nccidxclean package#

Module contents#

nccidxclean.nccidxclean module#

This module contains the main function for the nccidxclean package to clean NCCID clinical data.

nccidxclean.nccidxclean.main()[source]#

Handles command line execution of the cleaning pipeline.

nccidxclean.nccidxclean.xclean_nccid(input_df, xpipeline='xclean_default', return_cols='spo2_imputed_cleaned_only', collapse_pmh=True, fio2_ltrs_to_percent=True)[source]#

Runs the full pipeline on a pandas dataframe (data_df).

Parameters:
  • input_df (pd.DataFrame) – Input dataframe of clinical data from NCCID.

  • xpipeline (Union[str, Collection], optional, default = 'xclean_default') – List/tuple of cleaning functions to apply to the dataframe, which can be modified with the ‘collapse_pmh’ and ‘spo2_ltrs_to_percent’ parameters. The default pipeline is: ‘xclean_default’ = [xclean.check_new_centres, xclean.column_shift, xclean.remap_hospitals, xclean.remap_sex, xclean.remap_ethnicity, xclean.parse_date_columns, xclean.parse_binary_and_cat, xclean.binarise_lung_csv, xclean.rescale_fio2, nhsx._coerce_numeric_columns, nhsx._remap_test_result_columns, xclean.clean_numeric, xclean.clip_numeric, xclean.fix_headers, xclean.sense_check, xclean.inferences]

  • return_cols (str, optional, default = spo2_imputed_cleaned_only) – Dataframe columns returned by the pipeline: ‘spo2_imputed_cleaned_only’: returns the cleaned columns only, with spo2 imputed from pao2. ‘spo2_imputed_with_original’: returns the original and cleaned columns, with spo2 and pao2. ‘o2_split_cleaned_only’: returns the cleaned columns only, with spo2 and pao2. ‘o2_split_with_original’: returns the original and cleaned columns, with spo2 and pao2. ‘all_cleaned_only’: returns all possible cleaned columns. ‘all_with_original’: returns all possible original and cleaned columns.

  • collapse_pmh (bool, optional, default = True) – Whether to collapse the pre-existing medical history values to binary + unknown in the default pipeline.

  • fio2_ltrs_to_percent (bool, optional, default = True) – Whether to convert FiO2 values in litres to percentages in the default pipeline. This will prevent the P/F ratio being calculated in the inferences step.

Returns:

Cleaned dataframe.

Return type:

pd.DataFrame

nccidxclean.analysis module#

This module produces plots and numeric analysis used to compare the nccid-cleaning and nccidxclean pipelines.

nccidxclean.analysis.main()[source]#

Handles command line execution of the analysis module on cleaned NCCID data.

nccidxclean.analysis.make_figures_and_numeric_analysis(nhs_df, xtd_df, dcm_df)[source]#

Produces plots and numeric analysis used to compare the nccid-cleaning and nccidxclean pipelines. Numeric analysis is written to stdout, with plots saved to ./charts.

Parameters:
  • nhs_df (pd.DataFrame) – dataframe containing clinical data cleaned by nccid-cleaning

  • xtd_df (pd.DataFrame) – dataframe containing clinical data cleaned by nccidxclean

  • dcm_df (pd.DataFrame) – dataframe containing dicom metadata

Returns:

dictionary containing figures

Return type:

Dict[int, plt.Figure]

nccidxclean.analysis.print_df_as_latex(df, caption, label)[source]#

Prints a dataframe as a latex table.

Parameters:
  • df (pd.DataFrame) – dataframe to print

  • caption (str) – caption for the table

  • label (str) – label for the table

Return type:

NoReturn

nccidxclean.eda module#

Subpackages#