nccidxclean package#
Module contents#
nccidxclean.nccidxclean module#
This module contains the main function for the nccidxclean package to clean NCCID clinical data.
- nccidxclean.nccidxclean.xclean_nccid(input_df, xpipeline='xclean_default', return_cols='spo2_imputed_cleaned_only', collapse_pmh=True, fio2_ltrs_to_percent=True)[source]#
Runs the full pipeline on a pandas dataframe (data_df).
- Parameters:
input_df (pd.DataFrame) – Input dataframe of clinical data from NCCID.
xpipeline (Union[str, Collection], optional, default = 'xclean_default') – List/tuple of cleaning functions to apply to the dataframe, which can be modified with the ‘collapse_pmh’ and ‘spo2_ltrs_to_percent’ parameters. The default pipeline is: ‘xclean_default’ = [xclean.check_new_centres, xclean.column_shift, xclean.remap_hospitals, xclean.remap_sex, xclean.remap_ethnicity, xclean.parse_date_columns, xclean.parse_binary_and_cat, xclean.binarise_lung_csv, xclean.rescale_fio2, nhsx._coerce_numeric_columns, nhsx._remap_test_result_columns, xclean.clean_numeric, xclean.clip_numeric, xclean.fix_headers, xclean.sense_check, xclean.inferences]
return_cols (str, optional, default = spo2_imputed_cleaned_only) – Dataframe columns returned by the pipeline: ‘spo2_imputed_cleaned_only’: returns the cleaned columns only, with spo2 imputed from pao2. ‘spo2_imputed_with_original’: returns the original and cleaned columns, with spo2 and pao2. ‘o2_split_cleaned_only’: returns the cleaned columns only, with spo2 and pao2. ‘o2_split_with_original’: returns the original and cleaned columns, with spo2 and pao2. ‘all_cleaned_only’: returns all possible cleaned columns. ‘all_with_original’: returns all possible original and cleaned columns.
collapse_pmh (bool, optional, default = True) – Whether to collapse the pre-existing medical history values to binary + unknown in the default pipeline.
fio2_ltrs_to_percent (bool, optional, default = True) – Whether to convert FiO2 values in litres to percentages in the default pipeline. This will prevent the P/F ratio being calculated in the inferences step.
- Returns:
Cleaned dataframe.
- Return type:
pd.DataFrame
nccidxclean.analysis module#
This module produces plots and numeric analysis used to compare the nccid-cleaning and nccidxclean pipelines.
- nccidxclean.analysis.main()[source]#
Handles command line execution of the analysis module on cleaned NCCID data.
- nccidxclean.analysis.make_figures_and_numeric_analysis(nhs_df, xtd_df, dcm_df)[source]#
Produces plots and numeric analysis used to compare the nccid-cleaning and nccidxclean pipelines. Numeric analysis is written to stdout, with plots saved to ./charts.
- Parameters:
nhs_df (pd.DataFrame) – dataframe containing clinical data cleaned by nccid-cleaning
xtd_df (pd.DataFrame) – dataframe containing clinical data cleaned by nccidxclean
dcm_df (pd.DataFrame) – dataframe containing dicom metadata
- Returns:
dictionary containing figures
- Return type:
Dict[int, plt.Figure]
nccidxclean.eda module#
Subpackages#
- nccidxclean.clean subpackage
- Submodules
- nccidxclean.clean.binary_and_cat module
- nccidxclean.clean.dicts_and_maps module
- nccidxclean.clean.enrich_with_dcm module
- nccidxclean.clean.ethnicity_and_sex module
- nccidxclean.clean.fix_headers_and_order module
- nccidxclean.clean.geh_col_shift module
- nccidxclean.clean.inferences module
- nccidxclean.clean.numeric module
- nccidxclean.clean.parse_dates module
- nccidxclean.clean.remap_hospitals module
- nccidxclean.clean.sense_check module
- nccidxclean.clean.utils module
- nccidxclean.figures subpackage
- nccidxclean.num_analysis subpackage
- nccidxclean.eda subpackage