Pipeline Comparison and Analysis#
The charts and calculations made for our write-up may be replicated using the
figures module.
1. Generating Cleaned Data using the NHSx NCCID-Cleaning Pipeline#
Prior to running the module, the data must also be cleaned using the NHSx nccid-cleaning pipeline. The analysis module needs this to allow comparison with the data cleaned by to the extended pipeline.
Full documentation of how to run the NHSx pipeline is provided on their GitHub
(nhsx/nccid-cleaning); however, you may now run their default pipeline from the command line using:
xclean_run_nhsx_pipeline <base_path> <clinical_subdir> --xray_subdir --ct_subdir --mri_subdir --xray_meta_path --ct_meta_path --mri_meta_path
The arguments allow you to specify the location of either the imaging DICOM/json files or
the csv’s if the metadata has previously been extracted. Outputs are again saved in ./data/.
The NHSx team have provided Jupyter notebooks should you prefer to run the pipeline step-by-step or encounter any issues.
2. Running Analysis to Compare Pipelines#
To generate both the analysis and figures using the command line:
xclean_analysis <nhsx_cleaned_path> <extended_cleaned_path> --xray_meta_path --ct_meta_path --mri_meta_path
To run from python:
import nccidxclean as xclean
xclean.analysis(<nhsx_cleaned_path>, <extended_cleaned_path>, xray_meta_path=None, ct_meta_path=None, mri_meta_path=None)