Key Changes and Additions#
A visual comparison of the original NCCID cleaning pipeline and NCCIDxClean [1]. This figure is a modified version of a figure from our paper, which may be requested below.#
- Creatinine
Creatinine units have been made consistent.
- Dates
Excel formatted dates are handled and dates are sense checked to ensure a logical order, e.g. death is after admission. Some centers systematically used the wrong format in their submissions, which has been corrected.
- D-Dimer
D-dimer units have been made consistent.
- EDA
Automated Exploratory Data Analysis (EDA) scripts added.
- FiO2
The units of FiO2 values below 21% have been converted where possible and removed if the conversion is ambiguous.
- Inferences
Where possible, values have been inferred from other values in the data. For example, if a patient has a date of death but the binary death feature is missing, death is set to ‘1’.
- Missing and Unknown Values
Entries of ‘unknown’ or equivalent are retained using an additional code to distinguish from missing values.
- PaO2
The PaO2 feature was split into blood gases and a oxygen saturations, with oxygen saturations then imputed from any PaO2 values. The original values are may be returned depending on the parameters used to run the pipeline.
- Past Medical History
Some of the Past Medical History (PMH) features have become binary (+ ‘Unknown’) due to discrepancies in the submission coding. A significant number of implausible values for the PMH hypertension feature have been removed.
- Sense Checks
Further data sense checking, including enhanced clipping of unrealistic or impossible numerical values.
- Sex
A code error in the ‘Sex’ feature for one hospital has been corrected.
- Truncation
Some numerical features were truncated to ensure consistent maximum / minimum values due to differing laboratory reporting limits between centers, e.g. Troponin I.
See also
A spreadsheet outlining the features and the changes made in this pipeline versus NCCID Cleaning is available here
A pre-print of our paper is available on request:
A pipeline to further enhance quality, integrity and reusability of the NCCID clinical data. A. Breger, I. Selby, M. Roberts, J. Preller, J.H.F. Rudd, J.A.D. Aston, J.R. Weir-McCall, C.B. Schönlieb on behalf of the AIX-COVNET Collaboration. (under review)