Key Changes and Additions#

Comparison of the original NCCID cleaning pipeline and NCCIDxClean.

A visual comparison of the original NCCID cleaning pipeline and NCCIDxClean [1]. This figure is a modified version of a figure from our paper, which may be requested below.#

Creatinine

Creatinine units have been made consistent.

Dates

Excel formatted dates are handled and dates are sense checked to ensure a logical order, e.g. death is after admission. Some centers systematically used the wrong format in their submissions, which has been corrected.

D-Dimer

D-dimer units have been made consistent.

EDA

Automated Exploratory Data Analysis (EDA) scripts added.

FiO2

The units of FiO2 values below 21% have been converted where possible and removed if the conversion is ambiguous.

Inferences

Where possible, values have been inferred from other values in the data. For example, if a patient has a date of death but the binary death feature is missing, death is set to ‘1’.

Missing and Unknown Values

Entries of ‘unknown’ or equivalent are retained using an additional code to distinguish from missing values.

PaO2

The PaO2 feature was split into blood gases and a oxygen saturations, with oxygen saturations then imputed from any PaO2 values. The original values are may be returned depending on the parameters used to run the pipeline.

Past Medical History

Some of the Past Medical History (PMH) features have become binary (+ ‘Unknown’) due to discrepancies in the submission coding. A significant number of implausible values for the PMH hypertension feature have been removed.

Sense Checks

Further data sense checking, including enhanced clipping of unrealistic or impossible numerical values.

Sex

A code error in the ‘Sex’ feature for one hospital has been corrected.

Truncation

Some numerical features were truncated to ensure consistent maximum / minimum values due to differing laboratory reporting limits between centers, e.g. Troponin I.

See also

A spreadsheet outlining the features and the changes made in this pipeline versus NCCID Cleaning is available here

A pre-print of our paper is available on request:

A pipeline to further enhance quality, integrity and reusability of the NCCID clinical data. A. Breger, I. Selby, M. Roberts, J. Preller, J.H.F. Rudd, J.A.D. Aston, J.R. Weir-McCall, C.B. Schönlieb on behalf of the AIX-COVNET Collaboration. (under review)