Numeric Result Cleaning#

New Function : clean_numeric

The component nested functions and their impact on certain numeric fields are explained below:

Nested Function Purpose
_additional_check_for_known_errors
  • CRP on admission, Ferritin and Troponin I:
    • Extracts value from known error in "[6] - 2020-05-19" format.
    • Extracts value if there is a leading "<" or ">". For example, '<4' becomes '4')
  • Temperature on admission:
    • Extracts value from string with additional information, for example, it returns '37.4' from '37.4 (36 hours post admission, 1st available)'.
_raise_if_date_in_numerical_column
  • During development one patient was found that has a date in the lymphocyte field (1/3/20) and another in Troponin T. Although only affecting a small number of values, this is raised for manual checking of the data.
_merge_o2_sat_into_po2
  • Only one value in 'o2_saturation' column, but this is now merged into PaO2/SpO2.
_split_and_clean_pao2
  • Where a centre has submitted a blood gas PaO2, this is converted from kPa to a percentage, then stored in 'pao2_gas'. Where a centre has submitted an SpO2, this is saved in 'pao2_saturation'
_impute_pao2_spo2
  • Imputes values of SpO2 from PaO2 values using the equation shown below.
_clean_creatinine
  • Creatinine data appears to have been submitted in multiple units (mg/dL and µmol/L). This function attempts to standardise the units to µmol/L.
_truncate_numerical_field
  • Truncates fields to account for the maximum and minimum values handled by the machines at some institutions. Further details are provided below.
_infer_ddimer_units
  • If a centre was not in the development data, the units for the D-Dimer are inferred by this function.
_clean_d_dimer
  • All values are converted to ng/mL FEU.
_merge_Ferritin_2
  • Ferritin_2 column is merged with 'Ferritin'.
_check_sheffield_trops
  • The units specified by Sheffield to NHSx for their Troponin I values do not fit with those in the data. This function warns the developer to check these values are consistent with those of other centres and correct/exclude this data accordingly.

Significantly Impacted Fields#

PaO2 / SpO2#

  • PaO2 versus SpO2:

    • When requesting data, NHSx asked for PaO2 (Partial Pressure of Oxygen) with the vital signs rather than the more commonly collected SpO2 (Oxygen Saturation). As a result, some centres submitted PaO2’s from arterial blood gases (ABGs) and others SpO2’s (from a pulse oximeter).

    • The typical units of PaO2 (kPa) and SpO2 (%) are a factor of 10 different, allowing differentiation between centres which submitted a PaO2 and an SpO2.

    • 3 hospitals completed this field in kPa exclusively.

  • Other corrections:

    • Royal United Hospitals Bath: 1. Put some FiO2’s in the wrong column (e.g. 0.21, 21, 26, 38, 50); and 2. Entered some blood gas values (PaO2).

    • Ealing Hospital and Ashford and St Peters appear to have entered FiO2 values.

    • Oxford University Hospitals and Liverpool Heart and Chest Hospital have entered SpO2 as a fraction rather than a percentage, i.e. 0.XX where XX is the SpO2 as a percentage.

  • Solutions:

    • Integer values between and including 21-50 are taken as FiO2 if FiO2 is blank. If FiO2 appears blank a warning is raised.

    • Any common oxygen fractions <=0.5, e.g. 0.21 are assumed to be FiO2 if FiO2 is blank. If FiO2 is not blank a warning is raised.

    • Any values for that 0.5<=PaO2<=1 holds are assumed to be SpO2 as a fraction of 1 and are multiplied by 100.

    • The PaO2 column is then split into pao2_gas and spo2_saturation

    • SpO2 is imputed from PaO2 values to merge the columns using the following equation[1]:

      \[SpO_2 = \left(\frac{28.6025^3}{{PaO_{2}}^{3}}+0.99 \right)^{-1} \]

Creatinine#

  • A handful of values are less than 20 and unlikely to be in SI units (µmol/L):

    • Some are less than 0.5 and appear to be in mmol/L rather than µmol/L.

    • Others between 0.5 and 20.0 are decimals (and therefore unlikely to be µmol/L), but too large for mmol/L. These could be errors (e.g. decimal placement) or in mg/dL; however, they did not appear consistent with mg/dL. These values are clipped in the clip_numeric function.


D-Dimer#

  • D-Dimer may be in DDU (D-Dimer Units) or FEU (Fibrinogen Equivalent Units). The standard is now FEU and use of this was confirmed with the labs.

  • Some centres had values orders of magnitude different from the others as they used any of: ng/mL, μg/mL, mg/L, g/L. For all of the centres in the development data, this was checked by telephone.

  • At some centres, it was apparent their machine had a maximum possible value above which results were truncated (e.g. if a centre had multiple results at a maximum of exactly 10,000). The minimum maximum laboratory value identified was 10,000, and consequently results were truncated to this value as if all machines had this as their maximum.


Truncated fields#

Feature Minimum Maximum
crp_on_admission 4
d-dimer_on_admission 10,000
ferritin 15,000
troponin_i 10
troponin_t 5 25,000