Dates#

Original NCCID Cleaning Pipeline#

NCCID Function : _parse_date_columns

  1. Converts fields expected in US date format MM/DD/YY into pd.datetime dates.

  2. Dates are pulled out from entries with known errors of the form ‘[Text] - YYYY-MM-DD’.

  3. Other known errors e.g., entries of ‘.’, ‘ ‘, and unknown errors are parsed as pd.NaT.


NCCIDxClean#

New Function : parse_date_columns

  1. Convertion of dates stored as numbers in the excel date format, which previously would have been lost (set to np.nan).

  2. Swab dates for three centres are converted to correct format. It was identified that these were in UK format rather than the expected US format.

  3. Cleaning of date_of_positive_covid_swab to change the entry to the earliest positive swab date provided, as it was noted that for a substantial number of patients:

    • Their ‘positive’ swab date was equal to their PCR result date rather than the aquisition date; and/or

    • Their ‘positive’ swab date was equal to their second PCR rather than the first.