Dates#
Original NCCID Cleaning Pipeline#
NCCID Function
: _parse_date_columns
Converts fields expected in US date format MM/DD/YY into
pd.datetimedates.Dates are pulled out from entries with known errors of the form ‘[Text] - YYYY-MM-DD’.
Other known errors e.g., entries of ‘.’, ‘ ‘, and unknown errors are parsed as
pd.NaT.
NCCIDxClean#
New Function
: parse_date_columns
Convertion of dates stored as numbers in the excel date format, which previously would have been lost (set to
np.nan).Swab dates for three centres are converted to correct format. It was identified that these were in UK format rather than the expected US format.
Cleaning of date_of_positive_covid_swab to change the entry to the earliest positive swab date provided, as it was noted that for a substantial number of patients:
Their ‘positive’ swab date was equal to their PCR result date rather than the aquisition date; and/or
Their ‘positive’ swab date was equal to their second PCR rather than the first.