nccidxclean.num_analysis subpackage#

Submodules#

nccidxclean.num_analysis.categorical#

Generates analysis for the categorical fields of the nccid data cleaned using the nccidxclean pipeline. Outputs are printed to stdout.

nccidxclean.num_analysis.categorical.analyse_age(x_df, x_pos)[source]#

Generates analysis of age feature and prints to stdout.

Parameters:
  • x_df (pd.DataFrame) – dataframe of numeric variables cleaned with the nccidxclean tool

  • x_pos (pd.DataFrame) – dataframe of numeric variables for patients with a positive PCR, cleaned with the nccidxclean tool

Returns:

None

Return type:

NoReturn

nccidxclean.num_analysis.categorical.analyse_pmh_cvs(x_pos, n_pos)[source]#

Generates analysis of the pmh cvs disease feature and prints to stdout.

Parameters:
  • x_pos (pd.DataFrame) – dataframe of numeric variables for patients with a positive PCR, cleaned with the nccidxclean tool

  • n_pos (pd.DataFrame) – dataframe of numeric variables for patients with a positive PCR, cleaned with the nccid-cleaning tool

Returns:

None

Return type:

NoReturn

nccidxclean.num_analysis.categorical.analyse_pmh_htn(x_pos, n_pos)[source]#

Generates analysis of the pmh hypertension feature and prints to stdout.

Parameters:
  • x_pos (pd.DataFrame) – dataframe of numeric variables for patients with a positive PCR, cleaned with the nccidxclean tool

  • n_pos (pd.DataFrame) – dataframe of numeric variables for patients with a positive PCR, cleaned with the nccid-cleaning tool

Returns:

None

Return type:

NoReturn

nccidxclean.num_analysis.categorical.analyse_sex(x_df, x_pos, n_pos)[source]#

Generates analysis of the sex feature and prints to stdout.

Parameters:
  • x_df (pd.DataFrame) – dataframe of numeric variables cleaned with the nccidxclean tool

  • x_pos (pd.DataFrame) – dataframe of numeric variables for patients with a positive PCR, cleaned with the nccidxclean tool

  • n_pos (pd.DataFrame) – dataframe of numeric variables for patients with a positive PCR, cleaned with the nccid-cleaning tool

Returns:

None

Return type:

NoReturn

nccidxclean.num_analysis.categorical.categorical_fields_analysis(x_df, x_pos, n_pos)[source]#

Generates analysis for the categorical fields of the nccid data cleaned using the nccidxclean pipeline. Outputs are printed to stdout.

Parameters:
  • x_df (pd.DataFrame) – dataframe of numeric variables cleaned with the nccidxclean tool

  • x_pos (pd.DataFrame) – dataframe of numeric variables for patients with a positive PCR, cleaned with the nccidxclean tool

  • n_pos (pd.DataFrame) – dataframe of numeric variables for patients with a positive PCR, cleaned with the nccid-cleaning tool

Returns:

None

Return type:

NoReturn

nccidxclean.num_analysis.dates#

nccidxclean.num_analysis.dates.count_dates(n_pos, x_pos)[source]#
Parameters:
  • n_pos (DataFrame) –

  • x_pos (DataFrame) –

Return type:

DataFrame

nccidxclean.num_analysis.dates.date_field_analysis(n_pos, x_pos, n_dates, x_dates)[source]#
Parameters:
  • n_pos (DataFrame) –

  • x_pos (DataFrame) –

  • n_dates (DataFrame) –

  • x_dates (DataFrame) –

Return type:

NoReturn

nccidxclean.num_analysis.dates.dates_summary(date_entry_count)[source]#
Parameters:

date_entry_count (DataFrame) –

Return type:

NoReturn

nccidxclean.num_analysis.dates.excel_dates(n_pos, date_entry_count)[source]#
Parameters:
  • n_pos (DataFrame) –

  • date_entry_count (DataFrame) –

Return type:

NoReturn

nccidxclean.num_analysis.dates.incorrect_date_format(n_pos)[source]#
Parameters:

n_pos (DataFrame) –

Return type:

NoReturn

nccidxclean.num_analysis.dates.pcr_imaging_gaps(n_dates, x_dates)[source]#
Parameters:
  • n_dates (DataFrame) –

  • x_dates (DataFrame) –

Return type:

NoReturn

nccidxclean.num_analysis.num_analysis#

nccidxclean.num_analysis.num_analysis.count_missing(n_pos, x_pos)[source]#
Parameters:
  • n_pos (DataFrame) –

  • x_pos (DataFrame) –

Return type:

NoReturn

nccidxclean.num_analysis.num_analysis.numeric_analysis(xtd_df, xtd_pos, nhs_pos, xtd_dates, nhs_dates)[source]#

Performs numeric analysis of the data cleaned by the nccidxclean package.

Parameters:
  • xtd_df (pd.DataFrame) – dataframe of nccid data cleaned using the nccidxclean pipeline

  • xtd_pos (pd.DataFrame) – dataframe of nccid data for patients with a positive pcr cleaned using the nccidxclean pipeline

  • nhs_pos (pd.DataFrame) – dataframe of nccid data for patients with a positive pcr cleaned using the nccid-cleaning pipeline

Returns:

None

Return type:

NoReturn

nccidxclean.eda.numeric#

Generates analysis for the numerical fields of the nccid data cleaned using the nccidxclean pipeline. Outputs are printed to stdout.

nccidxclean.num_analysis.numeric.creatinine_analysis(df)[source]#

Generates creatinine analysis and prints to stdout.

Parameters:

df (pd.DataFrame) – dataframe of numeric variables cleaned with _coerce_numeric_columns from the nhsx nccid-cleaning tool

Returns:

None

Return type:

NoReturn

nccidxclean.num_analysis.numeric.ddimer_analysis(df)[source]#

Generates d-dimer analysis and prints to stdout.

Parameters:

df (pd.DataFrame) – dataframe of numeric variables cleaned with _coerce_numeric_columns from the nhsx nccid-cleaning tool

Returns:

None

Return type:

NoReturn

nccidxclean.num_analysis.numeric.fio2_analysis(df)[source]#

Generates fio2 analysis and prints to stdout.

Parameters:

df (pd.DataFrame) – dataframe of nccid data cleaned using the nccidxclean pipeline

Returns:

None

Return type:

NoReturn

nccidxclean.num_analysis.numeric.numeric_fields_analysis(x_df)[source]#
Generates analysis for the numerical fields to understand how the nccidxclean pipeline

affected the data.

Parameters:

x_df (pd.DataFrame) – dataframe cleaned by the nccidxclean pipeline.

Returns:

None

Return type:

NoReturn

nccidxclean.num_analysis.numeric.pao2_analysis(df)[source]#

Generates pao2/spo2 analysis and prints to stdout.

Parameters:

df (pd.DataFrame) – dataframe of nccid data cleaned using the nccidxclean pipeline

Returns:

None

nccidxclean.num_analysis.numeric.round_sig(x, sig_fig=2)[source]#

Sets a number to a specified number of significant figures.

Parameters:
  • x (str, int or float) – input number

  • sig_fig (int) – number of significant figures to use, default=2

Returns:

the number rounded using sig_fig significant figures

Return type:

str

nccidxclean.num_analysis.numeric.troponin_analysis(df)[source]#

Generates troponin i analysis and prints to stdout.

Parameters:

df (pd.DataFrame) – dataframe of numeric variables cleaned with _coerce_numeric_columns from the nhsx nccid-cleaning tool

Returns:

None

Return type:

NoReturn

nccidxclean.num_analysis.numeric.value_counts_with_percentage(x)[source]#

Generates a dataframe of value counts as a percentage for a given field.

Parameters:

x (pd.Series) – Input data field

Returns:

dataframe of value counts as a percentage

Return type:

pd.DataFrame

nccidxclean.eda.tables#

nccidxclean.num_analysis.tables.info_by_center(x_df, x_pos)[source]#

Creates table of information for each center, including the number of hospitals, patients, missing values, and earliest / latest PCR dates. It is formatted in markdown for use on our website.

Parameters:
  • x_df (pd.DataFrame) – dataframe containing cleaned nccid data

  • x_pos (pd.DataFrame) – dataframe containing cleaned nccid data for positive patients only

Returns:

None

Return type:

NoReturn

nccidxclean.num_analysis.tables.summary_table(x_df, x_pos)[source]#

Creates summary table in markdown used on website.

Parameters:
  • x_df (pd.DataFrame) – dataframe containing cleaned nccid data

  • x_pos (pd.DataFrame) – dataframe containing cleaned nccid data for positive patients only

Returns:

None

Return type:

NoReturn