nccidxclean.num_analysis subpackage#
Submodules#
nccidxclean.num_analysis.categorical#
Generates analysis for the categorical fields of the nccid data cleaned using the nccidxclean pipeline. Outputs are printed to stdout.
- nccidxclean.num_analysis.categorical.analyse_age(x_df, x_pos)[source]#
Generates analysis of age feature and prints to stdout.
- Parameters:
x_df (pd.DataFrame) – dataframe of numeric variables cleaned with the nccidxclean tool
x_pos (pd.DataFrame) – dataframe of numeric variables for patients with a positive PCR, cleaned with the nccidxclean tool
- Returns:
None
- Return type:
NoReturn
- nccidxclean.num_analysis.categorical.analyse_pmh_cvs(x_pos, n_pos)[source]#
Generates analysis of the pmh cvs disease feature and prints to stdout.
- Parameters:
x_pos (pd.DataFrame) – dataframe of numeric variables for patients with a positive PCR, cleaned with the nccidxclean tool
n_pos (pd.DataFrame) – dataframe of numeric variables for patients with a positive PCR, cleaned with the nccid-cleaning tool
- Returns:
None
- Return type:
NoReturn
- nccidxclean.num_analysis.categorical.analyse_pmh_htn(x_pos, n_pos)[source]#
Generates analysis of the pmh hypertension feature and prints to stdout.
- Parameters:
x_pos (pd.DataFrame) – dataframe of numeric variables for patients with a positive PCR, cleaned with the nccidxclean tool
n_pos (pd.DataFrame) – dataframe of numeric variables for patients with a positive PCR, cleaned with the nccid-cleaning tool
- Returns:
None
- Return type:
NoReturn
- nccidxclean.num_analysis.categorical.analyse_sex(x_df, x_pos, n_pos)[source]#
Generates analysis of the sex feature and prints to stdout.
- Parameters:
x_df (pd.DataFrame) – dataframe of numeric variables cleaned with the nccidxclean tool
x_pos (pd.DataFrame) – dataframe of numeric variables for patients with a positive PCR, cleaned with the nccidxclean tool
n_pos (pd.DataFrame) – dataframe of numeric variables for patients with a positive PCR, cleaned with the nccid-cleaning tool
- Returns:
None
- Return type:
NoReturn
- nccidxclean.num_analysis.categorical.categorical_fields_analysis(x_df, x_pos, n_pos)[source]#
Generates analysis for the categorical fields of the nccid data cleaned using the nccidxclean pipeline. Outputs are printed to stdout.
- Parameters:
x_df (pd.DataFrame) – dataframe of numeric variables cleaned with the nccidxclean tool
x_pos (pd.DataFrame) – dataframe of numeric variables for patients with a positive PCR, cleaned with the nccidxclean tool
n_pos (pd.DataFrame) – dataframe of numeric variables for patients with a positive PCR, cleaned with the nccid-cleaning tool
- Returns:
None
- Return type:
NoReturn
nccidxclean.num_analysis.dates#
- nccidxclean.num_analysis.dates.count_dates(n_pos, x_pos)[source]#
- Parameters:
n_pos (DataFrame) –
x_pos (DataFrame) –
- Return type:
DataFrame
- nccidxclean.num_analysis.dates.date_field_analysis(n_pos, x_pos, n_dates, x_dates)[source]#
- Parameters:
n_pos (DataFrame) –
x_pos (DataFrame) –
n_dates (DataFrame) –
x_dates (DataFrame) –
- Return type:
NoReturn
- nccidxclean.num_analysis.dates.dates_summary(date_entry_count)[source]#
- Parameters:
date_entry_count (DataFrame) –
- Return type:
NoReturn
- nccidxclean.num_analysis.dates.excel_dates(n_pos, date_entry_count)[source]#
- Parameters:
n_pos (DataFrame) –
date_entry_count (DataFrame) –
- Return type:
NoReturn
nccidxclean.num_analysis.num_analysis#
- nccidxclean.num_analysis.num_analysis.count_missing(n_pos, x_pos)[source]#
- Parameters:
n_pos (DataFrame) –
x_pos (DataFrame) –
- Return type:
NoReturn
- nccidxclean.num_analysis.num_analysis.numeric_analysis(xtd_df, xtd_pos, nhs_pos, xtd_dates, nhs_dates)[source]#
Performs numeric analysis of the data cleaned by the nccidxclean package.
- Parameters:
xtd_df (pd.DataFrame) – dataframe of nccid data cleaned using the nccidxclean pipeline
xtd_pos (pd.DataFrame) – dataframe of nccid data for patients with a positive pcr cleaned using the nccidxclean pipeline
nhs_pos (pd.DataFrame) – dataframe of nccid data for patients with a positive pcr cleaned using the nccid-cleaning pipeline
- Returns:
None
- Return type:
NoReturn
nccidxclean.eda.numeric#
Generates analysis for the numerical fields of the nccid data cleaned using the nccidxclean pipeline. Outputs are printed to stdout.
- nccidxclean.num_analysis.numeric.creatinine_analysis(df)[source]#
Generates creatinine analysis and prints to stdout.
- Parameters:
df (pd.DataFrame) – dataframe of numeric variables cleaned with _coerce_numeric_columns from the nhsx nccid-cleaning tool
- Returns:
None
- Return type:
NoReturn
- nccidxclean.num_analysis.numeric.ddimer_analysis(df)[source]#
Generates d-dimer analysis and prints to stdout.
- Parameters:
df (pd.DataFrame) – dataframe of numeric variables cleaned with _coerce_numeric_columns from the nhsx nccid-cleaning tool
- Returns:
None
- Return type:
NoReturn
- nccidxclean.num_analysis.numeric.fio2_analysis(df)[source]#
Generates fio2 analysis and prints to stdout.
- Parameters:
df (pd.DataFrame) – dataframe of nccid data cleaned using the nccidxclean pipeline
- Returns:
None
- Return type:
NoReturn
- nccidxclean.num_analysis.numeric.numeric_fields_analysis(x_df)[source]#
- Generates analysis for the numerical fields to understand how the nccidxclean pipeline
affected the data.
- Parameters:
x_df (pd.DataFrame) – dataframe cleaned by the nccidxclean pipeline.
- Returns:
None
- Return type:
NoReturn
- nccidxclean.num_analysis.numeric.pao2_analysis(df)[source]#
Generates pao2/spo2 analysis and prints to stdout.
- Parameters:
df (pd.DataFrame) – dataframe of nccid data cleaned using the nccidxclean pipeline
- Returns:
None
- nccidxclean.num_analysis.numeric.round_sig(x, sig_fig=2)[source]#
Sets a number to a specified number of significant figures.
- Parameters:
x (str, int or float) – input number
sig_fig (int) – number of significant figures to use, default=2
- Returns:
the number rounded using sig_fig significant figures
- Return type:
str
nccidxclean.eda.tables#
- nccidxclean.num_analysis.tables.info_by_center(x_df, x_pos)[source]#
Creates table of information for each center, including the number of hospitals, patients, missing values, and earliest / latest PCR dates. It is formatted in markdown for use on our website.
- Parameters:
x_df (pd.DataFrame) – dataframe containing cleaned nccid data
x_pos (pd.DataFrame) – dataframe containing cleaned nccid data for positive patients only
- Returns:
None
- Return type:
NoReturn
- nccidxclean.num_analysis.tables.summary_table(x_df, x_pos)[source]#
Creates summary table in markdown used on website.
- Parameters:
x_df (pd.DataFrame) – dataframe containing cleaned nccid data
x_pos (pd.DataFrame) – dataframe containing cleaned nccid data for positive patients only
- Returns:
None
- Return type:
NoReturn