Dataloaders¶

The dataloaders are used to collate raw data of xFeatures, yTrue and yPred and standardize them into input formats and methods that can be called upon at various stages for further processing to serve specific tasks related to feature components. This is the module that user can call upon to load their data and trigger a series of auto-processing to generate the gap analysis report and spin up the dash web application.

CSVDataLoader¶

class rarity.data_loader.CSVDataLoader(xFeatures_file: str, yTrue_file: str, yPred_file_ls: List[str] = [], model_names_ls: List[str] = [], analysis_type: Optional[str] = None)[source]¶

Dataloader that compiles all input files in csv format.

Parameters

xFeature_file (str) –
Path to csv file that contains all xfeatures used for model development/training.

Example of csv file storing xfeatures :

feature_0

feature_1

21

B

36

A
yTrue_file (str) –
Path to csv file that contains all actual values (regression) / true labels (classification)

Example of csv file storing yTrue values for

Regression :

price

78634

98273

2780

Binary Classification :

churn

1

0

1

Multiclass Classification :

size

big

medium

small
yPred_file_ls (List[str]) –
List consists of csv file paths that contain prediction values / probabilities generated by specific model type. One csv file for 1 model prediction outputs

Example of csv file stroing yPred values / labels for

Regression :

price

83683

67293

Binary Classification :

0

1

0.0675

0.9325

0.6237

0.3767

Multiclass Classification :

big

medium

small

0.7772

0.1140

0.1088

0.0014

0.8169

0.1817
model_names_ls (List[str]) – List contains model names representing the model used to generate yPred
analysis_type (str) – Analysis type defined by user. Corresponding feature components will be auto-populated based on the specified analysis type. Supported analysis types : Regression, Binary Classification, Multiclass Classification

DataframeLoader¶

class rarity.data_loader.DataframeLoader(df_xFeatures: pandas.core.frame.DataFrame, df_yTrue: pandas.core.frame.DataFrame, df_yPred_ls: List[pandas.core.frame.DataFrame] = [], model_names_ls: List[str] = [], analysis_type: Optional[str] = None)[source]¶

Dataloader that compile all xFeatures, yTrue, yPreds in dataframe format.

Parameters

df_xFeatures (DataFrame) – Dataframe that contains all xfeatures used for model development/training.
df_yTrue (DataFrame) – Dataframe that contains all true values / labels.
df_yPred_ls (List[~pd.DataFrame]) – Dataframe that contains all predicted values (regresession) / probabilities (classification).
model_names_ls (List[str]) – List contains model names representing the model used to generate yPred.
analysis_type (str) – Analysis type defined by user. Corresponding feature components will be auto-populated based on the specified analysis type. Supported analysis types : Regression, Binary Classification, Multiclass Classification

feature_0	feature_1
21	B
36	A

price
78634
98273
2780

churn
1
0
1

size
big
medium
small