Dataloaders¶
The dataloaders are used to collate raw data of xFeatures, yTrue and yPred and standardize them into input formats and methods
that can be called upon at various stages for further processing to serve specific tasks related to feature components. This is the module that
user can call upon to load their data and trigger a series of auto-processing to generate the gap analysis report and spin up the dash web application.
CSVDataLoader¶
- class rarity.data_loader.CSVDataLoader(xFeatures_file: str, yTrue_file: str, yPred_file_ls: List[str] = [], model_names_ls: List[str] = [], analysis_type: Optional[str] = None)[source]¶
Dataloader that compiles all input files in csv format.
- Parameters
xFeature_file (str) –
Path to csv file that contains all xfeatures used for model development/training.
Example of csv file storing xfeatures :
feature_0
feature_1
21
B
36
A
yTrue_file (str) –
Path to csv file that contains all actual values (regression) / true labels (classification)
Example of csv file storing yTrue values for
Regression:price
78634
98273
2780
Binary Classification:churn
1
0
1
Multiclass Classification:size
big
medium
small
yPred_file_ls (
List[str]) –List consists of csv file paths that contain prediction values / probabilities generated by specific model type. One csv file for 1 model prediction outputs
Example of csv file stroing yPred values / labels for
Regression:price
83683
67293
Binary Classification:0
1
0.0675
0.9325
0.6237
0.3767
Multiclass Classification:big
medium
small
0.7772
0.1140
0.1088
0.0014
0.8169
0.1817
model_names_ls (
List[str]) – List contains model names representing the model used to generate yPredanalysis_type (str) – Analysis type defined by user. Corresponding feature components will be auto-populated based on the specified analysis type. Supported analysis types :
Regression,Binary Classification,Multiclass Classification
DataframeLoader¶
- class rarity.data_loader.DataframeLoader(df_xFeatures: pandas.core.frame.DataFrame, df_yTrue: pandas.core.frame.DataFrame, df_yPred_ls: List[pandas.core.frame.DataFrame] = [], model_names_ls: List[str] = [], analysis_type: Optional[str] = None)[source]¶
Dataloader that compile all xFeatures, yTrue, yPreds in dataframe format.
- Parameters
df_xFeatures (
DataFrame) – Dataframe that contains all xfeatures used for model development/training.df_yTrue (
DataFrame) – Dataframe that contains all true values / labels.df_yPred_ls (
List[~pd.DataFrame]) – Dataframe that contains all predicted values (regresession) / probabilities (classification).model_names_ls (
List[str]) – List contains model names representing the model used to generate yPred.analysis_type (str) – Analysis type defined by user. Corresponding feature components will be auto-populated based on the specified analysis type. Supported analysis types :
Regression,Binary Classification,Multiclass Classification