Dataloaders

The dataloaders are used to collate raw data of xFeatures, yTrue and yPred and standardize them into input formats and methods that can be called upon at various stages for further processing to serve specific tasks related to feature components. This is the module that user can call upon to load their data and trigger a series of auto-processing to generate the gap analysis report and spin up the dash web application.

CSVDataLoader

class rarity.data_loader.CSVDataLoader(xFeatures_file: str, yTrue_file: str, yPred_file_ls: List[str] = [], model_names_ls: List[str] = [], analysis_type: Optional[str] = None)[source]

Dataloader that compiles all input files in csv format.

Parameters
  • xFeature_file (str) –

    Path to csv file that contains all xfeatures used for model development/training.

    Example of csv file storing xfeatures :

    feature_0

    feature_1

    21

    B

    36

    A

  • yTrue_file (str) –

    Path to csv file that contains all actual values (regression) / true labels (classification)

    Example of csv file storing yTrue values for

    Regression :

    price

    78634

    98273

    2780

    Binary Classification :

    churn

    1

    0

    1

    Multiclass Classification :

    size

    big

    medium

    small

  • yPred_file_ls (List[str]) –

    List consists of csv file paths that contain prediction values / probabilities generated by specific model type. One csv file for 1 model prediction outputs

    Example of csv file stroing yPred values / labels for

    Regression :

    price

    83683

    67293

    Binary Classification :

    0

    1

    0.0675

    0.9325

    0.6237

    0.3767

    Multiclass Classification :

    big

    medium

    small

    0.7772

    0.1140

    0.1088

    0.0014

    0.8169

    0.1817

  • model_names_ls (List[str]) – List contains model names representing the model used to generate yPred

  • analysis_type (str) – Analysis type defined by user. Corresponding feature components will be auto-populated based on the specified analysis type. Supported analysis types : Regression, Binary Classification, Multiclass Classification

DataframeLoader

class rarity.data_loader.DataframeLoader(df_xFeatures: pandas.core.frame.DataFrame, df_yTrue: pandas.core.frame.DataFrame, df_yPred_ls: List[pandas.core.frame.DataFrame] = [], model_names_ls: List[str] = [], analysis_type: Optional[str] = None)[source]

Dataloader that compile all xFeatures, yTrue, yPreds in dataframe format.

Parameters
  • df_xFeatures (DataFrame) – Dataframe that contains all xfeatures used for model development/training.

  • df_yTrue (DataFrame) – Dataframe that contains all true values / labels.

  • df_yPred_ls (List[~pd.DataFrame]) – Dataframe that contains all predicted values (regresession) / probabilities (classification).

  • model_names_ls (List[str]) – List contains model names representing the model used to generate yPred.

  • analysis_type (str) – Analysis type defined by user. Corresponding feature components will be auto-populated based on the specified analysis type. Supported analysis types : Regression, Binary Classification, Multiclass Classification