Features

Modules under Features act as core integrator to link up inputs from interpreters and outputs interactive graphs via visualizers. Major styled components built with dash are defined at this stage and customized accordingly in respective feature modules depending on the task it serves. All responsive parameters and callbacks managements are handled in this stage as well.

Feat - General Metrics

class rarity.features.GeneralMetrics(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]

Main integration for feature component on General Metrics.

  • On Regression: Prediction vs Actual, Prediction vs Offset

  • On Classification: Confusion Matrix, Classification Report, ROC, Precision-Recall

Parameters

data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module

Returns

styled dash components displaying graph and/or table objects

Return type

Container

rarity.features.feat_general_metrics.fig_classification_report(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]

Create classification report in table form

Parameters

data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module

Returns

list of tables displaying classification report details

Return type

List[~plotly.graph_objects.Figure]

rarity.features.feat_general_metrics.fig_confusion_matrix(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]

Create confusion matrix

Parameters

data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module

Returns

figure displaying confusion matrix details

Return type

Figure

rarity.features.feat_general_metrics.fig_precisionRecall_curve(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]

Display precision-recall curve for comparison on various models

Parameters

data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module

Returns

figure displaying line curves comparing precision-recall for various models

Return type

Figure

rarity.features.feat_general_metrics.fig_prediction_actual_comparison(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]

Display scatter plot for comparison on actual values (yTrue) vs prediction values (yPred)

Parameters

data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module

Returns

figure displaying scatter plot comparing actual values vs prediction values

Return type

Figure

rarity.features.feat_general_metrics.fig_prediction_offset_overview(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]

Display scatter plot for overview on prediction offset values

Parameters

data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module

Returns

figure displaying scatter plot outlining overview on prediction offset values

Return type

Figure

rarity.features.feat_general_metrics.fig_roc_curve(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]

Display roc curve for comparison on various models

Parameters

data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module

Returns

figure displaying line curves comparing roc-auc score for various models

Return type

Figure

rarity.features.feat_general_metrics.fig_standard_error_metrics(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]

Display table comparing various standard metrics for regression task

Parameters

data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module

Returns

table object comparing various standard metrics for regression task

Return type

DataTable

Feat - Miss Predictions

class rarity.features.MissPredictions(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]

Main integration for feature component on Miss Prediction.

  • On Regression: To generate single miss-prediction scatter plot by data index points

  • On Classification: To generate scatter plots for probabilities comparison on correct data point vs miss-predicted data point for each class label

Parameters

data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module

Important Attributes:

  • analysis_type (str):

    Analysis type defined by user during initial inputs preparation via data_loader stage.

  • model_names (List[str]):

    model names defined by user during initial inputs preparation via data_loader stage.

  • is_bimodal (bool):

    to indicate if analysis involves 2 models

Returns

styled dash components displaying graph and/or table objects

Return type

Container

rarity.features.feat_miss_predictions.convert_relayout_data_to_df_cls(fig_class_label, relayout_data, df_feature, df_viz_specific)[source]

Convert raw data format from relayout selection range by user into the correct df fit for viz purpose

Parameters
  • fig_class_label (str) – class label name

  • relayout_data (Dict) – data containing selection range indices returned from plotly graph

  • df (DataFrame) – dataframe tap-out from interpreters pipeline

  • df_viz_specific (DataFrame) – dataframe prefiltered with right class label and model

Returns

dataframe fit for the responsive table-graph filtering

Return type

DataFrame

rarity.features.feat_miss_predictions.convert_relayout_data_to_df_reg(relayout_data, df, models)[source]

Convert raw data format from relayout selection range by user into the correct df fit for viz purpose

Parameters
  • relayout_data (Dict) – dictionary like data containing selection range indices returned from plotly graph

  • df (DataFrame) – dataframe tap-out from interpreters pipeline

  • models (List[str]) – model names defined by user during spin-up of Tenjin app

Returns

dataframe fit for the responsive table-graph filtering

Return type

DataFrame

rarity.features.feat_miss_predictions.fig_plot_prediction_offset_overview(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]

For use in regression task only. Display scatter plot for overview on prediction offset values

Parameters

data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module

Returns

figure displaying scatter plot outlining overview on prediction offset values by index

Return type

Figure

rarity.features.feat_miss_predictions.fig_probabilities_spread_pattern(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]

For use in classification task only. Function to output collated info packs used to display final graph objects and data tables

Parameters

data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module

Returns

Compact outputs consist of the followings

  • fig_objs_all_models (:obj: List[~plotly.graph_objects.Figure]): figure displaying scatter plot outlining probabilities comparison on correct data point vs miss-predicted data point for each class label

  • tables_all_models (List[~dash_table.DataTable]): table object outlining simple stats on ss, %correct, % wrong, accuracy for each label class

  • ls_dfs_viz (List[~pandas.DataFrame]): dataframes for overview visualization need with true labels and predicted labels included

  • df_features (DataFrame): dataframe storing all features used in dataset

  • ls_class_labels (List[str]): list of class labels found in dataset

rarity.features.feat_miss_predictions.table_with_relayout_datapoints(data, customized_cols, header, exp_format)[source]

Create table outlining dataframe content

Parameters
  • data (DataTable) – dictionary like format storing dataframe info under ‘record’ key

  • customized_cols (List[str]) – list of customized column names

  • header (Dict) – dictionary format storing the style info for table header

  • exp_format (str) – text info indicating the export format

Returns

table object outlining the dataframe content with specific styles

Return type

DataTable

Feat - Loss Clusters

class rarity.features.LossClusters(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]

Main integration for feature component on Loss Clusters.

Parameters

data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module

Important Attributes:

analysis_type (str):

Analysis type defined by user during initial inputs preparation via data_loader stage.

model_names (List[str]):

model names defined by user during initial inputs preparation via data_loader stage.

is_bimodal (bool):

to indicate if analysis involves 2 models

num_clusters (int):

Number of cluster to form

log_funct (math.log):

Mathematics logarithm function used to calculate log-loss between yTrue and yPred

specific_dataset (str):

Default to ‘All’ indicating to include all miss-predict labels. Other options flexibly expand depending on class labels

Returns

styled dash components displaying graph and/or table objects

Return type

Container

rarity.features.feat_loss_clusters.convert_cluster_relayout_data_to_df_cls(relayout_data: Dict, dfs_viz: List[pandas.core.frame.DataFrame], df_features: pandas.core.frame.DataFrame, models: List[str])[source]

For use in classification task only. Convert raw data format from relayout selection range by user into the correct df fit for viz purpose

Parameters
  • relayout_data (Dict) – dictionary like data containing selection range indices returned from plotly graph

  • dfs_viz (List[~pd.DataFrame]) – list of dataframes for overview visualization need with offset values included

  • df_features (DataFrame) – dataframe storing all features used in dataset

  • models (List[str]) – model names defined by user during spin-up of Tenjin app

Returns

Compact outputs consist of the followings

  • df_final_features (DataFrame): dataframe storing all features based on slicing info from relayout_data

  • df_final_probs (DataFrame): dataframe storing probability values by class label corresponding to the slicing relayout_data

rarity.features.feat_loss_clusters.convert_cluster_relayout_data_to_df_reg(relayout_data: Dict, df: pandas.core.frame.DataFrame, models: List[str])[source]

For use in regression task only. Convert raw data format from relayout selection range by user into the correct df fit for viz purpose

Parameters
  • relayout_data (Dict) – dictionary like data containing selection range indices returned from plotly graph

  • df (DataFrame) – dataframe tap-out from interpreters pipeline

  • models (List[str]) – model names defined by user during spin-up of Tenjin app

Returns

dataframe fit for the responsive table-graph filtering

Return type

DataFrame

rarity.features.feat_loss_clusters.fig_plot_logloss_clusters_cls(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader], num_cluster: int, log_func: math.log = <built-in function log>, specific_dataset: str = 'All')[source]

For use in classification task only. Function to output collated info packs used to display final graph objects by cluster groups along with calculated silhouette scores

Parameters
  • data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module

  • num_cluster (int) – Number of cluster to form

  • log_funct (math.log) – Mathematics logarithm function used to calculate log-loss between yTrue and yPred

  • specific_dataset (str) – Default to ‘All’ indicating to include all miss-predict labels. Other options flexibly expand depending on class labels

Returns

Compact outputs consist of the followings

  • ls_dfs_viz (List[~pd.DataFrame]): dataframes for overview visualization need with offset values included

  • fig_obj_cluster (Figure): figure displaying violin plot outlining cluster groups by offset values

  • ls_cluster_score (List[str]): list of silhouette scores, indication of clustering quality

  • fig_obj_elbow (Figure): figure displaying line plot outlining the change in sum of squared distances along the cluster range

  • ls_class_labels (List[str]): list of all class labels

  • ls_class_labels_misspred (List[str]): list of class labels with minimum of 1 miss-prediction

  • df_features (DataFrame): dataframe storing all features used in dataset

rarity.features.feat_loss_clusters.fig_plot_offset_clusters_reg(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader], num_cluster: int)[source]

For use in regression task only. Function to output collated info packs used to display final graph objects by cluster groups along with calculated silhouette scores

Parameters
Returns

Compact outputs consist of the followings

  • df (DataFrame): dataframes for overview visualization need with offset values included

  • fig_obj_cluster (Figure): figure displaying violin plot outlining cluster groups by offset values

  • ls_cluster_score (List[str]): list of silhouette scores, indication of clustering quality

  • fig_obj_elbow (Figure): figure displaying line plot outlining the change in sum of squared distances along the cluster range

rarity.features.feat_loss_clusters.table_with_relayout_datapoints(data: dash_table.DataTable.DataTable, customized_cols: List[str], header: Dict, exp_format: str)[source]

Create table outlining dataframe content

Parameters
  • data (DataTable) – dictionary like format storing dataframe info under ‘record’ key

  • customized_cols (List[str]) – list of customized column names

  • header (Dict) – dictionary format storing the style info for table header

  • exp_format (str) – text info indicating the export format

Returns

table object outlining the dataframe content with specific styles

Return type

DataTable

Feat - xFeature Distribution

class rarity.features.FeatureDistribution(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]

Main integration for feature component on Distribution

Parameters

data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module

Important Attributes:

analysis_type (str):

Analysis type defined by user during initial inputs preparation via data_loader stage.

model_names (List[str]):

model names defined by user during initial inputs preparation via data_loader stage.

is_bimodal (bool):

to indicate if analysis involves 2 models

feature_to_exclude (List of str, optional):

A list of features to be excluded from the kl-div calculation and visualization

df_features (DataFrame):

Dataframe storing all features used in dataset

specific_feature (List of str):

A list of features to be displayed along with the corresponding kl-div score

display_option (str):
  • info to indicate if to display distribution plot by top-N / bottom-N or both top-N + bottom-N

  • Available options: ‘top’, ‘bottom’ or ‘both’

display_value (int):
  • number indicates the limit of graph to be displayed, max at 10

  • if dataset consists of < 10 features, the limit == no. of features the dataset has

Returns

styled dash components displaying graph and/or table objects

Return type

Container

rarity.features.feat_feature_distribution.fig_plot_distribution_by_kl_div_ranking(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader], feature_to_exclude: List[str], start_idx: int, stop_idx: int, display_option: str, display_value: int)[source]

Integration of kl-divergence scores to corresponding fig-object

Parameters
  • data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module

  • feature_to_exclude (List of str, optional) – A list of features to be excluded from the kl-div calculation and visualization

  • start_idx (int, optional) – Integer number indicating the start index position to slice dataframe

  • stop_idx (int, optional) – Integer number indicating the stop index position to slice dataframe

  • display_option (str) –

    • info to indicate if to display distribution plot by top-N / bottom-N or both top-N + bottom-N

    • Available options: ‘top’, ‘bottom’ or ‘both’

  • display_value (int) –

    • number indicates the limit of graph to be displayed, max at 10

    • if dataset consists of < 10 features, the limit == no. of features the dataset has

Returns

dictionary storing distribution figures by display_option

Return type

Dict[~plotly.graph_objects.Figure]

Note

if classification, returns:

List[Dict[~plotly.graph_objects.Figure]]: list of dictionary storing distribution figures by display_option

rarity.features.feat_feature_distribution.fig_plot_distribution_by_specific_feature(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader], ls_specific_feature, start_idx, stop_idx)[source]

Integration of kl-divergence scores to specific fig-object

Parameters
  • data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module

  • ls_specific_feature (List of str) – A list of features to be displayed along with the corresponding kl-div score

  • start_idx (int, optional) – Integer number indicating the start index position to slice dataframe

  • stop_idx (int, optional) – Integer number indicating the stop index position to slice dataframe

Returns

list of figure objects displaying the distribution plot based on kl-divergence score

Return type

List[~plotly.graph_objects.Figure]

Feat - Similarities (+CounterFactuals)

class rarity.features.SimilaritiesCF(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]

Main integration for feature component on Similarities-CounterFactuals

Parameters

data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module

Important Attributes:

analysis_type (str):

Analysis type defined by user during initial inputs preparation via data_loader stage.

df_features (DataFrame):

Dataframe storing all features used in dataset

feature_to_exclude (List[str], optional):

A list of features to be excluded from the ranking and similarities distance calculation

user_defined_idx (int):

Index of the data point of interest specified by user

top_n (int):

Number indicating the max limit of records to be displayed based on the distance ranking

Returns

styled dash components displaying graph and/or table objects

Return type

Container

rarity.features.feat_similarities_counter_factuals.generate_counterfactuals(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader], user_defined_idx, feature_to_exclude=None, top_n=3)[source]

Tapout table collating feature info corresponding to user defined index and top N index based on distance score with condition that the prediction labels of top N index differ from prediction label of user defined index Applicable to both classification only

Parameters
  • data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module

  • user_defined_idx (int) – Index of the data point of interest specified by user

  • feature_to_exclude (List[str], optional) – A list of features to be excluded from the ranking and similarities distance calculation

  • top_n (int) – Number indicating the max limit of records to be displayed based on the distance ranking

Returns

table object outlining the dataframe content with dynamic-conditional styles

Return type

DataTable

rarity.features.feat_similarities_counter_factuals.generate_similarities(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader], user_defined_idx, feature_to_exclude=None, top_n=3)[source]

Tapout table collating feature info corresponding to user defined index and top N index based on distance score. Applicable to both regression and classification

Parameters
  • data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module

  • user_defined_idx (int) – Index of the data point of interest specified by user

  • feature_to_exclude (List[str], optional) – A list of features to be excluded from the ranking and similarities distance calculation

  • top_n (int) – Number indicating the max limit of records to be displayed based on the distance ranking

Returns

table object outlining the dataframe content with dynamic-conditional styles

Return type

DataTable