Features¶

Modules under Features act as core integrator to link up inputs from interpreters and outputs interactive graphs via visualizers. Major styled components built with dash are defined at this stage and customized accordingly in respective feature modules depending on the task it serves. All responsive parameters and callbacks managements are handled in this stage as well.

Feat - General Metrics¶

class rarity.features.GeneralMetrics(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶

Main integration for feature component on General Metrics.

On Regression: Prediction vs Actual, Prediction vs Offset

On Classification: Confusion Matrix, Classification Report, ROC, Precision-Recall

Parameters: data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module
Returns: styled dash components displaying graph and/or table objects
Return type: Container

rarity.features.feat_general_metrics.fig_classification_report(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶

Create classification report in table form

Parameters: data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module
Returns: list of tables displaying classification report details
Return type: List[~plotly.graph_objects.Figure]

rarity.features.feat_general_metrics.fig_confusion_matrix(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶

Create confusion matrix

Parameters: data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module
Returns: figure displaying confusion matrix details
Return type: Figure

rarity.features.feat_general_metrics.fig_precisionRecall_curve(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶

Display precision-recall curve for comparison on various models

Parameters: data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module
Returns: figure displaying line curves comparing precision-recall for various models
Return type: Figure

rarity.features.feat_general_metrics.fig_prediction_actual_comparison(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶

Display scatter plot for comparison on actual values (yTrue) vs prediction values (yPred)

Parameters: data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module
Returns: figure displaying scatter plot comparing actual values vs prediction values
Return type: Figure

rarity.features.feat_general_metrics.fig_prediction_offset_overview(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶

Display scatter plot for overview on prediction offset values

Parameters: data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module
Returns: figure displaying scatter plot outlining overview on prediction offset values
Return type: Figure

rarity.features.feat_general_metrics.fig_roc_curve(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶

Display roc curve for comparison on various models

Parameters: data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module
Returns: figure displaying line curves comparing roc-auc score for various models
Return type: Figure

rarity.features.feat_general_metrics.fig_standard_error_metrics(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶

Display table comparing various standard metrics for regression task

Parameters: data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module
Returns: table object comparing various standard metrics for regression task
Return type: DataTable

Feat - Miss Predictions¶

class rarity.features.MissPredictions(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶

Main integration for feature component on Miss Prediction.

On Regression: To generate single miss-prediction scatter plot by data index points

On Classification: To generate scatter plots for probabilities comparison on correct data point vs miss-predicted data point for each class label

Parameters: data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module

Important Attributes:

analysis_type (str):
Analysis type defined by user during initial inputs preparation via data_loader stage.

model_names (List[str]):
model names defined by user during initial inputs preparation via data_loader stage.

is_bimodal (bool):
to indicate if analysis involves 2 models

Returns: styled dash components displaying graph and/or table objects
Return type: Container

rarity.features.feat_miss_predictions.convert_relayout_data_to_df_cls(fig_class_label, relayout_data, df_feature, df_viz_specific)[source]¶

Convert raw data format from relayout selection range by user into the correct df fit for viz purpose

Parameters

fig_class_label (str) – class label name
relayout_data (Dict) – data containing selection range indices returned from plotly graph
df (DataFrame) – dataframe tap-out from interpreters pipeline
df_viz_specific (DataFrame) – dataframe prefiltered with right class label and model

Returns

dataframe fit for the responsive table-graph filtering

Return type

DataFrame

rarity.features.feat_miss_predictions.convert_relayout_data_to_df_reg(relayout_data, df, models)[source]¶

Convert raw data format from relayout selection range by user into the correct df fit for viz purpose

Parameters

relayout_data (Dict) – dictionary like data containing selection range indices returned from plotly graph
df (DataFrame) – dataframe tap-out from interpreters pipeline
models (List[str]) – model names defined by user during spin-up of Tenjin app

Returns

dataframe fit for the responsive table-graph filtering

Return type

DataFrame

rarity.features.feat_miss_predictions.fig_plot_prediction_offset_overview(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶

For use in regression task only. Display scatter plot for overview on prediction offset values

Parameters: data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module
Returns: figure displaying scatter plot outlining overview on prediction offset values by index
Return type: Figure

rarity.features.feat_miss_predictions.fig_probabilities_spread_pattern(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶

For use in classification task only. Function to output collated info packs used to display final graph objects and data tables

Parameters

data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module

Returns

Compact outputs consist of the followings

fig_objs_all_models (:obj: List[~plotly.graph_objects.Figure]): figure displaying scatter plot outlining probabilities comparison on correct data point vs miss-predicted data point for each class label
tables_all_models (List[~dash_table.DataTable]): table object outlining simple stats on ss, %correct, % wrong, accuracy for each label class
ls_dfs_viz (List[~pandas.DataFrame]): dataframes for overview visualization need with true labels and predicted labels included
df_features (DataFrame): dataframe storing all features used in dataset
ls_class_labels (List[str]): list of class labels found in dataset

rarity.features.feat_miss_predictions.table_with_relayout_datapoints(data, customized_cols, header, exp_format)[source]¶

Create table outlining dataframe content

Parameters

data (DataTable) – dictionary like format storing dataframe info under ‘record’ key
customized_cols (List[str]) – list of customized column names
header (Dict) – dictionary format storing the style info for table header
exp_format (str) – text info indicating the export format

Returns

table object outlining the dataframe content with specific styles

Return type

DataTable

Feat - Loss Clusters¶

class rarity.features.LossClusters(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶

Main integration for feature component on Loss Clusters.

Parameters: data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module

Important Attributes:

analysis_type (str):
Analysis type defined by user during initial inputs preparation via data_loader stage.

model_names (List[str]):
model names defined by user during initial inputs preparation via data_loader stage.

is_bimodal (bool):
to indicate if analysis involves 2 models

num_clusters (int):
Number of cluster to form

log_funct (math.log):
Mathematics logarithm function used to calculate log-loss between yTrue and yPred

specific_dataset (str):
Default to ‘All’ indicating to include all miss-predict labels. Other options flexibly expand depending on class labels

Returns: styled dash components displaying graph and/or table objects
Return type: Container

rarity.features.feat_loss_clusters.convert_cluster_relayout_data_to_df_cls(relayout_data: Dict, dfs_viz: List[pandas.core.frame.DataFrame], df_features: pandas.core.frame.DataFrame, models: List[str])[source]¶

For use in classification task only. Convert raw data format from relayout selection range by user into the correct df fit for viz purpose

Parameters

relayout_data (Dict) – dictionary like data containing selection range indices returned from plotly graph
dfs_viz (List[~pd.DataFrame]) – list of dataframes for overview visualization need with offset values included
df_features (DataFrame) – dataframe storing all features used in dataset
models (List[str]) – model names defined by user during spin-up of Tenjin app

Returns

Compact outputs consist of the followings

df_final_features (DataFrame): dataframe storing all features based on slicing info from relayout_data
df_final_probs (DataFrame): dataframe storing probability values by class label corresponding to the slicing relayout_data

rarity.features.feat_loss_clusters.convert_cluster_relayout_data_to_df_reg(relayout_data: Dict, df: pandas.core.frame.DataFrame, models: List[str])[source]¶

For use in regression task only. Convert raw data format from relayout selection range by user into the correct df fit for viz purpose

Parameters

relayout_data (Dict) – dictionary like data containing selection range indices returned from plotly graph
df (DataFrame) – dataframe tap-out from interpreters pipeline
models (List[str]) – model names defined by user during spin-up of Tenjin app

Returns

dataframe fit for the responsive table-graph filtering

Return type

DataFrame

rarity.features.feat_loss_clusters.fig_plot_logloss_clusters_cls(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader], num_cluster: int, log_func: math.log = <built-in function log>, specific_dataset: str = 'All')[source]¶

For use in classification task only. Function to output collated info packs used to display final graph objects by cluster groups along with calculated silhouette scores

Parameters

data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module
num_cluster (int) – Number of cluster to form
log_funct (math.log) – Mathematics logarithm function used to calculate log-loss between yTrue and yPred
specific_dataset (str) – Default to ‘All’ indicating to include all miss-predict labels. Other options flexibly expand depending on class labels

Returns

Compact outputs consist of the followings

ls_dfs_viz (List[~pd.DataFrame]): dataframes for overview visualization need with offset values included
fig_obj_cluster (Figure): figure displaying violin plot outlining cluster groups by offset values
ls_cluster_score (List[str]): list of silhouette scores, indication of clustering quality
fig_obj_elbow (Figure): figure displaying line plot outlining the change in sum of squared distances along the cluster range
ls_class_labels (List[str]): list of all class labels
ls_class_labels_misspred (List[str]): list of class labels with minimum of 1 miss-prediction
df_features (DataFrame): dataframe storing all features used in dataset

rarity.features.feat_loss_clusters.fig_plot_offset_clusters_reg(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader], num_cluster: int)[source]¶

For use in regression task only. Function to output collated info packs used to display final graph objects by cluster groups along with calculated silhouette scores

Parameters

data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module
num_cluster (int) – Number of cluster to form

Returns

Compact outputs consist of the followings

df (DataFrame): dataframes for overview visualization need with offset values included
fig_obj_cluster (Figure): figure displaying violin plot outlining cluster groups by offset values
ls_cluster_score (List[str]): list of silhouette scores, indication of clustering quality
fig_obj_elbow (Figure): figure displaying line plot outlining the change in sum of squared distances along the cluster range

rarity.features.feat_loss_clusters.table_with_relayout_datapoints(data: dash_table.DataTable.DataTable, customized_cols: List[str], header: Dict, exp_format: str)[source]¶

Create table outlining dataframe content

Parameters

data (DataTable) – dictionary like format storing dataframe info under ‘record’ key
customized_cols (List[str]) – list of customized column names
header (Dict) – dictionary format storing the style info for table header
exp_format (str) – text info indicating the export format

Returns

table object outlining the dataframe content with specific styles

Return type

DataTable

Feat - xFeature Distribution¶

class rarity.features.FeatureDistribution(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶

Main integration for feature component on Distribution

Parameters: data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module

Important Attributes:

analysis_type (str):
Analysis type defined by user during initial inputs preparation via data_loader stage.

model_names (List[str]):
model names defined by user during initial inputs preparation via data_loader stage.

is_bimodal (bool):
to indicate if analysis involves 2 models

feature_to_exclude (List of str, optional):
A list of features to be excluded from the kl-div calculation and visualization

df_features (DataFrame):
Dataframe storing all features used in dataset

specific_feature (List of str):
A list of features to be displayed along with the corresponding kl-div score

display_option (str):

info to indicate if to display distribution plot by top-N / bottom-N or both top-N + bottom-N

Available options: ‘top’, ‘bottom’ or ‘both’

display_value (int):

number indicates the limit of graph to be displayed, max at 10

if dataset consists of < 10 features, the limit == no. of features the dataset has

Returns: styled dash components displaying graph and/or table objects
Return type: Container

rarity.features.feat_feature_distribution.fig_plot_distribution_by_kl_div_ranking(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader], feature_to_exclude: List[str], start_idx: int, stop_idx: int, display_option: str, display_value: int)[source]¶

Integration of kl-divergence scores to corresponding fig-object

Parameters

data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module
feature_to_exclude (List of str, optional) – A list of features to be excluded from the kl-div calculation and visualization
start_idx (int, optional) – Integer number indicating the start index position to slice dataframe
stop_idx (int, optional) – Integer number indicating the stop index position to slice dataframe
display_option (str) –
- info to indicate if to display distribution plot by top-N / bottom-N or both top-N + bottom-N
- Available options: ‘top’, ‘bottom’ or ‘both’
display_value (int) –
- number indicates the limit of graph to be displayed, max at 10
- if dataset consists of < 10 features, the limit == no. of features the dataset has

Returns

dictionary storing distribution figures by display_option

Return type

Dict[~plotly.graph_objects.Figure]

Note

if classification, returns:

List[Dict[~plotly.graph_objects.Figure]]: list of dictionary storing distribution figures by display_option

rarity.features.feat_feature_distribution.fig_plot_distribution_by_specific_feature(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader], ls_specific_feature, start_idx, stop_idx)[source]¶

Integration of kl-divergence scores to specific fig-object

Parameters

data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module
ls_specific_feature (List of str) – A list of features to be displayed along with the corresponding kl-div score
start_idx (int, optional) – Integer number indicating the start index position to slice dataframe
stop_idx (int, optional) – Integer number indicating the stop index position to slice dataframe

Returns

list of figure objects displaying the distribution plot based on kl-divergence score

Return type

List[~plotly.graph_objects.Figure]

Feat - Similarities (+CounterFactuals)¶

class rarity.features.SimilaritiesCF(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶

Main integration for feature component on Similarities-CounterFactuals

Parameters: data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module

Important Attributes:

analysis_type (str):
Analysis type defined by user during initial inputs preparation via data_loader stage.

df_features (DataFrame):
Dataframe storing all features used in dataset

feature_to_exclude (List[str], optional):
A list of features to be excluded from the ranking and similarities distance calculation

user_defined_idx (int):
Index of the data point of interest specified by user

top_n (int):
Number indicating the max limit of records to be displayed based on the distance ranking

Returns: styled dash components displaying graph and/or table objects
Return type: Container

rarity.features.feat_similarities_counter_factuals.generate_counterfactuals(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader], user_defined_idx, feature_to_exclude=None, top_n=3)[source]¶

Tapout table collating feature info corresponding to user defined index and top N index based on distance score with condition that the prediction labels of top N index differ from prediction label of user defined index Applicable to both classification only

Parameters

data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module
user_defined_idx (int) – Index of the data point of interest specified by user
feature_to_exclude (List[str], optional) – A list of features to be excluded from the ranking and similarities distance calculation
top_n (int) – Number indicating the max limit of records to be displayed based on the distance ranking

Returns

table object outlining the dataframe content with dynamic-conditional styles

Return type

DataTable

rarity.features.feat_similarities_counter_factuals.generate_similarities(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader], user_defined_idx, feature_to_exclude=None, top_n=3)[source]¶

Tapout table collating feature info corresponding to user defined index and top N index based on distance score. Applicable to both regression and classification

Parameters

data_loader (CSVDataLoader or DataframeLoader) – Class object from data_loader module
user_defined_idx (int) – Index of the data point of interest specified by user
feature_to_exclude (List[str], optional) – A list of features to be excluded from the ranking and similarities distance calculation
top_n (int) – Number indicating the max limit of records to be displayed based on the distance ranking

Returns

table object outlining the dataframe content with dynamic-conditional styles

Return type

DataTable