Features¶
Modules under Features act as core integrator to link up inputs from interpreters and outputs interactive graphs via visualizers. Major styled
components built with dash are defined at this stage and customized accordingly in respective feature modules depending on the task it serves. All responsive
parameters and callbacks managements are handled in this stage as well.
Feat - General Metrics¶
- class rarity.features.GeneralMetrics(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶
Main integration for feature component on General Metrics.
On Regression:
Prediction vs Actual,Prediction vs OffsetOn Classification:
Confusion Matrix,Classification Report,ROC,Precision-Recall
- Parameters
data_loader (
CSVDataLoaderorDataframeLoader) – Class object from data_loader module- Returns
styled dash components displaying graph and/or table objects
- Return type
Container
- rarity.features.feat_general_metrics.fig_classification_report(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶
Create classification report in table form
- Parameters
data_loader (
CSVDataLoaderorDataframeLoader) – Class object from data_loader module- Returns
list of tables displaying classification report details
- Return type
List[~plotly.graph_objects.Figure]
- rarity.features.feat_general_metrics.fig_confusion_matrix(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶
Create confusion matrix
- Parameters
data_loader (
CSVDataLoaderorDataframeLoader) – Class object from data_loader module- Returns
figure displaying confusion matrix details
- Return type
Figure
- rarity.features.feat_general_metrics.fig_precisionRecall_curve(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶
Display precision-recall curve for comparison on various models
- Parameters
data_loader (
CSVDataLoaderorDataframeLoader) – Class object from data_loader module- Returns
figure displaying line curves comparing precision-recall for various models
- Return type
Figure
- rarity.features.feat_general_metrics.fig_prediction_actual_comparison(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶
Display scatter plot for comparison on actual values (yTrue) vs prediction values (yPred)
- Parameters
data_loader (
CSVDataLoaderorDataframeLoader) – Class object from data_loader module- Returns
figure displaying scatter plot comparing actual values vs prediction values
- Return type
Figure
- rarity.features.feat_general_metrics.fig_prediction_offset_overview(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶
Display scatter plot for overview on prediction offset values
- Parameters
data_loader (
CSVDataLoaderorDataframeLoader) – Class object from data_loader module- Returns
figure displaying scatter plot outlining overview on prediction offset values
- Return type
Figure
- rarity.features.feat_general_metrics.fig_roc_curve(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶
Display roc curve for comparison on various models
- Parameters
data_loader (
CSVDataLoaderorDataframeLoader) – Class object from data_loader module- Returns
figure displaying line curves comparing roc-auc score for various models
- Return type
Figure
- rarity.features.feat_general_metrics.fig_standard_error_metrics(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶
Display table comparing various standard metrics for regression task
- Parameters
data_loader (
CSVDataLoaderorDataframeLoader) – Class object from data_loader module- Returns
table object comparing various standard metrics for regression task
- Return type
DataTable
Feat - Miss Predictions¶
- class rarity.features.MissPredictions(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶
Main integration for feature component on Miss Prediction.
On Regression: To generate single miss-prediction scatter plot by data index points
On Classification: To generate scatter plots for probabilities comparison on correct data point vs miss-predicted data point for each class label
- Parameters
data_loader (
CSVDataLoaderorDataframeLoader) – Class object from data_loader module
Important Attributes:
- analysis_type (str):
Analysis type defined by user during initial inputs preparation via data_loader stage.
- model_names (
List[str]): model names defined by user during initial inputs preparation via data_loader stage.
- model_names (
- is_bimodal (bool):
to indicate if analysis involves 2 models
- Returns
styled dash components displaying graph and/or table objects
- Return type
Container
- rarity.features.feat_miss_predictions.convert_relayout_data_to_df_cls(fig_class_label, relayout_data, df_feature, df_viz_specific)[source]¶
Convert raw data format from relayout selection range by user into the correct df fit for viz purpose
- Parameters
fig_class_label (str) – class label name
relayout_data (
Dict) – data containing selection range indices returned from plotly graphdf (
DataFrame) – dataframe tap-out from interpreters pipelinedf_viz_specific (
DataFrame) – dataframe prefiltered with right class label and model
- Returns
dataframe fit for the responsive table-graph filtering
- Return type
DataFrame
- rarity.features.feat_miss_predictions.convert_relayout_data_to_df_reg(relayout_data, df, models)[source]¶
Convert raw data format from relayout selection range by user into the correct df fit for viz purpose
- Parameters
relayout_data (
Dict) – dictionary like data containing selection range indices returned from plotly graphdf (
DataFrame) – dataframe tap-out from interpreters pipelinemodels (
List[str]) – model names defined by user during spin-up of Tenjin app
- Returns
dataframe fit for the responsive table-graph filtering
- Return type
DataFrame
- rarity.features.feat_miss_predictions.fig_plot_prediction_offset_overview(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶
For use in regression task only. Display scatter plot for overview on prediction offset values
- Parameters
data_loader (
CSVDataLoaderorDataframeLoader) – Class object from data_loader module- Returns
figure displaying scatter plot outlining overview on prediction offset values by index
- Return type
Figure
- rarity.features.feat_miss_predictions.fig_probabilities_spread_pattern(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶
For use in classification task only. Function to output collated info packs used to display final graph objects and data tables
- Parameters
data_loader (
CSVDataLoaderorDataframeLoader) – Class object from data_loader module- Returns
Compact outputs consist of the followings
fig_objs_all_models (:obj: List[~plotly.graph_objects.Figure]): figure displaying scatter plot outlining probabilities comparison on correct data point vs miss-predicted data point for each class label
tables_all_models (
List[~dash_table.DataTable]): table object outlining simple stats on ss, %correct, % wrong, accuracy for each label classls_dfs_viz (
List[~pandas.DataFrame]): dataframes for overview visualization need with true labels and predicted labels includeddf_features (
DataFrame): dataframe storing all features used in datasetls_class_labels (
List[str]): list of class labels found in dataset
- rarity.features.feat_miss_predictions.table_with_relayout_datapoints(data, customized_cols, header, exp_format)[source]¶
Create table outlining dataframe content
- Parameters
data (
DataTable) – dictionary like format storing dataframe info under ‘record’ keycustomized_cols (
List[str]) – list of customized column namesheader (
Dict) – dictionary format storing the style info for table headerexp_format (str) – text info indicating the export format
- Returns
table object outlining the dataframe content with specific styles
- Return type
DataTable
Feat - Loss Clusters¶
- class rarity.features.LossClusters(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶
Main integration for feature component on Loss Clusters.
- Parameters
data_loader (
CSVDataLoaderorDataframeLoader) – Class object from data_loader module
Important Attributes:
- analysis_type (str):
Analysis type defined by user during initial inputs preparation via data_loader stage.
- model_names (
List[str]): model names defined by user during initial inputs preparation via data_loader stage.
- is_bimodal (bool):
to indicate if analysis involves 2 models
- num_clusters (int):
Number of cluster to form
- log_funct (
math.log): Mathematics logarithm function used to calculate log-loss between yTrue and yPred
- specific_dataset (str):
Default to ‘All’ indicating to include all miss-predict labels. Other options flexibly expand depending on class labels
- Returns
styled dash components displaying graph and/or table objects
- Return type
Container
- rarity.features.feat_loss_clusters.convert_cluster_relayout_data_to_df_cls(relayout_data: Dict, dfs_viz: List[pandas.core.frame.DataFrame], df_features: pandas.core.frame.DataFrame, models: List[str])[source]¶
For use in classification task only. Convert raw data format from relayout selection range by user into the correct df fit for viz purpose
- Parameters
relayout_data (
Dict) – dictionary like data containing selection range indices returned from plotly graphdfs_viz (
List[~pd.DataFrame]) – list of dataframes for overview visualization need with offset values includeddf_features (
DataFrame) – dataframe storing all features used in datasetmodels (
List[str]) – model names defined by user during spin-up of Tenjin app
- Returns
Compact outputs consist of the followings
df_final_features (
DataFrame): dataframe storing all features based on slicing info from relayout_datadf_final_probs (
DataFrame): dataframe storing probability values by class label corresponding to the slicing relayout_data
- rarity.features.feat_loss_clusters.convert_cluster_relayout_data_to_df_reg(relayout_data: Dict, df: pandas.core.frame.DataFrame, models: List[str])[source]¶
For use in regression task only. Convert raw data format from relayout selection range by user into the correct df fit for viz purpose
- Parameters
relayout_data (
Dict) – dictionary like data containing selection range indices returned from plotly graphdf (
DataFrame) – dataframe tap-out from interpreters pipelinemodels (
List[str]) – model names defined by user during spin-up of Tenjin app
- Returns
dataframe fit for the responsive table-graph filtering
- Return type
DataFrame
- rarity.features.feat_loss_clusters.fig_plot_logloss_clusters_cls(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader], num_cluster: int, log_func: math.log = <built-in function log>, specific_dataset: str = 'All')[source]¶
For use in classification task only. Function to output collated info packs used to display final graph objects by cluster groups along with calculated silhouette scores
- Parameters
data_loader (
CSVDataLoaderorDataframeLoader) – Class object from data_loader modulenum_cluster (int) – Number of cluster to form
log_funct (
math.log) – Mathematics logarithm function used to calculate log-loss between yTrue and yPredspecific_dataset (str) – Default to ‘All’ indicating to include all miss-predict labels. Other options flexibly expand depending on class labels
- Returns
Compact outputs consist of the followings
ls_dfs_viz (
List[~pd.DataFrame]): dataframes for overview visualization need with offset values includedfig_obj_cluster (
Figure): figure displaying violin plot outlining cluster groups by offset valuesls_cluster_score (
List[str]): list of silhouette scores, indication of clustering qualityfig_obj_elbow (
Figure): figure displaying line plot outlining the change in sum of squared distances along the cluster rangels_class_labels (
List[str]): list of all class labelsls_class_labels_misspred (
List[str]): list of class labels with minimum of 1 miss-predictiondf_features (
DataFrame): dataframe storing all features used in dataset
- rarity.features.feat_loss_clusters.fig_plot_offset_clusters_reg(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader], num_cluster: int)[source]¶
For use in regression task only. Function to output collated info packs used to display final graph objects by cluster groups along with calculated silhouette scores
- Parameters
data_loader (
CSVDataLoaderorDataframeLoader) – Class object from data_loader modulenum_cluster (int) – Number of cluster to form
- Returns
Compact outputs consist of the followings
df (
DataFrame): dataframes for overview visualization need with offset values includedfig_obj_cluster (
Figure): figure displaying violin plot outlining cluster groups by offset valuesls_cluster_score (
List[str]): list of silhouette scores, indication of clustering qualityfig_obj_elbow (
Figure): figure displaying line plot outlining the change in sum of squared distances along the cluster range
- rarity.features.feat_loss_clusters.table_with_relayout_datapoints(data: dash_table.DataTable.DataTable, customized_cols: List[str], header: Dict, exp_format: str)[source]¶
Create table outlining dataframe content
- Parameters
data (
DataTable) – dictionary like format storing dataframe info under ‘record’ keycustomized_cols (
List[str]) – list of customized column namesheader (
Dict) – dictionary format storing the style info for table headerexp_format (str) – text info indicating the export format
- Returns
table object outlining the dataframe content with specific styles
- Return type
DataTable
Feat - xFeature Distribution¶
- class rarity.features.FeatureDistribution(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶
Main integration for feature component on Distribution
- Parameters
data_loader (
CSVDataLoaderorDataframeLoader) – Class object from data_loader module
Important Attributes:
- analysis_type (str):
Analysis type defined by user during initial inputs preparation via data_loader stage.
- model_names (
List[str]): model names defined by user during initial inputs preparation via data_loader stage.
- is_bimodal (bool):
to indicate if analysis involves 2 models
- feature_to_exclude (List of
str, optional): A list of features to be excluded from the kl-div calculation and visualization
- df_features (
DataFrame): Dataframe storing all features used in dataset
- specific_feature (List of
str): A list of features to be displayed along with the corresponding kl-div score
- display_option (str):
info to indicate if to display distribution plot by top-N / bottom-N or both top-N + bottom-N
Available options: ‘top’, ‘bottom’ or ‘both’
- display_value (int):
number indicates the limit of graph to be displayed, max at 10
if dataset consists of < 10 features, the limit == no. of features the dataset has
- Returns
styled dash components displaying graph and/or table objects
- Return type
Container
- rarity.features.feat_feature_distribution.fig_plot_distribution_by_kl_div_ranking(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader], feature_to_exclude: List[str], start_idx: int, stop_idx: int, display_option: str, display_value: int)[source]¶
Integration of kl-divergence scores to corresponding fig-object
- Parameters
data_loader (
CSVDataLoaderorDataframeLoader) – Class object from data_loader modulefeature_to_exclude (List of
str, optional) – A list of features to be excluded from the kl-div calculation and visualizationstart_idx (
int, optional) – Integer number indicating the start index position to slice dataframestop_idx (
int, optional) – Integer number indicating the stop index position to slice dataframedisplay_option (str) –
info to indicate if to display distribution plot by top-N / bottom-N or both top-N + bottom-N
Available options: ‘top’, ‘bottom’ or ‘both’
display_value (int) –
number indicates the limit of graph to be displayed, max at 10
if dataset consists of < 10 features, the limit == no. of features the dataset has
- Returns
dictionary storing distribution figures by display_option
- Return type
Dict[~plotly.graph_objects.Figure]
Note
if classification, returns:
List[Dict[~plotly.graph_objects.Figure]]: list of dictionary storing distribution figures by display_option
- rarity.features.feat_feature_distribution.fig_plot_distribution_by_specific_feature(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader], ls_specific_feature, start_idx, stop_idx)[source]¶
Integration of kl-divergence scores to specific fig-object
- Parameters
data_loader (
CSVDataLoaderorDataframeLoader) – Class object from data_loader modulels_specific_feature (List of
str) – A list of features to be displayed along with the corresponding kl-div scorestart_idx (
int, optional) – Integer number indicating the start index position to slice dataframestop_idx (
int, optional) – Integer number indicating the stop index position to slice dataframe
- Returns
list of figure objects displaying the distribution plot based on kl-divergence score
- Return type
List[~plotly.graph_objects.Figure]
Feat - Similarities (+CounterFactuals)¶
- class rarity.features.SimilaritiesCF(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader])[source]¶
Main integration for feature component on Similarities-CounterFactuals
- Parameters
data_loader (
CSVDataLoaderorDataframeLoader) – Class object from data_loader module
Important Attributes:
- analysis_type (str):
Analysis type defined by user during initial inputs preparation via data_loader stage.
- df_features (
DataFrame): Dataframe storing all features used in dataset
- feature_to_exclude (
List[str], optional): A list of features to be excluded from the ranking and similarities distance calculation
- user_defined_idx (int):
Index of the data point of interest specified by user
- top_n (int):
Number indicating the max limit of records to be displayed based on the distance ranking
- Returns
styled dash components displaying graph and/or table objects
- Return type
Container
- rarity.features.feat_similarities_counter_factuals.generate_counterfactuals(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader], user_defined_idx, feature_to_exclude=None, top_n=3)[source]¶
Tapout table collating feature info corresponding to user defined index and top N index based on distance score with condition that the prediction labels of top N index differ from prediction label of user defined index Applicable to both classification only
- Parameters
data_loader (
CSVDataLoaderorDataframeLoader) – Class object from data_loader moduleuser_defined_idx (int) – Index of the data point of interest specified by user
feature_to_exclude (
List[str], optional) – A list of features to be excluded from the ranking and similarities distance calculationtop_n (int) – Number indicating the max limit of records to be displayed based on the distance ranking
- Returns
table object outlining the dataframe content with dynamic-conditional styles
- Return type
DataTable
- rarity.features.feat_similarities_counter_factuals.generate_similarities(data_loader: Union[rarity.data_loader.data_loader.CSVDataLoader, rarity.data_loader.data_loader.DataframeLoader], user_defined_idx, feature_to_exclude=None, top_n=3)[source]¶
Tapout table collating feature info corresponding to user defined index and top N index based on distance score. Applicable to both regression and classification
- Parameters
data_loader (
CSVDataLoaderorDataframeLoader) – Class object from data_loader moduleuser_defined_idx (int) – Index of the data point of interest specified by user
feature_to_exclude (
List[str], optional) – A list of features to be excluded from the ranking and similarities distance calculationtop_n (int) – Number indicating the max limit of records to be displayed based on the distance ranking
- Returns
table object outlining the dataframe content with dynamic-conditional styles
- Return type
DataTable