Visualizers¶
Modules under Visualizers are mainly responsible for all interactive graphing works. It takes in direct inputs or post-processed inputs from interpreters
and generate various plots using plotly frameworks. The types of graph generated depend on the feature component which is linked to specific task.
Viz - General Metrics¶
- rarity.visualizers.general_metrics.plot_classification_report(yTrue: pandas.core.series.Series, yPred: pandas.core.series.Series, model_names: List)[source]¶
Create classification report in table form
- Parameters
yTrue (
pd.Series) – true labels, output from int_general_metricsyPred (
pd.Series) – predicted labels, output from int_general_metricsmodel_names (
List[str]) – model names, output from interpreter int_general_metrics
- Returns
list of tables displaying classification report details
- Return type
List[~plotly.graph_objects.Figure]
- rarity.visualizers.general_metrics.plot_confusion_matrix(yTrue: pandas.core.series.Series, yPred: pandas.core.series.Series, model_names: List)[source]¶
Create confusion matrix
- Parameters
yTrue (
pd.Series) – true labels, output from int_general_metricsyPred (
pd.Series) – predicted labels, output from int_general_metricsmodel_names (
List[str]) – model names, output from interpreter int_general_metrics
- Returns
figure displaying confusion matrix details
- Return type
Figure
- rarity.visualizers.general_metrics.plot_precisionRecall_curve(yTrue: pandas.core.series.Series, yPred: pandas.core.series.Series, model_names: List)[source]¶
Display precision-recall curve for comparison on various models
- Parameters
yTrue (
pd.Series) – true labels, output from int_general_metricsyPred (
pd.Series) – predicted labels, output from int_general_metricsmodel_names (
List[str]) – model names, output from interpreter int_general_metrics
- Returns
figure displaying line curves comparing precision-recall for various models
- Return type
Figure
- rarity.visualizers.general_metrics.plot_prediction_offset_overview(df: pandas.core.frame.DataFrame)[source]¶
Display scatter plot for overview on prediction offset values
- Parameters
df (
DataFrame) – dataframe containing yTrue and yPred values, output from int_general_metrics- Returns
figure displaying scatter plot outlining overview on prediction offset values
- Return type
Figure
- rarity.visualizers.general_metrics.plot_prediction_vs_actual(df: pandas.core.frame.DataFrame)[source]¶
Display scatter plot for comparison on actual values vs prediction values
- Parameters
df (
pd.DataFrame) – dataframe containing yTrue and yPred values, output from int_general_metrics- Returns
figure displaying scatter plot comparing actual values vs prediction values
- Return type
Figure
- rarity.visualizers.general_metrics.plot_roc_curve(yTrue: pandas.core.series.Series, yPred: pandas.core.series.Series, model_names: List)[source]¶
Display roc curve for comparison on various models
- Parameters
yTrue (
pd.Series) – true labels, output from int_general_metricsyPred (
pd.Series) – predicted labels, output from int_general_metricsmodel_names (
List[str]) – model names, output from interpreter int_general_metrics
- Returns
figure displaying line curves comparing roc-auc score for various models
- Return type
Figure
- rarity.visualizers.general_metrics.plot_std_error_metrics(df: pandas.core.frame.DataFrame)[source]¶
Display table comparing various standard metrics for regression task
- Parameters
df (
DataFrame) – dataframe containing info on error metrics, output from int_general_metrics- Returns
table object comparing various standard metrics for regression task
- Return type
DataTable
Viz - Miss Predictions¶
- rarity.visualizers.miss_predictions.plot_prediction_offset_overview(df: pandas.core.frame.DataFrame)[source]¶
Display scatter plot for overview on prediction offset values
- Parameters
df (
DataFrame) – dataframe containing calculated offset values, output from int_miss_predictions- Returns
figure displaying scatter plot outlining overview on prediction offset values by index
- Return type
Figure
- rarity.visualizers.miss_predictions.plot_probabilities_spread_pattern(df_specific_label: pandas.core.frame.DataFrame)[source]¶
Display scatter plot for probabilities comparison on correct data point vs miss-predicted data point for each class label
- Parameters
df_specific_label (
DataFrame) – dataframe of 1 specific label of 1 model type, output from int_miss_predictions- Returns
figure displaying scatter plot outlining probabilities comparison on correct data point vs miss-predicted data point for each class label
- Return type
Figure
- rarity.visualizers.miss_predictions.plot_simple_probs_spread_overview(df_label_state: pandas.core.frame.DataFrame)[source]¶
Display data table listing simple stats on ss, %correct, % wrong, accuracy for each label class
- Parameters
df_label_state (
DataFrame) – dataframe containing info on simple stats, output from int_miss_predictions- Returns
table object outlining simple stats on ss, %correct, % wrong, accuracy for each label class
- Return type
DataTable
Viz - Loss Clusters¶
- rarity.visualizers.loss_clusters.plot_logloss_clusters(dfs: List[pandas.core.frame.DataFrame], analysis_type: str)[source]¶
For use in classification task only. Function to plot figure displaying cluster groups by log-loss values
- Parameters
dfs (
List[~pd.DataFrame]) – list of dataframes containing cluster info, output from int_loss_clustersanalysis_type (str) – info to indicate if analysis is regression or classification, info inherited from data_loader
- Returns
figure displaying violin plot outlining cluster groups by log-loss values
- Return type
Figure
- rarity.visualizers.loss_clusters.plot_offset_clusters(df: pandas.core.frame.DataFrame, analysis_type: str)[source]¶
For use in regression task only. Function to plot figure displaying cluster groups by prediction offset values
- Parameters
df (
DataFrame) – dataframe containing cluster info, output from int_loss_clustersanalysis_type (str) – info to indicate if analysis is regression or classification, info inherited from data_loader
- Returns
figure displaying violin plot outlining cluster groups by offset values
- Return type
Figure
- rarity.visualizers.loss_clusters.plot_optimum_cluster_via_elbow_method(cluster_range: List[int], sum_squared_distance: List[float], models: List[str])[source]¶
Figure to guide decision on the number of clusters that is reasonable to form with KMean method
- Parameters
cluster_range (
List[int]) – list of integers indicating the number of clusterssum_squared_distance (
List[float]) – list of sum of squared distance generated via kmean_inertiamodels (
List[str]) – list of models used to generate yPred
- Returns
figure displaying line plot outlining the change in sum of squared distances along the cluster range
- Return type
Figure
Viz - xFeature Distribution¶
- rarity.visualizers.xfeature_distribution.plot_distribution_by_kl_div_ranking(kl_div_dict_sorted: Dict, display_option: str, display_value: int, comparison_base: str, model_name: str)[source]¶
Create distribution plot by kl-divergence score ranking in descending order
- Parameters
kl_div_dict_sorted (
Dict) – dictionary storing kl-divergence score by feature in decending orderdisplay_option (str) –
info to indicate if to display distribution plot by top-N / bottom-N or both top-N + bottom-N
Available options:
top,bottomorboth
display_value (int) –
number indicates the limit of graph to be displayed, max at 10
if dataset consists of < 10 features, the limit == no. of features the dataset has
comparison_base (str) – info to indicate the baseline for distribution comparison.
dataset_typefor regression andpred_statefor classification taskmodel_name (str) – model used to generate yPred
- Returns
Dictionary storing distribution figures by display_option
- Return type
Dict[str, ~plotly.graph_objects.Figure]
- rarity.visualizers.xfeature_distribution.plot_distribution_by_specific_feature(ls_specific_feature: List[str], kl_div_dict_sorted: Dict, comparison_base: str, model_name: str)[source]¶
Create distribution plot for a specific feature
- Parameters
ls_specific_feature (
List[str]) – list of feature to have its distribution graph plottedkl_div_dict_sorted (
Dict) – dictionary storing kl-divergence score by feature in decending ordercomparison_base (str) – info to indicate the baseline for distribution comparison.
dataset_typefor regression andpred_statefor classification taskmodel_name (str) – model used to generate yPred
- Returns
List of figures displaying distribution plot of specific feature
- Return type
List[~plotly.graph_objects.Figure]