Rarity is a diagnostic library for tabular data with minimal setup to enable deep dive into datasets identifying features that could have potentially influenced the model prediction performance. It is meant to be used at post model training phase to ease the understanding on miss predictions and carry out systematic analysis to identify the gap between actual values versus prediction values. The auto-generated gap analysis is presented as a dash web application with flexible parameters at several feature components.
The inputs needed to auto-generate gap analysis report with Rarity are solely depending on features, yTrue and yPred values. Rarity is therefore a model anogstic package and can be used to inspect miss predictions on tabular data generated by any model framework.
Supported Analysis Type¶
Rarity currently supports tasks related to
Regression
Binary Classifciation
Multiclass Classification
It can also be used to conduct bimodal analysis. As it is used to inspect miss predictions in details down to the granularity at each data index level, multiple modal analysis won’t be ideal for repetition at individual data index for each model. Therefore, the package supports upto 2 model miss prediction gap analysis for side-by-side comparison benefiting more during the post model training and final phase of model fine-tuning stage.
Core Feature Components¶
There are five core feature components covered in the auto-generated gap analysis report by Rarity:
General Metrics : covers general metrics used to evaluate model performance.
Miss Predictions : presents miss predictions scatter plot by index number
Loss Clusters : covers clustering info on offset values (regression) and logloss values (classification)
xFeature Distribution : distribution plots ranked by kl-divergence score
Similarities : tabulated info listing top-n data points based on similarities in features with reference to data index specified by user
Counter-Factuals is also included under Similarities component tab for classification task to better compare data points with most similar features but show different prediction outcomes. For futher details on how the feature components are displayed in the web application, please checkout the examples under section Features Introduction