Evaluation
Evaluation reports help identify weaknesses, improvements or regressions in your model.
#
k-foldYou can run a k-fold cross-validation evaluation anytime within HumanFirst. When running a k-fold you can specify the number of folds (more folds will increase the evaluation run time, but will yield more accurate evaluation results) and also exclude/include intents or training examples from the evaluation.
#
Regression (Test set)You can run a regression evaluation using a dedicated test set. Your test set is defined using tags.
Both evaluations will expose the following metrics:
#
PrecisionPrecision helps understand how precise your model is at predicting actual positives. It is calculated using the following formula : # of true positives / (# of true positives + # of false positives)
.
#
RecallRecall helps understand how well your model is at predicting all actual positives. It is calculated using the following formula : # of true positives / (# of true positive + # of false negatives)
.
#
F1F1 is a combination of precision and recall. The formula is the following: 2 x [(Precision*Recall) / (Precision + Recall)]
#
AccuracyAccuracy is calculated using the following formula : # of correct predictions / # of all predictions
#
Comparing EvaluationsBy selecting multiple evaluations, you can compare them to visualize how your models' performance has changed over time.
#
Exporting EvaluationsYou can easily export the full results of and evaluation report by clicking on the menu next to the summary and clicking Export to CSV
. The CSV will contain a summary for intents, phrases and entities.
#
Advanced optionsWhen running an evaluation you can exclude parts of your training data (intents and/or training examples) using tags. Tags can be set on both intents and training examples. This is particularly useful if your model contains context specific flows (like Dialogflow Pages/Flows). It allows to test specific parts of your model and reflect the mutual exclusiveness intents might have.
info
Learn more about NLU and the various metrics we expose at the Machine Learning University (MLU)