Evaluation

Evaluation reports help identify weaknesses, improvements or regressions in your model.

k-fold#

You can run a k-fold cross-validation evaluation anytime within HumanFirst. When running a k-fold you can specify the number of folds (more folds will increase the evaluation run time, but will yield more accurate evaluation results) and also exclude/include intents or training examples from the evaluation.

Regression (Test set)#

You can run a regression evaluation using a dedicated test set. Your test set is defined using tags.

Both evaluations will expose the following metrics:

Precision#

Precision helps understand how precise your model is at predicting actual positives. It is calculated using the following formula : # of true positives / (# of true positives + # of false positives).

Recall#

Recall helps understand how well your model is at predicting all actual positives. It is calculated using the following formula : # of true positives / (# of true positive + # of false negatives).

F1#

F1 is a combination of precision and recall. The formula is the following: 2 x [(Precision*Recall) / (Precision + Recall)]

Accuracy#

Accuracy is calculated using the following formula : # of correct predictions / # of all predictions

Comparing Evaluations#

By selecting multiple evaluations, you can compare them to visualize how your models' performance has changed over time.

Exporting Evaluations#

You can easily export the full results of and evaluation report by clicking on the menu next to the summary and clicking Export to CSV. The CSV will contain a summary for intents, phrases and entities.

Advanced options#

When running an evaluation you can exclude parts of your training data (intents and/or training examples) using tags. Tags can be set on both intents and training examples. This is particularly useful if your model contains context specific flows (like Dialogflow Pages/Flows). It allows to test specific parts of your model and reflect the mutual exclusiveness intents might have.

info

Learn more about NLU and the various metrics we expose at the Machine Learning University (MLU)