The HumanFirst Academy

Learn NLU SuperPowers 🦸by building based on what your data is showing you and in the way your classifier can understand.#

The academy is a workflow driven course which takes you through the typical stages of creating or working with an NLU model from a set of unlabelled data in the HumanFirst tooling.

Each module contains a video explaining the benefits of the workflow and how to do it and links to the tool documentation for any features used.

To get the best out of the course you'll need access to an instance of humanfirst and the example scripts. If you already have an instance of humanfirst, you should also have joined our slack channel. Look for the #academy channel.


To Follow#

  • Building your first high level model based on what the data is telling you
  • Subdividing your first high level intents into smaller ones based on what the classifier is telling you
  • Investigating an issue raised by your business within the data set
  • Levelling the training data down to optimise it
  • Validating, Measuring and increassing the coverage of your model
  • Tuning your model and increasing it's accuracy
  • Creating a matching blindset for your model
  • Releasing your model

So anyone can follow along we have used publicly available datasets:

Action Based Conversation Dataset (abcd)#

This is a set of 10,042 human customer to human agent conversations for a fictional online clothes retailer who offers a subscription model, a website, and ships products to people. It sells Jeans, Boots, Jackets, and Shirts from the brands Calvin Klein, Michael Kors, Tommy Hilfiger and Guess accross 9 US cities Baltimore, Brighton, Jacksonville, La Fayette, Monterey, Newark, Raleigh, San Lima and San Mateo.

This set shares a lot of similarities with the sort of datasets our customers often have when the start trying to build a new classifier or bot: It's a set of unequal knowledge conversations where the agent is a lot more knowledgeable than the consumer. It's marked up already with who was speaking Agent/Expert, Customer/Client and also system Actions where the system does something like process payment. It's been generated from real web and mobile inputs so it has typical text-like speak including users repeatedly saying things without waiting for a response or correcting themselves with * on the next line. It has metadata on it about the scenario and customer which can be loaded to give additional context in HumanFirst. Conversations tend to go quite long an average 22.1 turns which is longer than many bot conversations.

Chen et al. took care on the agent training and for filtering for bad roleplayer behaviour like cut and pasting across conversations. So we find it's quite diverse in terms of language (though not as much as many raw production datasets), but as each customer was given a brief on the problem the were facing from a known set of ~50 minor scenarios, across different products and brands, split between 10 major scenarios like having an issue with the site, or with shipping, or being mischarged. All fairly typical bot intent scenarios.

So it's not perfect but it's a pretty good example of loading data with a conversational context.


Chen, D., Chen, H., Yang, Y., Lin, A. and Yu, Z., 2021. Action-based conversations dataset: A corpus for building more in-depth task-oriented dialogue systems. arXiv preprint arXiv:2104.00783.

We provide a downloader and parser as part of the academy, but the raw data set is available here under the MIT license:

Liu et al dataset (liuetal)#

This is a set of single utterances (i.e not in conversations) for a fictional PDA style command and control bot called Olly. It was generated using Amazon Mechanical Turk (MTurk) where each user is given an example scenario across multi domains from playing music, to querying the weather or calendar, turning on lights to chitchat.

After removing the utterances labelled by the Authors as irrelevant it ends up being around 25k utterances split across 18 major/parent categories with 68 intents, with labelled utterances from 25-1400 per intent. It's also annotated with entities. It's also been normalized by the authors for use across multiple systems so it looses some useful subtleties so things like "At 7am!!!" become "at seven am". Only the normalised data is available annotated.

The authors designed this as a benchmarking set and we'll be using it as an example of loading utterance (rather than conversation) data and doing accuracy comparison and optimisations.


Liu, X., Eshghi, A., Swietojanski, P. and Rieser, V., 2019. Benchmarking natural language understanding services for building conversational agents. arXiv preprint arXiv:1903.05566.

Again we provide a downloader and parser as part of the academy. The raw data set is available here under the Creative Commons Attribution 4.0 International license: