Clustering

What problem does this solve?#

Clustering is an ideal tool to bootstrap a new intent structure, to discover new intents in an existing model, to quickly improve existing intents or even to expand existing intents into more specialized children.

While our intent-specific suggestion feature is ideal for drawing the fine line between what belongs to an intent and what doesn't, our clustering tool is what gets most of the labeling & discovery work done in very little time.

Getting basic clusters#

Make sure to have uploaded data

If you go to the Unlabeled data view and click on Get clusters, you'll be provided with a list of all the most likely clusters within your data. These may match existing intents or they may represent completely new intents.

Selecting a cluster is akin to selecting an utterance, only it selects all the utterances within the proposed cluster. Feel free to unselect items that may not be relevant to your use-case. You may also select further clusters or continue manually searching unlabeled data to expand your selection list. Once you're done, label the selected utterances as usual and continue with your workflow.

Finding new intents#

If you sort your unlabeled data by NLU Entropy and then click on Get clusters, you'll obtain a list of clusters that don't match well with your existing intents. These are more likely to represent new intents.

Finding specialized intents#

If you have an existing intent and need to identify specialized versions of it, clustering can help. For example, if you have an intent about "Loans", you may want to create specialized child intents that target "Mortgages", "Student loans" or "Car loans".

First, go to the Labeled data view and select the intent from which you want to start specializing, click Get clusters. You'll obtain a list of clusters matching that intent and this should help identify specialized sub-intents that need to be created. Using the clustering Granularity (see below) control can further improve the results.

Fine-tuning your clusters#

We expose various controls to help you find the types of clusters you need. Use these before clicking on Get clusters.

  • Keyword search can help pin-point topics.
  • Minimum cluster size can help weed out clusters that may not have enough data to justify the creation of an intent.
  • Granularity determines how specific your clusters will be. High granularity is very specific.
    Continuing with the same "Loans" example: if you're getting clusters about Loans, but want to see clusters about Mortgages, Student loans and Car loans, granularity will help delineate those different clusters.