Clustering is an ideal tool to bootstrap a new intent structure, to discover new intents in an existing model, to quickly improve existing intents or even to expand existing intents into more specialized children.
While our intent-specific suggestion feature is ideal for drawing the fine line between what belongs to an intent and what doesn't, our clustering tool is what gets most of the labeling & discovery work done in very little time.
Make sure to have uploaded data
If you go to the
Unlabeled data view and click on
Get clusters, you'll be provided with a list of all the most likely clusters within your data. These may match existing intents or they may represent completely new intents.
Selecting a cluster is akin to selecting an utterance, only it selects all the utterances within the proposed cluster. Feel free to unselect items that may not be relevant to your use-case. You may also select further clusters or continue manually searching unlabeled data to expand your selection list. Once you're done, label the selected utterances as usual and continue with your workflow.
If you sort your unlabeled data by
NLU Entropy and then click on
Get clusters, you'll obtain a list of clusters that don't match well with your existing intents. These are more likely to represent new intents.
If you have an existing intent and need to identify specialized versions of it, clustering can help. For example, if you have an intent about "Loans", you may want to create specialized child intents that target "Mortgages", "Student loans" or "Car loans".
First, go to the
Labeled data view and select the intent from which you want to start specializing, click
Get clusters. You'll obtain a list of clusters matching that intent and this should help identify specialized sub-intents that need to be created. Using the clustering Granularity (see below) control can further improve the results.
We expose various controls to help you find the types of clusters you need.
Use these before clicking on
- Keyword search can help pin-point topics.
- Minimum cluster size can help weed out clusters that may not have enough data to justify the creation of an intent.
- Granularity determines how specific your clusters will be. High granularity is very specific.
Continuing with the same "Loans" example: if you're getting clusters about Loans, but want to see clusters about Mortgages, Student loans and Car loans, granularity will help delineate those different clusters.