Data points are the term used to reference data that has been uploaded to HumanFirst. A data point represents a single utterance, which is an interactive piece of your data. Utterances are what you will be manipulating most in HumanFirst.
The image below shows 4 utterances.
In practice, data points will often resemble phrases.
However, because data can be uploaded to HumanFirst in a variety of formats (utterance lists, conversation logs, documents, NLU models, ...) each format is treated differently.
When dealing with conversation logs, each input of the conversation is treated as a data point (even something as short as "Yes" or "Thank you"). When dealing with larger documents, HumanFirst provides a post-processing step called sentence splitting which will break down the document into smaller semantic pieces. Those pieces are not always delimited by punctuation; a single phrase, based on its content & structure, may be broken down into multiple data points.
We provide a data usage report to help understand where your data points are coming from. This report will count the entirety of your data points across all your workspaces and namespaces.
To access the data usage report:
- Navigate to the Data management section.
- Click on the (i) icon beside your data point usage meter at the top right of the screen.
An example data usage report.
There are 2 sources of data:
Datasets (Conversation sets)
Conversation sets are collections of data that can be reused across workspaces. When you upload data to HumanFirst you add that data to a conversation set. You can then link the conversation set to one or more workspaces to interact with that data.
Counting data points from conversation sets is as straight forward as summing all the data points in each conversation set.
Workspaces are projects that access data via conversation sets (what we call linked data). They are also composed of NLU models which contain labeled data within intents which will affect total data points.
Lets break that down.
- Linking a conversation set to a workspace grants the workspace access to the data within the conversation set. This does not affect your total data points.
- Labelling data linked to your workspace will not affect your total data points as long as the conversation set remains linked to the workspace.
- If you unlink a conversation set from a workspace, any data from the conversation set that has been labelled will increase your data point total because that data now lives in both the conversation set it originated from and has been copied into your workspace's labeled data. (This is by design, otherwise unlinking data would completely break a workspace).
- Likewise, uploading an existing workspace will increase your total data points by the amount of labeled data in the provided workspace.
HumanFirst tries to keep your data point count as low as possible by enabling the reuse of uploaded data. This is done by managing collections of data called conversation sets. Those conversation sets can be linked to workspaces to enable interactions with their data.
As long as you link conversation sets to your workspaces (and keep them linked!) counting data points is simple: it'll always be the sum of all data points in your conversation sets.
If you upload workspaces your total data points will increase by the amount of data contained in the uploaded workspaces.
If you find yourself linking, labelling and unlinking data from workspaces then things get more complicated. In that situation we take the sum of all labeled data no longer linked to the workspaces and add that to your total data points from conversation sets.