Data management

Datasets#

Datasets help manage your uploaded data, you can compare them to folders containing uploaded files. Both utterances and conversations can be uploaded to datasets.

About data de-duplication#

Utterances are de-duplicated within a dataset. This means that only 1 version of a unique utterance will be made available for labeling when linking to a dataset. However, if linking multiple datasets to a workspace and duplicates exist between the datasets, these duplicates will appear in your workspace.

Linking data to workspaces#

When adding data to a workspace, you're in fact telling the workspace to look at the data in one or many datasets, we call this a "linked dataset". This means that multiple workspaces can work on the same data. It also means that adding or removing data from a dataset will affect the data available to all workspaces linked to that dataset.

Finally, workspaces are used to provide structure to data, but changes made within the workspace do not affect the linked datasets and their data.