CLI data management
Managing datasets#
As documented in the Data Management section, datasets help organize the unlabelled data files. They can be seen as folders containing uploaded files that can be linked to any workspaces of a namespace.
All dataset commands are available via the data sub-commands of hf.
Execute hf data --help to see the list of available sub-commands.
Listing#
To list datasets, execute hf data sets list.
The --sources and --workspaces options can be used to display the unique identifiers of the linked data sources and workspaces.
Creation#
To create a new dataset, execute hf data sets create <name of the dataset>.
To link this dataset to a workspace, consult the workspaces documentation.
Deletion#
To delete a dataset, execute hf data sets --dataset <id of the dataset> delete.
A dataset that is still linked to a workspace cannot be deleted. Use the list command with --workspaces to see the workspaces that are linked to the dataset and then use the unlink command of the workspace data sub-command.
Linking to workspaces#
Refer to the workspaces documentation for more information on how to link and unlink datasets to workspaces.
Managing files#
File formats#
Different file formats can be uploaded into a dataset as documented in Data Management section.
The CLI supports all file formats supported by Studio and they can be referenced by their shorthand name:
| Shorthand name | Description |
|---|---|
| json | HumanFirst JSON format |
| txt | Utterance text format |
| csv | Simple conversation CSV format |
Validating#
Before uploading a file, its format can be validated using the hf data validate [--format format] <filename> command.
Refer to the formats section for more information about the supported file formats.
Listing#
To list files in a dataset, execute hf data sets --dataset <id of the dataset> files list.
Importing#
To import a file into a dataset, execute hf data sets --dataset <id of the dataset> files import [--format format] [filename].
Refer to the formats section for more information about the supported file formats.
The file content needs to be encoded in UTF-8 and can be passed via filename or stdin.
Deleting#
To delete a file from a dataset, execute hf data sets --dataset <id of the dataset> files delete <filename>.