CLI data management
#
Managing datasetsAs documented in the Data Management section, datasets help organize the unlabelled data files. They can be seen as folders containing uploaded files that can be linked to any workspaces of a namespace.
All dataset commands are available via the data
sub-commands of hf
.
Execute hf data --help
to see the list of available sub-commands.
#
ListingTo list datasets, execute hf data sets list
.
The --sources
and --workspaces
options can be used to display the unique identifiers of the linked data sources and workspaces.
#
CreationTo create a new dataset, execute hf data sets create <name of the dataset>
.
To link this dataset to a workspace, consult the workspaces documentation.
#
DeletionTo delete a dataset, execute hf data sets --dataset <id of the dataset> delete
.
A dataset that is still linked to a workspace cannot be deleted. Use the list command with --workspaces
to see the workspaces that are linked to the dataset and then use the unlink command of the workspace data sub-command.
#
Linking to workspacesRefer to the workspaces documentation for more information on how to link and unlink datasets to workspaces.
#
Managing files#
File formatsDifferent file formats can be uploaded into a dataset as documented in Data Management section.
The CLI supports all file formats supported by Studio and they can be referenced by their shorthand name:
Shorthand name | Description |
---|---|
json | HumanFirst JSON format |
txt | Utterance text format |
csv | Simple conversation CSV format |
#
ValidatingBefore uploading a file, its format can be validated using the hf data validate [--format format] <filename>
command.
Refer to the formats section for more information about the supported file formats.
#
ListingTo list files in a dataset, execute hf data sets --dataset <id of the dataset> files list
.
#
ImportingTo import a file into a dataset, execute hf data sets --dataset <id of the dataset> files import [--format format] [filename]
.
Refer to the formats section for more information about the supported file formats.
The file content needs to be encoded in UTF-8 and can be passed via filename or stdin.
#
DeletingTo delete a file from a dataset, execute hf data sets --dataset <id of the dataset> files delete <filename>
.