HumanFirst JSON format
HumanFirst exposes a rich JSON format to import unlabeled and import/export labelled data. The goal of the format is to expose all features that HumanFirst provides and make it possible to exchange data between HumanFirst and other tools with higher fidelity.
Details of the format are described in the format specification section. A JSON schema can also be downloaded here.
#
Workspace details examplesThe details of a workspace can be imported using the following format:
#
Unlabeled data examples#
UtterancesIndependent utterances can be imported using the following format:
#
ConversationsThe JSON format permits the importation of conversation data by defining a common context on all examples
that are part of the same conversation. In the following example, notice the context.context_id
field
on the examples of the same conversation uniquely identified by conv-0001
. For the utterances of the
conversation to be in the right order, the timestamp of the utterances is encoded in the created_at
field.
Example:
#
Tags and metadataThe JSON format can be used to import utterances with tags and metadata. Metadata can be used to annotate utterances with any key-value information attached to the utterance. Tags can be used to filter utterances in Studio for different purposes.
note
When importing unlabeled data with tags, tags with the same name as specified on examples must be created in the destination workspace beforehand. If tags were missing prior to the importation, they can be created after and a workspace revision will need to be created in order for the data to be reprocessed.
Example:
#
Streaming (JSONL)When using the JSON format with unlabeled data, examples can be defined in a streaming fashion instead of defining all examples as a single JSON object. In this format, the Workspace object is repeated on a per-line basis.
Consult the JSON Lines specs for more information.
Example:
note
Optionally schema can be specified per line
#
Labelled data examplesWhen using the JSON format for labelled data, all objects (training phrases, intents, entities and tags) of a workspace can be encoded in a single Workspace object.
note
It is recommended to use examples.parts rather than examples.entities to simplify entity annotations. If examples.parts is specified, then the examples.entities field is ignored on import and the system will report both examples.parts and examples.entities on export
Example:
#
Format specificationA JSON schema can also be downloaded here.
#
Workspace objectField | Description | Required | Format |
---|---|---|---|
examples | An array of examples/utterances | yes | Array of Example objects |
intents | An array of intents (only for labelled data) | no | Array of Intent objects |
entities | An array of entities (only for labelled data) | no | Array of Entity objects |
tags | An array of tags (only for labelled data) | no | Array of Tag objects |
name | Name of the workspace | no | string |
metadata | Metadata of the workspace | no | map<string, string> |
description | Description of the workspace | no | string |
color | Color of the workspace in the UI | no | string |
#
Example objectField | Description | Required | Format |
---|---|---|---|
id | Unique identifier for the example. The identifier will be translated into a deterministic internal id at import time. | no | string |
text | The text of the example. | yes | string |
intents | Optional array of intents in which this example is classified. (only for labelled data) | no | Array of ExampleIntent objects |
context | Optional context information about where this example has occurred. This is used to group examples that have occurred in a common context (ex: all examples of a single conversation). (only for unlabeled data) | no | ExampleContext object |
tags | Optional array of tags (see note in Tag object for usage in unlabeled) | no | Array of TagReference objects |
entities | Optional array of entity annotations in the example. Use parts to simplify entity annotations. If parts are specified, the entities field is ignored. (only for labelled data) | no | Array of EntityReference objects |
parts | If the example contains entities, this field contains parts of the text and the entities. The parts are concatenated to form the final text. Parts are provided to ease entity annotations. If provided at import, this will take precedence over the entities field. (only for labelled data) | no | Array of TextPart objects |
source | Optional information on the source of the example. | no | ExampleSource object |
metadata | Optional key:value data associated with the example. | no | map<string, string> |
created_at | Optional timestamp of the example. This is useful if the example is part of a conversation to order the utterances. | no | RFC3339 format (ex: 2006-01-02T15:04:05Z07:00) |
#
Tag objectnote
When importing unlabeled data, the examples will be assigned with the specified tags only if the workspace in which the data is imported has the corresponding tags (with matching name). If the workspace does not have the matching tags, the examples won't appear tagged in the Data section. Prior to importing unlabeled data, you should create the tags in the workspace. You can also create the tags after importing unlabeled data, but a revision will need to be created in order to reprocess the data.
Field | Description | Required | Format | |
---|---|---|---|---|
id | Unique identifier of the tag. The identifier is used to reference tags by this identifier in the examples, but referencing them by name is also supported. The identifier will be translated to a deterministic internal id at import time. | no | no | string |
name | Name of the tag | yes | string | |
description | Optional description of the tag | no | string | |
color | Optional color of the tag used in Studio. Any css color format is supported (ex: red, #133337, etc.) | no | string |
#
Intent objectField | Description | Required | Format |
---|---|---|---|
id | Optional unique identifier of the intent. The identifier is used to reference intents by this identifier in the examples, but referencing them by name is also supported. The identifier will be translated to a deterministic internal id at import time. | string | |
name | Name of the intent. | true | string |
parent_intent_id | Optional unique identifier of the parent intent if the intent is part of a hierarchy. | false | string |
source | Optional information on the source of the intent. | false | IntentSource object |
description | Optional description of the intent. | false | string |
tags | Optional list of tags for the intent. | false | Array of TagReference objects |
metadata | Optional key-value metadata of the intent. | false | map<string, string> |
#
IntentSource objectField | Description | Format |
---|---|---|
source_id | Optional unique identifier of the intent in the source system. | string (optional) |
metadata | Optional extra metadata about the intent in the source system. | map<string, string> (optional) |
#
ExampleIntent objectField | Description | Required | Format |
---|---|---|---|
intent_id | Unique identifier of the intent. The identifier will be translated into a deterministic internal id at import time. | true | string |
negative | Identifies if the example is a negative training example of the intent. | true | boolean |
#
EntityReference objectField | Description | Required | Format |
---|---|---|---|
entity_id | Optional unique identifier of the entity. The identifier will be translated into a deterministic internal id at import time. If not specified, the name needs to be specified. | false | string |
name | Optional name of the entity. If not specified, entity_id needs to be specified. | false | string |
text | Text used in the reference of the entity in the example. | true | string |
span | Span of the entity in the example. If the reference is provided through a TextPart , the span is not required since parts are presenting the text and entities. | false | AnnotationSpan object |
value | Optional entity value of the entity. This corresponds to the key_value field of EntityValue object | false | string |
value_id | Optional entity value unique identifier, as defined in EntityValue . | false | string |
role | Optional role of the entity in the example if there is more than one present. | false | string |
#
ExampleContext objectField | Description | Required | Format |
---|---|---|---|
context_id | Optional unique identifier of the context in which the example is used (ex: conversation id). The identifier will be translated into a deterministic internal id at import time. | false | string |
type | Optional type of the context in which the example is used. | false | "conversation", "utterance", "training_phrase" |
role | In the case of a conversation input, the role of the person who did the utterance. | false | "client" or "expert" (agent) |
#
ExampleSource objectField | Description | Required | Format |
---|---|---|---|
source_id | Unique identifier of the example in the source system. | false | string |
#
AnnotationSpan objectField | Description | Required | Format |
---|---|---|---|
from_character | Inclusive start index (UTF-8 byte offset) of the span in characters. | true | int |
to_character | Exclusive end index (UTF-8 byte offset) of the span in characters. | true | int |
#
Entity objectField | Description | Required | Format |
---|---|---|---|
id | Optional unique identifier of the entity. The identifier will be translated into a deterministic internal id at import time. | false | string |
name | Name of the entity. | true | string |
values | Values of the entity. They represent the different instances of the entity. | true | EntityValue object |
is_regex | Indicates that the entity is of type regex. Values of the entity are regular expressions. | false | boolean |
system_type | Indicates that the entity is a pre-trained system entity (ex: time, date, etc.) | false | string |
tags | List of tags for the entity. | false | TagReference object |
metadata | Key-value metadata of the entity. | false | map<string, string> |
source | Information about the source of the entity. | false | EntitySource object |
settings | Information about the settings for the entity and how it interacts with other objects in the workspace. | false | EntitySettings object |
#
EntitySource objectField | Description | Required | Format |
---|---|---|---|
source_id | Unique identifier of the entity in the source system. | false | string |
metadata | Extra metadata about the entity in the source system. | false | map<string, string> |
#
EntitySettings objectField | Description | Required | Format |
---|---|---|---|
allowed_intent_ids | Intents that are allowed to be annotated with this entity. If empty, all intents are considered allowed. | false | Array of string |
denied_intent_ids | Intents that are not allowed to be annotated with this entity. If empty, no intents are considered denied. | false | Array of string |
#
EntityValue objectField | Description | Required | Format |
---|---|---|---|
id | Optional unique identifier of the entity value. The identifier will be translated into a deterministic internal id at import time. Any references to this intent will also be translated. | false | string |
key_value | Optional key value of the entity value. This corresponds to the main disambiguated value of the entity value. | false | string |
key_value_entities | If supported by the NLU engine, entities referenced in key_value . Ex: DialogFlow composite entities are referenced here. If the key_value_parts field is provided at import, this field is ignored and built from key_value_parts . | false | Array of EntityReference objects |
key_value_parts | If supported by the NLU engine and that key_value contains entity references, this field contains the parts of the text and the entities. The parts are concatenated to form the final text. Parts are provided to ease entity annotations. If provided at import, this will take precedence over the entities field. | false | Array of TextPart objects |
language | Language of the entity value. This is in two-letter ISO 639-1 format. | false | string |
synonyms | Synonyms of the entity value. | Array of EntityValueSynonym objects | false |
source | Information about the source of the entity value. | false | EntityValueSource object |
#
EntityValueSynonym objectField | Description | Required | Format |
---|---|---|---|
value | Value / text of the synonym. | true | string |
entities | Entities annotated in the value . This is used when the entity is a composed entity that references other entities. | false | Array of EntityReference objects |
parts | Parts of the text and the entities constituting value . This is used when the entity is a composed entity that references other entities (composite entities). The parts are concatenated to form the final value . Parts are provided to ease entity annotations. If provided at import, this will take precedence over the entities field. | false | Array of TextPart objects |
#
EntityValueSource objectField | Description | Required | Format |
---|---|---|---|
source_id | Optional unique identifier of the entity value in the source system. | false | string |
metadata | Extra metadata about the entity value in the source system. | false | map<string, string> |
#
TextPart objectField | Description | Required | Format |
---|---|---|---|
text | Text of the part if not an entity. If not defined, the entity field needs to be defined. | no | string |
entity | If the part is an entity reference, reference to the entity. If not defined, text needs to be defined. | no | EntityReference object |
#
TagReference objectField | Description | Required | Format |
---|---|---|---|
id | Optional unique identifier of the tag. The identifier will be translated into a deterministic internal id at import time. If not specified, name needs to be specified. | false | string |
name | Name of the tag. If not specified, id needs to be specified. | false | string |