HumanFirst JSON format

HumanFirst exposes a rich JSON format to import unlabeled and import/export labelled data. The goal of the format is to expose all features that HumanFirst provides and make it possible to exchange data between HumanFirst and other tools with higher fidelity.

Unlabeled data examples#

Utterances#

Independent utterances can be imported using the following format:

{
"examples": [
{
"text": "where to report lost discover credit card"
},
{
"text": "I lost my credit card"
}
]
}

Conversations#

The JSON format permits the importation of conversation data by defining a common context on all examples that are part of the same conversation. In the following example, notice the context.context_id field on the examples of the same conversation uniquely identified by conv-0001. For the utterances of the conversation to be in the right order, the timestamp of the utterances is encoded in the created_at field.

Example:

{
"examples": [
{
"text": "What is the weather like in Montreal?",
"context": {
"context_id": "conv-0001",
"type": "conversation",
"role": "client"
},
"tags": [
{
"name": "montreal"
}
],
"metadata": {
"location": "Montreal, QC, Canada"
},
"created_at": "2022-01-01T12:00:00Z"
},
{
"text": "It is sunny in Montreal and the temperature is 22 degrees.",
"context": {
"context_id": "conv-0001",
"type": "conversation",
"role": "expert"
},
"created_at": "2022-01-01T12:00:05Z"
},
{
"text": "Thank you!",
"context": {
"context_id": "conv-0001",
"type": "conversation",
"role": "client"
},
"tags": [
{
"name": "montreal"
}
],
"metadata": {
"location": "Montreal, QC, Canada"
},
"created_at": "2022-01-01T12:00:10Z"
},
]
}

Tags and metadata#

The JSON format can be used to import utterances with tags and metadata. Metadata can be used to annotate utterances with any key-value information attached to the utterance. Tags can be used to filter utterances in Studio for different purposes.

note

When importing unlabeled data with tags, tags with the same name as specified on examples must be created in the destination workspace beforehand. If tags were missing prior to the importation, they can be created after and a workspace revision will need to be created in order for the data to be reprocessed.

Example:

{
"examples": [
{
"id": "phrase-4553MGLJMZFOLHAEBXCZ36IB",
"text": "where to report lost discover credit card",
"tags": [
{
"id": "tag-id",
"name": "tag-C"
}
],
"metadata": {
"key": "value"
}
}
]
}

Streaming (JSONL)#

When using the JSON format with unlabeled data, examples can be defined in a streaming fashion instead of defining all examples as a single JSON object. In this format, the Workspace object is repeated on a per-line basis.

Consult the JSON Lines specs for more information.

Example:

{"examples": [{ "text": "Where can I report an issue?" }] }
{"examples": [{ "text": "I want to upgrade my service" }] }
{"examples": [{ "text": "Thanks, you've been very helpful!" }] }

Labelled data examples#

When using the JSON format for labelled data, all objects (training phrases, intents, entities and tags) of a workspace can be encoded in a single Workspace object.

Example:

{
"intents": [
{
"id": "intent-OGCBT24IKBOGXM3C6O5BH5LX",
"name": "card lost",
"parent_intent_id": "intent-QARARGORRZL5BEFNKORO3EYK",
"source": {},
"tags": [
{
"id": "tag-LXLJPNCILNIDBA7WMFB2CHH5",
"name": "live"
}
],
"metadata": {
"key": "value"
},
"created_at": "2022-05-31T19:36:22Z",
"updated_at": "2022-05-31T19:36:23Z"
},
{
"id": "intent-QARARGORRZL5BEFNKORO3EYK",
"name": "card issues",
"source": {},
"created_at": "2022-05-31T19:36:22Z",
"updated_at": "2022-05-31T19:36:22Z"
},
{
"id": "intent-SY4GNPR7TZJXFMBHAKBSPI4T",
"name": "card stolen",
"parent_intent_id": "intent-QARARGORRZL5BEFNKORO3EYK",
"source": {},
"created_at": "2022-05-31T19:36:22Z",
"updated_at": "2022-05-31T19:36:22Z"
},
{
"id": "intent-TMMYVTDJWJMSXLBF75HFMSDT",
"name": "card damaged",
"parent_intent_id": "intent-QARARGORRZL5BEFNKORO3EYK",
"source": {},
"created_at": "2022-05-31T19:36:22Z",
"updated_at": "2022-05-31T19:36:22Z"
}
],
"examples": [
{
"id": "phrase-GEK6EVSPX5PQRDA2557HHXWR",
"text": "where to report lost discover credit card",
"intents": [
{
"intent_id": "intent-OGCBT24IKBOGXM3C6O5BH5LX"
}
],
"entities": [
{
"entity_id": "entity-YRMP3DTW3JMN5N7ZL3MEUQZY",
"name": "card",
"text": "discover credit card",
"span": {
"from_character": 21,
"to_character": 41
},
"value": "credit card",
"value_id": "entval-VM6QANSJSZILNIKR7OOOVOKU"
}
],
"parts": [
{
"text": "where to report lost "
},
{
"entity": {
"entity_id": "entity-YRMP3DTW3JMN5N7ZL3MEUQZY",
"name": "card",
"text": "discover credit card",
"value": "credit card",
"value_id": "entval-VM6QANSJSZILNIKR7OOOVOKU"
}
}
],
"context": {
"context_id": "88ff6ed68d4f16e9ea3b8087f852b929",
"type": "utterance"
},
"source": {},
"tags": [
{
"id": "tag-LXLJPNCILNIDBA7WMFB2CHH5",
"name": "live"
}
],
"metadata": {
"key": "value"
},
"created_at": "2022-05-31T19:36:22Z",
"updated_at": "2022-05-31T19:36:23Z"
}
],
"entities": [
{
"id": "entity-YRMP3DTW3JMN5N7ZL3MEUQZY",
"name": "card",
"values": [
{
"id": "entval-MZTRDAEPCRMIVGDIR4UAJD2I",
"key_value": "card",
"synonyms": [
{
"value": "card"
}
],
"created_at": "2022-05-31T19:36:22Z",
"updated_at": "2022-05-31T19:36:22Z"
},
{
"id": "entval-VM6QANSJSZILNIKR7OOOVOKU",
"key_value": "credit card",
"synonyms": [
{
"value": "visa card"
},
{
"value": "discover credit card"
},
{
"value": "capital one credit card"
}
],
"created_at": "2022-05-31T19:36:22Z",
"updated_at": "2022-05-31T19:36:22Z"
},
{
"id": "entval-ZVFZVFCLG5MRTESCXRZUKNEV",
"key_value": "debit card",
"synonyms": [
{
"value": "debit card"
}
],
"created_at": "2022-05-31T19:36:22Z",
"updated_at": "2022-05-31T19:36:22Z"
}
],
"source": {},
"created_at": "2022-05-31T19:36:22Z",
"updated_at": "2022-05-31T19:36:22Z"
}
],
"tags": [
{
"id": "tag-LXLJPNCILNIDBA7WMFB2CHH5",
"name": "live"
}
]
}

Format specification#

Workspace object#

FieldDescriptionRequiredFormat
examplesAn array of examples/utterancesyesArray of Example objects
intentsAn array of intents (only for labelled data)noArray of Intent objects
entitiesAn array of entities (only for labelled data)noArray of Entity objects
tagsAn array of tags (only for labelled data)noArray of Tag objects

Example object#

FieldDescriptionRequiredFormat
idUnique identifier for the example. The identifier will be translated into a deterministic internal id at import time.nostring
textThe text of the example.yesstring
intentsOptional array of intents in which this example is classified. (only for labelled data)noArray of ExampleIntent objects
contextOptional context information about where this example has occurred. This is used to group examples that have occurred in a common context (ex: all examples of a single conversation). (only for unlabeled data)noExampleContext object
tagsOptional array of tags (see note in Tag object for usage in unlabeled)noArray of TagReference objects
entitiesOptional array of entity annotations in the example. Use parts to simplify entity annotations. If parts are specified, the entities field is ignored. (only for labelled data)noArray of EntityReference objects
partsIf the example contains entities, this field contains parts of the text and the entities. The parts are concatenated to form the final text. Parts are provided to ease entity annotations. If provided at import, this will take precedence over the entities field. (only for labelled data)noArray of TextPart objects
sourceOptional information on the source of the example.noExampleSource object
metadataOptional key:value data associated with the example.nomap<string, string>
created_atOptional timestamp of the example. This is useful if the example is part of a conversation to order the utterances.noRFC3339 format (ex: 2006-01-02T15:04:05Z07:00)

Tag object#

note

When importing unlabeled data, the examples will be assigned with the specified tags only if the workspace in which the data is imported has the corresponding tags (with matching name). If the workspace does not have the matching tags, the examples won't appear tagged in the Data section. Prior to importing unlabeled data, you should create the tags in the workspace. You can also create the tags after importing unlabeled data, but a revision will need to be created in order to reprocess the data.

FieldDescriptionRequiredFormat
idUnique identifier of the tag. The identifier is used to reference tags by this identifier in the examples, but referencing them by name is also supported. The identifier will be translated to a deterministic internal id at import time.nono
nameName of the tagyesstring
descriptionOptional description of the tagnostring
colorOptional color of the tag used in Studio. Any css color format is supported (ex: red, #133337, etc.)nostring

Intent object#

FieldDescriptionRequiredFormat
idOptional unique identifier of the intent. The identifier is used to reference intents by this identifier in the examples, but referencing them by name is also supported. The identifier will be translated to a deterministic internal id at import time.string
nameName of the intent.truestring
parent_intent_idOptional unique identifier of the parent intent if the intent is part of a hierarchy.falsestring
sourceOptional information on the source of the intent.falseIntentSource object
descriptionOptional description of the intent.falsestring
tagsOptional list of tags for the intent.falseArray of TagReference objects
metadataOptional key-value metadata of the intent.falsemap<string, string>

IntentSource object#

FieldDescriptionFormat
source_idOptional unique identifier of the intent in the source system.string (optional)
metadataOptional extra metadata about the intent in the source system.map<string, string> (optional)

ExampleIntent object#

FieldDescriptionRequiredFormat
intent_idUnique identifier of the intent. The identifier will be translated into a deterministic internal id at import time.truestring
negativeIdentifies if the example is a negative training example of the intent.trueboolean

EntityReference object#

FieldDescriptionRequiredFormat
entity_idOptional unique identifier of the entity. The identifier will be translated into a deterministic internal id at import time. If not specified, the name needs to be specified.falsestring
nameOptional name of the entity. If not specified, entity_id needs to be specified.falsestring
textText used in the reference of the entity in the example.truestring
spanSpan of the entity in the example. If the reference is provided through a TextPart, the span is not required since parts are presenting the text and entities.falseAnnotationSpan object
valueOptional entity value of the entity. This corresponds to the key_value field of EntityValue objectfalsestring
value_idOptional entity value unique identifier, as defined in EntityValue.falsestring
roleOptional role of the entity in the example if there is more than one present.falsestring

ExampleContext object#

FieldDescriptionRequiredFormat
context_idOptional unique identifier of the context in which the example is used (ex: conversation id). The identifier will be translated into a deterministic internal id at import time.falsestring
typeOptional type of the context in which the example is used.false"conversation", "utterance", "training_phrase"
roleIn the case of a conversation input, the role of the person who did the utterance.false"client" or "expert" (agent)

ExampleSource object#

FieldDescriptionRequiredFormat
source_idUnique identifier of the example in the source system.falsestring

AnnotationSpan object#

FieldDescriptionRequiredFormat
from_characterInclusive start index of the span in characters.trueint
to_characterExclusive end index of the span in characters.trueint

Entity object#

FieldDescriptionRequiredFormat
idOptional unique identifier of the entity. The identifier will be translated into a deterministic internal id at import time.falsestring
nameName of the entity.truestring
valuesValues of the entity. They represent the different instances of the entity.trueEntityValue object
is_regexIndicates that the entity is of type regex. Values of the entity are regular expressions.falseboolean
system_typeIndicates that the entity is a pre-trained system entity (ex: time, date, etc.)falsestring
tagsList of tags for the entity.falseTagReference object
metadataKey-value metadata of the entity.falsemap<string, string>
sourceInformation about the source of the entity.falseEntitySource object

EntitySource object#

FieldDescriptionRequiredFormat
source_idUnique identifier of the entity in the source system.falsestring
metadataExtra metadata about the entity in the source system.falsemap<string, string>

EntityValue object#

FieldDescriptionRequiredFormat
idOptional unique identifier of the entity value. The identifier will be translated into a deterministic internal id at import time. Any references to this intent will also be translated.falsestring
key_valueOptional key value of the entity value. This corresponds to the main disambiguated value of the entity value.falsestring
key_value_entitiesIf supported by the NLU engine, entities referenced in key_value. Ex: DialogFlow composite entities are referenced here. If the key_value_parts field is provided at import, this field is ignored and built from key_value_parts.falseArray of EntityReference objects
key_value_partsIf supported by the NLU engine and that key_value contains entity references, this field contains the parts of the text and the entities. The parts are concatenated to form the final text. Parts are provided to ease entity annotations. If provided at import, this will take precedence over the entities field.falseArray of TextPart objects
languageLanguage of the entity value. This is in two-letter ISO 639-1 format.falsestring
synonymsSynonyms of the entity value.Array of EntityValueSynonym objectsfalse
sourceInformation about the source of the entity value.falseEntityValueSource object

EntityValueSynonym object#

FieldDescriptionRequiredFormat
valueValue / text of the synonym.truestring
entitiesEntities annotated in the value. This is used when the entity is a composed entity that references other entities.falseArray of EntityReference objects
partsParts of the text and the entities constituting value. This is used when the entity is a composed entity that references other entities (composite entities). The parts are concatenated to form the final value. Parts are provided to ease entity annotations. If provided at import, this will take precedence over the entities field.falseArray of TextPart objects

EntityValueSource object#

FieldDescriptionRequiredFormat
source_idOptional unique identifier of the entity value in the source system.falsestring
metadataExtra metadata about the entity value in the source system.falsemap<string, string>

TextPart object#

FieldDescriptionRequiredFormat
textText of the part if not an entity. If not defined, the entity field needs to be defined.nostring
entityIf the part is an entity reference, reference to the entity. If not defined, text needs to be defined.noEntityReference object

TagReference object#

FieldDescriptionRequiredFormat
idOptional unique identifier of the tag. The identifier will be translated into a deterministic internal id at import time. If not specified, name needs to be specified.falsestring
nameName of the tag. If not specified, id needs to be specified.falsestring