Search

Once a workspace has a trained classifier, you can configure the workspace to run this classifier on all linked unlabelled data sets. This classifies every utterance in every conversation that is uploaded, and exposes it through a search API capable including intents in the search terms.

This can easily be hooked into custom dashboards that showcase activity from a conversation corpus.

Concepts#

In this API, everything maps to a conversation (whether you uploaded an utterances file, or conversations from a csv file or from a supported integration). The search request is done on a specific workspace, and include results in the form of an annotated conversation.

The annotated conversation contains:

  • The list of inputs containing the text of every utterance
  • Metadata about the conversation (when it took place, what file it is contained in)
  • Various annotations objects, the most important one containing the result the last trained classifier on each input of the conversation

All URLs in this document are relative to https://api.humanfirst.ai/

Intents#

Some predicates refer to intent ids, get them via a GET call to /v1alpha1/workspaces/${namespace}/${workspaceId}/intents

The response will look like this:

{
"intents": [
{
// The intent's unique id
"id": "intent-5GG4LZSKNRDWHEAY47V2E6BN",
// The current name
"name": "ask_for_flight",
// Its parent intent in the hierarchy, if any
"parentId": "intent-PMS43D7LJNDKHPG5URXAPDH2",
// When it was created/last updated
"createdAt": "2020-12-15T22:51:19Z",
"updatedAt": "2020-12-15T22:51:19Z"
},
// ...
]
}

Querying#

POST /v1alpha1/conversations/${namespace}/${workspaceId}/query

This is where you'd start to make queries, you can combine predicates that expose the output of the classifier along with full text search and time windowing. The request is built by stacking a series of predicates (a conversation has to match them all to be returned).

Most API calls depend on a namespace and workspace id, you can find these in the URL bar - the workspace id has a format of playbook-.... and the namespace is part of the query string as ?namespace=.... From the command line, look at hf namespace list and hf workspace list

Let's start by the case of doing a full text search within conversations:

curl -s -H 'Authorization: Bearer '$(hf auth print-access-token) \
-H 'Accept: application/json' \
-d '{
"predicates": [
{ "inputMatch": { "text": "search text" } }
],
"pageSize": 10
}' https://api.humanfirst.ai/v1alpha1/conversations/${namespace}/${workspaceId}/query
info

This uses our hf command line tool to generate an access token. Refer to the authentication section for instructions on how to obtain them programmatically.

Predicate: timeRange#

Specifies a time window in which the conversation must have taken place, both the start and end time are optional so unbounded windows are possible.

{
"timeRange": {
"start": "2020-05-26T00:00:00Z",
"end": "2020-12-26T00:00:00Z"
}
}

Predicate: intentMatch#

Specifies a series of intents ids that have to be matched in order. An optional minimum match score can be set to threshold the quality of the matches.

{
"intentMatch": {
"intentIds": ["intent-..." ],
"minimumMatchScore": 0.50,
"matchSources": ["MATCH_SOURCE_CLASSIFIER"]
}
}
note

Only the classifier's top result is indexed, but all subsequent probabilities will be returned as part of the response.

Predicate: conversationSource#

Specifies the filename and file type to match (if you only want conversations and not flat utterances).

note

Depending on the format, certain suffixes have to be appended to the filename in order for results to match our internal representation.

[filename].fmt1 represents CSV data
[filename].fmt2 represents a flat utterance file

{
"conversationSource": {
"sourceFile": "filename.csv.fmt1"
}
}

Predicate: inputMatch#

Specifies a full text search predicate in order to search all conversation inputs.

{
"inputMatch": {
"text": "search text"
}
}

Response format#

Each conversation is represented as the original converted conversation, and additional annotations containing the result of various components in our pipeline.

{
// The top level object will be an array containing the current page
"results": [
{
"annotatedConversation": {
"conversation": {
// The unique id of this conversation
"id": "star-1469",
// Each input of the conversation is transcribed here.
"inputs": [
{
// The `id` is automatically generated using an internal representation.
"id": "e6ca117ad37a78738f49d982c170575923e2109f",
// The input's verbatim value
"value": "Hey. My name is John I need to get to Forbes and Murray. I am at 5th and main right now. I would really like to ride in a Toyota car please.",
// An ISO8601 timestamp representing when this input was uttered
"createdAt": "2020-05-25T23:56:04Z",
// Valid source values are "client" and "expert"
"source": "client"
}
],
"sources": [
{
// The id of this conversation
"id": "star-1469",
// Where did the conversation come from in our system
"source": "CONVERSATION_SOURCE_USER_UPLOAD",
// Which format was used during import
"importFormat": "IMPORT_FORMAT_SIMPLE_CSV",
// The filename used during import
"sourcePath": "star.csv"
}
],
// An ISO8601 timestamp representing when this conversation was created
"createdAt": "2020-05-25T23:56:04Z",
// An ISO8601 timestamp representing when this conversation was updated
"updatedAt": "2020-05-25T23:56:04Z",
// Defines which dataset this conversation came from
"conversationSetSource": {
"conversationSetId": "convset-47VOCM2YO5A4XFHC422UNFLX"
}
},
// Annotations are objects that are added by our pipeline that
// indicate derivative computed representations of the data.
// They can change over time as the underlying model evolves
"annotations": {
// "distribution" contains information about the probability
// distribution outputted by a classifier (the probabilities
// themselves are in `inputs_intents`)
"distribution": {
"@type": "type.googleapis.com/zia.ai.model.DistributionMetricAnnotation",
"metrics": [
{
// Normalized entropy.
// -sum(prob * log2(prob)) / log2(n_intents)
"entropy": 0.29151642,
// Uncertainty Score
// 1 - topScore
"uncertainty": 0.15535975,
// Margin score
// 1 - (topScore - secondScore)
"margin": 0.10032129,
}
]
},
// "inputs_intents" contain information associating each input to an intent.
// It is updated whenever an input is converted to a training phrase.
// The `source` field denotes where this association originates.
"inputs_intents": {
"@type": "type.googleapis.com/zia.ai.model.InputsIntentsAnnotation",
"inputs": [
// The order of objects is the same as the order of
// inputs in the conversation
{
// Multiple matches are reported per input (but only the first
// match is searchable)
"matches": [
{
"intentId": "intent-AOEQNTKVERAHRMANFGHKRP34",
"score": 0.8996787,
"source": "MATCH_SOURCE_CLASSIFIER"
},
{
"intentId": "intent-HYSTGTFGMFB5LH4DUNTTXCRB",
"score": 0.055038463,
"source": "MATCH_SOURCE_CLASSIFIER"
},
{
"intentId": "intent-JCTSPYOZHJBVFL4P6WS2I4WM",
"score": 0.01580668,
"source": "MATCH_SOURCE_CLASSIFIER"
}
// Results are truncated and only include intents
// with a probability higher than 0.01
]
}
]
},
// "language" runs a language detection algorithm on each input
"language": {
"@type": "type.googleapis.com/zia.ai.model.LanguageAnnotation",
"confidence": 1,
"inputs": [
{
"language": "en",
"confidence": 1
}
],
// An aggregated prediction is made for the conversation itself
"language": "en"
},
// "metrics" contain precomputed metrics about the entire conversation
"metrics": {
"@type": "type.googleapis.com/zia.ai.model.MetricAnnotation",
"clientInputsCount": 1,
"inputsCount": 1
}
}
}
}
],
"nextPageToken": "...",
"totalCount": 646
}
note

Some of the annotations won't be present if a predicate doesn't prefer the use of the classifier's output. For example, inputs_intents and distribution will not be present unless intentMatch is requested. A minimum score of 0 can be used to include everything.