Annotation set descriptor JSON reference

The purpose of the JSON format described below is slightly different than the purpose of the XML format.

The XML format for the annotation set descriptors is intended to be segmented into sets which map onto steps in each MAT task. When MAT reads this XML, it digests these descriptors, along with other information from the task relating to how the annotations might be defined, and aggregates this information for internal use. The MAT Web server provides a JSON encoding of this aggregated annotation-related information to the MAT UI when the UI starts up. It is this JSON format, rather than the XML format, that the Java and JavaScript libraries understand.

The MATAnnotationInfoToJSON tool will generate this JSON encoding for any known task.

There are two versions of this encoding. Either of them can be used with the standalone JavaScript viewer.

The expanded JSON

The expanded version is the default format provided by the MATAnnotationInfoToJSON tool. It has the following structure:

{
"alphabetizeLabels": <boolean>,
"annotationSetRepository": {
"allAnnotationsKnown": <boolean>,
"types": {
<true_label>: <asd>, ...
}
},
"tagHierarchy": <tag_hierarchy>,
"tagOrder": [<true_or_effective_label>, ...],
"overlapOrder": [<true_or_effective_label>, ...],
"languages": {<lname>: {"name": <lname>, "code": <lcode>,
"text_right_to_left": <boolean>, "tokenless_autotag_delimiters": <delims>,
"tokenless_autotag_respects_delimiters": <boolean>}, ...},
"settings": {<setting_key>: <setting_value>, ...}
}

where

This JSON is an appropriate value of the taskATRFragment parameter when you configure your JavaScript standalone viewer.

The simplified JSON

The simplified version is generated when you use the --simplified option for MATAnnotationInfoToJSON. It has the following structure:

[ <asd>, ... ]

This JSON is an appropriate value for the atr parameter of the JavaScript standalone viewer.

The JSON annotation set descriptor

The JSON annotation set descriptor is an amalgamation of the annotation set descriptor information itself, and the display information found in the <label> and <attribute> children of the <annotation_display> element in your task. Please consult those two references for details about the structure and values below.

The descriptor has the following structure:

<asd>: {
"type": <string>
(, "hasSpan": <boolean> )?
(, "category": "admin" | "token" | "zone" | <string> )?
(, "set_name": <string> )?
(, "attrs": [ <attr_desc>, ... ] )?
(, "allAttributesKnown": <boolean> )?
(, "display": <display_desc> )?
(, "processable": <boolean> )?
(, "managed": <boolean> )?
(, "effective_labels": <el_desc> )?
}

<attr_desc>: {
"name": <string>
(, "type": "string" | "int" | "float" | "boolean" | "annotation" )?
(, "category": "admin" | "token" | "zone" | <string> )?
(, "set_name": <string> )?
(, "managed": <boolean> )?
(, "aggregation": "set" | "list" )?
(, "default": <val> )?
(, "default_is_text_span": true )?
(, "choices": [ <val>+ ] )?
(, "maxval": <int> | <float> )?
(, "minval": <int> | <float> )?
(, "label_restrictions": [ <label_restr_desc>+ ] )?
(, "display": <attr_display_desc> )?
}

<display_desc>: {
( "accelerator": <string> )?
(, "css": <string> )?
(, "edit_immediately": true )?
(, "presented_name": <string> )?
(, "rendering_style": "normal" | "background_span" )?
(, "overlap_rank": <int> )?
(, "gestures" : [ <gesture_desc>+ ] )?
}

<attr_display_desc>: {
( "accelerator": <string> )?
(, "css": <string> )?
(, "edit_immediately": true )?
(, "presented_name": <string> )?
(, "overlap_rank": <int> )?
}

<el_desc>: {
( <string> : {
"attr": <string>,
"val": <val>,
"display": <display_desc>
(, "category": "admin" | "token" | "zone" | <string> )?
(, "set_name": <string> )? (, "aggregation": "set" | "list" )?
} )+
}

<gesture_desc>: {
"label": <string>,
"function": <string>
(, "writeable_only": <boolean> )?
(, "attribute_writeable_only": [ <string>+ ] )?
}

<label_restr_desc>: [ ( <string> | <complex_label_restr_desc> )+ ]

<complex_label_restr_desc>: [ <string>, [ [ <string>, <val> ]+ ] ]

The "type" corresponds to the "label" attribute of the corresponding <annotation> element.

"hasSpan" indicates whether the annotation has a span. It is optional, and defaults to true.

The "category" and "set_name" define the category and set name to which the annotation belongs.

"allAttributesKnown" indicates where the annotation type should block creation of attributes that aren't listed in the "attrs" list. It is optional, and defaults to false.

"attrs", "display", and "effective_labels" are all optional. "effective_labels" is limited to descriptors with exactly one attribute which has a "choices" list, and the keys in <el_desc> and the elements of the "choices" list must be identical.

The "type" of the attribute descriptor is optional, and defaults to "string".

The "category" and "set_name" define the category and set name to which the attribute belongs. This can differ from that of the annotation which bears the attribute (e.g., a "part_of_speech" attribute of a "lex" tag may not be in the token category).

"default" must be a value appropriate for the type of the attribute descriptor. It and "default_is_text_span" cannot cooccur.

"choices" is limited to attribute descriptors with "type" of "string" or "int". The values in the list must be appropriate for the type.

"maxval" and "minval" are limited to attribute descriptors with "type" of "int" or "float".

"label_restrictions" is limited to attribute descriptors with "type" of "annotation".

The <label_restr_desc> requires some comment. The elements of this list will be either a true label (not an effective label), or a 2-element list consisting of a true label and a list of 2-element attribute-value pair lists. The attribute must be a choice attribute, and the <val> must be appropriate for the attribute, and the attribute must be appropriate for the true label. Note that this differs from the <label_restriction> XML element, where the labels can be true labels or effective labels. When MAT digests the XML, it unpacks any effective labels it finds in label restrictions.

The <el_desc> also doesn't correspond to what's found in the XML. Any effective labels that MAT finds when it digests the XML are accumulated here at the level of the annotation, rather than the attribute. The keys in this element are the string version of the <val>, which must be an appropriate value for the "attr", which is the name of an attribute for the annotation. The values of "category" and "set_name" must be equivalent to those for the attribute referenced. The "display" has the same form as the display for the toplevel annotation.

The elements of <display_desc> are all optional, and correspond to the <label> children of <annotation_display>. Look there for details.

The <attr_display_desc> corresponds to the <attribute> children of <annotation_display>. These elements are specific to the JavaScript UI. We'll skip documenting them for now.

Examples

Here is the result of generating the simplified output for the Named Entity task:

[
{
"display": {
"accelerator": "P",
"css": "background-color: #CCFF66"
},
"type": "PERSON"
},
{
"display": {
"accelerator": "L",
"css": "background-color: #FF99CC"
},
"type": "LOCATION"
},
{
"display": {
"accelerator": "O",
"css": "background-color: #99CCFF"
},
"type": "ORGANIZATION"
}
]

Here's the result for the ENAMEX version of the Named Entity task:

[
{
"attrs": [
{
"choices": [
"PERSON",
"LOCATION",
"ORGANIZATION"
],
"name": "type"
}
],
"effective_labels": {
"LOCATION": {
"attr": "type",
"display": {
"accelerator": "L",
"css": "background-color: #FF99CC"
},
"val": "LOCATION"
},
"ORGANIZATION": {
"attr": "type",
"display": {
"accelerator": "O",
"css": "background-color: #99CCFF"
},
"val": "ORGANIZATION"
},
"PERSON": {
"attr": "type",
"display": {
"accelerator": "P",
"css": "background-color: #CCFF66"
},
"val": "PERSON"
}
},
"type": "ENAMEX"
}
]

And finally, here's the result for the Enhanced Named Entity task:

[
{
"attrs": [
{
"choices": [
"Proper name",
"Noun",
"Pronoun"
],
"name": "nomtype"
}
],
"display": {
"accelerator": "P",
"css": "background-color: #CCFF66",
"edit_immediately": true
},
"type": "PERSON"
},
{
"attrs": [
{
"choices": [
"Proper name",
"Noun",
"Pronoun"
],
"name": "nomtype"
},
{
"name": "is_political_entity",
"type": "boolean"
}
],
"display": {
"accelerator": "L",
"css": "background-color: #FF99CC",
"edit_immediately": true
},
"type": "LOCATION"
},
{
"attrs": [
{
"choices": [
"Proper name",
"Noun",
"Pronoun"
],
"name": "nomtype"
}
],
"display": {
"accelerator": "O",
"css": "background-color: #99CCFF",
"edit_immediately": true
},
"type": "ORGANIZATION"
},
{
"attrs": [
{
"aggregation": "set",
"label_restrictions": [
"PERSON"
],
"name": "mentions",
"type": "annotation"
}
],
"display": {
"accelerator": "C",
"css": "background-color: lightgreen",
"edit_immediately": true
},
"hasSpan": false,
"type": "PERSON_COREF"
},
{
"attrs": [
{
"label_restrictions": [
"PERSON"
],
"name": "actor",
"type": "annotation"
},
{
"label_restrictions": [
"ORGANIZATION",
"LOCATION"
],
"name": "location",
"type": "annotation"
}
],
"display": {
"accelerator": "E",
"css": "background-color: pink",
"edit_immediately": true
},
"type": "LOCATED_EVENT"
},
{
"attrs": [
{
"label_restrictions": [
"ORGANIZATION",
"PERSON"
],
"name": "located",
"type": "annotation"
},
{
"label_restrictions": [
"LOCATION"
],
"name": "location",
"type": "annotation"
}
],
"display": {
"accelerator": "R",
"css": "background-color: orange",
"edit_immediately": true
},
"hasSpan": false,
"type": "LOCATION_RELATION"
}
]