The purpose of the JSON format described below is slightly
different than the purpose of the XML
format.
The XML format for the annotation set descriptors is intended to
be segmented into sets which map onto steps in each MAT task. When
MAT reads this XML, it digests these descriptors, along with other
information from the task relating to how the annotations might be
defined, and aggregates this information for internal use. The MAT
Web server provides a JSON encoding of this aggregated
annotation-related information to the MAT UI when the UI starts
up. It is this JSON format, rather than the XML format, that the
Java and JavaScript libraries understand.
The MATAnnotationInfoToJSON
tool will generate this JSON encoding for any known task.
There are two versions of this encoding. Either of them can be
used with the standalone
JavaScript viewer.
The expanded version is the default format provided by the
MATAnnotationInfoToJSON tool. It has the following structure:
{
"alphabetizeLabels": <boolean>,
"annotationSetRepository": {
"allAnnotationsKnown": <boolean>,
"types": {
<true_label>: <asd>, ...
}
},
"tagHierarchy": <tag_hierarchy>,
"tagOrder": [<true_or_effective_label>, ...],
"overlapOrder": [<true_or_effective_label>, ...],
"languages": {<lname>: {"name": <lname>, "code": <lcode>,
"text_right_to_left": <boolean>, "tokenless_autotag_delimiters": <delims>,
"tokenless_autotag_respects_delimiters": <boolean>}, ...},
"settings": {<setting_key>: <setting_value>, ...}
}
where
This JSON is an appropriate value of the taskATRFragment
parameter when you configure your JavaScript standalone viewer.
The simplified version is generated when you use the --simplified
option for MATAnnotationInfoToJSON. It has the following
structure:
[ <asd>, ... ]
This JSON is an appropriate value for the atr parameter of the JavaScript standalone viewer.
The JSON annotation set descriptor is an amalgamation of the annotation set descriptor
information itself, and the display information found in the
<label> and <attribute> children of the <annotation_display>
element in your task. Please consult those two references for
details about the structure and values below.
The descriptor has the following structure:
<asd>: {
"type": <string>
(, "hasSpan": <boolean> )?
(, "category": "admin" | "token" | "zone" | <string> )?
(, "set_name": <string> )?
(, "attrs": [ <attr_desc>, ... ] )?
(, "allAttributesKnown": <boolean> )?
(, "display": <display_desc> )?
(, "processable": <boolean> )?
(, "managed": <boolean> )?
(, "effective_labels": <el_desc> )?
}
<attr_desc>: {
"name": <string>
(, "type": "string" | "int" | "float" | "boolean" | "annotation" )?
(, "category": "admin" | "token" | "zone" | <string> )?
(, "set_name": <string> )?
(, "managed": <boolean> )?
(, "aggregation": "set" | "list" )?
(, "default": <val> )?
(, "default_is_text_span": true )?
(, "choices": [ <val>+ ] )?
(, "maxval": <int> | <float> )?
(, "minval": <int> | <float> )?
(, "label_restrictions": [ <label_restr_desc>+ ] )?
(, "display": <attr_display_desc> )?
}
<display_desc>: {
( "accelerator": <string> )?
(, "css": <string> )?
(, "edit_immediately": true )?
(, "presented_name": <string> )?
(, "rendering_style": "normal" | "background_span" )?
(, "overlap_rank": <int> )?
(, "gestures" : [ <gesture_desc>+ ] )?
}
<attr_display_desc>: {
( "accelerator": <string> )?
(, "css": <string> )?
(, "edit_immediately": true )?
(, "presented_name": <string> )?
(, "overlap_rank": <int> )?
}
<el_desc>: {
( <string> : {
"attr": <string>,
"val": <val>,
"display": <display_desc>
(, "category": "admin" | "token" | "zone" | <string> )?
(, "set_name": <string> )? (, "aggregation": "set" | "list" )?
} )+
}
<gesture_desc>: {
"label": <string>,
"function": <string>
(, "writeable_only": <boolean> )?
(, "attribute_writeable_only": [ <string>+ ] )?
}
<label_restr_desc>: [ ( <string> | <complex_label_restr_desc> )+ ]
<complex_label_restr_desc>: [ <string>, [ [ <string>, <val> ]+ ] ]
The "type" corresponds to the "label" attribute of the
corresponding <annotation> element.
"hasSpan" indicates whether the annotation has a span. It is
optional, and defaults to true.
The "category" and "set_name" define the category and set name to which the annotation belongs.
"allAttributesKnown" indicates where the annotation type should
block creation of attributes that aren't listed in the "attrs"
list. It is optional, and defaults to false.
"attrs", "display", and "effective_labels" are all optional.
"effective_labels" is limited to descriptors with exactly one
attribute which has a "choices" list, and the keys in
<el_desc> and the elements of the "choices" list must be
identical.
The "type" of the attribute descriptor is optional, and defaults
to "string".
The "category" and "set_name" define the category and set name to which the attribute belongs. This can differ from that of the annotation which bears the attribute (e.g., a "part_of_speech" attribute of a "lex" tag may not be in the token category).
"default" must be a value appropriate for the type of the
attribute descriptor. It and "default_is_text_span" cannot
cooccur.
"choices" is limited to attribute descriptors with "type" of
"string" or "int". The values in the list must be appropriate for
the type.
"maxval" and "minval" are limited to attribute descriptors with
"type" of "int" or "float".
"label_restrictions" is limited to attribute descriptors with
"type" of "annotation".
The <label_restr_desc> requires some comment. The elements
of this list will be either a true label (not an effective
label), or a 2-element list consisting of a true label and a list
of 2-element attribute-value pair lists. The attribute must be a
choice attribute, and the <val> must be appropriate for the
attribute, and the attribute must be appropriate for the true
label. Note that this differs from the <label_restriction>
XML element, where the labels can be true labels or effective
labels. When MAT digests the XML, it unpacks any effective labels
it finds in label restrictions.
The <el_desc> also doesn't correspond to what's found in
the XML. Any
effective labels that MAT finds when it digests the XML are
accumulated here at the level of the annotation, rather than the
attribute. The keys in this element are the string version of the
<val>, which must be an appropriate value for the "attr",
which is the name of an attribute for the annotation. The values
of "category" and "set_name" must be equivalent to those for the
attribute referenced. The "display" has the same form as the
display for the toplevel annotation.
The elements of <display_desc> are all optional, and
correspond to the <label> children of <annotation_display>.
Look there for details.
The <attr_display_desc> corresponds to the
<attribute> children of <annotation_display>.
These elements are specific to the JavaScript UI. We'll skip
documenting them for now.
Here is the result of generating the simplified output for the
Named Entity task:
[
{
"display": {
"accelerator": "P",
"css": "background-color: #CCFF66"
},
"type": "PERSON"
},
{
"display": {
"accelerator": "L",
"css": "background-color: #FF99CC"
},
"type": "LOCATION"
},
{
"display": {
"accelerator": "O",
"css": "background-color: #99CCFF"
},
"type": "ORGANIZATION"
}
]
Here's the result for the ENAMEX version of the Named Entity task:
[
{
"attrs": [
{
"choices": [
"PERSON",
"LOCATION",
"ORGANIZATION"
],
"name": "type"
}
],
"effective_labels": {
"LOCATION": {
"attr": "type",
"display": {
"accelerator": "L",
"css": "background-color: #FF99CC"
},
"val": "LOCATION"
},
"ORGANIZATION": {
"attr": "type",
"display": {
"accelerator": "O",
"css": "background-color: #99CCFF"
},
"val": "ORGANIZATION"
},
"PERSON": {
"attr": "type",
"display": {
"accelerator": "P",
"css": "background-color: #CCFF66"
},
"val": "PERSON"
}
},
"type": "ENAMEX"
}
]
And finally, here's the result for the Enhanced Named Entity task:
[
{
"attrs": [
{
"choices": [
"Proper name",
"Noun",
"Pronoun"
],
"name": "nomtype"
}
],
"display": {
"accelerator": "P",
"css": "background-color: #CCFF66",
"edit_immediately": true
},
"type": "PERSON"
},
{
"attrs": [
{
"choices": [
"Proper name",
"Noun",
"Pronoun"
],
"name": "nomtype"
},
{
"name": "is_political_entity",
"type": "boolean"
}
],
"display": {
"accelerator": "L",
"css": "background-color: #FF99CC",
"edit_immediately": true
},
"type": "LOCATION"
},
{
"attrs": [
{
"choices": [
"Proper name",
"Noun",
"Pronoun"
],
"name": "nomtype"
}
],
"display": {
"accelerator": "O",
"css": "background-color: #99CCFF",
"edit_immediately": true
},
"type": "ORGANIZATION"
},
{
"attrs": [
{
"aggregation": "set",
"label_restrictions": [
"PERSON"
],
"name": "mentions",
"type": "annotation"
}
],
"display": {
"accelerator": "C",
"css": "background-color: lightgreen",
"edit_immediately": true
},
"hasSpan": false,
"type": "PERSON_COREF"
},
{
"attrs": [
{
"label_restrictions": [
"PERSON"
],
"name": "actor",
"type": "annotation"
},
{
"label_restrictions": [
"ORGANIZATION",
"LOCATION"
],
"name": "location",
"type": "annotation"
}
],
"display": {
"accelerator": "E",
"css": "background-color: pink",
"edit_immediately": true
},
"type": "LOCATED_EVENT"
},
{
"attrs": [
{
"label_restrictions": [
"ORGANIZATION",
"PERSON"
],
"name": "located",
"type": "annotation"
},
{
"label_restrictions": [
"LOCATION"
],
"name": "location",
"type": "annotation"
}
],
"display": {
"accelerator": "R",
"css": "background-color: orange",
"edit_immediately": true
},
"hasSpan": false,
"type": "LOCATION_RELATION"
}
]