Annotation and annotation display XML use cases

Use cases for the XML format for the annotations and their display properties in the task files (see "Creating a New Task") are described in this document. The reference document is found here. Click here for a split-screen view.

This documents focuses exclusively on the definition of annotations and their display properties. There are two subsystems that provide this capability. The <annotations> element and its children provides the new subsystem, and the <annotation_set_descriptors> element and its children, and the <annotation_display> element and its children, provide the legacy subsystem. Other customizations you can make in your task.xml file are described separately.

Conceptual differences between the new and legacy subsystems

If you don't care about the legacy subsystem, feel free to skip this section.

The primary differences between the new <annotations> subsystem and the legacy <annotation_set_descriptors> subsystem are organizational.

First, in the legacy system, all the elements of each annotation set descriptor are defined within a single <annotation_set_descriptor> element. Because attributes can be defined in different sets than the annotations that bear them, attributes in the legacy system are defined outside the scope of the annotations that bear them, which introduced additional verbosity and made it harder to comprehend the annotation inventory. In the new system, attributes are defined within the scope of their annotations, and there's no separate container for annotation set descriptors; the set and category of the annotations and attributes is recorded using the "of_set" and "of_category" XML attributes. This makes it harder to comprehend the set descriptor structure, but complex annotation attribute structures are much more common than multiple set descriptors, and so the new system simplifies the former at the expense of the latter.

Second, in the legacy system, display properties were defined in an entirely separate section of the task.xml file, which led to additional verbosity and introduced opportunities for error. This separate display section has been eliminated in the new section, and all display-relevant attributes are prefixed with "d_" for clarity.

Third, in the new system, we've introduced a number of shorthands to define spanned vs. spanless annotations, different types of attributes (<int>, <string>, etc.) and set and list attributes (<int_set>, <int_list>, etc.). We've also eliminated some extraneous element levels (<range>, for instance, is now attributes on its former parent).

Fourth, in the legacy system, the visual menu hierarchy of annotations was defined in this separate display section. In the new system, the <display_group> element can be used to create hierarchies of declarations of annotation types.

Note: if you make use of task inheritance (i.e., if you use the "parent" attribute on your <task> element in your task.xml file), you may have good reasons to use the legacy subsystem; it's only in the legacy subsystem that you can inherit annotation types from the parent task and add attributes or display behavior to that type.

In the sections below, we'll exemplify both the new and legacy methods.

Defining content annotations

The simplest example of customizing your annotations in your task.xml file is inheriting all your structural annotations and adding your own content annotations. The role of the different annotation categories is described here.

New:

  <annotations inherit="category:zone,category:token">
<span label="TAG1"/>
<span label="TAG2"/>
</annotations>

Legacy:

  <annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
<annotation label="TAG1"/>
<annotation label="TAG2"/>
</annotation_set_descriptor>
</annotation_set_descriptors>

So here, we've inherited the structure annotations and defined two content annotations, TAG1 and TAG2. The content annotations are both spanned annotations, by default.

Defining spanless content annotations

Not all content annotations are spanned annotations; some annotations aren't anchored directly to the text. You can find examples of such annotations in Tutorial 7. It's easy to define these annotations.

New:

  <annotations inherit="category:zone,category:token">
...
<spanless label="SPANLESS1"/>
...
</annotations>

Legacy:

  <annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<annotation label="SPANLESS1" span="no"/>
...
</annotation_set_descriptor>
</annotation_set_descriptors>

The UI effects of defining spanless annotations are described here.

Defining attributes

Annotations, spanned or spanless, can have attributes. These attributes can be strings (the default), floats, integers, Booleans, or other annotations, or sets or lists of these types. Here's how to define a simple string attribute.

New:

  <annotations inherit="category:zone,category:token">
...
<span label="TAG1">
...
<string name="string_attr"/>
...
</span>
...
</annotations>

Legacy:

  <annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<attribute of_annotation="TAG1" name="string_attr"/>
...
</annotation_set_descriptor>
</annotation_set_descriptors>

String attributes can have default values, or choices.

New:

  <annotations inherit="category:zone,category:token">
...
<span label="TAG1">
...
<string name="string_attr" default="Pronoun"
choices="Pronoun,Nominal,Proper name"/>
...
</span>
...
</annotations>

Legacy:

  <annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<attribute of_annotation="TAG1" name="string_attr" default="Pronoun">
<choice>Pronoun</choice>
<choice>Nominal</choice>
<choice>Proper name</choice>
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>

Integer and float attributes can be defined with accepted ranges.

New:

  <annotations inherit="category:zone,category:token">
...
<span name="TAG1">
...
<int name="int_attr" range_from="10" range_to="20"/>
...
</span>
...
</annotations>

Legacy

  <annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<attribute of_annotation="TAG1" type="int" name="int_attr">
<range from="10" to="20"/>
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>

Annotation attributes must have label restrictions that specify what types of annotations can fill this attribute value (more examples here).

New:

  <annotations inherit="category:zone,category:token">
...
<span label="TAG1">
...
<filler name="annot_attr" filler_types="TAG2"/>
...
</span>
...
</annotations>

Legacy:

  <annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<attribute of_annotation="TAG1" type="annotation" name="annot_attr">
<label_restriction label="TAG2"/>
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>

And any of these attributes can be set or list aggregations.

New:

  <annotations inherit="category:zone,category:token">
...
<span label="TAG1">
...
<filler_set name="mentions" filler_types="TAG2"/>
...
</span>
...
</annotations>

Legacy:

  <annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<attribute of_annotation="TAG1" type="annotation" aggregation="set" name="mentions">
<label_restriction label="TAG2"/>
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>

Defining a single content annotation, partitioned by attribute values

In some situations, you may want to define a single content annotation, which has a distinguished attribute value. One common example of this in language processing arises in tagging for so-called named entities (people, locations, organizations). One common tagging scheme assigns a single ENAMEX tag to these entities, and distinguishes among them using the value of the "type" attribute. This label + attribute/value pair is assigned a notional name, for use in the UI, scorer, etc. We call these effective labels.

Effective labels must be defined on choice restrictions of string or integer attributes. If an effective label is declared for one of the choices, there must be a declaration for all of them. In other words, the choices must completely partition the label.

New:

  <annotations inherit="category:zone,category:token">
...
<span label="ENAMEX">
...
<effective_labels name="type">
<label label="PERSON" value="PER"/>
<label label="ORGANIZATION" value="ORG"/>
<label label="LOCATION" value="LOC"/>
</effective_labels>
...
</span>
...
</annotations>

Legacy:

  <annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<annotation label="ENAMEX"/>
<attribute of_annotation="ENAMEX" name="type">
<choice effective_label="PERSON">PER</choice>
<choice effective_label="ORGANIZATION">ORG</choice>
<choice effective_label="LOCATION">LOC</choice>
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>

 Defining complex annotation-valued attributes

You can define complex restrictions on annotation-valued attributes in a number of ways. These restrictions consist of a label and its attributes; the attributes must be choice attributes (i.e., string or integer attributes with choices defined). The availability of these restrictions is independent of whether an effective label is defined for the attribute.

Here's an example fragment. It starts with the effective label attribute definition from the previous example, but defines a second (nonsensical) integer choice attribute.

New:

    <annotations inherit="category:zone,category:token">
...
<span label="ENAMEX">
...
<effective_labels name="type">
<label label="PERSON" value="PER"/>
<label label="ORGANIZATION" value="ORG"/>
<label label="LOCATION" value="LOC"/>
</effective_labels>
<int name="size" choices="0,1"/>
...
</span>
<span label="LOCATED">
<filler name="who">
<filler_type label="ENAMEX" type="PER" size="1"/>
</filler>
</span>
...
</annotations>

Legacy:

  <annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<annotation label="ENAMEX"/>
<attribute of_annotation="ENAMEX" name="type">
<choice effective_label="PERSON">PER</choice>
<choice effective_label="ORGANIZATION">ORG</choice>
<choice effective_label="LOCATION">LOC</choice>
</attribute>
<attribute of_annotation="ENAMEX" type="int" name="size">
<choice>0</choice>
<choice>1</choice>
</attribute>
<!-- and now, the annotation-valued attribute -->
<annotation label="LOCATED"/>
<attribute of_annotation="LOCATED" name="who" type="annotation">
<label_restriction label="ENAMEX">
<attributes type="PER" size="1"/>
</label_restriction>
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>

The label restriction itself can refer either to a true or an effective label, and the effective label can be combined with additional attribute restrictions, as long as those attributes are choice attributes:

New:

      <span label="LOCATED">
<filler name="who">
<filler_type label="PERSON" size="1"/>
</filler>
</span>

Legacy:

      <annotation label="LOCATED"/>
<attribute of_annotation="LOCATED" name="who" type="annotation">
<label_restriction label="PERSON">
<attributes size="1"/>
</label_restriction>
</attribute>

Defining relations among annotations

As you can see from the previous example, you can use annotation-valued attributes and label restrictions to create relations among annotations. This is the only facility that MAT provides for making these connections. We acknowledge that this approach has limitations:

So, for instance, how might you represent an array of time restrictions (before, after, etc.) on an event? Here are three different strategies.

Strategy 1: a slot for each restriction type

New:

  <annotations inherit="category:zone,category:token">
...
<span label="TIME"/>
<!-- this annotation can be spanned or spanless -->
<span label="EVENT">
<!-- if you're anticipating multiple times for a restriction type,
make these set aggregations -->
<filler name="before" filler_types="TIME"/>
<filler name="after" filler_types="TIME"/> </span>
...
</annotations>

Legacy:

  <annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<annotation label="TIME"/>
<!-- this annotation can be spanned or spanless -->
<annotation label="EVENT"/>
<!-- if you're anticipating multiple times for a restriction type,
make these set aggregations -->
<attribute of_annotation="EVENT" type="annotation" name="before">
<label_restriction label="TIME"/>
<attribute of_annotation="EVENT" type="annotation" name="after">
<label_restriction label="TIME"/> ...
</annotation_set_descriptor>
</annotation_set_descriptors>

The obvious problem with this strategy is that you might have many, many temporal relations you care about, and/or you may want to provide attributes for the temporal relations.

Strategy 2: separate relations for time

New:

  <annotations inherit="category:zone,category:token">
...
<span label="TIME"/>
<!-- this annotation can be spanned or spanless -->
<span label="EVENT"/>
<!-- so can this annotation --> <span label="BEFORE">
<filler name="event" filler_types="EVENT"/>
<filler name="time" filler_types="TIME"/>
</span>
...
</annotations>

Legacy:

  <annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<annotation label="TIME"/>
<!-- this annotation can be spanned or spanless -->
<annotation label="EVENT"/>
<!-- so can this annotation --> <annotation label="BEFORE"/>
<attribute of_annotation="BEFORE" type="annotation" name="event">
<label_restriction label="EVENT"/>
</attribute>
<attribute of_annotation="BEFORE" type="annotation" name="time">
<label_restriction label="TIME"/>
</attribute>
 ...
</annotation_set_descriptor>
</annotation_set_descriptors>

You could generalize this strategy by having a single temporal relation with an attribute to indicate what kind of temporal relation it is.

New:

  <annotations inherit="category:zone,category:token">
...
<span label="TIME"/>
<!-- this annotation can be spanned or spanless -->
<span label="EVENT"/>
<!-- so can this annotation --> <span label="TEMPORAL"/>
<filler name="event" filler_types="EVENT"/>
<filler name="time" filler_types="TIME"/>
<string name="type" choices="BEFORE,AFTER,..."/>
</span>
...
</annotations>

Legacy:

  <annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<annotation label="TIME"/>
<!-- this annotation can be spanned or spanless -->
<annotation label="EVENT"/>
<!-- so can this annotation --> <annotation label="TEMPORAL"/>
<attribute of_annotation="TEMPORAL" type="annotation" name="event">
<label_restriction label="EVENT"/>
</attribute>
<attribute of_annotation="TEMPORAL" type="annotation" name="time">
<label_restriction label="TIME"/>
</attribute>
<attribute of_annotation="TEMPORAL" name="type">
<choice>BEFORE</choice>
<choice>AFTER</choice>
...
</attribute>
 ...
</annotation_set_descriptor>
</annotation_set_descriptors>

The obvious problem with this strategy is that the temporal relations are separated from the events they modify (because there's no way of showing or representing relations as subordinate attributes).

Strategy 3: subordinate the temporal relation

This strategy is a combination of the first two.

New:

  <annotations inherit="category:zone,category:token">
...
<span label="TIME"/>
<!-- this annotation can be spanned or spanless -->
<span label="EVENT">
<filler_set name="temporal" filler_types="TEMPORAL"/>
</span>
<!-- so can this annotation --> <span label="TEMPORAL"/>
<filler name="time" filler_types="TIME"/>
<string name="type" choices="BEFORE,AFTER,..."/>
</span>
...
</annotations>

Legacy:

  <annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<annotation label="TIME"/>
<!-- this annotation can be spanned or spanless -->
<annotation label="EVENT"/>
<!-- so can this annotation --> <annotation label="TEMPORAL"/>
<attribute of_annotation="TEMPORAL" type="annotation" name="time">
<label_restriction label="TIME"/>
</attribute>
<attribute of_annotation="TEMPORAL" name="type">
<choice>BEFORE</choice>
<choice>AFTER</choice>
...
</attribute>
<attribute of_annotation="EVENT" type="annotation" aggregation="set" name="temporal">
<label_restriction label="TEMPORAL"/>
</attribute>
 ...
</annotation_set_descriptor>
</annotation_set_descriptors>

The distinction here is subtle: instead of TEMPORAL being a two-place relation between an event and a time, it's got only one argument, the time, and its relation to the EVENT is represented by the its presence in the "temporal" set-aggregation annotation-valued attribute.

The obvious disadvantage to this strategy is that it doesn't correspond trivially to what we'd think of as the "correct event logic". However, given that we're talking about annotations, not objects in a knowledge representation, it might ultimately be the proper compromise.

Defining multiple content annotation sets

Every task in MAT has multiple annotation sets defined: at the very least, the task will define the "admin" set, which contains the SEGMENT annotation. And all the examples up to this point inherit zone and token annotation set categories from the root task. However, all the examples so far also have defined a single content annotation set (i.e., a set which is neither admin, zone, or token).

But there's no reason you have to do this; defining multiple annotation sets is absolutely trivial. Here's a simple example.

New:

  <annotations inherit="category:zone,category:token">
<span label="TAG1" of_set="content1"/>
<span label="TAG2" of_set="content2"/>
</annotations>

Legacy:

  <annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content1" category="content">
<annotation label="TAG1"/>
</annotation_set_descriptor>
<annotation_set_descriptor name="content2" category="content">
<annotation label="TAG2"/>
</annotation_set_descriptor>
</annotation_set_descriptors>

As you can see, in the new subsystem, it's no more complicated than setting the of_set attribute (and of_category, if it's other than "content"); and in the legacy subsystem, it's no more complicated than placing the annotations and attributes in different <annotation_set_descriptor> elements. Note that definition order is important; you can't refer to an annotation label (say, in a <label_restriction> in the legacy subsystem or the <filler_types> in the new subsystem) before it's defined.

There are many reasons you may not want all your content annotations in the same set. For instance, you may want to structure your annotation task so that all your mentions are annotated before any of your relations are added. You may also have multiple engines you're trying to use, each of which works on a different subset of your annotations. In these cases, you might want multiple annotation steps; and since steps are linked to annotation types at the granularity of annotation sets, if you have multiple annotation steps, you'll want multiple annotation sets.

Note that there are reasons to have multiple annotation sets even if you don't map them to multiple steps. You may have a single annotation step, which adds multiple sets, and in this hand annotation step in the UI, you'll now have multiple active annotation sets, any of which you can disable or hide as a set, rather than annotation by annotation.

Defining attributes in their own annotation set

One useful feature of the way annotation types are defined in MAT is that attributes can be placed in annotation sets other than the set of the annotations that bear them. In the legacy subsystem, attributes are intentionally defined outside the scope of the annotation in order to emphasize this; in the new subsystem, it's managed via XML attributes. For instance, perhaps you want your event tagging workflow to separate finding the event spans from connecting them with their arguments. You could do this:

New:

  <annotations inherit="category:zone,category:token">
<span label="TAG1" of_set="mentions"/>
<span label="TAG2" of_set="mentions"/>
<span label="EVENT1" of_set="events">
<filler name="arg1" of_set="args" filler_types="TAG1"/>
<filler name="arg2" of_set="args" filler_types="TAG2"/> </span>
 </annotations>

Legacy:

  <annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="mentions" category="content">
<annotation label="TAG1"/>
<annotation label="TAG2"/>
</annotation_set_descriptor>
<annotation_set_descriptor name="events" category="content">
<annotation label="EVENT1"/>
</annotation_set_descriptor>
<annotation_set_descriptor name="args" category="content">
<attribute of_annotation="EVENT1" name="arg1" type="annotation">
<label_restriction="TAG1"/>
</attribute>
<attribute of_annotation="EVENT1" name="arg2" type="annotation">
<label_restriction="TAG2"/>
</attribute> </annotation_set_descriptor>
 </annotation_set_descriptors>

In the UI, when you enter a step which adds the "events" set, you'll be able to add, delete, and modify the extent of event spans, but not edit their attributes, and when you enter a step which adds the "args" set, you'll be able to edit the attributes of your event spans, but not delete them, modify their extents, or add new ones.

Customizing the annotation presentation

Defining content annotation colors

The simplest way of customizing the UI behavior of your content annotations is to assign some CSS to distinguish them in the Web UI. The most appropriate way to do this is to use background colors; using font weight or style, or text color, as the sole distinguishing feature will fail in document alignment, comparison, and reconciliation for span annotations, and will always fail for spanless annotations, which don't cover any text.

New:

  <annotations>
...
<span label="TAG1" d_css="background-color: blue"/>
<span label="TAG2" d_css="background-color: green"/>
...
</annotations>

Legacy:

  <annotation_display>
...
<label name="TAG1" css="background-color: blue"/>
<label name="TAG2" css="background-color: green"/>
...
</annotation_display>

(In the legacy case, we assume that TAG1 and TAG2 are defined as true or effective labels in your task.)

We've assigned TAG1 a blue background color, and TAG2 a red background color. Since you're using CSS, you can assign colors using hexadecimal designations as well (or, if you prefer, set a background image, or other wacky things).

One caveat: at the moment, annotation spans are styled on a token-by-token basis. So if, for instance, you want to have a left bracket at the left end of an annotation, and a right bracket at the right end, you can't do that quite yet; you'd end up with each token bracketed.

Changing the annotation foreground font

Sometimes the color you choose is too dark to see the text, in which case you can use CSS to change the text color.

New:

  <annotations>
...
<span label="TAG1" d_css="background-color: red; color: white"/>
...
</annotations>

Legacy:

  <annotation_display>
...
<label name="TAG1" css="background-color: red; color: white"/>
...
</annotation_display>

(In the legacy case, we assume that TAG1 is defined as a true or effective label in your task.)

Remember, the value of the css/d_css attribute is really CSS; it's not converted or processed in any way before it's inserted into the CSS rules in the Web UI. The one caveat is that the CSS is applied to each token in the annotated phrase, not to the phrase as a whole.

Ensuring that some annotations appear "in front" of others

If you're marking longer spans and shorter spans (say, paragraphs or sentences on the one hand, and entity mentions on the other), you might want to ensure that the entity mentions always appear "in front" of the longer spans when annotations aren't stacked. By default, the order in which annotation display elements are defined in your task.xml file is the back-to-front overlap order of the annotations in the UI:

New:

  <annotations>
...
<span label="PARAGRAPH" d_css="background-color: blue"/>
<span label="PERSON" d_css="background-color: red"/>
...
</annotation_display>

Legacy:

  <annotation_display>
...
<label name="PARAGRAPH" css="background-color: blue"/>
<label name="PERSON" css="background-color: red"/>
...
</annotation_display>

(In the legacy case, we assume that PARAGRAPH and PERSON are defined as true or effective labels in your task.)

However, you may be relying on the order of definition of the annotations to be the order that the annotations appear in the legend and the annotation popup menu, and you may not want those orders to be the same. There are two ways to control this order.

First, as of MAT 3.1, you can use the new overlap_rank attribute to control the back-to-front order explicitly. All labels which lack the overlap_rank display attribute will be ordered back-to-front in the order they're defined, and in front of them, all labels which have an overlap_rank attribute will be ordered back-to-front in the order of their overlap_rank attribute. So in this case below, the overlap rank will be, back-to-front, "SECTION", "PARAGRAPH", "LOCATION", "PERSON":

New:

  <annotations>
...
<span label="PERSON" d_css="background-color: red" d_overlap_rank="2"/>
<span label="SECTION" d_css='background-color: pink"/>
<span label="PARAGRAPH" d_css="background-color: blue"/>
<span label="LOCATION" d_css="background-color: green" d_overlap_rank="1"/>
...
</annotation_display>

Legacy:

  <annotation_display>
...
<label name="PERSON" css="background-color: red" overlap_rank="2"/>
<label name="SECTION" css='background-color: pink"/>
<label name="PARAGRAPH" css="background-color: blue"/>
<label name="LOCATION" css="background-color: green" overlap_rank="1"/>
...
</annotation_display>

Second, as of MAT 3.2, you can use the "background_span" value of the new rendering_style attribute to force some labels to (a) never be stacked, and (b) appear behind all spans which do not have this attribute value. E.g.:

New:

  <annotations>
...
<span label="PERSON" d_css="background-color: red"/>
<span label="SECTION" d_css='background-color: pink" d_rendering_style="background_span"/>
<span label="PARAGRAPH" d_css="background-color: blue" d_rendering_style="background_span"/>
<span label="LOCATION" d_css="background-color: green"/>
...
</annotation_display>

Legacy:

  <annotation_display>
...
<label name="PERSON" css="background-color: red"/>
<label name="SECTION" css='background-color: pink" rendering_style="background_span"/>
<label name="PARAGRAPH" css="background-color: blue" rendering_style="background_span"/>
<label name="LOCATION" css="background-color: green"/>
...
</annotation_display>

In this example, the PERSON label will occur behind the LOCATION label, and the SECTION label will occur behind the PARAGRAPH label, and both the SECOND and PARAGRAPH labels will occur behind the PERSON and LOCATION labels. In addition, the SECTION and PARAGRAPH labels will never be stacked, no matter how the stacking in the UI is configured.

These two back-to-front ordering control options interact with each other properly; so you can use overlap_rank to explicitly control the order within the background spans, and within the other spans.

Customizing annotation editing

Controlling the label order in the legend and annotation popup menu

By default, the order in which labels are defined does not control the order in which they appear in the legend and the annotation popup menu. Instead, the default behavior is for the labels to appear in alphabetical order. The motivation for this default is that as your label set grows, it becomes harder and harder to easily find the label you're interested in, so imposing a default order that the task definer doesn't have to think about is useful. However, this may not be useful behavior for some tasks; either there may be an intuitive non-alphabetical order, or the task may define cascades for its annotation popup menu. In this case, you can force the UI to respect the definition order in the task, as follows:

<task name="My task">
...
<web_customization alphabetize_labels="no"/>
...
</task>

Note that if you do this, you may need to pay additional attention to the back-to-front order of spanned labels.

Defining keyboard accelerators

The display element also supports the option of having keyboard accelerators. These are keys that the user can press when the tagging menu is visible in the UI, which are equivalent to having selected that menu item. You can add an accelerator using an attribute on the element.

New:

  <annotations>
...
<span label="TAG1" d_accelerator="1" d_css="background-color: blue"/>
<span label="TAG2" d_accelerator="2" d_css="background-color: green"/>
...
</annotations>

Legacy:

  <annotation_display>
...
<label name="TAG1" accelerator="1" css="background-color: blue"/>
<label name="TAG2" accelerator="2" css="background-color: green"/>
...
</annotation_display>

(In the legacy case, we assume that TAG1 and TAG2 are defined as true or effective labels in your task.)

It's probably a good idea to choose the accelerators mnemonically (the first letter of the menu item name is always a good mnemonic, unless of course more than one item starts with the same letter). Be careful, though; MAT doesn't yet ensure that there are no clashes among accelerators.

Editing annotations immediately after creation

Spanless annotations will pop up an annotation editor immediately when they're created. If you want this behavior for spanned annotations, you can specify it.

New:

  <annotations>
...
<span label="TAG1" d_css="background-color: red" d_edit_immediately="yes"/>
...
</annotations>

Legacy:

  <annotation_display>
...
<label name="TAG1" css="background-color: red" edit_immediately="yes"/>
...
</annotation_display>

(In the legacy case, we assume that TAG1 is defined as a true or effective label in your task.)

Changing the presented name of an annotation

When an annotation is described in the UI (e.g., as an annotation attribute value in the annotation tables or annotation editors, or in the title bar of an annotation editor), it has a default presentation, which, for spanned annotations, is the string the span covers, and for spanless annotations, is the label and annotation index. There are many times when you might want a different default presented name, e.g., if the label tends to traverse enormous spans. Here's an example of using the presented_name attribute to truncate the text.

New:

  <annotations>
...
<span label="TAG1" d_css="background-color: red" d_presented_name="$(_text:truncate=20)"/>
...
</annotations>

Legacy:

  <annotation_display>
...
<label name="TAG1" css="background-color: red" presented_name="$(_text:truncate=20)"/>
...
</annotation_display>

(In the legacy case, we assume that TAG1 is defined as a true or effective label in your task.)

There's a fairly extensive formatting language which is available for the presented_name attribute; see the annotation XML reference for details.

Controlling the size of string attribute input widgets

Sometimes string attribute values are expected to be short, and sometimes long. By default, non-choice string attribute values get a short typein window in the annotation editor. If you want to make it a long string, you can do it.

New:

  <annotations>
...
<span label="TAG1">
<string name="attr1" d_editor_style="long_string"/>
</span>
...
</annotations>

Legacy:

  <annotation_display>
...
<attribute name="attr1" of_annotation="TAG1" editor_style="long_string"/>
...
</annotation_display>

(In the legacy case, we assume that TAG1 is defined as a true or effective label in your task, with attr1 as one of its attributes.)

Adding a custom attribute editor

You may have a string attribute which is actually a date, which you want to use a calendar widget to populate; or you might want to look up the annotation text in a database, and use the results to populate the attribute value. If you're willing to do some programming, there's a way to associate an arbitrary JavaScript function with a string, int, or float attribute, to use as its editor.

New:

  <web_customization>
<js>js/yourTaskCustomizations.js</js>
</web_customization>
<annotations>
...
<span label="TAG1">
<string name="attr1" d_custom_editor="yourJavascriptFunction"/>
</span>
...
</annotations>

Legacy:

  <web_customization>
<js>js/yourTaskCustomizations.js</js>
</web_customization>
<annotation_display>
...
<attribute name="attr1" of_annotation="TAG1" custom_editor="yourJavascriptFunction"/>
...
</annotation_display>

(In the legacy case, we assume that TAG1 is defined as a true or effective label in your task, with attr1 as one of its attributes.)

You can define your function in your task directory in js/yourTaskCustomizations.js (or whatever name you choose), and this file will be loaded when the UI is loaded. Unfortunately, we don't really have the resources to document the API this function has to conform to; you can find the code which governs this capability in the CustomEditorCellDisplay implementation in MAT_PKG_HOME/web/htdocs/js/mat_doc_display.js.

Using cascaded menus for more and less specialized tags

Let's say that you've defined the following annotations.

New:

<annotations>
<span label="PERSON"/>
<span label="MAN"/>
<span label="WOMAN"/>
<span label="US-LOCATION"/>
<span label="FOREIGN-LOCATION"/>
</annotations>

Legacy:

<annotation_set_descriptors>
 <annotation_set_descriptor name="content" category="content">
<annotation label="PERSON"/>
<annotation label="MAN"/>
<annotation label="WOMAN"/>
<annotation label="US-LOCATION"/>
<annotation label="FOREIGN-LOCATION"/>
</annotation_set_descriptor>
</annotation_set_descriptors>

Your annotator is instructed to label people, using PERSON as the annotation if she can't tell which of MAN or WOMAN is applicable. Your preference is to arrange these in a visual hierarchy for the annotator's convenience; you wish to do the same with US-LOCATION and FOREIGN-LOCATION, even though they don't have a common, less specific annotation.

New:

<annotations>
<display_group>
<span label="PERSON" d_css="background-color: blue"/>
<span label="MAN" d_css="background-color: pink"/>
<span label="WOMAN" d_css="background-color: green"/>
</display_group>
<display_group name="LOCATION">
<span label="US-LOCATION" d_css="background-color: gray"/>
<span label="FOREIGN-LOCATION" d_css="background-color: red"/>
</display_group>
</annotations>

Legacy:


<annotation_set_descriptors>
 <annotation_set_descriptor name="content" category="content">
<annotation label="PERSON"/>
<annotation label="MAN"/>
<annotation label="WOMAN"/>
<annotation label="US-LOCATION"/>
<annotation label="FOREIGN-LOCATION"/>
</annotation_set_descriptor>
</annotation_set_descriptors>
<annotation_display>
...
<label name="PERSON" css="background-color: blue"/>
<label name="MAN" css="background-color: pink"/>
<label name="WOMAN" css="background-color: green"/>
<label name="US-LOCATION" css="background-color: gray"/>
<label name="FOREIGN-LOCATION" css="background-color: red"/>
...
<label_group name="PERSON" children="MAN,WOMAN"/>
<label_group name="LOCATION" children="US-LOCATION,FOREIGN-LOCATION"/>
...
</annotation_display>

In the legacy case, the label group can reference an existing annotation (as in the PERSON case) or create its own group (as in the LOCATION case); in the new case, the display group either has its own name and all its children are children in the group, or it can treat the first annotation in the group as the parent. The effect of these groups will be to create submenus in the annotation popup in the MAT UI.

Note that in all cases, the actual labels which appear in this construction must be styled. The whole point of this construction is to create cascades for annotation menus, and if the label isn't styled, it doesn't appear in the annotation menu at all. As a result, it's an error for a referenced label not to be styled.

Also note that if you set up annotation menu cascades, you might want to ensure that your labels are not alphabetized.

If you have effective labels, the contract between the original and legacy cases is a bit more extensive. Let's say you want to make a cascade out of a set of effective labels:

New:

  <annotations>
<span label="ENAMEX">
<effective_labels name="type">
<label label="PERSON" value="PER" d_css="background-color: blue"/>
<label label="ORGANIZATION" value="ORG" d_css="background-color: pink"/>
<label label="LOCATION" value="LOC" d_css="background-color: green"/>
<display_group/>
</effective_labels>
</span>
</annotations>

Legacy:

  <annotation_set_descriptors>
<annotation_set_descriptor name="content" category="content">
<annotation label="ENAMEX"/>
<attribute of_annotation="ENAMEX" name="type">
<choice effective_label="PERSON">PER</choice>
<choice effective_label="ORGANIZATION">ORG</choice>
<choice effective_label="LOCATION">LOC</choice>
</attribute>
</annotation_set_descriptor>
</annotation_set_descriptors>
<annotation_display>
<label name="PERSON" css="background-color: blue"/>
<label name="ORGANIZATION" css="background-color: pink"/>
<label name="LOCATION" css="background-color: green"/>
<label_group name="ENAMEX" children="PERSON,ORGANIZATION,LOCATION"/>
</annotation_display>

In the new case, the <display_group> element within an <effective_label> element declares that the effective labels form a cascade beneath the parent span label; the <display_group> element in this case can also be named and styled, as it can in the main case, but it can have no children. What this means is that in the legacy case, it's possible to have an effective label as the parent of a cascade subtree, but in the new case, it's only possible to have effective labels at the leaves of the cascade.