Task XML Use Cases

Use cases for the XML format for the task files (see "Creating a New Task") are described in this document. The reference document is found here. Click here for a split-screen view.

Although the annotation types are defined in the task.xml file, we provide examples for them separately. Here, we focus on the other elements of the task.xml file.

Customizing the overall UI

Changing the order of annotation labels

By default, the labels available to the annotator are listed in the legend by annotation set, and then alphabetized within each set. Similarly, in the annotation editor popups, the labels are alphabetized by default. If you want the labels to appear in the order they were defined in the task.xml file, you can do this:

  <web_customizations alphabetize_labels="no"/>

Changing the name of the UI

If you want to "brand" your annotation task, you can change the title of the Web page and the "slug" that appears at the leftmost edge of the MAT desktop toolbar, if all your installed tasks have the same branding information (if not, it'll just use the default MAT branding). You can change the branding like this:

  <web_customizations>
<short_name>MCAT</short_name>
<long_name>MCAT: My Company's Annotation Tool</long_name>
</web_customizations>

Modifying the language information

Working with right-to-left languages

If you want to annotate, say, Hebrew, you can define it as a right-to-left language as follows:

  <languages>
...
<language name="Hebrew" code="he" text_right_to_left="yes"/>
...
</languages>

Defining engines

Changing the Java default heap and stack sizes

If you know you want Java to be called with, say, 4GB of heap by default (by the jCarafe tokenizer, trainer, and tagger), you can set this globally in your task:

  <engines>
<java_subprocess_parameters heap_size="4G"/>
...
</engines>

Defining multiple model configurations for a trainable engine

MAT knows that an engine is trainable because you define at least one model configuration for it. But you can define multiple configurations, if, e.g., you want to use different training strategies, or you want to have configurations which bear different numbers of model-building iterations. (All these options can be configured on the command line, so you don't strictly need multiple configurations; but you might find them convenient.) Here, the default configuration (the unnamed one) uses the PSA training method with 6 iterations; the alternative configuration uses the standard training method with the standard number of iterations:

  <engine name='carafe_tag_engine'>
...
<model_config class='MAT.JavaCarafe.CarafeModelBuilder'>
<build_settings training_method='psa' max_iterations='6'/>
</model_config>
<model_config config_name='alt_model_build'
class='MAT.JavaCarafe.CarafeModelBuilder'/>
...
</engine>

Defining complex workflows

Multiple mixed-initiative steps with the possibility of correction

MAT 3.0 provides the option of multiple mixed-initiative steps, as we've seen in the Sample Relations task. Such a task starts with multiple annotation sets.

New:

  <annotations inherit='category:zone,category:token'>
...
<span label="LABEL1" of_set="set1" of_category="content"/>
<span label="LABEL2" of_set="set2" of_category="content"/>
...
</annotations>

Legacy:

  <annotation_set_descriptors inherit='category:zone,category:token'>
<annotation_set_descriptor name="set1" category="content">
...
<annotation label="LABEL1"/>
...
</annotation_set_descriptor>
<annotation_set_descriptor name="set2" category="content">
...
<annotation label="LABEL2"/>
...
</annotation_set_descriptor>
</annotation_set_descriptors>

The key to defining multiple possibilities for mixed initiative is in the steps:

  <steps>
<annotation_step name="set1_step" type="mixed"
engine="..." sets_added="set1"/>
<annotation_step name="set2_step" type="mixed"
engine="..." sets_added="set2"/>
<annotation_step name="correction_step" type="hand"
sets_modified="set1,set2"/> </steps>

The idea here is that there are separate engines devoted to adding the annotations in set1 and set2. An annotator might want to do pretagging and correction for each set individually:

  <workflows>
<workflow name="Mixed Initiative">
<step name="set1_step" pretty_name="Do set1"/>
<step name="set2_step" pretty_name="Do set2"/> </workflow>
...
</workflows>

This workflow requires that all annotations in set1 are completed before set2 is begun. But even if this expresses the annotator's normal workflow, there may be times when the annotator finds an error in the set1 annotations while she's annotating set2. That's where the final step comes in:

<workflows>
...
<workflow name="Correction">
<step name="correction_step" pretty_name="Correct"/>
</workflow>
</workflows>

During hand annotation, the annotator can simply switch to the Correction workflow. The single step in this workflow is defined as only modifying sets, which is interpreted by the UI as permitting the annotator to modify sets which have been otherwise completed. Once corrections in previous steps have been done, the annotator can simply return to the Mixed Initiative workflow and continue.

Note: If you're correcting in the midst of other annotation, as this example suggests, don't complete the correction step in the correction workflow. This will mark all your annotation sets gold, which isn't want you want in this case, since you'll be returning to annotation for set2.

Defining a workflow step which adds attributes only

One application of this sort of complex workflow might involve separating span tagging from attribute filling. E.g., you might have annotation sets like the following.

New:

  <annotations inherit='category:zone,category:token'>
<span label="PERSON" of_set="spans">
<string name="nomtype" choices="NAM,NOM,PRO" of_set="attrs"/>
</span>
<span label="ORGANIZATION" of_set="spans">
<string name="nomtype" choices="NAM,NOM,PRO" of_set="attrs"/>
</span>
 <span label="LOCATION" of_set="spans">
<string name="nomtype" choices="NAM,NOM,PRO" of_set="attrs"/>
</span>
</annotations>

Legacy:

  <annotation_set_descriptors inherit='category:zone,category:token'>
<annotation_set_descriptor name="spans" category="content">
<annotation label="PERSON"/>
<annotation label="ORGANIZATION"/>
<annotation label="LOCATION"/>
</annotation_set_descriptor>
<annotation_set_descriptor name="attrs" category="content">
<attribute of_annotation="PERSON,LOCATION,ORGANIZATION" name="nomtype">
<choice>NAM</choice>
<choice>NOM</choice>
<choice>PRO</choice>
</attribute>
</annotation_set_descriptor>
</annotation_set_descriptors>

So the "attrs" set descriptor, while still in the content category, defines an attribute on the annotations which are defined in the "spans" set descriptor. So given the following steps and workflow:

<steps>
<annotation_step type="hand" name="add spans" sets_added="spans"/>
<annotation_step type="hand" name="add attrs" sets_added="attrs"/>
</steps>
<workflows>
<workflow name="Annotation">
<step name="add spans"/>
<step name="add attrs"/>
</workflow>
</workflows>

you'll be able in the first step only to add the spans, but not edit the "nomtype" attribute; and in the second step, only edit the "nomtype" attribute of existing spans, but not create, modify or delete the spans.

This partition of activities interacts with attribute defaults in a perhaps unexpected way. If the attribute has a default, it will be assigned when the annotation is created, even if the current step doesn't support adding or editing the attribute. Accordingly, if the workflow is undoable, any attribute which has a default will be restored to the default value when the step that adds the attribute is undone.