The sample tasks

The sample tasks can be found in MAT_PKG_HOME/sample. There are three task directories found there; in this documentation, we discuss two of them. Both of these directories, like all task directories, have a file named task.xml at its root. The format of this file is described in the task XML documentation.

The first of these directories is ne. This directory has a python subdirectory to implement a Python engine for the Sample Relations task below, but no Javascript customizations (so no js subdirectory). See "Creating a task" for a description of the subdirectory structure of the task.

The task.xml file in the ne directory contains three tasks: "Named Entity", "Enhanced Named Entity" and "Sample Relations". The first task is a simple span task; it contains spanned annotations without any complex attribute structure. This task is used for Tutorials 1 - 6, and for a variety of other examples throughout this documentation. The second task is a complex task, containing both spanned and spanless annotations and multiple attributes, some of which take other annotations as their values. This second task is used for Tutorial 7, as well as the UI documentation on editing annotations and spanless annotations. The final task is a complex task, containing spanned and spanless annotations, which is intended to illustrate how multiple content annotation sets and multiple engines can be used. This third task is used for Tutorial 8.

The second of these directories is classification. The task.xml file in this directory contains a single task, "Sample Sentiment", which exemplifies using classification to assign positive and negative sentiment to sentences.

In the sample tasks, we'll exemplify both the new method of defining annotations and their attributes, and the legacy method.

The "Named Entity" task
The "Enhanced Named Entity" task
The "Sample Relations" task
The "Sample Sentiment" task

The "Named Entity" task

The file typically contains a single task declaration, with <task> as the toplevel element. However, if you wish to declare multiple tasks in the same task.xml file, it can also contain multiple <task> elements, within a <tasks> element. Here, we will define three tasks. Each task must be named, and declare supported languages:

     1	<tasks>
     2	  <task name='Named Entity'>
     3	    <languages>
     4	      <language code='en' name='English' tokenless_autotag_delimiters='.,/?!;:'/>
     5	    </languages>

Each task usually contains a block of annotation declarations:

     6	    <annotations all_annotations_known='no'
     7	                 inherit='category:zone,category:token'>
     8	      <span label='PERSON'
     9	            d_css='background-color: #CCFF66' d_accelerator='P'/>
    10	      <span label='LOCATION'
    11	            d_css='background-color: #FF99CC' d_accelerator='L'/>
    12	      <span label='ORGANIZATION'
    13	            d_css='background-color: #99CCFF' d_accelerator='O'/>
    14	    </annotations>

Here, we have inherited the zone and token category tags from the root task, and defined our own content tags, PERSON, LOCATION and ORGANIZATION. We also define the display properties of these tags. For instance, the PERSON tag will display as light green (defined here in hexadecimal), and the tagging menu will support the "P" keyboard accelerator for annotating a selected span with the PERSON tag.

For reference, here's the same annotation declarations and their display attributes defined by the legacy method:

      	    <annotation_set_descriptors all_annotations_known='no'
     	                                inherit='category:zone,category:token'>
     	      <annotation_set_descriptor category='content' name='content'>
     	        <annotation label='PERSON'/>
                <annotation label='LOCATION'/>
    	        <annotation label='ORGANIZATION'/>
    	      </annotation_set_descriptor>
    	    </annotation_set_descriptors>
    	    <annotation_display>
    	      <label css='background-color: #CCFF66' name='PERSON' accelerator='P'/>
    	      <label css='background-color: #FF99CC' name='LOCATION' accelerator='L'/>
    	      <label css='background-color: #99CCFF' name='ORGANIZATION' accelerator='O'/>
    	    </annotation_display>

If the task supports automated annotation (trainable or otherwise), it will define engines:

    15	    <engines>
    16	      <engine name='carafe_tag_engine'>
    17	        <default_model>default_model</default_model>
    18	        <model_config class='MAT.JavaCarafe.CarafeModelBuilder'>
    19	          <build_settings training_method='psa' max_iterations='6'/>
    20	        </model_config>
    21	        <model_config config_name='alt_model_build'
    22	                      class='MAT.JavaCarafe.CarafeModelBuilder'/>
    23	        <step_config class='MAT.JavaCarafe.CarafeTagStep'/>
    24	      </engine>
    25	      <engine name='align_engine'>
    26	        <step_config class='MAT.PluginMgr.AlignStep'/>
    27	      </engine>
    28	      <engine name='whole_zone_engine'>
    29	        <step_config class='MAT.PluginMgr.WholeZoneStep'/>
    30	      </engine>
    31	      <engine name='carafe_tokenize_engine'>
    32	        <step_config class='MAT.JavaCarafe.CarafeTokenizationStep'/>
    33	      </engine>
    34	    </engines>

Each engine is named, and specifies a class, using the <step_config> element, which implements the automated annotation. If the engine is trainable, it will also have at least one <model_config> element (as we see for "carafe_tag_engine" above), which can support customizations for the model build settings. If it is trainable, it also optionally defines a default model file name, which is suffixed with the relevant language and step when it's referenced.

If a task supports any hand or automated annotation activity, it defines steps, which are the basic building blocks of the annotation activities:

    35	    <steps>
    36	      <annotation_step engine='align_engine' type='auto' name='align'/>
    37	      <annotation_step engine='carafe_tag_engine' sets_added='category:content'
    38	                       type='mixed' name='carafe_tag'/>
    39	      <annotation_step engine='whole_zone_engine' sets_added='category:zone'
    40	                       type='auto' name='whole_zone'/>
    41	      <annotation_step engine='carafe_tokenize_engine'
    42	                       sets_added='category:token' type='auto'
    43	                       name='carafe_tokenize'/>
    44	      <annotation_step type='hand' name='correct'
    45	                       sets_modified='category:content'/>
    46	    </steps>

The most common step is the annotation step, and there are four subtypes:

hand annotation steps, which have no engines associated with them and can only be applied in the UI
automated steps, which have engines associated with them which must be used (i.e., you can't add or modify these annotations by hand)
correction steps, which have engines associated with them whose output can be corrected if you're in the UI
mixed steps, which have engines associated with them, in which you can either do hand annotation in the UI or automated annotation followed by correction

This task has three auto steps (whole_zone, align, carafe_tokenize), a mixed step (carafe_tag), and a hand step (correct). These steps specify which annotations they modify or add, and optionally connect those annotation sets or categories with the engine which applies them.

These steps can be assembled into workflows:

    47	    <workflows>
    48	      <workflow name='Tokenless hand annotation'>
    49	        <step pretty_name='zone' name='whole_zone'/>
    50	        <step name='carafe_tag' pretty_name='hand tag' type='hand'/>
    51	      </workflow>
    52	      <workflow name='Review/repair'>
    53	        <step name='correct'/>
    54	      </workflow>
    55	      <workflow name='Demo' undoable="yes">
    56	        <step pretty_name='zone' name='whole_zone'/>
    57	        <step pretty_name='tokenize' name='carafe_tokenize'/>
    58	        <step pretty_name='tag' name='carafe_tag'/>
    59	      </workflow>
    60	      <workflow name='Align'>
    61	        <step pretty_name='zone' name='whole_zone'/>
    62	        <step pretty_name='tokenize' name='carafe_tokenize'/>
    63	        <step name='align'/>
    64	      </workflow>
    65	    </workflows>

Here's what these workflows do:

The most commonly used workflow here is the "Demo" workflow, which provides zoning and tokenization and then mixed-initiative annotation of the tags
The "Tokenless hand annotation" workflow zones without tokenization, and uses the "carafe_tag" step strictly as a hand step (it further restrictions the step type in the context of the workflow).
The "Review/repair" workflow is a single hand step intended for annotation correction.
The "Align" workflow is intended for documents which have content tags but nothing else. These documents were most likely prepared by other tools. The "align" step, instead of doing hand or automated tagging, aligns the content annotations with the token boundaries.

Once the workflows are defined, you can (optionally) specify the properties of workspaces. By default, you don't need to say anything additional about workspaces, since every human-annotatable workflow can serve as the basis of a workspace, but you might want to declare your default configuration (or workflow), and set special properties of workspace operations:

    66	    <workspaces default_config="Demo">
    67	      <workspace workflow='Demo'/>
    68	    </workspaces>

Here, we define a default configuration, and set up a block which can (but currently does not) customize the behavior of the "Demo" workflow in the context of the workspaces.

The "Enhanced Named Entity" task

At this point, we end the first task and begin the second one.

    69	  </task>
    70	  <task name='Enhanced Named Entity'>
    71	    <languages>
    72	      <language code='en' name='English' tokenless_autotag_delimiters='.,/?!;:'/>
    73	    </languages>

And now, we define the annotations and their displays in the second task:

    74	    <annotations all_annotations_known='no'
    75	                 inherit='category:zone,category:token'>
    76	      <span label='PERSON'
    77	            d_css='background-color: #CCFF66' d_accelerator='P'
    78	            d_edit_immediately='yes'>
    79	        <string name='nomtype' choices="Proper name,Noun,Pronoun"/>
    80	      </span>
    81	      <span label='LOCATION'
    82	            d_css='background-color: #FF99CC' d_accelerator='L'
    83	            d_edit_immediately='yes'>
    84	        <string name='nomtype' choices="Proper name,Noun,Pronoun"/>
    85	        <boolean name='is_political_entity'/>
    86	      </span>
    87	      <span label='ORGANIZATION'
    88	            d_css='background-color: #99CCFF' d_accelerator='O'
    89	            d_edit_immediately='yes'>
    90	        <string name='nomtype' choices="Proper name,Noun,Pronoun"/>
    91	      </span>
    92	      <spanless label='PERSON_COREF'
    93	                d_css='background-color: lightgreen' d_accelerator='C'>
    94	        <filler_set name='mentions' filler_types='PERSON'/>
    95	      </spanless>
    96	      <span label='LOCATED_EVENT'
    97	            d_css='background-color: pink' d_accelerator='E'
    98	            d_edit_immediately='yes'>
    99	        <filler name='actor' filler_types='PERSON'/>
   100	        <filler name='location' filler_types='LOCATION,ORGANIZATION'/>
   101	      </span>
   102	      <spanless label='LOCATION_RELATION'
   103	                d_css='background-color: orange' d_accelerator='R'>
   104	        <filler name='located' filler_types='ORGANIZATION,PERSON'/>
   105	        <filler name='location' filler_types='LOCATION'/>
   106	      </spanless>
   107	    </annotations>

This annotation definition block is much more complex than the one in the "Named Entity" task. In addition to the three labels we saw previously, we also have three other labels: "LOCATED_EVENT" (spanned) and "PERSON_COREF" and "LOCATION_RELATION" (spanless). We also have several attributes, of different types. Most notable is the "mentions" attribute of the "PERSON_COREF" annotation, which takes sets of annotations as its value. The annotation display information is also somewhat more complex; we see here that all of the annotations are marked to be edited immediately upon creation.

For reference, here's the same annotation declarations and their display attributes defined by the legacy method:

    	    <annotation_set_descriptors all_annotations_known='no'
    	                                inherit='category:zone,category:token'>
    	      <annotation_set_descriptor category='content' name='content'>
    	        <annotation label='PERSON'/>
    	        <annotation label='LOCATION'/>
    	        <annotation label='ORGANIZATION'/>
    	        <attribute name='nomtype' of_annotation='PERSON,LOCATION,ORGANIZATION'>
    	          <choice>Proper name</choice>
    	          <choice>Noun</choice>
    	          <choice>Pronoun</choice>
    	        </attribute>
    	        <attribute name='is_political_entity' type='boolean'
    	                   of_annotation='LOCATION'/>
    	        <annotation label='LOCATED_EVENT'/>
    	        <attribute name='actor' type='annotation' of_annotation='LOCATED_EVENT'>
    	          <label_restriction label='PERSON'/>
    	        </attribute>
    	        <attribute name='location' type='annotation'
    	                   of_annotation='LOCATED_EVENT'>
    	          <label_restriction label='LOCATION'/>
    	          <label_restriction label='ORGANIZATION'/>
    	        </attribute>
   	        <annotation span='no' label='PERSON_COREF'/>
   	        <attribute name='mentions' aggregation='set' type='annotation'
   	                   of_annotation='PERSON_COREF'>
   	          <label_restriction label='PERSON'/>
   	        </attribute>
   	        <annotation span='no' label='LOCATION_RELATION'/>
   	        <attribute name='located' type='annotation'
   	                   of_annotation='LOCATION_RELATION'>
   	          <label_restriction label='ORGANIZATION'/>
   	          <label_restriction label='PERSON'/>
   	        </attribute>
   	        <attribute name='location' type='annotation'
   	                   of_annotation='LOCATION_RELATION'>
   	          <label_restriction label='LOCATION'/>
   	        </attribute>
   	      </annotation_set_descriptor>
   	    </annotation_set_descriptors>
   	    <annotation_display>
   	      <label css='background-color: #CCFF66' name='PERSON' accelerator='P'
   	             edit_immediately='yes'/>
   	      <label css='background-color: #FF99CC' name='LOCATION' accelerator='L'
   	             edit_immediately='yes'/>
   	      <label css='background-color: #99CCFF' name='ORGANIZATION' accelerator='O'
   	             edit_immediately='yes'/>
   	      <label css='background-color: lightgreen' name='PERSON_COREF'
   	             accelerator='C' edit_immediately='yes'/>
   	      <label css='background-color: pink' name='LOCATED_EVENT' accelerator='E'
   	             edit_immediately='yes'/>
   	      <label css='background-color: orange' name='LOCATION_RELATION'
   	             accelerator='R' edit_immediately='yes'/>
   	    </annotation_display>

The remainder of this task is essentially identical to the "Named Entity" task:

   112	    <engines>
   113	      <engine name='carafe_tag_engine'>
   114	        <default_model>default_enhanced_model</default_model>
   115	        <model_config class='MAT.JavaCarafe.CarafeModelBuilder'>
   116	          <build_settings training_method='psa' max_iterations='6'/>
   117	        </model_config>
   118	        <model_config config_name='alt_model_build'
   119	                      class='MAT.JavaCarafe.CarafeModelBuilder'/>
   120	        <step_config class='MAT.JavaCarafe.CarafeTagStep'/>
   121	      </engine>
   122	      <engine name='align_engine'>
   123	        <step_config class='MAT.PluginMgr.AlignStep'/>
   124	      </engine>
   125	      <engine name='whole_zone_engine'>
   126	        <step_config class='MAT.PluginMgr.WholeZoneStep'/>
   127	      </engine>
   128	      <engine name='carafe_tokenize_engine'>
   129	        <step_config class='MAT.JavaCarafe.CarafeTokenizationStep'/>
   130	      </engine>
   131	    </engines>
   132	    <steps>
   133	      <annotation_step engine='align_engine' type='auto' name='align'/>
   134	      <annotation_step engine='carafe_tag_engine' sets_added='category:content'
   135	                       type='mixed' name='carafe_tag'/>
   136	      <annotation_step engine='whole_zone_engine' sets_added='category:zone'
   137	                       type='auto' name='whole_zone'/>
   138	      <annotation_step engine='carafe_tokenize_engine'
   139	                       sets_added='category:token' type='auto'
   140	                       name='carafe_tokenize'/>
   141	      <annotation_step type='hand' name='correct'
   142	                       sets_modified='category:content'/>
   143	    </steps>
   144	    <workflows>
   145	      <workflow name='Tokenless hand annotation'>
   146	        <step pretty_name='zone' name='whole_zone'/>
   147	        <step name='carafe_tag' pretty_name='hand tag' type='hand'/>
   148	      </workflow>
   149	      <workflow name='Review/repair'>
   150	        <step name='correct'/>
   151	      </workflow>
   152	      <workflow name='Demo' undoable="yes">
   153	        <step pretty_name='zone' name='whole_zone'/>
   154	        <step pretty_name='tokenize' name='carafe_tokenize'/>
   155	        <step pretty_name='tag' name='carafe_tag'/>
   156	      </workflow>
   157	      <workflow name='Align'>
   158	        <step pretty_name='zone' name='whole_zone'/>
   159	        <step pretty_name='tokenize' name='carafe_tokenize'/>
   160	        <step name='align'/>
   161	      </workflow>
   162	    </workflows>
   163	    <workspaces>
   164	      <workspace workflow='Demo'/>
   165	    </workspaces>

Notably, because the jCarafe tagger only operates on the simple span subset of this (or any) task, the "Demo" workflow will only apply the spanned labels, not the attributes associated with them, and won't apply the spanless labels at all.

The "Sample Relations" task

At this point, we end the second task and begin the third:

   166	  </task>
   167	  <task name='Sample Relations'>
   168	    <languages>
   169	      <language code='en' name='English' tokenless_autotag_delimiters='.,/?!;:'/>
   170	    </languages>

This third task is intended to illustrate the impact of multiple content annotation sets: the ability to reuse tagging engines, to segregate annotation activities by annotation set, and to support multiple mixed-initiative steps in the same workflow. As part of this illustration, we've implemented an extremely simplistic two-argument trainable relation tagger, which essentially does classification of the bags of words in between successive pairs of candidate relations. We're not advertising this as a relation tagging capability for anything besides demonstrating how trainable relation tagging might be integrated. Here are the annotation sets and their displays:

   171	    <annotations all_annotations_known='no'
   172	                 inherit='category:zone,category:token'>
   173	      <span label='PERSON' of_set='entities'
   174	            d_css='background-color: LawnGreen' d_accelerator='P'
   175	            d_edit_immediately='yes'/>
   176	      <span label='LOCATION' of_set='entities'
   177	            d_css='background-color: HotPink' d_accelerator='L'
   178	            d_edit_immediately='yes'/>
   179	      <span label='ORGANIZATION' of_set='entities'
   180	            d_css='background-color: DeepSkyBlue' d_accelerator='O'
   181	            d_edit_immediately='yes'/>
   182	      <span label='NATIONALITY' of_set='nationality'
   183	            d_css='background-color: PaleVioletRed' d_accelerator='N'
   184	            d_edit_immediately='yes'/>
   185	      <spanless label="Employment" of_set='relations'
   186	                d_css="background-color: Gray">
   187	        <filler name="Employee" filler_types="PERSON"/>
   188	        <filler name="Employer" filler_types="ORGANIZATION,LOCATION,NATIONALITY"/>
   189	      </spanless>
   190	      <spanless label="Located" of_set='relations'
   191	                d_css="background-color: Thistle">
   192	        <filler name="Located-Entity" filler_types="PERSON,ORGANIZATION"/>
   193	        <filler name="Location" filler_types="LOCATION,NATIONALITY"/>
   194	      </spanless>
   195	    </annotations>

Notice that there are three annotation sets rather than one, and while they're each in category "content", they each have a different set name.

The "entities" set corresponds to the "content" set in the "Named Entity" task.
The "nationality" set contains a single, additional span annotation.
The "relations" set contains two spanless annotations, "Employment" and "Located".

For reference, here's the same annotation declarations and their display attributes defined by the legacy method:

   	    <annotation_set_descriptors all_annotations_known='no'
   	                                inherit='category:zone,category:token'>
   	      <annotation_set_descriptor category='content' name='entities'>
   	        <annotation label='PERSON'/>
   	        <annotation label='LOCATION'/>
   	        <annotation label='ORGANIZATION'/>
   	      </annotation_set_descriptor>
   	      <annotation_set_descriptor category='content' name='nationality'>
   	        <annotation label='NATIONALITY'/>
   	      </annotation_set_descriptor>
   	      <annotation_set_descriptor category='content' name='relations'>
   	        <annotation label="Employment" span="no"/>
   	        <attribute name="Employee" of_annotation="Employment" type="annotation">
   	          <label_restriction label="PERSON"/>
   	        </attribute>
   	        <attribute name="Employer" of_annotation="Employment" type="annotation">
   	          <label_restriction label="ORGANIZATION"/>
   	          <label_restriction label='LOCATION'/>
   	          <label_restriction label="NATIONALITY"/>
   	        </attribute>
   	        <annotation label="Located" span="no"/>
   	        <attribute name="Located-Entity" of_annotation="Located" type="annotation">
   	          <label_restriction label="PERSON"/>
   	          <label_restriction label="ORGANIZATION"/>
   	        </attribute>
   	        <attribute name="Location" of_annotation="Located" type="annotation">
   	          <label_restriction label="LOCATION"/>
   	          <label_restriction label="NATIONALITY"/>
   	        </attribute>
   	      </annotation_set_descriptor>
   	    </annotation_set_descriptors>
   	    <annotation_display>
   	      <label css='background-color: LawnGreen' name='PERSON' accelerator='P'
   	             edit_immediately='yes'/>
   	      <label css='background-color: HotPink' name='LOCATION' accelerator='L'
   	             edit_immediately='yes'/>
   	      <label css='background-color: DeepSkyBlue' name='ORGANIZATION' accelerator='O'
   	             edit_immediately='yes'/>
   	      <label css='background-color: PaleVioletRed' name='NATIONALITY' accelerator='N'
   	             edit_immediately='yes'/>
   	      <label name="Employment" css="background-color: Gray" edit_immediately="yes"/>
   	      <label name="Located" css="background-color: Thistle" edit_immediately="yes"/>
   	    </annotation_display>

The next element, which doesn't appear in the other two tasks, supports a range of Web UI customizations. In this case, we specify that we want the annotations to appear in the annotation menu and legend in the order they're defined, not in alphabetical order:

   196	    <web_customization alphabetize_labels="no"/>

Next, we define the engines.

   197	    <engines>
   198	      <engine name='carafe_tag_engine'>
   199	        <default_model>default_model</default_model>
   200	        <model_config class='MAT.JavaCarafe.CarafeModelBuilder'>
   201	          <build_settings training_method='psa' max_iterations='6'/>
   202	        </model_config>
   203	        <model_config config_name='alt_model_build'
   204	                      class='MAT.JavaCarafe.CarafeModelBuilder'/>
   205	        <step_config class='MAT.JavaCarafe.CarafeTagStep'/>
   206	      </engine>
   207	      <engine name='trivial_relation_tag_engine'>
   208	        <default_model>default_relation_model</default_model>
   209	        <model_config class='TrivialRelationTagger.CarafeMaxentRelationModelBuilder'/>
   210	        <step_config class='TrivialRelationTagger.CarafeRelationTagStep'/>
   211	      </engine>
   212	      <engine name='align_engine'>
   213	        <step_config class='MAT.PluginMgr.AlignStep'/>
   214	      </engine>
   215	      <engine name='whole_zone_engine'>
   216	        <step_config class='MAT.PluginMgr.WholeZoneStep'/>
   217	      </engine>
   218	      <engine name='carafe_tokenize_engine'>
   219	        <step_config class='MAT.JavaCarafe.CarafeTokenizationStep'/>
   220	      </engine>
   221	    </engines>

In addition to the engines declared in the other two tasks, this task includes the "trivial_relation_tag_engine", which implements the simplistic relation tagging we described a moment ago.

The important differences arise in the definition of the steps:

   222	    <steps>
   223	      <annotation_step engine='align_engine' type='auto' name='align'/>
   224	      <annotation_step engine='carafe_tag_engine' sets_added='entities'
   225	                       type='mixed' name='entity_tag'/>
   226	      <annotation_step engine='carafe_tag_engine' sets_added='nationality'
   227	                       type='mixed' name='nationality_tag'/>
   228	      <annotation_step engine='carafe_tag_engine' sets_added='entities,nationality'
   229	                       type='mixed' name='all_entity_tag'/>
   230	      <annotation_step engine='trivial_relation_tag_engine' sets_added='relations'
   231	                       type='mixed' name='relation_tag'/>
   232	      <annotation_step engine='whole_zone_engine' sets_added='category:zone'
   233	                       type='auto' name='whole_zone'/>
   234	      <annotation_step engine='carafe_tokenize_engine'
   235	                       sets_added='category:token' type='auto'
   236	                       name='carafe_tokenize'/>
   237	      <annotation_step type='hand' name='correct'
   238	                       sets_modified='category:content'/>
   239	    </steps>

Here, we see that the "carafe_tag_engine" is used in three different steps: "entity_tag", "nationality_tag", and "all_entity_tag". This last step adds two annotation sets, rather than one. When a trainable engine defines a default model, the model path is suffixed with both the language and the step name when it's referenced and used, so these three cases will be kept separate. We also see that the "trivial_relation_tag_engine" is used in the "relation_tag" step. All four of these steps are mixed steps; so you'll be able to annotate the sets they add by hand de novo, or pretag and correct if a model is available. Finally, we have the workflows and workspaces (and then we end the task and the file):

   240	    <workflows>
   241	      <workflow name='Mixed Initiative Annotation' undoable="yes">
   242	        <step pretty_name='zone' name='whole_zone'/>
   243	        <step pretty_name='tokenize' name='carafe_tokenize'/>
   244	        <step name='entity_tag' pretty_name='tag entities'/>
   245	        <step name='nationality_tag' pretty_name='tag nationalities'/>
   246	        <step name='relation_tag' pretty_name='tag relations'/>
   247	      </workflow>
   248	      <workflow name='Review/repair all steps'>
   249	        <step name='correct'/>
   250	      </workflow>
   251	      <workflow name='Demo'>
   252	        <step pretty_name='zone' name='whole_zone'/>
   253	        <step pretty_name='tokenize' name='carafe_tokenize'/>
   254	        <step pretty_name='tag all entities' name='all_entity_tag'/>
   255	        <step pretty_name='tag relations' name='relation_tag'/>
   256	      </workflow>
   257	      <workflow name='Align'>
   258	        <step pretty_name='zone' name='whole_zone'/>
   259	        <step pretty_name='tokenize' name='carafe_tokenize'/>
   260	        <step name='align'/>
   261	      </workflow>
   262	    </workflows>
   263	    <workspaces default_config="Mixed Initiative Annotation">
   264	      <workspace workflow='Mixed Initiative Annotation'/>
   265	    </workspaces>
   266	  </task>
   267	</tasks>

The "Mixed Initiative Annotation" workflow illustrates how multi-step mixed initiative works: it contains several mixed steps, each responsible for a different content annotation set. The annotator will complete the entity annotations, and then the nationality annotations, and then the relation annotations. In each case, the annotator will have the option of pretagging; most significantly, the annotator will have the opportunity to correct the spans before pretagging the relations.

The "Sample Sentiment" task

The task.xml file which contains this task contains only this task, so its toplevel element is <task>. illustrates sentence classification.

We name the task and declare the language information:

     1	<task name='Sample Sentiment'>
     2	  <languages>
     3	    <language code='en' name='English' tokenless_autotag_delimiters='.,/?!;:'/>
     4	  </languages>

Next, we declare the annotations. We don't bother with zone annotations for this task, since the whole document is annotatable and unzoned documents are treated as such. So we only inherit category:token.

     5	  <annotations all_annotations_known='no' inherit='category:token'>
     6	    <annotation_set name="structure" managed="no"/>
     7	    <span label="sentence" of_set="structure" d_rendering_style="background_span"
     8	          d_css="background-color: lightgray">
     9	      <string name="sentiment" of_set="sentiment">
    10	        <choice value="positive" d_accelerator="P" d_css="background-color: green"/>
    11	        <choice value="negative" d_accelerator="N" d_css="background-color: red; color: white"/>
    12	      </string>
    13	    </span>
    14	  </annotations>

There are a number of important features in this annotation declaration block.

First, the <annotation_set> element declares a new annotation set named structure, which is marked as managed="no". Most annotation sets are managed, the significance of which is discussed here. However, for the structure annotations, there isn't going to be anything to manage, because they won't be hand annotated or corrected. In most circumstances, we wouldn't care that there's no advantage to managing this set; but in this case, the same engine which generates the tokens (the jCarafe tokenizer) also generates the sentences, and the token annotations are unmanaged, and no step (and thus, no engine) can simultaneously add managed and unmanaged sets.

Next, we define a sentence span annotation, and assign it to the structure set. We've declared the rendering style for this annotation to be background_span, which forces the annotation to be styled behind the text, rather than stacked on top of it. This would be important, from the point of view of presentation, if this task were also adding other span annotations (it isn't, but it's useful to illustrate the functionality here).

Next, we define an attribute of the sentence span, named sentiment. We put this attribute in a separate annotation set, also named sentiment. The reason we separate it is that we're going to use the maximum-entropy trainer/tagger to learn the value of this attribute, and the annotation sets which are added or learned by this engine must consist exclusively of attributes.

Next, we define two choices for this attribute: positive and negative. Note that we do not define a default value for the attribute, so the sentence classifier will add a "null" label during classification, and leave all "null"-labeled elements unmarked. In other words, the distinction modeled here is actually a three-way distinction.

Finally, we assign styling to these two choices, so we can distinguish visually between positive, negative, and unmarked sentiment sentences.

This last feature (choice styling) is not available in the legacy annotation declaration format, and so we do not present the legacy version of these declarations.

Next, we declare the engines:

    15	  <engines>
    16	    <engine name='carafe_tokenize_engine'>
    17	      <step_config class='MAT.JavaCarafe.CarafeTokenizationStep'/>
    18	    </engine>
    19	    <engine name='classifier_engine'>
    20	      <model_config class='MAT.JavaCarafe.JCarafeMaxentClassifierModelBuilder'>
    21	        <build_settings feature_extractors="_bagOfWords,_bigrams"/>
    22	      </model_config>
    23	      <step_config class='MAT.JavaCarafe.JCarafeMaxentClassifierTagStep'/>
    24	    </engine>
    25	  </engines>

These declarations consist of the standard jCarafe tokenizer/sentence segmenter, and the jCarafe maximum-entropy classifier, which uses bags of words and bigrams for each sentence as its classification features.

Next, we declare the steps:

    26	  <steps>
    27	    <annotation_step engine="carafe_tokenize_engine" sets_added="category:token,structure"
    28	                     type="auto" name="carafe_structure"/>
    29	    <annotation_step engine='classifier_engine'
    30	                     sets_added='sentiment' type='mixed'
    31	                     name='attribute_tag'/>
    32	  </steps>

We declare only two steps. The first uses the tokenizer engine and adds the token annotations, and the annotations from the structure set (i.e., the sentences). The second engine trains for and applies the attributes in the sentiment set. We enable mixed-initiative annotation (i.e., pretagging plus correction) for this second step. The single workflow in this task can be undone, and uses these two steps:

    33	  <workflows>
    34	    <workflow name="Demo" undoable="yes">
    35	      <step name="carafe_structure">
    36	        <run_settings sentence_label="sentence"/>
    37	      </step>
    38	      <step name='attribute_tag'/>
    39	    </workflow>      
    40	  </workflows>

Note that in order to retrieve the sentences, we have to pass a runtime setting to the step which specifies the label name of the desired sentence annotations.

Finally, we declare a score profile, and end:

    41	  <score_profile>
    42	    <attrs_alone true_labels="sentence"/>
    43	  </score_profile>
    44	</task>

The score profile is not so important in this task, but it's crucial in more complex tasks. The <attrs_alone> element tells the scorer to break out separate scores for the performance of the individual attributes, in addition to evaluating the overall correctness of the sentence annotation itself. The span and label of the sentence annotation here is never in doubt; it's always going to be the same for any document it's run on, because it's not trained. The overall correctness of the sentence annotation, then, comes down to the value of the sentiment attribute. So for the purposes of this task, there's no need to break out the separate attribute score. However, if there were more than one attribute on the sentence that we were training for, we would want to break out the separate contribution of each attribute so we could evaluate them independently without having to do separate runs for each attribute.