The sample tasks

The sample tasks can be found in MAT_PKG_HOME/sample. There are three task directories found there; in this documentation, we discuss two of them. Both of these directories, like all task directories, have a file named task.xml at its root. The format of this file is described in the task XML documentation.

The first of these directories is ne. This directory has a python subdirectory to implement a Python engine for the Sample Relations task below, but no Javascript customizations (so no js subdirectory). See "Creating a task" for a description of the subdirectory structure of the task.

The task.xml file in the ne directory contains three tasks: "Named Entity", "Enhanced Named Entity" and "Sample Relations". The first task is a simple span task; it contains spanned annotations  without any complex attribute structure. This task is used for Tutorials 1 - 6, and for a variety of other examples throughout this documentation. The second task is a complex task, containing both spanned and spanless annotations and multiple attributes, some of which take other annotations as their values. This second task is used for Tutorial 7, as well as the UI documentation on editing annotations and spanless annotations. The final task is a complex task, containing spanned and spanless annotations, which is intended to illustrate how multiple content annotation sets and multiple engines can be used. This third task is used for Tutorial 8.

The second of these directories is classification. The task.xml file in this directory contains a single task, "Sample Sentiment", which exemplifies using classification to assign positive and negative sentiment to sentences.

In the sample tasks, we'll exemplify both the new method of defining annotations and their attributes, and the legacy method.

The "Named Entity" task

The file typically contains a single task declaration, with <task> as the toplevel element. However, if you wish to declare multiple tasks in the same task.xml file, it can also contain multiple <task> elements, within a <tasks> element. Here, we will define three tasks. Each task must be named, and declare supported languages:

     1	<tasks>
2 <task name='Named Entity'>
3 <languages>
4 <language code='en' name='English' tokenless_autotag_delimiters='.,/?!;:'/>
5 </languages>

Each task usually contains a block of annotation declarations:

     6	    <annotations all_annotations_known='no'
7 inherit='category:zone,category:token'>
8 <span label='PERSON'
9 d_css='background-color: #CCFF66' d_accelerator='P'/>
10 <span label='LOCATION'
11 d_css='background-color: #FF99CC' d_accelerator='L'/>
12 <span label='ORGANIZATION'
13 d_css='background-color: #99CCFF' d_accelerator='O'/>
14 </annotations>

Here, we have inherited the zone and token category tags from the root task, and defined our own content tags, PERSON, LOCATION and ORGANIZATION. We also define the display properties of these tags. For instance, the PERSON tag will display as light green (defined here in hexadecimal), and the tagging menu will support the "P" keyboard accelerator for annotating a selected span with the PERSON tag.

For reference, here's the same annotation declarations and their display attributes defined by the legacy method:

      	    <annotation_set_descriptors all_annotations_known='no'
inherit='category:zone,category:token'>
<annotation_set_descriptor category='content' name='content'>
<annotation label='PERSON'/>
<annotation label='LOCATION'/>
<annotation label='ORGANIZATION'/>
</annotation_set_descriptor>
</annotation_set_descriptors>
<annotation_display>
<label css='background-color: #CCFF66' name='PERSON' accelerator='P'/>
<label css='background-color: #FF99CC' name='LOCATION' accelerator='L'/>
<label css='background-color: #99CCFF' name='ORGANIZATION' accelerator='O'/>
</annotation_display>

If the task supports automated annotation (trainable or otherwise), it will define engines:

    15	    <engines>
16 <engine name='carafe_tag_engine'>
17 <default_model>default_model</default_model>
18 <model_config class='MAT.JavaCarafe.CarafeModelBuilder'>
19 <build_settings training_method='psa' max_iterations='6'/>
20 </model_config>
21 <model_config config_name='alt_model_build'
22 class='MAT.JavaCarafe.CarafeModelBuilder'/>
23 <step_config class='MAT.JavaCarafe.CarafeTagStep'/>
24 </engine>
25 <engine name='align_engine'>
26 <step_config class='MAT.PluginMgr.AlignStep'/>
27 </engine>
28 <engine name='whole_zone_engine'>
29 <step_config class='MAT.PluginMgr.WholeZoneStep'/>
30 </engine>
31 <engine name='carafe_tokenize_engine'>
32 <step_config class='MAT.JavaCarafe.CarafeTokenizationStep'/>
33 </engine>
34 </engines>

Each engine is named, and specifies a class, using the <step_config> element, which implements the automated annotation. If the engine is trainable, it will also have at least one <model_config> element (as we see for "carafe_tag_engine" above), which can support customizations for the model build settings. If it is trainable, it also optionally defines a default model file name, which is suffixed with the relevant language and step when it's referenced.

If a task supports any hand or automated annotation activity, it defines steps, which are the basic building blocks of the annotation activities:

    35	    <steps>
36 <annotation_step engine='align_engine' type='auto' name='align'/>
37 <annotation_step engine='carafe_tag_engine' sets_added='category:content'
38 type='mixed' name='carafe_tag'/>
39 <annotation_step engine='whole_zone_engine' sets_added='category:zone'
40 type='auto' name='whole_zone'/>
41 <annotation_step engine='carafe_tokenize_engine'
42 sets_added='category:token' type='auto'
43 name='carafe_tokenize'/>
44 <annotation_step type='hand' name='correct'
45 sets_modified='category:content'/>
46 </steps>

The most common step is the annotation step, and there are four subtypes:

This task has three auto steps (whole_zone, align, carafe_tokenize), a mixed step (carafe_tag), and a hand step (correct). These steps specify which annotations they modify or add, and optionally connect those annotation sets or categories with the engine which applies them.

These steps can be assembled into workflows:

    47	    <workflows>
48 <workflow name='Tokenless hand annotation'>
49 <step pretty_name='zone' name='whole_zone'/>
50 <step name='carafe_tag' pretty_name='hand tag' type='hand'/>
51 </workflow>
52 <workflow name='Review/repair'>
53 <step name='correct'/>
54 </workflow>
55 <workflow name='Demo' undoable="yes">
56 <step pretty_name='zone' name='whole_zone'/>
57 <step pretty_name='tokenize' name='carafe_tokenize'/>
58 <step pretty_name='tag' name='carafe_tag'/>
59 </workflow>
60 <workflow name='Align'>
61 <step pretty_name='zone' name='whole_zone'/>
62 <step pretty_name='tokenize' name='carafe_tokenize'/>
63 <step name='align'/>
64 </workflow>
65 </workflows>

Here's what these workflows do:

Once the workflows are defined, you can (optionally) specify the properties of workspaces. By default, you don't need to say anything additional about workspaces, since every human-annotatable workflow can serve as the basis of a workspace, but you might want to declare your default configuration (or workflow), and set special properties of workspace operations:

    66	    <workspaces default_config="Demo">
67 <workspace workflow='Demo'/>
68 </workspaces>

Here, we define a default configuration, and set up a block which can (but currently does not) customize the behavior of the "Demo" workflow in the context of the workspaces.

The "Enhanced Named Entity" task

At this point, we end the first task and begin the second one.

    69	  </task>
70 <task name='Enhanced Named Entity'>
71 <languages>
72 <language code='en' name='English' tokenless_autotag_delimiters='.,/?!;:'/>
73 </languages>

And now, we define the annotations and their displays in the second task:

    74	    <annotations all_annotations_known='no'
75 inherit='category:zone,category:token'>
76 <span label='PERSON'
77 d_css='background-color: #CCFF66' d_accelerator='P'
78 d_edit_immediately='yes'>
79 <string name='nomtype' choices="Proper name,Noun,Pronoun"/>
80 </span>
81 <span label='LOCATION'
82 d_css='background-color: #FF99CC' d_accelerator='L'
83 d_edit_immediately='yes'>
84 <string name='nomtype' choices="Proper name,Noun,Pronoun"/>
85 <boolean name='is_political_entity'/>
86 </span>
87 <span label='ORGANIZATION'
88 d_css='background-color: #99CCFF' d_accelerator='O'
89 d_edit_immediately='yes'>
90 <string name='nomtype' choices="Proper name,Noun,Pronoun"/>
91 </span>
92 <spanless label='PERSON_COREF'
93 d_css='background-color: lightgreen' d_accelerator='C'>
94 <filler_set name='mentions' filler_types='PERSON'/>
95 </spanless>
96 <span label='LOCATED_EVENT'
97 d_css='background-color: pink' d_accelerator='E'
98 d_edit_immediately='yes'>
99 <filler name='actor' filler_types='PERSON'/>
100 <filler name='location' filler_types='LOCATION,ORGANIZATION'/>
101 </span>
102 <spanless label='LOCATION_RELATION'
103 d_css='background-color: orange' d_accelerator='R'>
104 <filler name='located' filler_types='ORGANIZATION,PERSON'/>
105 <filler name='location' filler_types='LOCATION'/>
106 </spanless>
107 </annotations>

This annotation definition block is much more complex than the one in the "Named Entity" task. In addition to the three labels we saw previously, we also have three other labels: "LOCATED_EVENT" (spanned) and "PERSON_COREF" and "LOCATION_RELATION" (spanless). We also have several attributes, of different types. Most notable is the "mentions" attribute of the "PERSON_COREF" annotation, which takes sets of annotations as its value. The annotation display information is also somewhat more complex; we see here that all of the annotations are marked to be edited immediately upon creation.

For reference, here's the same annotation declarations and their display attributes defined by the legacy method:

    	    <annotation_set_descriptors all_annotations_known='no'
inherit='category:zone,category:token'>
<annotation_set_descriptor category='content' name='content'>
<annotation label='PERSON'/>
<annotation label='LOCATION'/>
<annotation label='ORGANIZATION'/>
<attribute name='nomtype' of_annotation='PERSON,LOCATION,ORGANIZATION'>
<choice>Proper name</choice>
<choice>Noun</choice>
<choice>Pronoun</choice>
</attribute>
<attribute name='is_political_entity' type='boolean'
of_annotation='LOCATION'/>
<annotation label='LOCATED_EVENT'/>
<attribute name='actor' type='annotation' of_annotation='LOCATED_EVENT'>
<label_restriction label='PERSON'/>
</attribute>
<attribute name='location' type='annotation'
of_annotation='LOCATED_EVENT'>
<label_restriction label='LOCATION'/>
<label_restriction label='ORGANIZATION'/>
</attribute>
<annotation span='no' label='PERSON_COREF'/>
<attribute name='mentions' aggregation='set' type='annotation'
of_annotation='PERSON_COREF'>
<label_restriction label='PERSON'/>
</attribute>
<annotation span='no' label='LOCATION_RELATION'/>
<attribute name='located' type='annotation'
of_annotation='LOCATION_RELATION'>
<label_restriction label='ORGANIZATION'/>
<label_restriction label='PERSON'/>
</attribute>
<attribute name='location' type='annotation'
of_annotation='LOCATION_RELATION'>
<label_restriction label='LOCATION'/>
</attribute>
</annotation_set_descriptor>
</annotation_set_descriptors>
<annotation_display>
<label css='background-color: #CCFF66' name='PERSON' accelerator='P'
edit_immediately='yes'/>
<label css='background-color: #FF99CC' name='LOCATION' accelerator='L'
edit_immediately='yes'/>
<label css='background-color: #99CCFF' name='ORGANIZATION' accelerator='O'
edit_immediately='yes'/>
<label css='background-color: lightgreen' name='PERSON_COREF'
accelerator='C' edit_immediately='yes'/>
<label css='background-color: pink' name='LOCATED_EVENT' accelerator='E'
edit_immediately='yes'/>
<label css='background-color: orange' name='LOCATION_RELATION'
accelerator='R' edit_immediately='yes'/>
</annotation_display>

The remainder of this task is essentially identical to the "Named Entity" task:

   112	    <engines>
113 <engine name='carafe_tag_engine'>
114 <default_model>default_enhanced_model</default_model>
115 <model_config class='MAT.JavaCarafe.CarafeModelBuilder'>
116 <build_settings training_method='psa' max_iterations='6'/>
117 </model_config>
118 <model_config config_name='alt_model_build'
119 class='MAT.JavaCarafe.CarafeModelBuilder'/>
120 <step_config class='MAT.JavaCarafe.CarafeTagStep'/>
121 </engine>
122 <engine name='align_engine'>
123 <step_config class='MAT.PluginMgr.AlignStep'/>
124 </engine>
125 <engine name='whole_zone_engine'>
126 <step_config class='MAT.PluginMgr.WholeZoneStep'/>
127 </engine>
128 <engine name='carafe_tokenize_engine'>
129 <step_config class='MAT.JavaCarafe.CarafeTokenizationStep'/>
130 </engine>
131 </engines>
132 <steps>
133 <annotation_step engine='align_engine' type='auto' name='align'/>
134 <annotation_step engine='carafe_tag_engine' sets_added='category:content'
135 type='mixed' name='carafe_tag'/>
136 <annotation_step engine='whole_zone_engine' sets_added='category:zone'
137 type='auto' name='whole_zone'/>
138 <annotation_step engine='carafe_tokenize_engine'
139 sets_added='category:token' type='auto'
140 name='carafe_tokenize'/>
141 <annotation_step type='hand' name='correct'
142 sets_modified='category:content'/>
143 </steps>
144 <workflows>
145 <workflow name='Tokenless hand annotation'>
146 <step pretty_name='zone' name='whole_zone'/>
147 <step name='carafe_tag' pretty_name='hand tag' type='hand'/>
148 </workflow>
149 <workflow name='Review/repair'>
150 <step name='correct'/>
151 </workflow>
152 <workflow name='Demo' undoable="yes">
153 <step pretty_name='zone' name='whole_zone'/>
154 <step pretty_name='tokenize' name='carafe_tokenize'/>
155 <step pretty_name='tag' name='carafe_tag'/>
156 </workflow>
157 <workflow name='Align'>
158 <step pretty_name='zone' name='whole_zone'/>
159 <step pretty_name='tokenize' name='carafe_tokenize'/>
160 <step name='align'/>
161 </workflow>
162 </workflows>
163 <workspaces>
164 <workspace workflow='Demo'/>
165 </workspaces>

Notably, because the jCarafe tagger only operates on the simple span subset of this (or any) task, the "Demo" workflow will only apply the spanned labels, not the attributes associated with them, and won't apply the spanless labels at all.

The "Sample Relations" task

At this point, we end the second task and begin the third:

   166	  </task>
167 <task name='Sample Relations'>
168 <languages>
169 <language code='en' name='English' tokenless_autotag_delimiters='.,/?!;:'/>
170 </languages>

This third task is intended to illustrate the impact of multiple content annotation sets: the ability to reuse tagging engines, to segregate annotation activities by annotation set, and to support multiple mixed-initiative steps in the same workflow. As part of this illustration, we've implemented an extremely simplistic two-argument trainable relation tagger, which essentially does classification of the bags of words in between successive pairs of candidate relations. We're not advertising this as a relation tagging capability for anything besides demonstrating how trainable relation tagging might be integrated. Here are the annotation sets and their displays:

   171	    <annotations all_annotations_known='no'
172 inherit='category:zone,category:token'>
173 <span label='PERSON' of_set='entities'
174 d_css='background-color: LawnGreen' d_accelerator='P'
175 d_edit_immediately='yes'/>
176 <span label='LOCATION' of_set='entities'
177 d_css='background-color: HotPink' d_accelerator='L'
178 d_edit_immediately='yes'/>
179 <span label='ORGANIZATION' of_set='entities'
180 d_css='background-color: DeepSkyBlue' d_accelerator='O'
181 d_edit_immediately='yes'/>
182 <span label='NATIONALITY' of_set='nationality'
183 d_css='background-color: PaleVioletRed' d_accelerator='N'
184 d_edit_immediately='yes'/>
185 <spanless label="Employment" of_set='relations'
186 d_css="background-color: Gray">
187 <filler name="Employee" filler_types="PERSON"/>
188 <filler name="Employer" filler_types="ORGANIZATION,LOCATION,NATIONALITY"/>
189 </spanless>
190 <spanless label="Located" of_set='relations'
191 d_css="background-color: Thistle">
192 <filler name="Located-Entity" filler_types="PERSON,ORGANIZATION"/>
193 <filler name="Location" filler_types="LOCATION,NATIONALITY"/>
194 </spanless>
195 </annotations>

Notice that there are three annotation sets rather than one, and while they're each in category "content", they each have a different set name.

For reference, here's the same annotation declarations and their display attributes defined by the legacy method:

   	    <annotation_set_descriptors all_annotations_known='no'
inherit='category:zone,category:token'>
<annotation_set_descriptor category='content' name='entities'>
<annotation label='PERSON'/>
<annotation label='LOCATION'/>
<annotation label='ORGANIZATION'/>
</annotation_set_descriptor>
<annotation_set_descriptor category='content' name='nationality'>
<annotation label='NATIONALITY'/>
</annotation_set_descriptor>
<annotation_set_descriptor category='content' name='relations'>
<annotation label="Employment" span="no"/>
<attribute name="Employee" of_annotation="Employment" type="annotation">
<label_restriction label="PERSON"/>
</attribute>
<attribute name="Employer" of_annotation="Employment" type="annotation">
<label_restriction label="ORGANIZATION"/>
<label_restriction label='LOCATION'/>
<label_restriction label="NATIONALITY"/>
</attribute>
<annotation label="Located" span="no"/>
<attribute name="Located-Entity" of_annotation="Located" type="annotation">
<label_restriction label="PERSON"/>
<label_restriction label="ORGANIZATION"/>
</attribute>
<attribute name="Location" of_annotation="Located" type="annotation">
<label_restriction label="LOCATION"/>
<label_restriction label="NATIONALITY"/>
</attribute>
</annotation_set_descriptor>
</annotation_set_descriptors>
<annotation_display>
<label css='background-color: LawnGreen' name='PERSON' accelerator='P'
edit_immediately='yes'/>
<label css='background-color: HotPink' name='LOCATION' accelerator='L'
edit_immediately='yes'/>
<label css='background-color: DeepSkyBlue' name='ORGANIZATION' accelerator='O'
edit_immediately='yes'/>
<label css='background-color: PaleVioletRed' name='NATIONALITY' accelerator='N'
edit_immediately='yes'/>
<label name="Employment" css="background-color: Gray" edit_immediately="yes"/>
<label name="Located" css="background-color: Thistle" edit_immediately="yes"/>
</annotation_display>

The next element, which doesn't appear in the other two tasks, supports a range of Web UI customizations. In this case, we specify that we want the annotations to appear in the annotation menu and legend in the order they're defined, not in alphabetical order:

   196	    <web_customization alphabetize_labels="no"/>

Next, we define the engines.

   197	    <engines>
198 <engine name='carafe_tag_engine'>
199 <default_model>default_model</default_model>
200 <model_config class='MAT.JavaCarafe.CarafeModelBuilder'>
201 <build_settings training_method='psa' max_iterations='6'/>
202 </model_config>
203 <model_config config_name='alt_model_build'
204 class='MAT.JavaCarafe.CarafeModelBuilder'/>
205 <step_config class='MAT.JavaCarafe.CarafeTagStep'/>
206 </engine>
207 <engine name='trivial_relation_tag_engine'>
208 <default_model>default_relation_model</default_model>
209 <model_config class='TrivialRelationTagger.CarafeMaxentRelationModelBuilder'/>
210 <step_config class='TrivialRelationTagger.CarafeRelationTagStep'/>
211 </engine>
212 <engine name='align_engine'>
213 <step_config class='MAT.PluginMgr.AlignStep'/>
214 </engine>
215 <engine name='whole_zone_engine'>
216 <step_config class='MAT.PluginMgr.WholeZoneStep'/>
217 </engine>
218 <engine name='carafe_tokenize_engine'>
219 <step_config class='MAT.JavaCarafe.CarafeTokenizationStep'/>
220 </engine>
221 </engines>

In addition to the engines declared in the other two tasks, this task includes the "trivial_relation_tag_engine", which implements the simplistic relation tagging we described a moment ago.

The important differences arise in the definition of the steps:

   222	    <steps>
223 <annotation_step engine='align_engine' type='auto' name='align'/>
224 <annotation_step engine='carafe_tag_engine' sets_added='entities'
225 type='mixed' name='entity_tag'/>
226 <annotation_step engine='carafe_tag_engine' sets_added='nationality'
227 type='mixed' name='nationality_tag'/>
228 <annotation_step engine='carafe_tag_engine' sets_added='entities,nationality'
229 type='mixed' name='all_entity_tag'/>
230 <annotation_step engine='trivial_relation_tag_engine' sets_added='relations'
231 type='mixed' name='relation_tag'/>
232 <annotation_step engine='whole_zone_engine' sets_added='category:zone'
233 type='auto' name='whole_zone'/>
234 <annotation_step engine='carafe_tokenize_engine'
235 sets_added='category:token' type='auto'
236 name='carafe_tokenize'/>
237 <annotation_step type='hand' name='correct'
238 sets_modified='category:content'/>
239 </steps>

Here, we see that the "carafe_tag_engine" is used in three different steps: "entity_tag", "nationality_tag", and "all_entity_tag". This last step adds two annotation sets, rather than one. When a trainable engine defines a default model, the model path is suffixed with both the language and the step name when it's referenced and used, so these three cases will be kept separate. We also see that the "trivial_relation_tag_engine" is used in the "relation_tag" step. All four of these steps are mixed steps; so you'll be able to annotate the sets they add by hand de novo, or pretag and correct if a model is available. Finally, we have the workflows and workspaces (and then we end the task and the file):

   240	    <workflows>
241 <workflow name='Mixed Initiative Annotation' undoable="yes">
242 <step pretty_name='zone' name='whole_zone'/>
243 <step pretty_name='tokenize' name='carafe_tokenize'/>
244 <step name='entity_tag' pretty_name='tag entities'/>
245 <step name='nationality_tag' pretty_name='tag nationalities'/>
246 <step name='relation_tag' pretty_name='tag relations'/>
247 </workflow>
248 <workflow name='Review/repair all steps'>
249 <step name='correct'/>
250 </workflow>
251 <workflow name='Demo'>
252 <step pretty_name='zone' name='whole_zone'/>
253 <step pretty_name='tokenize' name='carafe_tokenize'/>
254 <step pretty_name='tag all entities' name='all_entity_tag'/>
255 <step pretty_name='tag relations' name='relation_tag'/>
256 </workflow>
257 <workflow name='Align'>
258 <step pretty_name='zone' name='whole_zone'/>
259 <step pretty_name='tokenize' name='carafe_tokenize'/>
260 <step name='align'/>
261 </workflow>
262 </workflows>
263 <workspaces default_config="Mixed Initiative Annotation">
264 <workspace workflow='Mixed Initiative Annotation'/>
265 </workspaces>
266 </task>
267 </tasks>

The "Mixed Initiative Annotation" workflow illustrates how multi-step mixed initiative works: it contains several mixed steps, each responsible for a different content annotation set. The annotator will complete the entity annotations, and then the nationality annotations, and then the relation annotations. In each case, the annotator will have the option of pretagging; most significantly, the annotator will have the opportunity to correct the spans before pretagging the relations.

The "Sample Sentiment" task

The task.xml file which contains this task contains only this task, so its toplevel element is <task>. illustrates sentence classification.

We name the task and declare the language information:

     1	<task name='Sample Sentiment'>
2 <languages>
3 <language code='en' name='English' tokenless_autotag_delimiters='.,/?!;:'/>
4 </languages>

Next, we declare the annotations. We don't bother with zone annotations for this task, since the whole document is annotatable and unzoned documents are treated as such. So we only inherit category:token.

     5	  <annotations all_annotations_known='no' inherit='category:token'>
6 <annotation_set name="structure" managed="no"/>
7 <span label="sentence" of_set="structure" d_rendering_style="background_span"
8 d_css="background-color: lightgray">
9 <string name="sentiment" of_set="sentiment">
10 <choice value="positive" d_accelerator="P" d_css="background-color: green"/>
11 <choice value="negative" d_accelerator="N" d_css="background-color: red; color: white"/>
12 </string>
13 </span>
14 </annotations>

There are a number of important features in this annotation declaration block.

First, the <annotation_set> element declares a new annotation set named structure, which is marked as managed="no". Most annotation sets are managed, the significance of which is discussed here. However, for the structure annotations, there isn't going to be anything to manage, because they won't be hand annotated or corrected. In most circumstances, we wouldn't care that there's no advantage to managing this set; but in this case, the same engine which generates the tokens (the jCarafe tokenizer) also generates the sentences, and the token annotations are unmanaged, and no step (and thus, no engine) can simultaneously add managed and unmanaged sets.

Next, we define a sentence span annotation, and assign it to the structure set. We've declared the rendering style for this annotation to be background_span, which forces the annotation to be styled behind the text, rather than stacked on top of it. This would be important, from the point of view of presentation, if this task were also adding other span annotations (it isn't, but it's useful to illustrate the functionality here).

Next, we define an attribute of the sentence span, named sentiment. We put this attribute in a separate annotation set, also named sentiment. The reason we separate it is that we're going to use the maximum-entropy trainer/tagger to learn the value of this attribute, and the annotation sets which are added or learned by this engine must consist exclusively of attributes.

Next, we define two choices for this attribute: positive and negative. Note that we do not define a default value for the attribute, so the sentence classifier will add a "null" label during classification, and leave all "null"-labeled elements unmarked. In other words, the distinction modeled here is actually a three-way distinction.

Finally, we assign styling to these two choices, so we can distinguish visually between positive, negative, and unmarked sentiment sentences.

This last feature (choice styling) is not available in the legacy annotation declaration format, and so we do not present the legacy version of these declarations.

Next, we declare the engines:

    15	  <engines>
16 <engine name='carafe_tokenize_engine'>
17 <step_config class='MAT.JavaCarafe.CarafeTokenizationStep'/>
18 </engine>
19 <engine name='classifier_engine'>
20 <model_config class='MAT.JavaCarafe.JCarafeMaxentClassifierModelBuilder'>
21 <build_settings feature_extractors="_bagOfWords,_bigrams"/>
22 </model_config>
23 <step_config class='MAT.JavaCarafe.JCarafeMaxentClassifierTagStep'/>
24 </engine>
25 </engines>

These declarations consist of the standard jCarafe tokenizer/sentence segmenter, and the jCarafe maximum-entropy classifier, which uses bags of words and bigrams for each sentence as its classification features.

Next, we declare the steps:

    26	  <steps>
27 <annotation_step engine="carafe_tokenize_engine" sets_added="category:token,structure"
28 type="auto" name="carafe_structure"/>
29 <annotation_step engine='classifier_engine'
30 sets_added='sentiment' type='mixed'
31 name='attribute_tag'/>
32 </steps>

We declare only two steps. The first uses the tokenizer engine and adds the token annotations, and the annotations from the structure set (i.e., the sentences). The second engine trains for and applies the attributes in the sentiment set. We enable mixed-initiative annotation (i.e., pretagging plus correction) for this second step. The single workflow in this task can be undone, and uses these two steps:

    33	  <workflows>
34 <workflow name="Demo" undoable="yes">
35 <step name="carafe_structure">
36 <run_settings sentence_label="sentence"/>
37 </step>
38 <step name='attribute_tag'/>
39 </workflow>
40 </workflows>

Note that in order to retrieve the sentences, we have to pass a runtime setting to the step which specifies the label name of the desired sentence annotations.

Finally, we declare a score profile, and end:

    41	  <score_profile>
42 <attrs_alone true_labels="sentence"/>
43 </score_profile>
44 </task>

The score profile is not so important in this task, but it's crucial in more complex tasks. The <attrs_alone> element tells the scorer to break out separate scores for the performance of the individual attributes, in addition to evaluating the overall correctness of the sentence annotation itself. The span and label of the sentence annotation here is never in doubt; it's always going to be the same for any document it's run on, because it's not trained. The overall correctness of the sentence annotation, then, comes down to the value of the sentiment attribute. So for the purposes of this task, there's no need to break out the separate attribute score. However, if there were more than one attribute on the sentence that we were training for, we would want to break out the separate contribution of each attribute so we could evaluate them independently without having to do separate runs for each attribute.