Task XML Reference

The XML format for the task files (see "Creating a New Task") is described in this document. Use cases are described here. Click here for a split-screen view.

The reference for declaring annotations and their display properties is handled separately, and can be found here.

Element hierarchy

<tasks>
<task>
<languages>
<language>
<doc_enhancement_class>
<web_customization>
<short_name>, <long_name>, <js>, <css>
<engines>
<java_subprocess_parameters>
<engine>
<step_config>
<create_settings>
<setting>
<name>, <value>
<default_model>
<model_config>
<build_settings>
<setting>
<name>, <value>
<steps>
<signal_step>
<annotation_step>
<transform_step>
<workspaces>
<workspace>
<operation>
<settings>
<setting>
<name>, <value>
<settings>
<setting>
<name>
<value>
<workflows>
<workflow>
<ui_settings>
<setting>
<name>, <value>
<step>
<create_settings>
<setting>
<name>, <value>
<run_settings>
<setting>
<name>, <value>
<ui_settings>
<setting>
<name>, <value>
<annotation_set_descriptors>
<annotation_set_descriptor>
<annotation_display>
<annotations>
<similarity_profile>
<stratum>
<tag_profile>
<dimension>
<attr_equivalences>
<score_profile>
<label_limitation>
<attrs_alone>
<aggregation>
<attr_decomposition>
<partition_decomposition>

<tasks>

A container for <task> elements. Either this element or <task> can be the toplevel element in the file.

Children

Element
Obligatory?
Repeatable?
Description
<task> yes yes A task definition.

<task>

The usual toplevel element in the file. For historical reasons, some of the tags are obligatory and some not. Conceptually speaking, you always need to specify either <annotation_set_descriptors> and <annotation_display> or <annotations>; and <engines>, <workflows> and <steps> are required for using the engine and experiment infrastructure. <workspaces> should be defined if you want to define a default workflow to use as your workspace, or customize a workflow for use in a workspace. The other elements are for advanced customizations.

If you want to define multiple tasks in the same task.xml file (if, for instance, you're defining a task and a set of child tasks), you can use <tasks> as your toplevel element. This element has no attributes, and only one repeatable child: <task>.

Attributes

Attribute
Value
Obligatory?
Description
name a string
yes The name of the task. This name will appear in menus in the UI, and in help strings in the engine, so make it something mnemonic, distinctive and descriptive.
visible "no"
no If present, the task is not "visible" in the various lists of tasks the user will see. Typically, this is used if this task is not a leaf in the tree of tasks. You will seldom need this capability.
parent a string
no The name of the parent task in the hierarchy. If this is not specified, the system root task will be used. You will seldom need this capability. If you do, typically the parent will specify visible="no".
class a string, the name of a Python class
no If you've found a need to specialize the default task implementation, the value of this attribute should be "<file>.<classname>", where <file> corresponds to a file <taskdir>/python/<file>.py.

Children

Element
Obligatory?
Repeatable?
Description
<languages> yes no The languages this task can apply to.
<doc_enhancement_class> no no If specified, this element should delimit a string "<file>.<classname>", where <file> corresponds to a file <taskdir>/python/<file>.py. The specified class is a class which contributes to specializations of this documentation for the task in question. This functionality is currently undocumented.

This element has no attributes or element children; its value is the text it delimits.
<web_customization> no no Customizations of the Web UI.
<engines> no no The automated tagging engines used in this task.
<steps> no no The steps used in the workflows in this task.
<workspaces> no no The default workspace configuration or workflow to use for workspaces, and any custom configurations built on top of workflows.
<settings> no no The task-specific settings which may be viewed by specializations of the root task.
<workflows> no no The workflows that are used in the MAT engine.
<annotation_set_descriptors> no no The labels and attributes which are used in this task.
<annotation_display> no no The display-related properties of the labels and attributes in this task.
<similarity_profile> no yes The methods for comparing annotations for scoring and visual comparison.
<score_profile> no yes The methods for decomposing and aggregating annotation labels for scoring.

<languages> (of <task>)

As of MAT 3.0, each task must declare the languages it applies to. The task can apply to multiple languages. This element contains those language definitions.

Attributes

Attribute
Value
Obligatory?
Description
inherit_all "yes"
no If present, all languages defined in the parent task (if any) will be inherited.
inherit a comma separated list of language names
no If present, a list of languages (by their name attribute) which will be inherited from the parent task.

Children

Element
Obligatory?
Repeatable?
Description
<language> no yes An individual language declaration.

<language> (of <languages>)

This element declares a single language. You can assign a shorthand code (e.g., the ISO language code) for use in most places where languages must be specified, and

Attributes

Attribute
Value
Obligatory?
Description
code a string
no A shorthand code (e.g., the ISO language code) which you can refer to this language by.
name a string
no The full name of the language
import_from N/A
no We intend, someday, to allow languages to be defined in their own files, and to be imported by name or code. This feature is not yet implemented.
text_right_to_left
"yes"
no
If specified, documents viewed in this language in the MAT UI will be treated as right-to-left text (e.g., Arabic).
tokenless_autotag_delimiters
a string
no
By default, if you ask the MAT UI to autotag similar strings when you're annotating without tokens, the only edge conditions that the UI recognizes are whitespace, zone boundaries, and document start and end. If your match abuts a punctuation mark, it will not recognize it as a delimiter. If you want other edge conditions to be recognized, you can list them in the value of this attribute. (Remember, though, that you may have to use the XML entity character codes for those characters which are significant to XML syntax, so that the XML parsing doesn't fail.) You can provide UTF-8 for this value; in most other cases in the task.xml file, attribute values are interpreted as ASCII, but not here.
tokenless_autotag_respects_delimiters
"no"
no
By default, if you ask the MAT UI to autotag similar strings when you're annotating without tokens, it will require edges (whitespace, zone boundaries, document start and end, or the elements in the value of the tokenless_autotag_delimiters attribute) to delimit any matched strings. If this language doesn't use whitespace as a delimiter (e.g., Chinese), you might want the tokenless autotagging to ignore delimiters entirely, and autotag the string wherever it's found. You can achieve this by setting this attribute to "no".

<web_customization> (of <task>)

Among the ways that tasks can be customized is the Web UI can be customized in a number of ways. This process is quite complicated; it's almost entirely code-oriented, and it's not documented at all. This section is here for reference only; users who aren't really, really brave shouldn't go anywhere near most of these customizations.

Attributes

Attribute
Value
Obligatory?
Description
inherit_css "no"
no If the parent task has CSS customizations, as specified in the <css> element below, they are inherited by default. Use this setting to block inheritance.
inherit_js "no"
no If the parent task has Javascript customizations, as specified in the <css> element below, they are inherited by default. Use this setting to block inheritance.
display_config a string
no Each Web customization set has a name, so that when the user selects a particular task, the UI knows which customization set to use. Can be inherited from parent tasks; a value of "" cancels the inheritance.
alphabetize_labels
"no"
no
By default, the MAT UI orders the annotation labels alphabetically in the legend and the tag popup menu. If this attribute is set, the UI will list the annotation labels in the order they are defined in the <annotation_display> element. Can be inherited from parent tasks; a value of "" cancels the inheritance.

Children

Element
Obligatory?
Repeatable?
Description
<js> no yes The relative pathname of the Javascript customizations. This path is relative to the task directory. By convention, this file should be in the "js" subdirectory.

This element has no attributes or element children; its value is the text it delimits.
<css> no yes The relative pathname of the CSS customizations. This path is relative to the task directory. By convention, this file should be in the "css" subdirectory.

This element has no attributes or element children; its value is the text it delimits.
<short_name>
no
no
This is the name that the UI will display in the upper left corner if this customization is the only customization available. This setting will be inherited by child tasks.

This element has no attributes or element children; its value is the text it delimits.
<long_name>
no
no
This is the name that the UI will use as the title of the Web page if this customization is the only customization available. This setting will be inherited by child tasks.

This element has no attributes or element children; its value is the text it delimits.

<engines> (of <task>)

Tasks can define automated processing engines, which may be trainable (see the <model_config> element below).

Attributes

Attribute
Value
Obligatory?
Description
enhance a comma-separated list of engine names
no If present, the named engines will be inherited from the parent task, and any local settings for an engine of the same name will be treated as overrides to the parent values.
inherit_all "yes"
no If present, all engines will be inherited from the parent task.
inherit a comma-separated list of engine names
no If present, the named engines will be inherited from the parent task.

Children

Element
Obligatory?
Repeatable?
Description
<java_subprocess_parameters> no no If present, defaults for various JVM parameters for all Java subprocesses (e.g., jCarafe training and tagging).
<engine> no yes An engine.

<java_subprocess_parameters> (of <engines>)

MAT has some built-in tools to control jCarafe and other Java subprocesses. Using this element, you can declare default settings for Java heap and stack sizes. If not set locally, these settings are inherited from parent tasks.

Attributes

Attribute
Value
Obligatory?
Description
heap_size a string no The value here is a value for the heap size for the Java VM. It is passed to the Java VM using the -Xmx argument. Values like 512M or 2G are examples of expected values. This default value can be overridden by declaring the empty string ("") in any configuration context where the heap size can be specified (see the jCarafe engine for examples).
stack_size a string no The value here is a value for the stack size for the Java VM. It is passed to the Java VM using the -Xss argument. Values like 4096k or 512k are examples of expected values. This default value can be overridden by declaring the empty string ("") in any configuration context where the heap size can be specified (see the jCarafe engine for examples).

<engine> (of <engines>)

Each engine references a Python class which implements the engine, and optionally configuration information for training the engine.

Attributes

Attribute
Value
Obligatory?
Description
languages a comma-separated list of language names or codes
no You may choose to restrict the engine to applying to particular languages. These languages must be a subset of the languages declared in the task.
name a string
no The name of the engine
import_from N/A
no We intend, someday, to allow engines to be defined in their own files, and to be imported by name. This feature is not yet implemented.

Children

Element
Obligatory?
Repeatable?
Description
<step_config> yes no The implementation of the engine
<default_model> no no If present, this element should delimit a pathname where models will be saved if MATModelBuilder is invoked with --save_as_default_model. If the pathname is relative, it will be interpreted as relative to the task directory. The path will be suffixed with the relevant step name and language when it is referenced, so that the engine can be reused in multiple languages and steps.

This element has no attributes or children; its value is the text it delimits.
<model_config> no yes Settings for the model building engine, if the engine is trainable

<step_config> (of <engine>)

This element defines the class which implements the engine, and any settings which should be passed to the class when it is created.

Attributes

Attribute
Value
Obligatory?
Description
class a string, the name of a Python class yes The Python class, including its module name, which implements this engine.

Children

Element
Obligatory?
Repeatable?
Description
<create_settings> no no
Default settings for initializing the engine implementation.

<create_settings> (of <step_config>)

These are settings that an engine might pass to the initialization phase of its step config class. These settings can be overridden by the values in the <create_settings> element for <step> in the <workflow> element. You can configure these settings either with a child <setting> element, or with an attribute on the <settings> element itself; they're interchangeable.

Attributes

Attribute
Value
Obligatory?
Description
<attr> a string
no The <create_settings> tag supports arbitrary attribute-value pairs.

Children

Element
Obligatory?
Repeatable?
Description
<setting> no yes An attribute-value pair.

<setting> (of <create_settings>)

An individual step creation setting.

Children

Element
Obligatory?
Repeatable?
Description
<name> yes no The name of the setting. This element has no attributes or element children; its value is the text it delimits.
<value> yes no The value of the setting. This element has no attributes or element children; its value is the text it delimits.

<model_config> (of <engine>)

Some of your engines will be trainable. If so, you must provide at least one <model_config> element for the engine. The settings for the config are identical to the command-line options available for the MATModelBuilder.

MAT is delivered with a default jCarafe model builder.

You can have multiple <model_config> entries for an engine, as long as they differ by the config_name attribute.You can inherit an engine from a parent task, and override various elements of the parent engine definition, including the model config.

Attributes

Attribute
Value
Obligatory?
Description
class
the name of a Python class
yes
This attribute names the class which will be used as the model builder. The default jCarafe model builder class is MAT.JavaCarafe.CarafeModelBuilder
config_name
a string
no
If present, a config name to specify as the --config_name in MATModelBuilder, or for the config_name attribute in <build_settings> in the experiment engine. If omitted, this entry is the default model config. There can be only one default.

Children

Element
Obligatory?
Repeatable?
Description
<build_settings> no no
The settings for this model config

<build_settings> (of <model_config>)

The <build_settings> tag supports arbitrary attribute-value pairs which are passed to the model builder. See the documentation for the jCarafe model builder to see which attributes should be supplied to that engine. You can configure these settings either with a child <setting> element, or with an attribute on the <settings> element itself; they're interchangeable.

Attributes

Attribute
Value
Obligatory?
Description
<attr> a string
no The <build_settings> tag supports arbitrary attribute-value pairs.

Children

Element
Obligatory?
Repeatable?
Description
<setting> no yes An attribute-value pair.

<setting> (of <build_settings>)

An individual build setting.

Children

Element
Obligatory?
Repeatable?
Description
<name> yes no The name of the setting. This element has no attributes or element children; its value is the text it delimits.
<value> yes no The value of the setting. This element has no attributes or element children; its value is the text it delimits.

<steps> (of <task>)

Steps represent the way engines are used within workflows, and how humans interact with workflows. Signal steps modify the signal (and must be workflow-initial); transform steps transform the signal (and must be workflow-final); and annotation steps add or modify annotation sets.

Attributes

Attribute
Value
Obligatory?
Description
inherit_all "yes"
no If present, all steps will be inherited from the parent task.
inherit a comma-separated list of step names
no If present, the named steps will be inherited from the parent task.

Children

Element
Obligatory?
Repeatable?
Description
<signal_step> no yes A signal step
<annotation_step> no yes A annotation step
<transform_step> no yes A transform step

<signal_step> (of <steps>)

Signal steps modify the signal, and must be workflow-initial.

Attributes

Attribute
Value
Obligatory?
Description
name a string
yes The name of the step.
engine an engine name
no The engine this step uses, if applicable
type one of "hand", "auto"
no The type of the step. Currently, "auto" is the only type that's implemented ("hand" is intended for steps where the human annotator can edit the signal before annotating).

<annotation_step> (of <steps>)

Annotation steps add or modify annotations. This is probably the only step you'll define.

There are four types of annotation steps:

Each step is associated with annotation sets or categories which it adds or modifies.These declarations are used to determine which annotations to make available to the user in the UI in each step, and how much progress has been made in each step. Annotation steps support the ability to declare sets forbidden, desired and required as well; these declarations were intended to be used in computation of well-formedness conditions in workflows which we have not had a chance to implement yet.

Attributes

Attribute
Value
Obligatory?
Description
name a string
yes The name of the step.
engine an engine name
no The engine this step uses, if applicable.
sets_desired a comma-separated list of annotation set names, or categories prefixed with the string "category:"
no Not currently used.
sets_modified a comma-separated list of annotation set names, or categories prefixed with the string "category:" no The annotation sets or categories modified by this step
sets_required a comma-separated list of annotation set names, or categories prefixed with the string "category:" no Not currently used.
sets_forbidden a comma-separated list of annotation set names, or categories prefixed with the string "category:" no Not currently used.
sets_added a comma-separated list of annotation set names, or categories prefixed with the string "category:" no The annotation sets or categories added by this step.
type one of "hand", "auto", "auto-with-correction", "mixed"
no One of the four annotation step types describe immediately above.

<transform_step> (of <steps>)

Transform steps modify the signal based on the previous annotation, and must be workflow-final. Transform steps are all auto steps.

Transform steps support the ability to declare sets forbidden, desired and required; these declarations were intended to be used in computation of well-formedness conditions in workflows which we have not had a chance to implement yet.

Attributes

Attribute
Value
Obligatory?
Description
engine an engine name
yes The engine this step uses
name a string
yes The name of this step
sets_desired a comma-separated list of annotation set names, or categories prefixed with the string "category:" no Not currently used.
sets_required a comma-separated list of annotation set names, or categories prefixed with the string "category:" no Not currently used.
sets_forbidden a comma-separated list of annotation set names, or categories prefixed with the string "category:" no Not currently used.

<workspaces> (of <task>)

In MAT 3.0, any workflow which contains at least one hand-annotatable step can be used to create a workspace. You might want to define this element if you want to define a special configuration which customizes a workflow for use in workspaces, or if you want to declare a default workflow or configuration.

Attributes

Attribute
Value
Obligatory?
Description
enhance a comma-separated list of configuration names
no If present, the named workspace configurations will be inherited from the parent task, and any local settings for a configuration of the same name will be treated as overrides to the parent values.
default_config a string, either the name of a workflow or of a configuration defined by the a <workspace> child element
no The default configuration to use when you define a workspace for this task. If there are multiple hand-annotatable workflows, and you don't declare a default, you'll have to provide the --workspace_config option to the "create" operation of MATWorkspaceEngine. If there's no default_config, but there's a parent task which has a default config which has been inherited, the parent's default will be used.
inherit_all "yes"
no If present, all workspace configurations will be inherited from the parent task.
inherit a comma-separated list of configuration names
no If present, the named workspace configurations will be inherited from the parent task.

Children

Element
Obligatory?
Repeatable?
Description
<workspace> no yes Implementations of the operations in the workspaces.

<workspace> (of <workspaces>)

This element is a container for workspace configurations, which provide options to workspace operation arguments. The workspace operations which can be configured are listed here.

Attributes

Attribute
Value
Obligatory?
Description
workflow the name of a workflow
yes Each workspace configuration is built on top of a workflow, which must be declared in the task. If config_name is not present, the name of the workflow will be the name of the workspace configuration.
config_name a string
no If present, the name of the workspace configuration. You may choose to set this if you have a workflow named "Demo" and you want to associate two different sets of workspace operation arguments with it.

Children

Element
Obligatory?
Repeatable?
Description
<operation> yes yes An individual operation configuration

<operation> (of <workspace>)

Specifies the configuration of a workspace operation. Note that in spite of the fact that operations are associated with folders, these operations are referenced only by name. Currently, no operations which are ambiguous among folders are configurable.

Attributes

Attribute
Value
Obligatory?
Description
name a string
yes The name of the operation

Children

Element
Obligatory?
Repeatable?
Description
<settings> no no The operation settings.

<settings> (of <operation>)

The settings for the operation. What these settings are depend on what sort of operation it is. For instance, for operations which invoke the MAT engine, these settings will be the arguments to the MAT engine. For operations which invoke the MAT model builder, these settings will be the arguments to the MAT model builder. See the documentation on workspaces to find out which operations are configurable, and how they can be configured.

Attributes

Attribute
Value
Obligatory?
Description
<attr> a string
no The <settings> tag supports arbitrary attribute-value pairs.

Children

Element
Obligatory?
Repeatable?
Description
<setting> no yes An attribute-value pair.

<setting> (of <settings>)

An individual operation setting.

Children

Element
Obligatory?
Repeatable?
Description
<name> yes no The name of the setting. This element has no attributes or element children; its value is the text it delimits.
<value> yes no The value of the setting. This element has no attributes or element children; its value is the text it delimits.

<settings> (of <task>)

These are settings that a specialized task might require which the user wishes to be able to configure in XML, rather than by modifying the source code for the specialized task. The chances that a normal user will use this are extremely slim. These settings are not inherited by task children.

You can configure the settings either with a child <setting> element, or with an attribute on the <settings> element itself; they're interchangeable.

Attributes

Attribute
Value
Obligatory?
Description
<attr> a string
no The <settings> tag supports arbitrary attribute-value pairs.

Children

Element
Obligatory?
Repeatable?
Description
<setting> no yes An attribute-value pair.

<setting> (of <settings>)

An individual task-level setting.

Children

Element
Obligatory?
Repeatable?
Description
<name> yes no The name of the task-level setting. This element has no attributes or element children; its value is the text it delimits.
<value> yes no The value of the task-level setting. This element has no attributes or element children; its value is the text it delimits.

<workflows> (of <task>)

Workflows are ordered sets of steps, corresponding to a larger-scale activity the user may wish to apply to the documents.

Attributes

Attribute
Value
Obligatory?
Description
inherit a string, a comma-delimited sequence of workflow names
no If the task has a non-root parent task, you may use this attribute to inherit workflows from the parent task. The implementations of the step names will also be inherited. You can list multiple workflows, delimited by commas, e.g., "Demo,Hand annotation".
inherit_all "yes"
no If the task has a non-root parent, you may use this attribute to specify that all workflows should be inherited from the parent.

Children

Element
Obligatory?
Repeatable?
Description
<workflow> no yes An individual workflow

<workflow> (of <workflows>)

Each non-inherited workflow is specified by a <workflow> element.

Attributes

Attribute
Value
Obligatory?
Description
name a string
yes The name of the workflow that the user can specify in the Web UI or the MAT engine.
undoable "yes"
no In MAT 2.0, the task required a global "undo" order on steps, in order to ensure that when steps were undone in a workflow, all appropriate steps in the task were undone (e.g., if you undid tag and zone in a workflow which doesn't contain tokenization, and some workflow in the task contained tokenization between zone and tag, tokenization was undone as well). This global undo order was impossible to maintain in 3.0, and as a result, it has been abandoned. If you undo steps in a workflow, only those steps will be undone, and as a result, your documents can end up in unusual states (e.g., tokenized but not zoned). In order to compensate for this issue, in 3.0, workflows, by default, are not undoable; the "retreat" buttons in the UI will not be present, for instance.

You can use this attribute to specify the workflow as undoable, but we encourage you to use it sparingly; you should only enable this feature for workflows which support all your tagging steps (content and otherwise).

Children

Element
Obligatory?
Repeatable?
Description
<ui_settings> no
no
These are settings that are intended to be passed unmodified to the UI. This is not currently used.
<step> no yes An individual step of a workflow.

<ui_settings> (of <workflow>)

These are settings that are intended to be passed unmodified to the UI, in order to declaratively configure UI customizations for particular workflows. At the moment, no tasks use this feature. You can configure these settings either with a child <setting> element, or with an attribute on the <settings> element itself; they're interchangeable.

Attributes

Attribute
Value
Obligatory?
Description
<attr> a string
no The <ui_settings> tag supports arbitrary attribute-value pairs.

Children

Element
Obligatory?
Repeatable?
Description
<setting> no yes An attribute-value pair.

<setting> (of <ui_settings>)

An individual UI setting.

Children

Element
Obligatory?
Repeatable?
Description
<name> yes no The name of the setting. This element has no attributes or element children; its value is the text it delimits.
<value> yes no The value of the setting. This element has no attributes or element children; its value is the text it delimits.

<step> (of <workflow>)

Each workflow contains a sequence of globally-defined steps.

Attributes

Attribute
Value
Obligatory?
Description
name a string
yes The name of a globally defined step.
pretty_name a string
no The name of this step that the user will see in the UI in this workflow.
type
one of "hand", "auto", "auto-with-correction"
no
If the global step is an annotation step of type "mixed", its use can be further narrowed in the context of this workflow through the use of this attribute.

Children

Element
Obligatory?
Repeatable?
Description
<create_settings> no no
Settings to pass to the initializer of the step
<run_settings>
no
no
Settings to pass to the execution of a step
<ui_settings>
no
no
Settings to pass to the UI for this step. Not currently used.

<create_settings> (of <step>)

These are settings that a step might pass to the initialization phase of its step class. These settings override the values in the <create_settings> element for <step_config>. You can configure these settings either with a child <setting> element, or with an attribute on the <settings> element itself; they're interchangeable.

Attributes

Attribute
Value
Obligatory?
Description
<attr> a string
no The <create_settings> tag supports arbitrary attribute-value pairs.

Children

Element
Obligatory?
Repeatable?
Description
<setting> no yes An attribute-value pair.

<setting> (of <create_settings>)

An individual step creation setting.

Children

Element
Obligatory?
Repeatable?
Description
<name> yes no The name of the setting. This element has no attributes or element children; its value is the text it delimits.
<value> yes no The value of the setting. This element has no attributes or element children; its value is the text it delimits.

<run_settings> (of <step>)

These are settings which are passed to the do() or doBatch() method of the step config class of the step (i.e., the engine implementation). You can configure the settings either with a child <setting> element, or with an attribute on the <settings> element itself; they're interchangeable.

Attributes

Attribute
Value
Obligatory?
Description
<attr> a string
no The <run_settings> tag supports arbitrary attribute-value pairs.

Children

Element
Obligatory?
Repeatable?
Description
<setting> no yes An attribute-value pair.

Most predefined step implementations in MAT do not support any run settings. Two implementations which do are MAT.JavaCarafe.CarafeTokenizationStep and MAT.JavaCarafe.CarafeTagStep. The MAT.JavaCarafe.CarafeTagStep step implements automatic tagging. Any step which implements automatic tagging can bear the following additional attribute-value pairs:

Key
Value
Description
local
"yes"
By default, the MAT engine will contact the MAT Web server to tag a document, because the Web server has the capability of starting up and monitoring a long-living tagger task (if the engine supports this capability). The reason this is beneficial is that the jCarafe tagger, like many model-based taggers, has a fairly expensive startup cost. To block the engine from contacting the Web server, and force it to start up and shut down the tagger on its own, specify local="yes".
model
a string, a filename of a tagging model
If the engine for the task does not have a default model (or if no default model has been created yet), the user must specify the location of the tagger model, either here or when invoking MATEngine.

In addition, the jCarafe tagging and tokenization steps support other run settings, documented here.

<setting> (of <run_settings>)

An individual run setting.

Children

Element
Obligatory?
Repeatable?
Description
<name> yes no The name of the setting. This element has no attributes or element children; its value is the text it delimits.
<value> yes no The value of the setting. This element has no attributes or element children; its value is the text it delimits.

<ui_settings> (of <step>)

These are settings that are intended to be passed unmodified to the UI, in order to declaratively configure UI customizations for particular tasks. At the moment, no tasks use this feature. You can configure these settings either with a child <setting> element, or with an attribute on the <settings> element itself; they're interchangeable.

Attributes

Attribute
Value
Obligatory?
Description
<attr> a string
no The <ui_settings> tag supports arbitrary attribute-value pairs.

Children

Element
Obligatory?
Repeatable?
Description
<setting> no yes An attribute-value pair.

<setting> (of <ui_settings>)

An individual UI setting.

Children

Element
Obligatory?
Repeatable?
Description
<name> yes no The name of the setting. This element has no attributes or element children; its value is the text it delimits.
<value> yes no The value of the setting. This element has no attributes or element children; its value is the text it delimits.

<annotation_set_descriptors> (of <task>)

The <annotation_set_descriptors> element allows you to define multiple annotation sets. You can inherit annotations from other tasks.

The <annotation_set_descriptors> element and its children, along with the <annotation_display> element and (most of) its children, constitute the legacy method for defining annotations and their display elements. For a more concise and differently organized mechanism, see the <annotations> element. For a comparison of the two mechanisms, see the use cases here.

It's possible to combine the two methods. In that case, the processing order is first, <annotation_set_descriptors>, then <annotations>, then <annotation_display>.

Attributes

Attribute
Value
Obligatory?
Description
all_annotations_known "yes"
no By default, the task leaves its annotation sets "open"; i.e., if the task encounters an unknown annotation label, it won't raise an error. If you provide the value "yes" for this attribute, an error will be raised if the task encounters an unknown annotation.
inherit a comma-separated list of labels to inherit
no You can inherit annotations from other tasks, either by true label, by set (see the "set" attribute of <annotation_set_descriptor>) or by category (see the "category" attribute of <annotation_set_descriptor>). To inherit an annotation by label, simply list it; to inherit a category or set, list "category:" or "set:" + the category or set name. Individual true labels will be inherited with all of their attributes, even those defined in other sets. You can augment inherited sets, categories or true labels in the child task.

A typical value for this attribute is "category:zone,category:token", which inherits the annotations for the zone and token categories from the parent (usually root) task.

Children

Element
Obligatory?
Repeatable?
Description
<annotation_set_descriptor> no yes

<annotation_set_descriptor> (of <annotation_set_descriptors>)

The <annotation_set_descriptor> element is described in detail elsewhere. In simple tasks, you'll have a single element of this type; we tend to define this single element with category="content" and name="content". However, you can define multiple sets, and the categories and set names can be whatever you want. See the "Sample Relations" task for an example.

Attributes

Attribute
Value
Obligatory?
Description
category a string
no The category of the annotation set descriptor. These values are user-definable (aside from a few predetermined values like "zone" and "token").
name
a string
yes
The name of the annotation set descriptor. The names are user-definable, and you can have a many-to-one mapping from names to categories, to facilitate set grouping.
managed
"no"
no
By default, annotation sets are managed, which means that SEGMENT annotations are established to track the annotation progress for this set. If, for some reason, you don't want your sets to be managed, set this attribute to "no". We don't recommend doing this.

<annotation_display> (of <task>)

This element defines all the display-related properties in the MAT UI of the elements defined in the <annotation_set_descriptors> element. Most of what you can do here is define the display-related properties of labels, although you can also define some of the properties of attributes, and also define groups for hierarchical annotation displays. The <annotation_display> element is described in detail elsewhere.

Note that the order in which the elements in <annotation_display> are defined is the order in which the CSS display rules are defined; so the styling of <label> and <label_group> elements takes precedence of the styling of previous <label> and <label_group> elements.

The <annotation_display> element and (most of) its children, along with the <annotation_set_descriptors> element and its children, constitute the legacy method for defining annotations and their display elements. For a more concise and differently organized mechanism, see the <annotations> element. For a comparison of the two mechanisms, see the use cases here. Note that the <gesture> element is only definable in the legacy <annotation_display> subsystem.

It's possible to combine the two methods. In that case, the processing order is first, <annotation_set_descriptors>, then <annotations>, then <annotation_display>.

<annotations> (of <task>)

The <annotations> element, introduced in MAT 3.1, provides a more concise and differently organized mechanism for defining annotations and their display elements, in comparison with the legacy <annotation_set_descriptors> element and its children, along with the <annotation_display> element and (most of) its children. For a comparison of the two mechanisms, see the use cases here. The <annotations> element is described in detail elsewhere.

It's possible to combine the two methods. In that case, the processing order is first, <annotation_set_descriptors>, then <annotations>, then <annotation_display>.

<similarity_profile> (of <task>)

When you run the MATScore engine, or produce a visual comparison of annotations, MAT uses a set of heuristics to determine the best pairing of annotations. You can affect this process using the <similarity_profile> element.

Similarity profiles are not inherited.

Attributes

Attribute
Value
Obligatory?
Description
name a string
no The name of the profile, for use when creating comparison documents or scoring. If no name is provided, this is the default profile for the task. There can be only one unnamed profile.

Children

Element
Obligatory?
Repeatable?
Description
<stratum> no yes The comparison algorithm is stratified (see the algorithm for more details). You can use this element to define the strata, rather than allowing them to be inferred.
<tag_profile> no yes There's a default similarity profile for spanned and spanless annotations. If you want to declare your own profile explicitly, you can do that with this element.

<stratum> (of <similarity_profile>)

The comparison algorithm is stratified (see the algorithm for more details). You can use this element to define the strata, rather than allowing them to be inferred.

Attributes

Attribute
Value
Obligatory?
Description
true_labels a comma-separated string of labels
yes The labels in this stratum. Note that these labels must be true labels, not effective labels. You may specify all the labels in an annotation set by listing "set:" + the set name as one of the elements in the value, e.g., "set:content".

<tag_profile> (of <similarity_profile>)

There's a default similarity profile for spanned and spanless annotations. If you want to declare your own profile explicitly, you can do that with this element. See the algorithm for details on how to use these.

Attributes

Attribute
Value
Obligatory?
Description
true_labels a comma-separated string of labels
yes The labels to which this profile applies. Note that these labels must be true labels, not effective labels. You may specify all the labels in an annotation set by listing "set:" + the set name as one of the elements in the value, e.g., "set:content".

Children

Element
Obligatory?
Repeatable?
Description
<attr_equivalences>
no
yes
Equivalences for attributes among the various labels in the profile.
<dimension> yes yes One dimension of the profile.

<dimension> (of <tag_profile>)

Each profile consists of a number of dimensions, which define some aspect of the annotation to use in comparison, along with the method to be used for comparison and the relative weight of the dimension. See the algorithm for details about the various dimensions.

Attributes

Attribute
Value
Obligatory?
Description
name a string
yes The name of the dimension. See the algorithm for a list of legal names and their interpretations.
weight a number
yes The relative weight of the dimension. The weights of all the dimensions will be normalized.
param_digester_method a Python function name
no In rare circumstances, the dimension method may accept parameters (see <attr> below) and these parameters may need to be interpreted (e.g., "yes" -> True). The full name of the function, including the module it's in, must be specified.
aggregator_method a Python function name
no If special handling is required for a dimension which has an aggregation value, this option allows you to declare the handler. The full name of the function, including the module it's in, must be specified.
method a string
no The method associated with the dimension, if not the default method. See the algorithm for a list of legal names.
<attr> a string no the <dimension> element supports arbitrary attribute-value pairs

<attr_equivalences> (of <tag_profile>)

The true labels in your tag profile may vary in their attribute names, but you may still want these attributes to be comparable. This element allows you to declare your equivalences. See the algorithm for details about the various dimensions.

Attributes

Attribute
Value
Obligatory?
Description
name a string
yes The name of the equivalence. See the algorithm for a list of legal names and their interpretations.
attrs a comma-separated stering
yes All attributes which stand in this equivalence. Each label in your profile must have at least one of these attributes, and no attribute name can appear more than once among the equivalences in the profile.

<score_profile> (of <task>)

When you run the MATScore engine, you can control how the scored elements are aggregated, decomposed, or filtered in the scoring output. See the algorithm for details on how to use this.

Score profiles are not inherited.

Attributes

Attribute
Value
Obligatory?
Description
name a string
no The name of the profile, for use when scoring. If no name is provided, this is the default profile for the task. There can be only one unnamed profile.

Children

Element
Obligatory?
Repeatable?
Description
<aggregation> no yes A set of labels to aggregate as a separate entry.
<attr_decomposition> no yes An attribute-based decomposition of particular labels to report as a separate entry.
<partition_decomposition> no yes A function-based decomposition of particular labels to report as a separate entry.
<label_limitation>
no
no
A list of labels to restrict the overall reporting to.
<attrs_alone>
no
no
A set of labels to for which independent scores for each of the attributes are reported.

<label_limitation> (of <score_profile>)

The scorer will pair all annotations which are not specified as being ignored. Sometimes, you might need to pair some annotations as part of the scoring process (let's say they're arguments of relations, for instance), but you don't want them in the final output, even though you can't ignore them. You can use this element to provide that filter.

Attributes

Attribute
Value
Obligatory?
Description
true_labels a comma-separated string of labels
yes Only these true labels (and the effective labels that are defined on them) will be included in the scoring output. You may specify all the labels in an annotation set by listing "set:" + the set name as one of the elements in the value, e.g., "set:content".

<attrs_alone> (of <score_profile>)

Sometimes, you might want to know what the scores of the individual attributes for a label are, if, for instance, you have a processing step which adds attribute values. This element instructs the scorer to produce these individual scores, for all pairs of annotations bearing the specified labels. These scores are produced for each aggregation and decomposition in which the relevant annotations appear.

Attributes

Attribute
Value
Obligatory?
Description
true_labels a comma-separated string of labels
yes Independent scores for attributes alone will be produced for these labels. You may specify all the labels in an annotation set by listing "set:" + the set name as one of the elements in the value, e.g., "set:content".

<aggregation> (of <score_profile>)

Under normal circumstances, annotations are aggregated per document and per run by effective label (if available) or true label, or by equivalence classes passed to MATScore, and then all together into a single heap. You can add other aggregations of true labels using this element.

Attributes

Attribute
Value
Obligatory?
Description
name a string
yes The name of the aggregation as it will appear in the output spreadsheet
true_labels a comma-separated string of labels
yes The true labels in this aggregation. You may specify all the labels in an annotation set by listing "set:" + the set name as one of the elements in the value, e.g., "set:content".

<attr_decomposition> (of <score_profile>)

Under normal circumstances, the only way to decompose true labels in the score output is by effective label. If you want to decompose them by a particular attribute (e.g., you want to see the score for ENAMEXes when type = NOM), you can use this element. Decompositions can overlap with each other.

Attributes

Attribute
Value
Obligatory?
Description
true_labels a comma-separated string of labels
yes The true labels to which this decomposition applies. You may specify all the labels in an annotation set by listing "set:" + the set name as one of the elements in the value, e.g., "set:content".
attrs a comma-separated string of attrs
yes The names of attributes defined for all the listed labels. There will be a separate decomposition for each tuple of values for these attrs. The  name of the decomposition in the score output will be <attr1>=<val1> <attr2>=<val2>...

<partition_decomposition> (of <score_profile>)

Under normal circumstances, the only way to decompose true labels in the score output is by effective label. If you want to decompose them by a Python function, you can use this element. Decompositions can overlap with each other.

Attributes

Attribute
Value
Obligatory?
Description
true_labels a comma-separated string of labels
yes The true labels to which this decomposition applies. You may specify all the labels in an annotation set by listing "set:" + the set name as one of the elements in the value, e.g., "set:content".
method a Python function name
yes This function must take a single argument, which will be an annotation, and return a value. For instance, if you're evaluating a geotagger, and the tagger provides a country attribute for the location, and you want to decompose location scores by US and non-US, you'd define a function which returns "US" if the country attribute is "US", and "non-US" otherwise. The full name of the function, including the module it's in, must be specified. The name of the decomposition in the score output will be <bare function name>=<val>.