The XML format for the task files (see "Creating a New Task") is described in this
document. Use cases are described here. Click here for a split-screen view.
The reference for declaring annotations and their display
properties is handled separately, and can be found here.
A container for <task> elements. Either this element or
<task> can be the toplevel element in the file.
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<task> | yes | yes | A task definition. |
The usual toplevel element in the file. For historical reasons,
some of the tags are obligatory and some not. Conceptually
speaking, you always need to specify either
<annotation_set_descriptors> and <annotation_display>
or <annotations>; and <engines>, <workflows> and
<steps> are required for using the engine and experiment
infrastructure. <workspaces> should be defined if you want
to define a default workflow to use as your workspace, or
customize a workflow for use in a workspace. The other elements
are for advanced customizations.
If you want to define multiple tasks in the same task.xml file
(if, for instance, you're defining a task and a set of child
tasks), you can use <tasks> as your toplevel element. This
element has no attributes, and only one repeatable child:
<task>.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
yes | The name of the task. This
name will appear in menus in the UI, and in help strings in
the engine, so make it something mnemonic, distinctive and
descriptive. |
visible | "no" |
no | If present, the task is not
"visible" in the various lists of tasks the user will see.
Typically, this is used if this task is not a leaf in the
tree of tasks. You will seldom need this capability. |
parent | a string |
no | The name of the parent task
in the hierarchy. If this is not specified, the system root
task will be used. You will seldom need this capability. If
you do, typically the parent will specify visible="no". |
class | a string, the name of a
Python class |
no | If you've found a need to specialize the
default task implementation, the value of this attribute
should be "<file>.<classname>", where
<file> corresponds to a file
<taskdir>/python/<file>.py. Note: the
value of this attribute is not inherited. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<languages> | yes | no | The languages this task can
apply to. |
<doc_enhancement_class> | no | no | If specified, this element
should delimit a string "<file>.<classname>",
where <file> corresponds to a file
<taskdir>/python/<file>.py. The specified class
is a class which contributes to specializations of this
documentation for the task in question. This functionality
is currently undocumented. Use <doc_enhancement>
instead. This element has no attributes or element children; its value is the text it delimits. |
<doc_enhancement> | no | no | Declarative enhancements to
the documentation. |
<web_customization> | no | no | Customizations of the Web UI. |
<engines> | no | no | The automated tagging engines
used in this task. |
<steps> | no | no | The steps used in the
workflows in this task. |
<workspaces> | no | no | The default workspace
configuration or workflow to use for workspaces, and any
custom configurations built on top of workflows. |
<settings> | no | no | The task-specific settings which may be viewed by specializations of the root task. |
<workflows> | no | no | The workflows that are used in the MAT engine. |
<annotation_set_descriptors> | no | no | The labels and attributes which are used in this task. |
<annotation_display> | no | no | The display-related properties of the labels and attributes in this task. |
<similarity_profile> | no | yes | The methods for comparing annotations for scoring and visual comparison. |
<score_profile> | no | yes | The methods for decomposing and aggregating annotation labels for scoring. |
As of MAT 3.0, each task must declare the languages it
applies to. The task can apply to multiple languages. This element
contains those language definitions.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
inherit_all | "yes" |
no | If present, all languages
defined in the parent task (if any) will be inherited. |
inherit | a comma separated list of
language names |
no | If present, a list of
languages (by their name attribute) which will be inherited
from the parent task. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<language> | no | yes | An individual language
declaration. |
This element declares a single language. You can assign a
shorthand code (e.g., the ISO language code) for use in most
places where languages must be specified, and
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
code | a string |
no | A shorthand code (e.g., the
ISO language code) which you can refer to this language by. |
name | a string |
no | The full name of the language |
import_from | N/A |
no | We intend, someday, to allow
languages to be defined in their own files, and to be
imported by name or code. This feature is not yet
implemented. |
text_right_to_left |
"yes" |
no |
If specified, documents
viewed in this language in the MAT UI will be treated as
right-to-left text (e.g., Arabic). |
tokenless_autotag_delimiters |
a string |
no |
By default, if you ask the
MAT UI to autotag similar strings when you're annotating
without tokens, the only edge conditions that the UI
recognizes are whitespace, zone boundaries, and document
start and end. If your match abuts a punctuation mark, it
will not recognize it as a delimiter. If you want other edge
conditions to be recognized, you can list them in the value
of this attribute. (Remember, though, that you may have to
use the XML entity character codes for those characters
which are significant to XML syntax, so that the XML parsing
doesn't fail.) |
tokenless_autotag_respects_delimiters |
"no" |
no |
By default, if you ask the MAT UI to autotag
similar strings when you're annotating without tokens, it
will require edges (whitespace, zone boundaries, document
start and end, or the elements in the value of the
tokenless_autotag_delimiters attribute) to delimit any
matched strings. If this language doesn't use whitespace as
a delimiter (e.g., Chinese), you might want the tokenless
autotagging to ignore delimiters entirely, and autotag the
string wherever it's found. You can achieve this by setting
this attribute to "no". |
The MAT documentation, which you're currently reading, has two
slots in its documentation sidebar for additional pages provided
by the installed tasks. These slots are not visible unless one of
the installed tasks places documentation there. See here
for further details.
Element | Obligatory? | Repeatable? | Description |
---|---|---|---|
<app_header> | no | yes | Use this element to add
documentation to the "application" slot, which is at the top
of the sidebar, above "Getting started". You might use this
slot if you're using MAT for a single purpose that you want
the documentation to lead with. This element will also allow
you to "brand" the documentation, changing the documentation
window title and the initial page. Each <app_header>
entry is its own header in the sidebar, so it will seldom
make sense for you to provide more than one of these, or
install multiple tasks with this entry. |
<customization_header> | no | yes | Use this element to add
documentation to the "customization" slot, which is near the
top of the sidebar, above "For users". This is the default
place to put documentation about your task. Each
<customization_header> entry is its own header in the
sidebar. |
<external_id_header> | no | yes | Use this element to add
documentation to an <app_header> or
<customization_header> section which was defined by a
parent task. |
Use this element to add documentation to the "application" slot in the documentation sidebar, which is at the top, above "Getting started". You might use this slot if you're using MAT for a single purpose that you want the documentation to lead with. This element will also allow you to "brand" the documentation, changing the documentation window title and the initial page. Each <app_header> entry is its own header in the sidebar, so it will seldom make sense for you to provide more than one of these, or install multiple tasks with this entry. See here for further details.
Attribute | Value | Obligatory? | Description |
---|---|---|---|
text | a string |
yes | The text of the header. |
id | a string |
yes | The HTML node ID of the
documentation section, for later possible use by
<external_id_header> |
brand_url | a URL relative to the task
directory root |
no | If a value for this attribute
is provided, the text of this attribute will be used as the
title of the documentation tab or page, and the designated
URL will be loaded first into the main documentation window. |
Element | Obligatory? | Repeatable? | Description |
---|---|---|---|
<doc_entry> | no | yes | An entry under the
documentation section. |
<doc_section> | no | yes | A subsection heading. |
This element is a normal page within the documentation section in question. At the toplevel, it is listed in the sidebar at the same indent level as the section header; at lower levels, it will be indented farther to the right. See here for further details.
Attribute | Value | Obligatory? | Description |
---|---|---|---|
text | a string |
yes | The text of the entry. |
url | a URL relative to the task
directory root |
yes |
The actual HTML documentation
page. |
This element is a new section within the documentation section in question. At the toplevel, it is listed in the sidebar at the same indent level as its parent section header; at lower levels, it will be indented farther to the right. Its children will be indented farther to the right. See here for further details.
Attribute | Value | Obligatory? | Description |
---|---|---|---|
text | a string |
yes | The text of the entry. |
url | a URL relative to the task
directory root |
no | The actual HTML documentation
page for the section header (if appropriate) |
Element | Obligatory? | Repeatable? | Description |
---|---|---|---|
<doc_entry> | no | yes | See <app_header>. |
<doc_section> |
no |
yes |
A subsection heading (i.e., this element can
be recursive) |
Use this element to add documentation to the "customization" slot of the documentation sidebar, which is near the top, above "For users". This is the default place to put documentation about your task. Each <customization_header> entry is its own header in the sidebar.
Attribute | Value | Obligatory? | Description |
---|---|---|---|
text | a string |
yes | The text of the header. |
id | a string |
yes | The HTML node ID of the documentation section, for later possible use by <external_id_header> |
Element | Obligatory? | Repeatable? | Description |
---|---|---|---|
<doc_section> | no | yes | See <app_header>. |
<doc_entry> | no | yes | See <app_header>. |
Use this element to add documentation to an <app_header> or
<customization_header> section which was defined by a parent
task. The children of this element will appear in the
documentation as if they had been defined in the scope of the
referenced elements.
Attribute | Value | Obligatory? | Description |
---|---|---|---|
id | a string |
yes | The ID of the
<app_header> or <customization_header> section
that was previously defined. |
Element | Obligatory? | Repeatable? | Description |
---|---|---|---|
<doc_section> | no | yes | See <app_header>. |
<doc_entry> | no | yes | See <app_header>. |
Among the ways that tasks can be customized is the Web UI can be
customized in a number of ways. This process is quite complicated;
it's almost entirely code-oriented, and it's not documented at
all. This section is here for reference only; users who aren't
really, really brave shouldn't go anywhere near most of these
customizations.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
inherit_css | "no" |
no | If the parent task has CSS
customizations, as specified in the <css> element
below, they are inherited by default. Use this setting to
block inheritance. |
inherit_js | "no" |
no | If the parent task has Javascript customizations, as specified in the <css> element below, they are inherited by default. Use this setting to block inheritance. |
display_config | a string |
no | Each Web customization set
has a name, so that when the user selects a particular task,
the UI knows which customization set to use. Inherited by
default from parent tasks; a value of "" cancels the
inheritance. |
alphabetize_labels |
"no" |
no |
By default, the MAT UI orders the annotation labels alphabetically in the legend and the tag popup menu. If this attribute is set, the UI will list the annotation labels in the order they are defined in the <annotation_display> element. Inherited by default from parent tasks; a value of "" cancels the inheritance. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<js> | no | yes | The relative pathname of the
Javascript customizations. This path is relative to the task
directory. By convention, this file should be in the "js"
subdirectory. This element has no attributes or element children; its value is the text it delimits. |
<css> | no | yes | The relative pathname of the
CSS customizations. This path is relative to the task
directory. By convention, this file should be in the "css"
subdirectory. This element has no attributes or element children; its value is the text it delimits. |
<short_name> |
no |
no |
This is the name that the UI
will display in the upper left corner if this customization
is the only customization available. This setting will be
inherited by child tasks. This element has no attributes or element children; its value is the text it delimits. |
<long_name> |
no |
no |
This is the name that the UI
will use as the title of the Web page if this customization
is the only customization available. This setting will be
inherited by child tasks. This element has no attributes or element children; its value is the text it delimits. |
Tasks can define automated processing engines, which may be
trainable (see the <model_config> element below).
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
enhance | a comma-separated list of
engine names |
no | If present, the named engines
will be inherited from the parent task, and any local
settings for an engine of the same name will be treated as
overrides to the parent values. |
inherit_all | "yes" |
no | If present, all engines will
be inherited from the parent task. |
inherit | a comma-separated list of
engine names |
no | If present, the named engines
will be inherited from the parent task. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<java_subprocess_parameters> | no | no | If present, defaults for various JVM parameters for all Java subprocesses (e.g., jCarafe training and tagging). |
<engine> | no | yes | An engine. |
MAT has some built-in tools to control jCarafe and other Java
subprocesses. Using this element, you can declare default settings
for Java heap and stack sizes. If not set locally, these settings
are inherited from parent tasks.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
heap_size | a string | no | The value here is a value for
the heap size for the Java VM. It is passed to the Java VM
using the -Xmx argument. Values like 512M or 2G are examples
of expected values. This default value can be overridden by
declaring the empty string ("") in any configuration context
where the heap size can be specified (see the jCarafe engine for
examples). |
stack_size | a string | no | The value here is a value for the stack size for the Java VM. It is passed to the Java VM using the -Xss argument. Values like 4096k or 512k are examples of expected values. This default value can be overridden by declaring the empty string ("") in any configuration context where the heap size can be specified (see the jCarafe engine for examples). |
Each engine references a Python class which implements the
engine, and optionally configuration information for training the
engine.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
languages | a comma-separated list of
language names or codes |
no | You may choose to restrict
the engine to applying to particular languages. These
languages must be a subset of the languages declared in the
task. |
name | a string |
no | The name of the engine |
import_from | N/A |
no | We intend, someday, to allow engines to be defined in their own files, and to be imported by name. This feature is not yet implemented. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<step_config> | yes | no | The implementation of the
engine |
<default_model> | no | no | If present, this element
should delimit a pathname where models will be saved if
MATModelBuilder is invoked with --save_as_default_model. If
the pathname is relative, it will be interpreted as relative
to the task directory. The path will be suffixed with the
relevant step name and language when it is referenced, so
that the engine can be reused in multiple languages and
steps. This element has no attributes or children; its value is the text it delimits. |
<model_config> | no | yes | Settings for the model building engine, if the engine is trainable |
This element defines the class which implements the engine, and
any settings which should be passed to the class when it is
created.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
class | a string, the name of a Python class | yes | The Python class, including its module name, which implements this engine. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<create_settings> | no | no |
Default settings for
initializing the engine implementation. |
These are settings that an engine might pass to the
initialization phase of its step config class. These settings can
be overridden by the values in the <create_settings> element
for <step> in the <workflow> element. You can
configure these settings either with a child <setting>
element, or with an attribute on the <settings> element
itself; they're interchangeable.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
<attr> | a string |
no | The <create_settings> tag supports arbitrary attribute-value pairs. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<setting> | no | yes | An attribute-value pair. |
An individual step creation setting.
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<name> | yes | no | The name of the setting. This element has no attributes or element children; its value is the text it delimits. |
<value> | yes | no | The value of the setting. This element has no attributes or element children; its value is the text it delimits. |
Some of your engines will be trainable. If so, you must provide
at least one <model_config> element for the engine. The
settings for the config are identical to the command-line options
available for the MATModelBuilder.
MAT is delivered with a default jCarafe
model builder.
You can have multiple <model_config> entries for an engine,
as long as they differ by the config_name attribute.You can
inherit an engine from a parent task, and override various
elements of the parent engine definition, including the model
config.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
class |
the name of a Python class |
yes |
This attribute names the
class which will be used as the model builder. The default
jCarafe model builder class is
MAT.JavaCarafe.CarafeModelBuilder |
config_name |
a string |
no |
If present, a config name to
specify as the --config_name in MATModelBuilder, or for the
config_name attribute in <build_settings> in the
experiment engine. If omitted, this entry is the default
model config. There can be only one default. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<build_settings> | no | no |
The settings for this
model config |
The <build_settings> tag supports arbitrary attribute-value pairs which are passed to the model builder. See the documentation for the jCarafe model builder to see which attributes should be supplied to that engine. You can configure these settings either with a child <setting> element, or with an attribute on the <settings> element itself; they're interchangeable.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
<attr> | a string |
no | The <build_settings> tag supports arbitrary attribute-value pairs. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<setting> | no | yes | An attribute-value pair. |
An individual build setting.
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<name> | yes | no | The name of the setting. This element has no attributes or element children; its value is the text it delimits. |
<value> | yes | no | The value of the setting. This element has no attributes or element children; its value is the text it delimits. |
Steps represent the way engines are used within workflows, and
how humans interact with workflows. Signal steps modify the signal
(and must be workflow-initial); transform steps transform the
signal (and must be workflow-final); and annotation steps add or
modify annotation sets.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
inherit_all | "yes" |
no | If present, all steps will be inherited from the parent task. |
inherit | a comma-separated list of
step names |
no | If present, the named steps will be inherited from the parent task. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<signal_step> | no | yes | A signal step |
<annotation_step> | no | yes | A annotation step |
<transform_step> | no | yes | A transform step |
Signal steps modify the signal, and must be workflow-initial.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
yes | The name of the step. |
engine | an engine name |
no | The engine this step uses, if
applicable |
type | one of "hand", "auto" |
no | The type of the step.
Currently, "auto" is the only type that's implemented
("hand" is intended for steps where the human annotator can
edit the signal before annotating). |
Annotation steps add or modify annotations. This is probably the
only step you'll define.
There are four types of annotation steps:
Each step is associated with annotation sets or categories which
it adds or modifies.These declarations are used to determine which
annotations to make available to the user in the UI in each step,
and how much progress has been made in each step. Annotation steps
support the ability to declare sets forbidden, desired and
required as well; these declarations were intended to be used in
computation of well-formedness conditions in workflows which we
have not had a chance to implement yet.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
yes | The name of the step. |
engine | an engine name |
no | The engine this step uses, if
applicable. |
sets_desired | a comma-separated list of
annotation set names, or categories prefixed with the string
"category:" |
no | Not currently used. |
sets_modified | a comma-separated list of annotation set names, or categories prefixed with the string "category:" | no | The annotation sets or
categories modified by this step |
sets_required | a comma-separated list of annotation set names, or categories prefixed with the string "category:" | no | Not currently used. |
sets_forbidden | a comma-separated list of annotation set names, or categories prefixed with the string "category:" | no | Not currently used. |
sets_added | a comma-separated list of annotation set names, or categories prefixed with the string "category:" | no | The annotation sets or
categories added by this step. |
type | one of "hand", "auto",
"auto-with-correction", "mixed" |
no | One of the four annotation
step types describe immediately above. |
Transform steps modify the signal based on the previous
annotation, and must be workflow-final. Transform steps are all
auto steps.
Transform steps support the ability to declare sets forbidden, desired and required; these declarations were intended to be used in computation of well-formedness conditions in workflows which we have not had a chance to implement yet.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
engine | an engine name |
yes | The engine this step uses |
name | a string |
yes | The name of this step |
sets_desired | a comma-separated list of annotation set names, or categories prefixed with the string "category:" | no | Not currently used. |
sets_required | a comma-separated list of annotation set names, or categories prefixed with the string "category:" | no | Not currently used. |
sets_forbidden | a comma-separated list of annotation set names, or categories prefixed with the string "category:" | no | Not currently used. |
In MAT 3.0, any workflow which contains at least one
hand-annotatable step can be used to create a workspace. You might want to
define this element if you want to define a special configuration
which customizes a workflow for use in workspaces, or if you want
to declare a default workflow or configuration.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
enhance | a comma-separated list of
configuration names |
no | If present, the named
workspace configurations will be inherited from the parent
task, and any local settings for a configuration of the same
name will be treated as overrides to the parent values. |
default_config | a string, either the name of
a workflow or of a configuration defined by the a
<workspace> child element |
no | The default configuration to
use when you define a workspace for this task. If there are
multiple hand-annotatable workflows, and you don't declare a
default, you'll have to provide the --workspace_config
option to the "create" operation of MATWorkspaceEngine. If
there's no default_config, but there's a parent task which
has a default config which has been inherited, the parent's
default will be used. |
inherit_all | "yes" |
no | If present, all workspace
configurations will be inherited from the parent task. |
inherit | a comma-separated list of
configuration names |
no | If present, the named
workspace configurations will be inherited from the parent
task. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<workspace> | no | yes | Implementations of the operations in the workspaces. |
This element is a container for workspace configurations, which
provide options to workspace operation arguments. The workspace
operations which can be configured are listed here.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
workflow | the name of a workflow |
yes | Each workspace configuration
is built on top of a workflow, which must be declared in the
task. If config_name is not present, the name of the
workflow will be the name of the workspace configuration. |
config_name | a string |
no | If present, the name of the
workspace configuration. You may choose to set this if you
have a workflow named "Demo" and you want to associate two
different sets of workspace operation arguments with it. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<operation> | yes | yes | An individual operation
configuration |
Specifies the configuration of a workspace operation. Note that
in spite of the fact that operations are associated with folders,
these operations are referenced only by name. Currently, no
operations which are ambiguous among folders are configurable.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
yes | The name of the operation |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<settings> | no | no | The operation settings. |
The settings for the operation. What these settings are depend on what sort of operation it is. For instance, for operations which invoke the MAT engine, these settings will be the arguments to the MAT engine. For operations which invoke the MAT model builder, these settings will be the arguments to the MAT model builder. See the documentation on workspaces to find out which operations are configurable, and how they can be configured.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
<attr> | a string |
no | The <settings> tag supports arbitrary attribute-value pairs. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<setting> | no | yes | An attribute-value pair. |
An individual operation setting.
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<name> | yes | no | The name of the setting. This element has no attributes or element children; its value is the text it delimits. |
<value> | yes | no | The value of the setting. This element has no attributes or element children; its value is the text it delimits. |
These are settings that a specialized task might require which
the user wishes to be able to configure in XML, rather than by
modifying the source code for the specialized task. The chances
that a normal user will use this are extremely slim. These
settings are not inherited by task children.
You can configure the settings either with a child
<setting> element, or with an attribute on the
<settings> element itself; they're interchangeable.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
<attr> | a string |
no | The <settings> tag supports arbitrary attribute-value pairs. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<setting> | no | yes | An attribute-value pair. |
An individual task-level setting.
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<name> | yes | no | The name of the task-level setting. This element has no attributes or element children; its value is the text it delimits. |
<value> | yes | no | The value of the task-level setting. This element has no attributes or element children; its value is the text it delimits. |
Workflows are ordered sets of steps, corresponding to a
larger-scale activity the user may wish to apply to the documents.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
inherit | a string, a comma-delimited
sequence of workflow names |
no | If the task has a non-root parent task, you may use this attribute to inherit workflows from the parent task. The implementations of the step names will also be inherited. You can list multiple workflows, delimited by commas, e.g., "Demo,Hand annotation". |
inherit_all | "yes" |
no | If the task has a non-root
parent, you may use this attribute to specify that all
workflows should be inherited from the parent. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<workflow> | no | yes | An individual workflow |
Each non-inherited workflow is specified by a <workflow>
element.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
yes | The name of the workflow that
the user can specify in the Web UI or the MAT engine. |
undoable | "yes" |
no | In MAT 2.0, the task
required a global "undo" order on steps, in order to ensure
that when steps were undone in a workflow, all appropriate
steps in the task were undone (e.g., if you undid tag and
zone in a workflow which doesn't contain tokenization, and
some workflow in the task contained tokenization between
zone and tag, tokenization was undone as well). This global
undo order was impossible to maintain in 3.0, and as a
result, it has been abandoned. If you undo steps in a
workflow, only those steps will be undone, and as a result,
your documents can end up in unusual states (e.g., tokenized
but not zoned). In order to compensate for this issue, in
3.0, workflows, by default, are not undoable; the "retreat"
buttons in the UI will not be present, for instance. You can use this attribute to specify the workflow as undoable, but we encourage you to use it sparingly; you should only enable this feature for workflows which support all your tagging steps (content and otherwise). |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<ui_settings> | no |
no |
These are settings that are intended to be passed unmodified to the UI. This is not currently used. |
<step> | no | yes | An individual step of a
workflow. |
These are settings that are intended to be passed unmodified to
the UI, in order to declaratively configure UI customizations for
particular workflows. At the moment, no tasks use this feature.
You can configure these settings either with a child
<setting> element, or with an attribute on the
<settings> element itself; they're interchangeable.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
<attr> | a string |
no | The <ui_settings> tag supports arbitrary attribute-value pairs. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<setting> | no | yes | An attribute-value pair. |
An individual UI setting.
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<name> | yes | no | The name of the setting. This element has no attributes or element children; its value is the text it delimits. |
<value> | yes | no | The value of the setting. This element has no attributes or element children; its value is the text it delimits. |
Each workflow contains a sequence of globally-defined steps.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
yes | The name of a globally
defined step. |
pretty_name | a string |
no | The name of this step that
the user will see in the UI in this workflow. |
type |
one of "hand", "auto", "auto-with-correction" |
no |
If the global step is an annotation step of
type "mixed", its use can be further narrowed in the context
of this workflow through the use of this attribute. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<create_settings> | no | no |
Settings to pass to the
initializer of the step |
<run_settings> |
no |
no |
Settings to pass to the
execution of a step |
<ui_settings> |
no |
no |
Settings to pass to the UI
for this step. Not currently used. |
These are settings that a step might pass to the initialization
phase of its step class. These settings override the values in the
<create_settings> element for <step_config>. You can
configure these settings either with a child <setting>
element, or with an attribute on the <settings> element
itself; they're interchangeable.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
<attr> | a string |
no | The <create_settings> tag supports arbitrary attribute-value pairs. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<setting> | no | yes | An attribute-value pair. |
An individual step creation setting.
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<name> | yes | no | The name of the setting. This element has no attributes or element children; its value is the text it delimits. |
<value> | yes | no | The value of the setting. This element has no attributes or element children; its value is the text it delimits. |
These are settings which are passed to the do() or doBatch()
method of the step config class of the step (i.e., the engine
implementation). You can configure the settings either with a
child <setting> element, or with an attribute on the
<settings> element itself; they're interchangeable.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
<attr> | a string |
no | The <run_settings> tag supports arbitrary attribute-value pairs. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<setting> | no | yes | An attribute-value pair. |
Most predefined step implementations in MAT do not support any
run settings. Two implementations which do are
MAT.JavaCarafe.CarafeTokenizationStep and
MAT.JavaCarafe.CarafeTagStep. The MAT.JavaCarafe.CarafeTagStep
step implements automatic tagging. Any step which implements
automatic tagging can bear the following additional
attribute-value pairs:
Key |
Value |
Description |
---|---|---|
local |
"yes" |
By default, the MAT engine
will contact the MAT Web server to tag a document, because
the Web server has the capability of starting up and
monitoring a long-living tagger task (if the engine supports
this capability). The reason this is beneficial is that the
jCarafe tagger, like many model-based taggers, has a fairly
expensive startup cost. To block the engine from contacting
the Web server, and force it to start up and shut down the
tagger on its own, specify local="yes". |
model |
a string, a filename of a
tagging model |
If the engine for the task
does not have a default model (or if no default model has
been created yet), the user must specify the location of the
tagger model, either here or when invoking MATEngine. |
In addition, the jCarafe tagging and tokenization steps support other run settings, documented here.
An individual run setting.
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<name> | yes | no | The name of the setting. This element has no attributes or element children; its value is the text it delimits. |
<value> | yes | no | The value of the setting. This element has no attributes or element children; its value is the text it delimits. |
These are settings that are intended to be passed unmodified to
the UI, in order to declaratively configure UI customizations for
particular tasks. At the moment, no tasks use this feature. You
can configure these settings either with a child <setting>
element, or with an attribute on the <settings> element
itself; they're interchangeable.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
<attr> | a string |
no | The <ui_settings> tag supports arbitrary attribute-value pairs. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<setting> | no | yes | An attribute-value pair. |
An individual UI setting.
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<name> | yes | no | The name of the setting. This element has no attributes or element children; its value is the text it delimits. |
<value> | yes | no | The value of the setting. This element has no attributes or element children; its value is the text it delimits. |
The <annotation_set_descriptors> element allows you to
define multiple annotation sets. You can inherit annotations from
other tasks.
The <annotation_set_descriptors> element and its children,
along with the <annotation_display>
element and (most of) its children, constitute the legacy method
for defining annotations and their display elements. For a more
concise and differently organized mechanism, see the <annotations>
element. For a comparison of the two mechanisms, see the use cases
here.
It's possible to combine the two methods. In that case, the
processing order is first <annotation_set_descriptors>, then
<annotations>, then <annotation_display>.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
all_annotations_known | "yes" |
no | By default, the task leaves
its annotation sets "open"; i.e., if the task encounters an
unknown annotation label, it won't raise an error. If you
provide the value "yes" for this attribute, an error will be
raised if the task encounters an unknown annotation. |
inherit | a comma-separated list of
labels to inherit |
no | You can inherit annotations
from other tasks, either by true label, by set (see the
"set" attribute of <annotation_set_descriptor>) or by
category (see the "category" attribute of
<annotation_set_descriptor>). To inherit an annotation
by label, simply list it; to inherit a category or set, list
"category:" or "set:" + the category or set name. Individual
true labels will be inherited with all of their attributes,
even those defined in other sets. You can augment inherited
sets, categories or true labels in the child task. A typical value for this attribute is "category:zone,category:token", which inherits the annotations for the zone and token categories from the parent (usually root) task. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<annotation_set_descriptor> | no | yes |
The <annotation_set_descriptor> element is described in
detail elsewhere. In simple
tasks, you'll have a single element of this type; we tend to
define this single element with category="content" and
name="content". However, you can define multiple sets, and the
categories and set names can be whatever you want. See the "Sample Relations"
task for an example.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
category | a string |
no | The category of the
annotation set descriptor. These values are user-definable
(aside from a few predetermined values like "zone" and
"token"). The default category is "content". |
name |
a string |
yes |
The name of the annotation set descriptor.
The names are user-definable, and you can have a many-to-one
mapping from names to categories, to facilitate set
grouping. |
managed |
"no" |
no |
By default, annotation sets are managed,
which means that SEGMENT annotations are established to
track the annotation progress for this set. If, for some
reason, you don't want your sets to be managed, set this
attribute to "no". We don't recommend doing this, except in
one specific circumstance, described immediately below. |
Virtually all annotation sets are managed. The one circumstance
in which you might want to define an unmanaged set is if you have
an automated engine which adds tokens simultaneously with another
type of annotation (e.g., sentences). In this case, you'd have to
define the set which contains the sentence annotation as
unmanaged, because the step you'll declare will add both
annotation sets, and you can't define a step which adds both
managed and unmanaged annotation sets simultaneously. Since tokens
are unmanaged, and the engine adds them and sentences at the same
time, sentences must also be unmanaged.
This element defines all the display-related properties in the MAT UI of the elements defined in the <annotation_set_descriptors> element. Most of what you can do here is define the display-related properties of labels, although you can also define some of the properties of attributes, and also define groups for hierarchical annotation displays. The <annotation_display> element is described in detail elsewhere.
Note that the order in which the elements in
<annotation_display> are defined is the order in which the
CSS display rules are defined; so the styling of <label> and
<label_group> elements takes precedence of the styling of
previous <label> and <label_group> elements.
The <annotation_display> element and (most of) its
children, along with the <annotation_set_descriptors>
element and its children, constitute the legacy method for
defining annotations and their display elements. For a more
concise and differently organized mechanism, see the <annotations>
element. For a comparison of the two mechanisms, see the use cases
here. Note that the
<gesture> element is only definable in the legacy
<annotation_display> subsystem.
It's possible to combine the two methods. In that case, the processing order is first, <annotation_set_descriptors>, then <annotations>, then <annotation_display>.
The <annotations> element, introduced in MAT 3.1, provides a more concise and differently organized mechanism for defining annotations and their display elements, in comparison with the legacy <annotation_set_descriptors> element and its children, along with the <annotation_display> element and (most of) its children. For a comparison of the two mechanisms, see the use cases here. The <annotations> element is described in detail elsewhere.
It's possible to combine the two methods. In that case, the processing order is first, <annotation_set_descriptors>, then <annotations>, then <annotation_display>.
When you run the MATScore engine, or
produce a visual comparison of
annotations, MAT uses a set of heuristics to determine the best
pairing of annotations. You can affect this process using the
<similarity_profile> element.
Similarity profiles are not inherited.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
no | The name of the profile, for
use when creating comparison documents or scoring. If no
name is provided, this is the default profile for the task.
There can be only one unnamed profile. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<stratum> | no | yes | The comparison algorithm is stratified
(see the algorithm for
more details). You can use this element to define the
strata, rather than allowing them to be inferred. |
<tag_profile> | no | yes | There's a default similarity
profile for spanned and spanless annotations. If you want to
declare your own profile explicitly, you can do that with
this element. |
The comparison algorithm is stratified (see the algorithm for more details). You can use this element to define the strata, rather than allowing them to be inferred.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
true_labels | a comma-separated string of
labels |
yes | The labels in this stratum.
Note that these labels must be true labels, not effective
labels. You may specify all the labels in an annotation set
by listing "set:" + the set name as one of the elements in
the value, e.g., "set:content". |
There's a default similarity profile for spanned and spanless
annotations. If you want to declare your own profile explicitly,
you can do that with this element. See the algorithm for details on how to
use these.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
true_labels | a comma-separated string of
labels |
yes | The labels to which this profile applies. Note that these labels must be true labels, not effective labels. You may specify all the labels in an annotation set by listing "set:" + the set name as one of the elements in the value, e.g., "set:content". |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<attr_equivalences> |
no |
yes |
Equivalences for attributes among the various
labels in the profile. |
<dimension> | yes | yes | One dimension of the profile. |
Each profile consists of a number of dimensions, which define
some aspect of the annotation to use in comparison, along with the
method to be used for comparison and the relative weight of the
dimension. See the algorithm
for details about the various dimensions.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
yes | The name of the dimension.
See the algorithm for a list of legal names and their
interpretations. |
weight | a number |
yes | The relative weight of the
dimension. The weights of all the dimensions will be
normalized. |
param_digester_method | a Python function name |
no | In rare circumstances, the
dimension method may accept parameters (see <attr>
below) and these parameters may need to be interpreted
(e.g., "yes" -> True). The full name of the function,
including the module it's in, must be specified. |
aggregator_method | a Python function name |
no | If special handling is required for a dimension which has an aggregation value, this option allows you to declare the handler. The full name of the function, including the module it's in, must be specified. |
method | a string |
no | The method associated with
the dimension, if not the default method. See the algorithm
for a list of legal names. |
<attr> | a string | no | the <dimension> element supports arbitrary attribute-value pairs |
The true labels in your tag profile may vary in their attribute
names, but you may still want these attributes to be comparable.
This element allows you to declare your equivalences. See the algorithm for details about the
various dimensions.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
yes | The name of the equivalence.
See the algorithm for a list of legal names and their
interpretations. |
attrs | a comma-separated stering |
yes | All attributes which stand in
this equivalence. Each label in your profile must have at
least one of these attributes, and no attribute name can
appear more than once among the equivalences in the profile. |
When you run the MATScore engine,
you can control how the scored elements are aggregated,
decomposed, or filtered in the scoring output. See the algorithm for details on how to
use this.
Score profiles are not inherited.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
no | The name of the profile, for use when scoring. If no name is provided, this is the default profile for the task. There can be only one unnamed profile. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<aggregation> | no | yes | A set of labels to aggregate
as a separate entry. |
<attr_decomposition> | no | yes | An attribute-based
decomposition of particular labels to report as a separate
entry. |
<partition_decomposition> | no | yes | A function-based
decomposition of particular labels to report as a separate
entry. |
<label_limitation> |
no |
no |
A list of labels to restrict the overall
reporting to. |
<attrs_alone> |
no |
no |
A set of labels to for which independent
scores for each of the attributes are reported. |
The scorer will pair all annotations which are not specified as
being ignored. Sometimes, you might need to pair some annotations
as part of the scoring process (let's say they're arguments of
relations, for instance), but you don't want them in the final
output, even though you can't ignore them. You can use this
element to provide that filter.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
true_labels | a comma-separated string of
labels |
yes | Only these true labels (and the effective labels that are defined on them) will be included in the scoring output. You may specify all the labels in an annotation set by listing "set:" + the set name as one of the elements in the value, e.g., "set:content". |
Sometimes, you might want to know what the scores of the
individual attributes for a label are, if, for instance, you have
a processing step which adds attribute values. This element
instructs the scorer to produce these individual scores, for all
pairs of annotations bearing the specified labels. These scores
are produced for each aggregation and decomposition in which the
relevant annotations appear.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
true_labels | a comma-separated string of
labels |
yes | Independent scores for attributes alone will be produced for these labels. You may specify all the labels in an annotation set by listing "set:" + the set name as one of the elements in the value, e.g., "set:content". |
Under normal circumstances, annotations are aggregated per
document and per run by effective label (if available) or true
label, or by equivalence classes passed to MATScore, and then all
together into a single heap. You can add other aggregations of
true labels using this element.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
yes | The name of the aggregation
as it will appear in the output spreadsheet |
true_labels | a comma-separated string of
labels |
yes | The true labels in this aggregation. You may specify all the labels in an annotation set by listing "set:" + the set name as one of the elements in the value, e.g., "set:content". |
Under normal circumstances, the only way to decompose true labels
in the score output is by effective label. If you want to
decompose them by a particular attribute (e.g., you want to see
the score for ENAMEXes when type = NOM), you can use this element.
Decompositions can overlap with each other.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
true_labels | a comma-separated string of
labels |
yes | The true labels to which this decomposition applies. You may specify all the labels in an annotation set by listing "set:" + the set name as one of the elements in the value, e.g., "set:content". |
attrs | a comma-separated string of
attrs |
yes | The names of attributes
defined for all the listed labels. There will be a separate
decomposition for each tuple of values for these attrs.
The name of the decomposition in the score output will
be <attr1>=<val1> <attr2>=<val2>... |
Under normal circumstances, the only way to decompose true labels
in the score output is by effective label. If you want to
decompose them by a Python function, you can use this element.
Decompositions can overlap with each other.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
true_labels | a comma-separated string of
labels |
yes | The true labels to which this decomposition applies. You may specify all the labels in an annotation set by listing "set:" + the set name as one of the elements in the value, e.g., "set:content". |
method | a Python function name |
yes | This function must take a single argument, which will be an annotation, and return a value. For instance, if you're evaluating a geotagger, and the tagger provides a country attribute for the location, and you want to decompose location scores by US and non-US, you'd define a function which returns "US" if the country attribute is "US", and "non-US" otherwise. The full name of the function, including the module it's in, must be specified. The name of the decomposition in the score output will be <bare function name>=<val>. |