The basic unit of document enrichment in MAT is the annotation. There are two
types of annotations in MAT: span(ned)
annotations and spanless
annotations. Span annotations are anchored to a
particular contiguous span in the document, and make some implicit
assertion about it; e.g., the span from character 10 to character
15 is a noun phrase. Spanless annotations are not anchored to a
span, and are used to make assertions about the entire document,
or to make assertions about other annotations; e.g., annotation 1
and annotation 2 refer to the same entity, or they stand in some
relation to each other.
Annotations have labels, which are strings. Annotations can also
have attributes, whose values are restricted to particular types.
Your task maintainer will define all your annotations, attributes,
and attribute value types and restrictions for you; each task
defines the annotations and attributes available for that task.
You'll learn more about tasks in
a minute.
While you don't need to know most of the details of how
annotations are constructed or defined, you do need to
know that among the possible attribute value types are
"annotation" and "list or set of annotations"; in other words, the
way MAT implements relation, event and coreference annotation is
via attributes whose values are other annotations. The annotations
which "host" these annotation-valued attributes may be spanned or
spanless. You've already seen some examples of some of these in tutorial 7, and you'll see UI
examples in more detail later. If you want all the
gory details, you can look here
and here.
In most circumstances, the name that you'll be given or shown for
an annotation is the annotation's label; e.g., if you're applying
or looking at a PERSON annotation, the label of that annotation
will be "PERSON". However, in some cases, your task maintainer
will choose to make this notional label an effective label,
which corresponds to some combination of annotation label +
attribute value (e.g., ENAMEX type="PER"). As the annotator,
you'll have access to to the actual label and attribute/value
information in the UI, but you'll be shown the effective label in
the relevant circumstances (e.g., when you're choosing an
annotation to create from your annotation menu in the UI).
A subset of the annotations that MAT can define can be processed
by jCarafe, the engine that MAT
is delivered with.
The jCarafe engine can be used with such annotations to perform the tag-a-little, learn-a-little loop, run experiments, etc. For annotations of greater complexity (i.e., spanless annotations, or annotations with free-text attributes), jCarafe can't currently add such annotations automatically. This restriction affects the MAT processing engine and experiment harness, and it means that all your more complex attributes and annotations will have to be added by hand (unless your task maintainer has managed to configure MAT to use an engine which can handle more complex annotations, which is not simple - so assume that unless otherwise informed, these more complex steps are hand-only). For more details about what MAT "out of the box" can and can't do with complex annotations, see here.
There's one more concept you'll need. MAT divides the annotations
in a task into different sets, and assigns these sets to
categories. There are several reserved categories in MAT:
All other categories are established by your task, and for
convenience, we'll call these content annotations; these
are the annotations you'll be hand-annotating, by and large.
If this documentation ever talks about annotations without
specifying a category, it's almost certainly talking about content
annotations. As a user, you really won't need to know much, if
anything, about the other categories.