Annotations

The basic unit of document enrichment in MAT is the annotation. There are two types of annotations in MAT: span(ned) annotations and spanless annotations. Span annotations are anchored to a particular contiguous span in the document, and make some implicit assertion about it; e.g., the span from character 10 to character 15 is a noun phrase. Spanless annotations are not anchored to a span, and are used to make assertions about the entire document, or to make assertions about other annotations; e.g., annotation 1 and annotation 2 refer to the same entity, or they stand in some relation to each other.

Annotations have labels, which are strings. Annotations can also have attributes, whose values are restricted to particular types. Your task maintainer will define all your annotations, attributes, and attribute value types and restrictions for you; each task defines the annotations and attributes available for that task. You'll learn more about tasks in a minute.

While you don't need to know most of the details of how annotations are constructed or defined, you do need to know that among the possible attribute value types are "annotation" and "list or set of annotations"; in other words, the way MAT implements relation, event and coreference annotation is via attributes whose values are other annotations. The annotations which "host" these annotation-valued attributes may be spanned or spanless. You've already seen some examples of some of these in tutorial 7, and you'll see UI examples in more detail later. If you want all the gory details, you can look here and here.

In most circumstances, the name that you'll be given or shown for an annotation is the annotation's label; e.g., if you're applying or looking at a PERSON annotation, the label of that annotation will be "PERSON". However, in some cases, your task maintainer will choose to make this notional label an effective label, which corresponds to some combination of annotation label + attribute value (e.g., ENAMEX type="PER"). As the annotator, you'll have access to to the actual label and attribute/value information in the UI, but you'll be shown the effective label in the relevant circumstances (e.g., when you're choosing an annotation to create from your annotation menu in the UI).

A subset of the annotations that MAT can define can be processed by jCarafe, the engine that MAT is delivered with.

The jCarafe conditional random field engine can be used to build models for what we'll call simple span annotations: span annotations which have either no attributes, or a single attribute/value pair which defines the effective label.
The jCarafe maximum-entropy classifier can be used to build models for what we'll call fixed-value attributes: attributes whose values must be drawn from an explicit list of possible values.

The jCarafe engine can be used with such annotations to perform the tag-a-little, learn-a-little loop, run experiments, etc. For annotations of greater complexity (i.e., spanless annotations, or annotations with free-text attributes), jCarafe can't currently add such annotations automatically. This restriction affects the MAT processing engine and experiment harness, and it means that all your more complex attributes and annotations will have to be added by hand (unless your task maintainer has managed to configure MAT to use an engine which can handle more complex annotations, which is not simple - so assume that unless otherwise informed, these more complex steps are hand-only). For more details about what MAT "out of the box" can and can't do with complex annotations, see here.

There's one more concept you'll need. MAT divides the annotations in a task into different sets, and assigns these sets to categories. There are several reserved categories in MAT:

the token category, whose annotations mark the word boundaries in your documents
the zone category, whose annotations mark the regions of your document that you can add annotations to
the admin category, whose members are controlled by MAT and contain administrative information about the document regions (e.g., who annotated them)

All other categories are established by your task, and for convenience, we'll call these content annotations; these are the annotations you'll be hand-annotating, by and large.

If this documentation ever talks about annotations without specifying a category, it's almost certainly talking about content annotations. As a user, you really won't need to know much, if anything, about the other categories.