If you upgrade your version of Java or Python, and you want MAT
to use the new version, the easiest thing to do is rerun the
installer. See the installation instructions for your platform (Unix, MacOS
or Windows native).
The pip module is a small, almost completely self-contained
module (about 350K compressed, relying only on the third-party
Python munkres library) which is intended for people working with
MAT documents outside the bounds of the full application. The pip
module implements the core document model, task, workflow,
scoring, workspace and experiment capabilities of MAT. The
following are not included:
The various steps involved in this incremental, tag-a-little,
learn-a-little loop deserve their own page, and are described here.
You don't need to train a model or automatically tag data to use
MAT. You can use the MAT UI as a pure hand annotation tool. Tutorial 1 covers this process pretty
well, or, if your task involves more than simple span annotations, check
out tutorial 7.
Here's a summary with pointers to more of the details. First, create your task and install it, or use a task
you've already defined. Make sure that this task has a hand annotation step, as
illustrated in the sample 'Named
Entity' task. Then, start the Web
server and load the MAT UI. You
can either manually load and save individual
documents on your local machine, or you can set up a workspace and access it in the UI.
To score simple span annotations (that is, annotations whose main
label is the distinguishing element) against each other, you don't
even need to have a task.
You can use the scorer with the
--content_annotations option.
If you're comparing multiple annotation tools, you can run them
outside of MAT and ensure that their output can be read by MAT by
identifying or creating a MAT reader.
Alternatively, you can write a wrapper for the annotation tool,
and create workflows and model
configurations in task.xml which use the tool (this is
harder, and not really documented).
If you're comparing multiple annotators, you can have them
annotate the files in file
mode and then compare them using the scorer, or you can use
the assignment capability in the workspaces
to have the files multiply-annotated in the context of a
workspace. In the future, you'll be able to score and reconcile
these multiply-annotated files directly in the workspace.
The MAT document visualization and annotation tool is available
as a standalone utility
which does not require the MATWeb
server.
If you have documents for which you don't have a task, the first
step is to ensure that you can read them. If the documents aren't
in a known format, you'll
need to write a reader for them (or
convert them, outside MAT, to a format that MAT knows). Once MAT
can read the documents, you can use MATReport
to summarize the annotations which are present in the documents,
and you can use the --create_task option to create a task, which
you can then install. At
that point, you can start MATWeb and
load the documents into the UI. Alternatively, you can load the
documents into the UI while inferring
a task.
MAT is an intimidatingly configurable tool, in many respects.
We've tried to exemplify some of the most common use cases for
various aspects of the tool.
You can find a succinct comparison of file and workspace modes here.
Yes. See the --lexicon_dir option to the jCarafe training
engine. But you should be aware of the case
sensitivity of feature specifications.
Because the IP address of the machine your web browser is running
on is different than the IP address of the machine that MATWeb is running on. This is the default
behavior of MATWeb. You can override this behavior by providing
the --allow_remote_workspace_access option to MATWeb when you
start it up.
If you've defined your annotations
in your task.xml file, and you're
sure your document contains annotations, but you can't see them in
the UI, the most likely cause is that you haven't assigned any display
features to them.
If you have lots of annotations (including tokens), the document
redisplay can sometimes take a while. We've found that when full
display redraw is triggered, documents that have more than 1000
annotations will take a visible amount of time to redisplay. We've
attempted to optimize the panel redraw, but full redisplay is
still triggered in a number of circumstances, including when
documents are loaded or returned from a server-side annotation
process, and sometimes when the annotation window width is
changed.
Sometimes, in the UI, in Firefox, you'll notice a URL in the
bottom left corner of the UI; if you move your mouse over it, it
moves to the bottom right corner. This is an artifact of the
particular way that the Yahoo! UI toolkit implements the tabbing
that the MAT UI uses. If this URL annoys you, you can almost
always make it go away by clicking on one of the tab labels in the
UI. We've done our best to eliminate this floating URL, but we
haven't been able to work around all of the places it occurs.