Upgrade notes

If you've received a previous version of MAT, this page contains instructions on how to upgrade to the new version.

Upgrading from version 3.2 to version 3.3

Version 3.3 is almost completely backward compatible with version 3.2. Its primary goal is to future-proof MAT.

New features

Python 3.6+ compatibility. MAT was originally written in Python 2, and support for Python 2 ended on January 2020. Version 3.3 is intended to be fully compatible with both Python 2.x (2.7 and later) and Python 3.6 and later.
Java 9+ compatibility. The Java ecosystem has changed significantly since the release of Java 8. OpenJDK is replacing Oracle Java as the freely available version; the way version numbers are announced by the Java executable is changing; and the new Java module system has introduced a number of complexities in running applications which use Java serialization (such as the way jCarafe saves its models). The MAT installer and Java controller have been updated to deal with these issues, at least up through Java 15.
Compatibility with Python pip. MAT was developed before the days of package managers, and multiple people have asked to use Python in the context of the Python package manager, pip. Version 3.3 is designed to support this goal.
Web server restart on Windows. Version 3.3 features an entirely new Web service infrastructure based on the Python bottle package. This package handles server restart more generally than the original CherryPy package, and as a result, the Web server "restart" command finally allows you to restart the Web server on Windows (and use it in automated restart configurations).
Classification support. MAT now includes previously private enhancements which support classification engines as model builders and workflow steps, the primary use case being learning the values of choice and boolean attributes. The jCarafe maximum entropy classifier is exposed as a useable example.
More powerful annotation replacement in the UI. In version 3.3, when you replace an annotation which is itself the value of an annotation-valued attribute, the attributes of the replaced annotation are copied to the replacing annotation when possible.
Minor improvements to task definitions. Briefly:

MAT's new-style annotation set declarations now include the <annotation_set> element for explicit control of annotation set properties. It is no longer necessary to combine legacy and new-style annotation elements to achieve this.
It is no longer necessary to specify the category of a non-default annotation set after the first time the annotation set is referenced.
When declaring the fillers for annotation-valued attributes, you may now use the "set:" notation to specify all the labels of an annotation set, as long as you don't specify any additional attribute-value pairs.
The new-style annotation set declarations now support UI styling of choice attributes to support classification annotation, and provide the <positive> and <negative> sub-elements of <boolean> attributes to control the UI properties of these attributes.
The new <doc_enhancement> element exposes long-standing but previously undocumented functionality to add documentation for your task to the MAT HTML documentation.

The Python package structure has changed

In order to make the core of MAT compatible with pip, the Python library has been split in two. The vast majority of the library remains in src/MAT/lib/mat/python/MAT, and the remainder, dedicated specifically to jCarafe bindings and the MAT Web server, is found in src/MAT/lib/mat/python/MATApplication. Developers will almost never need this latter library.

Old utilities have been removed

MATUpdateTaskXML and MATUpdateWorkspace2To3 have been removed. They were both for use in migrating from MAT 2.x to MAT 3.x. MAT 3.x is now five years old, and we no longer test these conversion utilities. if you haven't migrated from MAT 2.x, you'll have to download MAT 3.1 or MAT 3.2 and use the conversion utilities in those releases.

Internal subprocess debug management has changed

Previous to 3.3, there was a variable in the ExecutionContext module called _SUBPROCESS_DEBUG which was used internally to control debug levels for subprocess calls. It has been removed in favor of a lazier way of accessing the value which respects live changes to the MAT configuration.

Attributes in task.xml are now Unicode

Previous to 3.3, in most situations the XML text and attribute values in the task.xml file were required to be ASCII-compatible; the only exception was the tokenless_autotag_delimiters attribute. This restriction was always unnecessary, and the migration to Python 3 has compelled its removal. All such text and values in task.xml in MAT 3.3 can take advantage of all of Unicode (with the caveat that certain nonlinguistic Unicode characters might cause problems).

Python 3 counts offsets differently than Python 2 for Unicode characters outside the Basic Multilingual Plane

The new Unicode issues discussion describes the basic problem, which has lurked within MAT for a very long time, but will be exacerbated by the differences between Python 2 and Python 3. If you have annotated documents which contain characters outside the Unicode Basic Multilingual Plane, you will likely not be able to use MAT.

Boolean attribute keyboard accelerators are no longer automatically declared

Prior to 3.3, any boolean attributes were automatically provided with keyboard accelerators because there was no way of specifying them. In 3.3, the <positive> and <negative> sub-elements of the <boolean> attribute declaration in the new-style annotation set declarations now provide hooks for declaring these accelerators explicitly. This feature has not been backported to the legacy declaration method, so in 3.3, if you want keyboard accelerators for boolean attributes, you must upgrade to the new-style declaration format.

The JSON representation of tasks has changed slightly

If you use the MAT standalone viewer, and you've embedded a JSON task description in your application, or if you use the MATAnnotationInfoToJSON utility to generate the JSON task description for this or any other reason, this representation has changed a tiny bit in version 3.3; in conjunction with the previously-described change in the declaration of boolean attribute keyboard accelerators, the JSON representation of accelerators for all attributes has been changed. If you refer to attribute accelerators, or have boolean attributes in your task, you must regenerate the JSON.

Upgrading from version 3.1 to version 3.2

Version 3.2 is almost completely backward compatible with version 3.1. There are a number of new features, and a few non-backward-compatible changes. The most substantial of these is that we've upgraded the version of jCarafe.

New features

Standalone UI improvements: The standalone UI now supports annotation tables and customizations to control tokenless autotagging.
Enhanced autotagging: There's now a new UI setting to support enhanced autotagging (batch delete, batch replace).
Better JSON conversion for tasks: MATAnnotationInfoToJSON has been enhanced to export more features of the task structure.
Better UI customization for tasks: A number of internal reorganizations now better support customizing the UI for specific MAT tasks.
Annotation table enhancements: The annotation table in the UI now has a UI setting to add a summary table for spanned and spanless annotations. If added, these summaries are now the default tables selected. In addition, it's now possible to specify custom sorting functions for annotation table columns via the task.
Attribute annotation enhancements: You can now change the value of boolean attributes from the annotation popup menu in the same way you can change the value of choice attributes.
Better tokenless swipe control: You can now use the expandTokenlessSwipe and trimTokenlessSwipe UI settings to trim or expand your swipe in tokenless annotation according to the tokenless autotag delimiters.
Better MATReport control: You can now use the --show_spanned_text_for_annotation_attribute_values to show first-level annotation attribute value text directly in the annotation description, rather than having to look it up elsewhere in the report.
Better MATWeb control: You can now use the --web_settings option to provide a MATWeb startup configuration. You can also save a default startup configuration in your MAT distribution.
Reconciliation enhancements: Hand annotation is fully enabled for chosen annotations in the reconciliation pane. It's now possible to arbitrarily modify chosen annotations, including replacing them with other labels and deleting them, and to add new annotations appropriate to the current reconciliation stratum.
More options for annotation display: You can now use the new rendering_style attribute in the annotation display in task.xml to define background spans, which are never stacked and can never appear in front of spans that aren't background spans. This setting is intended to support display of longer structural annotations like paragraphs, sentences, and sections which provide a context for the more specific content annotations. You can also now define menu gestures in the newer annotation definition method using the <d_gesture> element.
More options for document conversion: The document conversion XML syntax has been expanded to support regular expressions in more elements, and to add the ability to use regular expression backreferences in the mapping targets. We've also added the <discard_annot> operator to the <values> context, and cleaned up and expanded the handling of attribute value matching, and enhanced the documentation. Finally, we've also added the <copy_metadata> operation (most of you will never, ever need it).
Documentation search: There's now a search facility for the documentation available when the documentation is viewed via the MATWeb server.
Slightly better custom attribute editor control. Before 3.2, you could specify a custom attribute editor in the UI as updating multiple attributes, but you couldn't specify which attributes it was updating, with the result that unsetting the attribute only cleared one of the attributes that were set. We've added the custom_editor_multiattributes attribute in task.xml to address this issue.
Slightly better custom gesture control. You can now use the writeable_only and attribute_writeable_only attributes in task.xml to control when custom menu gestures associated with a label appear.

jCarafe version has changed

We've updated MAT to the very latest version of jCarafe. You'll need to rebuild your models.

The workspace database has changed (a bit)

In MAT 3.0, the internal structure of workspaces was completely reorganized to fully support workflows more transparently. However, we didn't fully update the way documents in workspaces were assigned to users; unassigned documents could illegitimately be assigned to users after they'd been modified if they had just advanced to a new workspace step, and previously assigned documents couldn't be assigned at all, because that portion of the code hadn't been updated. In MAT 3.2, we've fixed these errors by extending one of the workspace database tables to keep track of which imported workspace files are entirely pristine. When you open a pre-3.2 workspace in MAT 3.2, the workspace database will be updated automatically, and it will no longer be useable in MAT versions before 3.2.

Standalone viewer API has changed

The getDocument() method of the viewer API has been modified to match the behavior of other document panels. This was necessary in order to support annotation tables in the standalone UI. To retrieve the "bare" document which is equivalent to the previous result of getDocument(), use getBareDocument().

Swiping in tokenized text has improved

In previous versions, a swipe in tokenized text in the UI expanded to the nearest token boundaries on the left and right, but did not contract to the nearest boundaries (i.e., if the user had selected peripheral whitespace). We regard this as a bug, and it has been fixed.

Task installation is more conservative

The MATManagePluginDirs utility is now more thorough in how it inspects tasks during validation and installation. As a result, it will now refuse to install tasks whose Python customizations raise import errors during installation, even if these errors would not be raised during normal execution. Previously, installing such a task would report the errors, but still succeed in installing the task, which could be confusing to users.

Tabbed terminal spawning in Web server has changed

Previous versions of MAT were optionally distributed with tabbed terminal packages to provide a tabbed view of the MAT Web server, where the various server logs were displayed separately from the Web server command loop. These tabbed terminal packages were old, and we had not tested MAT with them in a very long time. In version 3.2, we've stopped distributing these packages. Instead, the --spawn_tabbed_terminal option of MATWeb now accepts an argument which is a 4-argument command which the user can provide to run his or her own tabbed terminal of choice. We've provided an example of such a command for the Unix GNOME windowing package in web/examples/gnome_tabbed_web_server_terminal.sh.

Slightly improved, slightly changed UI logging

For some reason, annotation offsets were not often recorded in the UI log. Although we intend the logs to remain anonymous, this information doesn't really compromise the anonymity; little can be gleaned from the location of an annotation gesture beyond the minimum length of the file being annotated. Furthermore, this information was already being recorded for annotation gestures which modify the extent of an annotation. In 3.2, if a gesture affects an annotation, the offsets are recorded in the UI log.

In addition, the gesture_type column in the UI log has been changed to gesture_method, and the modify_annotation event has been changed to modify_attribute. The remove_annotation_failed action has been changed to remove_annotation_failure, and the parameters have been changed to align with other _failure actions.

A new select_tab action has been added to the UI log.

Reconciliation logging has been completely reorganized. All annotation creation and modification gestures are now captured normally in the log, as well as vote actions.

Reconciliation interface has changed slightly

The buttons in the reconciliation table have been renamed and reorganized.

--subprocess_statistics common option has been removed

The third-party package which supported the --subprocess_statistics common command-line option only ever worked on Linux, and was rarely included in the distribution. It has been removed for simplicity and future maintainability.

Distribution layout has changed slightly

Third-party dependencies are no longer distributed unzipped in the src/ subdirectory; they're now found in the third_party directory, as zips, and unpacked in third_party/install during the installation process. The MANIFEST file has also been removed; the information is now inferred from the distribution during the installation process. This change should be largely invisible to the user; the only consequence is that the MAT 3.2 redistribute.py utility can only now be used with MAT 3.2 or later, and the version of the utility distributed in previous MAT releases can only be used with releases previous to MAT 3.2.

Upgrading from version 3.0 to version 3.1

Version 3.1 is completely backward compatible with version 3.0. Plus, there are a number of new features you can enjoy.

New features

Guided navigation mode: It's now possible to navigate through annotation editors in a guided mode almost entirely from the keyboard. This mode will be of interest to experienced annotators interested in accelerating their pace of annotation.
More concise alternative annotation declarations: The original annotation declarations in MAT were organized to expose the power available in the annotation mechanisms in MAT; but they don't make the easy things easy. We've created an alternative declaration system which renders the simplest cases significantly easier to define. This system is exemplified in the documentation on creating a new task and about the sample tasks.
Crossvalidation in the experiment engine: It's now possible to use the experiment engine for crossvalidation experiments.
Keyboard accelerators for attributes: It is now possible to define keyboard accelerators for attributes that can be set via the annotation popup menu as well as for labels.
Options to save, load and reuse UI settings: It's now possible to save your MAT UI settings as browser cookies, and as XML files for use when you start up the server, or for later loading from within the UI.
Inferred tasks in the UI: It's now possible to infer a UI task from a document, edit the task, augment it by loading other documents using that same task, and then save the task as an XML file for later editing and installation with MATManagePluginDirs. This is useful when you have a reader for a document format, but you don't know what annotations are in the document. This capability has been available for a while in MATReport; the version in the UI is more flexible and interactive.
Overlapping annotations are now always stacked: When hand annotation is available in the MAT UI, overlapping annotations are stacked vertically, rather than layered on top of (and thus obscuring) one another. Previously, however, when hand annotation was not available, overlapping annotations were not stacked. Now, we've introduced a UI setting to control whether annotations are stacked or not when hand annotation is not available, and we've changed the default behavior to stack the annotations. In addition to being visually clearer, this default eliminates document redraw when transitioning in and out of hand annotatable steps.
Overlap management: In spite of the new "always stacked" default, there are still a number of situations in the MAT UI where annotations overlap, including in the reconciliation and comparison views. Before MAT 3.1, there was no good way to control the overlap order; the label definition order determined it, but the label definition order also determined the order of labels in the annotation popup menu. Furthermore, the MAT UI had no access to this ordering information. In MAT 3.1, the overlap order is now explicitly definable, with the new "overlap_rank" attribute of the annotation display configuration, and the UI explicitly tracks this information. This overlap management supports a number of new capabilities:

the annotation popup menu now has "Bring to front" and "Bring <annot> to front" operations to manually control overlap
the UI no longer complains about ambiguous annotations when you click on an overlapping annotation, but rather presents an annotation popup for the frontmost annotation and allows you to bring the others forward
when you hover over an annotation in the UI, the frontmost annotation description is now presented in boldface

More match/nonmatch information in the UI comparison view: Before MAT 3.1, the non-reference annotations in the spanless sidebar were decorated with a small green (match) or red (nonmatch) square at the left edge of the annotation bar, but this facility was not available in the main palette. This was usually not a problem, because typically the label and/or extent of the nonmatching non-reference would differ, but when only attributes differed, there was no good feedback to distinguish it from the case where the non-reference annotation matched. In MAT 3.1, the feedback is enhanced in two ways: first, when you hover over a non-reference annotation, the mouseover information will show whether the annotation matches the reference or not, and second, the iconography from the spanless sidebar has been extended to the main palette (thanks, in part, to the new overlap management).

Upgrading from version 2.0 to version 3.0

New features

Multiple content annotation sets: MAT added the concept of annotation categories and sets in 2.0. In 3.0, MAT allows you to segregate the annotations you add into different annotation sets, so you can add them with different engines, partition your hand annotation task, etc. This is supported by MAT's completely reorganized task.xml specification.
Expanded support for attribute-only annotation sets: In 2.0, MAT added the concept of annotation sets which define attributes of labels which are defined in other annotation sets. In 3.0, we've expanded support for this feature, especially in MATScore. In particular, we've added the <attrs_alone> element to score profiles, so you can see how individual attribute matches and clashes contribute to the overall scores.
Multiple, dynamically derived workspace configurations: In 2.0, MAT allowed only one workspace configuration per task. In 3.0, MAT can create a workspace out of any workflow which has at least one hand-annotatable step.
Multiple languages per task: In 2.0, the language-relevant task features (e.g., text direction, default models) were specified in such a way that only one language could be used per task at a time. In 3.0, languages are cleanly segregated and declared, so that you can use multiple languages per task.
More flexibility in the experiment engine: you can now perform runs in the experiment engine against multiple trained models for multiple workflow steps, or without any models at all (e.g., for baseline comparisons).
Relation reconciliation: In 2.0, MAT featured a reconciliation tool which was based on a limited approach to reconciliation, which did not support spanless annotations and relations. MAT's 3.0 reconciliation tool is based on its scoring and comparison algorithm, which allows you to reconcile virtually any annotation type.
Reconciliation and human review in workspaces: MAT now supports reconciliation and human review in workspaces.
Crossvalidation as a workspace reconciliation input: MAT now supports crossvalidation as an input to reconciliation in workspaces.
All workspace files can be opened: In 2.0, you could not open a file in workspace mode in the UI if it was not assigned to you, or locked by another user. In 3.0, these files can be opened read-only.
Dynamic tagger services: In 2.0, the MATWeb server determined its available taggers at startup; subsequent model builds were not recognized. In 3.0, the MATWeb server makes this determination at the time of the tagging request.
Automatic task generation in the reporter: The MATReport tool now supports the option of creating a task to view the documents you're reporting on. This option is especially useful when you're exploring sets of annotated documents which you haven't annotated with MAT.
Faster annotation pane updates: In 2.0, the entire annotation pane was redrawn when an annotation was added, removed, or modified. In 3.0, the redraw has been localized so that only the affected regions are redrawn. This leads to significant speed upgrades when annotating large files.
Legend controls: The annotation legend in the UI has been augmented with menu controls which allow you to deactivate active annotations or annotation sets, render them invisible, or select single labels for immediate annotation, skipping the annotation popup menu.
UI settings: The UI now features a wide range of settings which allow you to customize its behavior. These settings can be modified from within the UI, and the default settings can be passed as an XML file to MATWeb at startup.
UI annotation menu gestures: you can now augment the annotation edit menu with URLs (e.g., search Google with this span) and custom Javascript actions.
Changing choice attributes from the annotation edit menu: the annotation edit menu now allows you to modify choice attributes directly, rather than having to open the annotation editor itself.
Support for CherryPy 3.2: for those users who receive a version of MAT without its third-party dependencies, support for CherryPy 3.2 has been added.
Global default temp directory: MAT 3.0 allows you to specify, in MAT's global configuration file, a default directory to use for temporary files instead of /tmp.

Completely reorganized task.xml

In order to support the expanded task configuration features, we've completely reorganized the task.xml file. We've introduced the concept of an engine; completely reworked the way workspaces are configured and built; and localized optional and obligatory pretagging in the definition of steps. You can read more about the new task organization here and here. The MATUpdateTaskXML tool is designed to do most, if not all, of this work for you (and should tell you what it can't do as it tries to do it). Your first step in updating to 3.0 should be to run this tool before installing your (updated) task.

Workflows are now not undoable by default

One of the confounding aspects of the 2.0 implementation of workflows and steps was that the task required a global "undo" order, in order to ensure that when steps were undone in a workflow, all appropriate steps in the task were undone (e.g., if you undid tag and zone in a workflow which doesn't contain tokenization, and some workflow in the task contained tokenization between zone and tag, tokenization was undone as well). This global undo order was impossible to maintain in 3.0, and as a result, it has been abandoned. If you undo steps in a workflow, only those steps will be undone, and as a result, your documents can end up in unusual states (e.g., tokenized but not zoned). In order to compensate for this issue, in 3.0, workflows, by default, are not undoable; the "retreat" buttons in the UI will not be present, for instance. You can specify workflows as undoable using the new "undoable" attribute of the <workflow> element in your task.xml file, but we encourage you to use it sparingly; you should only enable this feature for workflows which support all your tagging steps (content and otherwise).

Workspace database and structure changes

Workspaces now explicitly record the workflow they apply and the language they're supporting. In addition, workspaces have a new "review" folder, for human review, and the contents and metadata associated with reconciliation folders are completely different. As a result, your 2.0 workspaces must be updated to 3.0 (after you've updated your task) using the MATUpdateWorkspace2To3 tool.

Demo has been removed

In 2.0, MAT had a demo configuration capability, which we decided we couldn't afford to maintain. It's been removed in 3.0.

UI file mode advancement buttons have changed, and the "Mark gold" is gone

In 2.0, file mode featured two buttons: a forward arrow to complete the current step, and a backward arrow to undo the current step. In 3.0, we've clarified and sorted out what these operations do, and how they interact with marking gold. MAT now features four types of annotation steps, which are specified in your task.xml file:

hand steps, which only allow hand annotation
auto steps, which only allow automated annotation (e.g., zoners and tokenizers)
auto-with-correction steps, which require automated annotation followed by optional hand correction
mixed steps, which support optional automated annotation plus hand annotation

The available advance/retreat buttons vary depending on which step you're currently in. A hand icon refers to hand annotation, a gear icon refers to automated annotation, a right arrow refers to marking the step gold, and a left arrow indicates retreating; coupled with a hand icon, the left arrow indicates retreating into the most recent hand annotation phase, rather than undoing the current or previous step. You can find more details here.

The meaning of SEGMENT status has changed, and SEGMENT annotations have changed

In 2.0, MAT introduced SEGMENT annotations, an administrative annotation type which tracks annotation progress. These SEGMENT annotations referred to the document in general; e.g., a "human gold" status indicated that the document was marked gold. In 3.0, these statuses refer to particular annotation sets; so when you mark a document gold in an annotation step (and you can have multiple hand-annotatable steps), you're marking gold the sets associated with the step. In other words, MAT is now tracking (properly, we think) the annotation status of your annotation sets. To support this, each SEGMENT annotation has a "set" attribute, indicating what annotation set it refers to.

Administrative information in MAT documents has changed

The change to SEGMENT annotations is only one way the administrative information in MAT-JSON documents has changed. Another way is that MAT-JSON documents now record their overall progress in terms of the annotation sets that have been added, rather than the workflow steps that have been applied. When 2.0 MAT-JSON documents are read into a 3.0 tool, all this administrative information is automatically updated. As a result, documents saved in 3.0 cannot be used in 2.0.

MATEngine options have changed

As part of the various 3.0 changes, the options to MATEngine have changed slightly:

The --mark_gold option to the zone step (misnamed because it actually marked documents reconciled), has been removed, and MATEngine itself now has two new options, --mark_gold and --mark_reconciled, each of which takes a comma-separated sequence of step names as arguments. These options can be used to mark steps gold or reconciled, regardless of whether the step is applied via the --steps option.
MATEngine now accepts a --language option.
MATEngine now accepts a --fresh_task option, which clears out all the administrative information in the document and infers it directly from the content in the context of the specified task. You can use this option if, e.g., you're reusing annotation labels between tasks, but the annotation sets to which these labels belong differs between tasks.

As part of the change in administrative information, 2.0 administrative information that can't be updated when the document is read is discarded. If, for instance, you've edited your task to split up your content annotations into multiple sets (say, span annotations and relation annotations), the SEGMENT statuses can't be updated consistenly, and "human gold" or "reconciled" statuses will be discarded. You can use the new --mark_gold and --mark_reconciled options of MATEngine to fix this.

--tagger_local and --tagger_model have changed

Because workflows can now have multiple content annotation steps, the --tagger_local and --tagger_model flags have been replaced. These flags can now be specified as --<step_name>_local and --<step_name>_model, where <step_name> is the name of the step. E.g., if one of your tag steps is named "carafe_tag", these flags would be --carafe_tag_local and --carafe_tag_model. In the context of declaring <run_settings> in the context of a <step> in your <workflow> element in task.xml, these flags should be referenced as "local" and "model". The rule of thumb is: in the context of the engines, a prefix is required; in the context of a step, it is forbidden.

All jCarafe option names are now prefixed

Because you can now use jCarafe as a trainable tagger in multiple steps in your workflow, (almost) all the option names associated with the jCarafe tagger are now prefixed in the same way that "local" and "model" are. E.g., if you want to change the recall/precision balance for the carafe_tag step, you must now use --carafe_tag_prior_adjust instead of --prior_adjust in the context of the workflow or engine, or refer simple to "prior_adjust" in the context of declaring a workflow step.

Workspace operations have changed

Before MAT 2.0, workspaces had a "prep" operation, and "import" operation and an "autotag" operation. In 2.0, we removed the "prep" operation and folded it into the "import" operation. In 3.0, any human-annotatable workflow can serve as the basis for a workspace, and the goal of the workspace is to advance automatically to the next human-annotatable point. As a result:

The "autotag" operation has been removed.
The new "advance" operation moves documents to the next human-annotatable point. It advances through all auto steps, and applies the automated tagging phase of any mixed or auto-with-correction steps, if those steps can be applied (e.g., if they're not trainable, or if they're trainable and a model has been built).
The "markgold" and "import" operations will, by default, apply the automatic advancement. So unlike in 2.0, if a mixed step following a sequence of initial auto steps has a model available, that model will be applied (i.e., the document will be pretagged) on import.
The "create" operation supports the --language option (in case the task supports more than one language), the --workspace_config option (in case the task contains more than one workspace configuration or hand-annotatable workflow, and no default is provided), and the --similarity_profile option (to specify the appropriate similarity parameters for reconciliation).
The "import" operation has replaced the --users option with the --status_user and --assign_to_users options, to disentangle the specification of users for administrative status updates and users for assigning documents. The --assign option has been removed. The --document_status option has been replaced by the --step_status option.
A number of operations ("import", "markgold", and others) now accept the --suppress_advancement and --defer_reconciliation options, to cancel some of the automatic advancement options applied by default.
The roles available to the "add_roles", "remove_roles" and "register_users" operations have changed, due to the change in how reconciliation works in the workspaces. The available roles are now "annotator" and "reviewer".
The "modelbuild" operation now accepts a --trainable_step option, which is required if the workflow contains more than one trainable step.
The "run_experiment" operation no longer accepts the --workflow option. The --tag_step option has been replaced by the --test_step option (obligatory if the workspace workflow has more than one trainable step). The --test_document_statuses option has been replaced by the --test_step_statuses option.
The "configure_reconciliation" and "submit_to_reconciliation" operations have been removed, since reconciliation is managed differently.

The workspaces now also feature a number of new operations to manage reconciliation and review, which you can learn about here.

Experiment XML and MATExperimentEngine have changed

In order to support the various new workflow and workspace features, the following elements of the experiment XML have changed:

The toplevel <experiment> element now has an optional "language" attribute, for those circumstances where the task supports multiple languages. MATExperimentEngine also supports the --language attribute, if you don't need or want to specify it directly in the XML.
The <build_settings> element no longer accepts the "model_class" attribute, and replaces it with the "trainable_step" attribute, which should be the name of a trainable step in your task. This attribute can be omitted if your task contains only one trainable step.
The <workspace_corpora> and <workspace_corpus> elements have changed. The <workspace_corpora> element may specify a "step" attribute, which must be specified if the workspace workflow contains more than one trainable step; and the "document_statuses" attribute has been replaced by "step_statuses".
The scoring phase of the experiment will produce scores only for the content annotations added or modified by the trainable step for which the experiment models were built.
The "model" attribute of the <run> element now supports a comma-separated sequence of model set names, so you can create experiments which test the application of multiple steps. The model sets must be trained on the same corpus and partition.

Unless you've used the model_class or <workspace_corpora> features in your experiment, you should not notice these changes in moving from 2.0 to 3.0.

Annotation attribute filling in the UI has been improved

When you fill the value of an annotation attribute in the UI, you have new, more streamlined options in 3.0. First, we've introduced the idea of an "active" annotation editor, if multiple annotation editor windows are open; from the annotation popup menu, you can now add annotations you click on as attribute values in the active annotation editor without returning to the editor window. Second, if your attribute is a set or a list, you can add multiple values without re-enabling the attribute for filling for each value.

Placement of spanless sidebar icons has improved

In 2.0, the default location for spanless annotations without any annotation-valued attributes of their own was at the top of the document. This placement turned out to be problematic for certain spanless annotations (e.g., those representing implicit argument fillers). In 3.0, spanless annotations which have no implicit span information, but are attributes of elements with implicit span information, are positioned next to the elements which point to them.

UI logging output has been changed

A few of the log entry names have been changed, and a number of obsolete entries have been removed.

distinguishing_attribute_for_equality has been removed

The distinguishing_attribute_for_equality attribute in the task.xml file was used pre-2.0 as an input to scoring, and in 2.0 as an input to reconciliation. In 3.0, it's been completely superseded by the similarity configurations, and has been removed.

Upgrading from version 1.3 to version 2.0

New features

All non-automated components now handle annotation attributes: Attributes can be strings, ints, floats, booleans, or other annotations, or set or list aggregations of these types. Strings and ints support choice lists; annotation attributes support type restrictions; ints and floats support range restrictions. All attributes other than annotations support default values.
All non-automated components now handle spanless annotations: These annotations have no direct anchor in the text, and can be used to model relations, coreference entities, and other elements.
Scorer now handles document-internal overlaps: In version 1.3, MATScore did not deal with documents which contained internal content annotation overlaps. In 2.0, we've implemented a sophisticated annotation matching algorithm which addresses this issue.
Scorer is now customizable in task.xml: You can declare which dimensions of annotations will be compared when annotations are compared; how those dimensions will be compared; and what the weight of each dimension is.
New transducer tool now available: In previous versions, if you simply wanted to convert documents from one format to another, you'd use MATEngine. The problem with MATEngine is that it requires a task; it only succeeds if all documents can be processed; and it requires a workflow. There's now a new tool, MATTransducer, which addresses all these issues. The transducer also supports a new XML-driven annotation conversion language.
All tools now support better temporary file management: In 2.0, every command-line tool which invokes a subprocess (e.g., the jCarafe tagger) now takes the --tmpdir_root and --preserve_tempfiles options, which gives you better control over debugging and the placement of temporary files created during processing.
MAT JSON format has been expanded: To support its expanded annotation and attribute model, MAT 2.0 now uses version 2 of the MAT JSON document format by default. All readers recognize previous versions of the format as well. We've also introduced a new writer, mat-json-v1, to allow users to save documents in the format used by MAT 1.3.
New span reconciliation capability: In MAT 2.0, we've deployed a tool for reconciliation of simple span annotations. This tool will be replaced by a general-purpose reconciliation tool in the next version of MAT.
New standalone document viewer and annotation tool: You can now embed a standalone version of the MAT document viewing component in your own Web application. This viewer can be enabled for hand annotation, and it also supports document comparison.
New capabilities in MATReport: Expanded support for annotation attributes, including the ability to generate per-label expanded report spreadsheets.
Hand annotation now supports adding overlapping annotations: This corrects an enormous deficiency in previous versions of MAT. While annotating, the overlapping annotations are also vertically stacked, ensuring that they're visible.

No Cygwin support

Support for Cygwin has been removed, because Python in Cygwin does not support sqlite, and sqlite is required for the MAT workspaces in 2.0. Migrate to Windows native.

Python 2.6 or later required

MAT 2.0 makes extensive use of JSON and sqlite, which are best supported in Python 2.6 or 2.7. It also relies on Python's "with" statement, which is supported first in 2.6.

Task.xml schema has changed

Because MAT now explicitly defines the annotations and well-formedness conditions for attributes separately from its display information, the task.xml file has been reorganized. You can use the MATUpdateTaskXML tool to update your task.xml file automatically.

The <tags> element has been replaced by the <annotation_set_descriptors> and <annotation_display> elements. These new elements are quite different than the old ones. If you're receiving MAT as a zip file distribution with tasks included, your tasks have been updated.
Because the UI has been completely redesigned, the <web_customization> element no longer accepts the default_tag_window_position and default_tag_window_size attributes.
The tagging_step attribute of <step_implementations> is no longer accepted (or needed).
Because of changes in the implementation of workspaces, the tagprep operation has been replaced by the import operation, and the list of steps required for the tag operation is now only "tag".

All models must be rebuilt (new version of jCarafe)

The version of jCarafe which is delivered with MAT 2.0 is 0.9.8.5.b-06, which has a different model structure than the version delivered with 1.3. You must rebuild all your models, either using MATModelBuilder (in file mode) or the "modelbuild" operation of MATWorkspaceEngine (in workspace mode).

UI has been completely reorganized, with a new URL

The 1.3 UI used a desktop-in-a-browser metaphor, which raised a number of issues, including poor use of screen real estate. In 2.0, we've completely reorganized the UI, and changed the URL.

mat_controller.sh is replaced by the --spawn_tabbed_terminal option of MATWeb

In previous releases, you really didn't have the option to pass any command-line options to the MATWeb server running under the tabbed terminal. As the command-line options to MATWeb expanded, and became more important, this turned out to be a bad idea. As a result, we've now reorganized the tabbed terminal startup so that it's part of MATWeb. The mat_controller.sh application is gone. The Windows mat_controller.bat script is still present, but it simply invokes MATWeb with the --spawn_tabbed_terminal option.

Workspaces have been completely reorganized

We have completely reorganized the internal structure of workspaces for 2.0. These new workspaces are more powerful and impose fewer requirements on the user. Your MAT 1.3 workspaces cannot be used with MAT 2.0 without modification. We've provided an upgrade tool which will allow you to convert your MAT 1.3 workspaces to MAT 2.0.

The new workspaces feature many fewer folders; a SQLite database which manages the document state information; real transaction and file locking; document assignment, potentially to multiple annotators; extensive logging capabilities; and infrastructure for future capabilities like reconciliation and complex reconciliation workflows, prioritization queues, and segment-by-segment annotation.

As a result of this change, it's no longer possible to run an experiment against a workspace by pointing to, e.g., the "completed" folder. So as part of this change, there's now special support for running experiments against workspaces, both from MATWorkspaceEngine and MATExperimentEngine.

Scorer output ranges have changed

In version 1.3, recall, precision and f-measure were all scaled from 0 to 100. In 2.0, they're scaled from 0 to 1.

CSV spreadsheet management in MATScore and MATExperimentEngine has changed

MATScore and MATExperimentEngine have long supported writing one of three CSV file formats (Excel formulas, OpenOffice formulas, and no formulas). In 2.0, you can now write multiple formats in the same run, and the name of each CSV file clearly indicates the formula type. As a result, the --no_csv_formulas and --oo_separator command-line options have been removed, and replaced with --csv_formula_output.

MATScore --tag_span_details renamed

Because the scorer now provides mismatch details for all conditions, this flag has been renamed to --tag_output_mismatch_details.

MATScore spreadsheet output has changed

Due to enhancements to the scorer, some of the columns in the output spreadsheets have been renamed or moved, and others have a slightly different interpretation. Full details here.

Command-line options to MATWorkspaceEngine have changed

In previous releases, we deprecated, but retained, the "operate" operation in MATWorkspaceEngine. This operation has finally been removed in 2.0. If you had still been doing something like this:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine mydirectory operate core modelbuild

you should now do this:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine mydirectory modelbuild core

See the workspace documentation for more details.

Upgrading from version 1.2 to version 1.3

Web server security has been improved

We now provide a separate document on Web server security as it pertains to workspace access. There are a number of new options to MATWeb to support improved security. The most visible effect is that you can restrict access to workspaces from the MAT UI by using the --workspace_container_directory option when you start up the MAT Web server.

Attribute to control right-to-left text display has moved

In version 1.2, the text_right_to_left attribute lived on the workflow element in task.xml; we anticipated that different workflows might be used for different languages within the same task. Since then, we've realized that the task is going to be the appropriate level of encapsulation for language differences for the foreseeable future. Furthermore, the current implementation of right-to-left encoding did not work appropriately with workspaces. Accordingly, we've moved this attribute to the web_customization element, and it is now global to tasks.

Corpus size iteration has changed

The experiment engine has now been extended with general-purpose iterators for sets of values and for value increments. So it's now possible, for instance, to vary the number of model iterations from 20 to 100 by increments of 10 without having to write a separate model set specification for each possible value. These iterators can be combined, in which case you'll get the cross-product of the possible value settings, or you can define your own iterators to get more sophisticated behavior (e.g., iterating over pairs of attribue-value sets). For the user, this means that a couple of attributes have been removed from the experiment engine, and a new set of elements and attributes has been added.

In version 1.2, all you could iterate on was corpus size. The mechanism for this iteration has now changed. In version 1.2, this is what you'd do:

  [...]
  <model_sets dir="model_sets">
    <build_settings training_increment="4" 
                    truncate_to_increment="yes"/>
    <model_set name="test">
      <training_corpus corpus="test" partition="train"/>
    </model_set>
  </model_sets>
  [...]

In version 1.3, it looks like this instead:

  [...]
  <model_sets dir="model_sets">
    <corpus_settings>
      <iterator type="corpus_size" increment="4"/>
    </corpus_settings>
    <model_set name="test">
      <training_corpus corpus="test" partition="train"/>
    </model_set>
  </model_sets>
  [...]

You can see that the size processing has been removed from the <build_settings> and added to a new <corpus_settings> element, which contains an instance of the new <iterator> element to specify the type of the iteration. See the documentation and examples for the experiment engine for more details. Note that in version 1.2, you had to specify explicitly that the iteration ends on an increment exactly; in 1.3 this is the default, and to force the final corpus size to be used, you'll need the force_last attribute:

  [...]
  <model_sets dir="model_sets">
    <corpus_settings>
      <iterator type="corpus_size" increment="4" force_last="yes"/>
    </corpus_settings>
    <model_set name="test">
      <training_corpus corpus="test" partition="train"/>
    </model_set>
  </model_sets>
  [...]

Experiment spreadsheet columns have been expanded

The experiment engine output spreadsheets have been slightly expanded to include information about the run and model "families" in addition to the actual run and model. This change follows from the introduction of general iterators described above. See the documentation on MATExperimentEngine for details.

Experiment directory structure has changed

In order to support the iterators in the experiment engine, we've reorganized the structure of the experiment directory somewhat. See the documentation on MATExperimentEngine for details.

Upgrading from version 1.1 to version 1.2

New native Windows port

It is now possible to run MAT in Windows without Cygwin installed.

Single distribution bundle for all platforms

Unlike previous versions, there is a single distribution bundle for MAT 1.2 for all supported platforms. For compatibility with Windows, this bundle is now a zip file.

New tabbed terminal for Windows

If you use mat_controller.sh or mat_controller.bat under Windows, you'll find that there's a new tabbed terminal tool we're using, which has the advantage of not requiring Cygwin.

New version of Terminator.app for MacOS X 10.6

If you're using mat_controller.sh under MacOS X, and you intend to install 10.6, note that the previous version of Terminator.app, which supports the tabbed terminal behavior in mat_controller.sh, will not work in 10.6; you must install the newer version provided with MAT 1.2.

Tokenizer has changed

In version 1.2, the original OCaml tokenizer and Carafe trainer/tagger have been replaced by the Java reimplementations. There are a number of important changes that are required as a result. Among other things, the Java tokenizer produces slightly different token boundaries than the original OCaml tokenizer. This is problematic because the entire basis of most annotation systems, including MAT, is the subdivision into words (tokens). In order to have optimal performance, the tokenization of documents which are to be automatically tagged should match the tokenization of the documents which were used to create the tagger model. This means that in order to migrate from version 1.1 to version 1.2, among other things, you must retokenize your documents and update any references to the OCaml tokenizer.

First, to retokenize your documents, we've provided the new MATRetokenize utility. Please back up your data before you run this utility.

Next, if you refer to a tokenization step implementation in your task.xml file, you must change all occurrences of MAT.PluginMgr.CarafeTokenizationStep to MAT.JavaCarafe.CarafeTokenizationStep. You may also need to specify the heap_size attribute on the relevant tokenization <step> in any workflow, if it turns out that the default Java heap size isn't large enough for your purposes (this attribute can also be specified on the command line; see the Carafe engine documentation).

Trainer/tagger has changed

First, retokenize your documents using MATRetokenize, as described above, and update your tokenization steps.

Next, update your tagger and trainer settings in task.xml according to the documentation provided for the Carafe engine.

Next, if you refer to a tagging step in your task.xml file, you must change all occurrences of MAT.PluginMgr.CarafeTagStep to MAT.JavaCarafe.CarafeTagStep. You may also need to specify the heap_size attribute on the relevant tag <step> in any workflow, if it turns out that the default Java heap size isn't large enough for your purposes (this attribute can also be specified on the command line; see the Carafe engine documentation). Similarly, if you have a <model_build_settings> entry, you must change all occurrences of MAT.CarafeModelBuilder.CarafeModelBuilder to MAT.JavaCarafe.CarafeModelBuilder, and possibly specify the heap_size attribute as well. (Note below that you must also change the syntax of <model_build_settings>.)

Note that for the tagger, the prior_adjst attribute has been renamed to prior_adjust. For the trainer, the engine attribute has been eliminated, and the feature_set attribute as well; there's now a new feature_spec attribute which refers to a file in which you can describe your feature set, if you don't want to use the default feature set. Also, the psa_iterations flag has been removed, due to more numerous options in the Carafe trainer;

psa_iterations="6"

becomes

training_method="psa" max_iterations="6"

Because PSA no longer requires random segments, the no_random_psa_segments flag has been removed.

Finally, use the same tools as before to build your models: either MATModelBuild in file mode, or the modelbuild operation in workspace mode.

Internals of experiment directories have changed

In order to support a more flexible way of specifying partitions in experiments, the way the configuration of experiments is cached has changed in version 1.2. What this means is that you will not be able to invoke MATExperimentEngine on experiment directories created using version 1.1 to regenerate the experiment scores.

Experiment XML files have changed

In order to support a more flexible way of specifying partitions in experiments, we've changed the way partitions are specified in the experiment XML files. We compare the relevant files below:

Version 1.1:

<experiment task='Named Entity'>
  <corpora dir="corpora">
    <partition split_fraction=".2" ctype="split"/>
    <corpus name="test">
      <pattern>*.json</pattern>
    </corpus>
  </corpora>
  <model_sets dir="model_sets">
    <model_set name="test" corpus="test"/>
  </model_sets>
  <runs dir="runs">
    <run_settings>
      <args steps="zone,tokenize,tag" workflow="Demo"/>
    </run_settings>
    <run name="test" model="test" corpus="test"/>
  </runs>
</experiment>

Version 1.2:

<experiment task='Named Entity'>
  <corpora dir="corpora">
    <partition name="train" fraction=".8"/>
    <partition name="test" fraction=".2"/>
    <corpus name="test">
      <pattern>*.json</pattern>
    </corpus>
  </corpora>
  <model_sets dir="model_sets">
    <model_set name="test">
      <training_corpus corpus="test" partition="train"/>
    </model_set>
  </model_sets>
  <runs dir="runs">
    <run_settings>
      <args steps="zone,tokenize,tag" workflow="Demo"/>
    </run_settings>
    <run name="test" model="test">
      <test_corpus corpus="test" partition="test"/>
    </run>
  </runs>
</experiment>

Note the following changes:

The <partition> element now explicitly specifies named partitions and their fractions. You are no longer restricted to designating a corpus exclusively as test, exclusively as train, or as a single split.
The remainder of the attributes of the <partition> element have been moved to a new <size> element (not exemplified here).
Multiple corpora, and partitions, can be associated with a single <model_set> or <run>.
The "corpus" attributes of the <model_set> and <run> arguments are no longer recognized. The <training_corpus> and <test_corpus> child elements replace them.

Settings in task.xml have changed

In order to clarify how task settings are handled in MAT, a number of changes have been made to the task.xml file syntax.

First, the <step> element of <step_implementations> no longer accepts arbitrary attributes. If you made use of this feature to pass settings to the initialization methods of workflow steps, you must now use the <create_settings> child element. We doubt that anyone has made use of this feature.

Second, the <step> element of <workflow> no longer accepts arbitrary attributes. If you make use of this feature to pass settings to workflow steps, you must now use the <create_settings>, <ui_settings>, or <run_settings> child elements. The most likely situation where this might arise is in passing defaults to the run methods of steps. For instance, if you used this feature to increase the Java heap size for Java Carafe, your task.xml file would have to be revised as follows.

Version 1.1:

  ...
  <workflows>
    <workflow name="Demo" hand_annotation_available_at_end="yes">
      <step name="zone"/>
      <step name="tokenize"/>
      <step name="tag" heap_size="2G"/>
    </workflow>
    ...
  </workflows>
  ...

Version 1.2:

  ...
  <workflows>
    <workflow name="Demo" hand_annotation_available_at_end="yes">
      <step name="zone"/>
      <step name="tokenize"/>
      <step name="tag">
        <run_settings heap_size="2G"/>
      </step>
    </workflow>
    ...
  </workflows>
  ...

Second, the way settings are specified for model configurations has changed. The name and class for the configuration are now separated from the settings which are passed to the model builder, as follows.

Version 1.1:

  ...
  <model_build_settings class="MAT.JavaCarafe.CarafeModelBuilder"
                        training_method="psa" max_iterations="6"/>
  </model_build_settings>
  ...

Version 1.2:

  ...
  <model_config class="MAT.JavaCarafe.CarafeModelBuilder">
    <build_settings training_method="psa" max_iterations="6"/>
  </model_config>
  ...

Finally, the <workflow> element no longer accepts arbitrary settings; these settings must be passed using the <ui_settings> child element. No task appears to use this option yet, so this shouldn't affect anyone.

Upgrading from version 1.0 to version 1.1

Internals of experiment directories have changed

In order to support a more flexible way of invoking the MAT engine in experiments, the way the configuration of experiments is cached has changed in version 1.1. What this means is that you will not be able to invoke MATExperimentEngine on experiment directories created using version 1.0 to regenerate the experiment scores.

Experiment XML files have changed

In order to support a more flexible way of invoking the MAT engine in experiments, we've changed the way corpus preprocessing and test run processing are specified. In version 1.0, the MAT engine was called as a command-line tool, and the options were specified as a command line; in version 1.1, the options are specified as XML attribute-value pairs. We compare the relevant experiment XML blocks below:

Version 1.0:

  <corpora dir="corpora">
    <prep>--input_file_type xml-inline --workflow Align --steps 'zone,tokenize,align'</prep>
    [...]
  </corpora>

  <runs dir="runs">
    <run_settings>
      <args>--steps zone,tokenize,tag --workflow Demo</args>
    </run_settings>
    [...]
  </runs>

Version 1.1:

  <corpora dir="corpora">
    <prep input_file_type="xml-inline" workflow="Align" steps="zone,tokenize,align"/>
    [...]
  </corpora>

  <runs dir="runs">
    <run_settings>
      <args steps="zone,tokenize,tag" workflow="Demo"/>
    </run_settings>
    [...]
  </runs>

New training engine configuration in task.xml

Version 1.1 adds the ability to define different training engines. Because of this change, if you've defined your own task and you specified model build settings in your task.xml file, you must add a class attribute to the model_build_settings element. This attribute is not optional, and there is no default. If you're using the default Carafe engine, the value you should use for this attribute is MAT.CarafeModelBuilder.CarafeModelBuilder, as in the following example:

  <model_build_settings class="MAT.CarafeModelBuilder.CarafeModelBuilder" 
                        engine="anonTrain.native" feature_set="ANON-1"
                        psa_iterations="6"/>

New folder in workspaces

Version 1.1 adds the ability to import MAT JSON documents into your workspaces which haven't yet been processed (as well as other annotation formats, like XML inline). Because of this change, if you have a workspace, you must add a directory to it. This directory is expected by the MAT workspace engine. For each workspace directory, do this:

% mkdir <workspace_dir>/folders/rich_incoming

New command line option restriction for MATModelBuilder

In version 1.1, it's possible to have multiple model build configurations in your task.xml file. In order to ensure that the correct configuration adds the appropriate command line options to the MATModelBuilder executable, it was necessary to introduce a new restriction on the --task option for MATModelBuilder: if it appears, it must now be the first command-line option. In other words, the following will now raise an error:

% $MAT_PKG_HOME/bin/MATModelBuilder \
--input_files '/path/to/my/docs/1[0-9][0-9].json' \
--input_dir /path/to/my/other/docs --task "Named Entity" \
--lexicon_dir /path/to/my/lexicon/ --save_as_default_model

Change to how the default model is specified in task.xml

In version 1.0, the default model was defined within the model build settings. In version 1.1, because of the presence of multiple model bulid configurations, we've separated the specification of the default model in task.xml.

Version 1.0:

  <model_build_settings engine="anonTrain.native" feature_set="ANON-1"
                        psa_iterations="6" default_model="default_model"/>

Version 1.1:

  <model_build_settings class="MAT.CarafeModelBuilder.CarafeModelBuilder" 
                        engine="anonTrain.native" feature_set="ANON-1"
                        psa_iterations="6"/>
  <default_model>default_model</default_model>