Autotagging

When you left-click on an existing span annotation, one of your options is "Autotag matches":

[add or modify]

If you select this option, all the eligible spans in the text which are not yet marked and match the existing span will be assigned the same annotation label as the already-annotated span.

Default autotagging properties

Autotagging has the following default properties:

Controlling and enhancing autotagging

There are numerous ways to control and enhance this autotagging capability.

Controlling case-sensitivity

By default, autotagging is case-sensitive, but you can make it case-insensitive by deselecting "Autotag is case-sensitive" in the View menu:

[view menu]

You can also change this setting via "View -> All settings..." in the "Annotation operations" tab, by toggling "Take case into account...":

[ao settings]

Controlling untokenized autotagging boundaries

If your document is not tokenized, the characters that determine the edges of autotag candidates via the tokenless_autotag_delimiters property, per language, in the task configuration file. Here, we show the relevant block from the definition of the "Named Entity" sample task:

     <languages>
<language code='en' name='English' tokenless_autotag_delimiters='.,/?!;:'/>
</languages>

So, for instance, if you're using this task, if you annotate "Pakistan" in an untokenized document, and you want to autotag all other "Pakistan" instances as the same label, any such "Pakistan" instance will have to be flanked, on each end, by either whitespace, the beginning or end of the document, or one of the delimiting characters shown above. So the substring "Pakistan" in " Pakistani" will not be a candidate (because "i" is not a whitespace character or one of the delimiters), but the "Pakistan" substring in " Pakistan," will be (because the comma is one of the delimiters). Note that the substring "Pakistan" in "the India-Pakistan border" is also not a candidate, because the dash is not one of the delimiter characters; you could change this by adding the dash to the set of delimiter characters in your task.

Enhanced autotagging operations

You'll notice, in the "Annotation operations" tab above, there's one other setting, to display additional options for autotagging. If you enable this option, your popup menu will contain, instead of "Autotag matches", an "Autotag..." entry with a submenu containing the following operations: