The transducer converts documents from one format to another, and
also provides the capability to manipulate the documents via the
use of an XML configuration
specification.
Unix:
% $MAT_PKG_HOME/bin/MATTransducer
Windows native:
> %MAT_PKG_HOME%\bin\MATTransducer.cmd
Usage: MATTransducer [options]
Options:
-h, --help show this help message and exit
Core options:
--task=task name of the task to use, if helpful to the reader/writer. Optional. Known tasks are: ...
--debug Enable debug output.
...
If no arguments are provided to MATTransducer, the help message
above is presented.
The transducer will transduce all the files it can process, one
by one. It will skip (and report skipping) files which generate an
error.
You can also use MATEngine for
transducing, but it's less useful in a number of ways. Here's a
comparison:
MATTransducer |
MATEngine |
Loads, transduces and saves
files one at a time |
Loads and transduces all
files before any save is performed |
Skips and reports files which
generate an error |
Fails if any file generates
an error |
Does not require a task |
Requires a task |
Does not require a workflow |
Requires a workflow |
Supports XML-configurable conversion |
Does not support conversion |
--task <s> |
The name of the task to use,
if helpful to the reader/writer. Optional.. The known tasks
are reported here. |
--verbose |
If specified, report each
file to stdout as it's transduced. |
MATTransducer also makes the common options available.
The remainder of the options can be grouped into input and output options.
The input options specify the input files. You can specify
individual files, or directories (possibly filtering their
contents using a regular expression). You must specify a file
type. For raw files, you can also specify an input character
encoding.
Command line option |
XML attribute |
Value |
Description |
---|---|---|---|
--input_file <f> |
The file to process. Either
this or --input_dir must be specified. A single dash ('-')
will cause the engine to read from standard input. |
||
--input_dir <d> |
The directory to process.
Either this or --input_file must be specified. |
||
--input_file_re <s> |
If --input_dir is specified,
a regular expression to match the filenames in the directory
against. The pattern must cover the entire filename (and
only the filename, not the full path). |
||
--input_encoding <e> |
Input character encoding for
raw files. Default is UTF-8. |
||
--input_file_type <t> |
The file type of the input.
One of the available readers
and writers. Required. |
||
--handle_non_bmp <v> |
one of 'warn', 'scrub_or_warn', 'fail', 'ignore' |
Instructions on how to handle Unicode
characters outside the Basic Multilingual Plane. Overrides
the default HANDLE_NON_BMP configuration
variable. See the Unicode
issues discussion for details. Default is 'warn'. |
The output options specify how the result is saved. Unlike
MATEngine, an output is required. You can specify an output file
for an input file, or an output directory and/or name mapping for
an input directory. You must also specify the output format;
usually, you'll want this to be one of the rich formats, but "raw"
is useful in some rare circumstances. Finally, you can specify an
output character encoding for raw files.
Command line option |
XML attribute |
Value |
Description |
---|---|---|---|
--output_file <f> |
Where to save the output.
Either this or --output_dir must be provided. Must be paired
with --input_file. A single dash ('-') will cause the engine
to write to standard output. |
||
--output_dir <d> |
Where to save the output.
Either this or --output_file must be provided. Must be
paired with --input_dir. |
||
--output_fsuff <s> |
The suffix to add to each
filename when --output_dir is specified. If absent, the name
of each file will be identical to the name of the file in
the input directory. |
||
--output_file_type <t> |
The type of the file to save.
One of the available readers
and writers. Required if either --output_file or
--output_dir is specified. |
||
--output_encoding <e> |
Output character encoding for
raw files. Default is UTF-8. |
Command line option |
XML attribute |
Value |
Description |
---|---|---|---|
--fresh_task |
If this option is present, all task
information in each document will be removed and re-inferred
before the document is saved. Use this option if you're
processing documents which were created using a task other
than the current one. If no task is specified, this option
will remove the task information without replacing it. |
||
--document_mapping_xml
<xml> |
If present, the mapping XML will be
applied to the document(s) before they're saved. Only one of
this and --document_mapping_xml_file can be provided. |
||
--document_mapping_xml_file
<f> |
If present, the specified
file containing mapping
XML will be applied to the document(s) before they're
saved. Only one of this and --document_mapping_xml can be
provided. |
||
--document_mapping_record <f> |
If --document_mapping_xml or
--document_mapping_xml_file are present, or if the reader
itself has a convertor registered for this task, this option
specifies a file to save a CSV record of the mapping to, in
a format quite similar to the format found in MATReport. If the value of this
option is '-', the record will be printed out in plain text
to the terminal. |
The readers and writers described above may introduce additional
options, which are described here.
These options must follow the input and output options.
Let's say you have an XML file /path/to/my/document.xml, and you
wish to translate it into MAT JSON, converting all its tags, and
printing it to standard output:
Unix:
% $MAT_PKG_HOME/bin/MATTransducer --input_file /path/to/my/document.xml \
--input_file_type xml-inline --xml_translate_all \
--output_file - --output_file_type mat-json
Windows native:
> %MAT_PKG_HOME%\bin\MATTransducer.cmd --input_file c:\path\to\my\document.xml \
--input_file_type xml-inline --xml_translate_all \
--output_file - --output_file_type mat-json