Transducer

Description

The transducer converts documents from one format to another, and also provides the capability to manipulate the documents via the use of an XML configuration specification.

Usage

Unix:

% $MAT_PKG_HOME/bin/MATTransducer

Windows native:

> %MAT_PKG_HOME%\bin\MATTransducer.cmd

Usage: MATTransducer [options]

Options:
-h, --help show this help message and exit

Core options:
--task=task name of the task to use, if helpful to the reader/writer. Optional. Known tasks are: ...
--debug Enable debug output.
...

If no arguments are provided to MATTransducer, the help message above is presented.

The transducer will transduce all the files it can process, one by one. It will skip (and report skipping) files which generate an error.

You can also use MATEngine for transducing, but it's less useful in a number of ways. Here's a comparison:

MATTransducer
MATEngine
Loads, transduces and saves files one at a time
Loads and transduces all files before any save is performed
Skips and reports files which generate an error
Fails if any file generates an error
Does not require a task
Requires a task
Does not require a workflow
Requires a workflow
Supports XML-configurable conversion
Does not support conversion

Core options

--task <s>
The name of the task to use, if helpful to the reader/writer. Optional.. The known tasks are reported here.
--verbose
If specified, report each file to stdout as it's transduced.

MATTransducer also makes the common options available.

The remainder of the options can be grouped into input and output options.

Input options

The input options specify the input files. You can specify individual files, or directories (possibly filtering their contents using a regular expression). You must specify a file type. For raw files, you can also specify an input character encoding.

Command line option
XML attribute
Value
Description
--input_file <f>


The file to process. Either this or --input_dir must be specified. A single dash ('-') will cause the engine to read from standard input.
--input_dir <d>


The directory to process. Either this or --input_file must be specified.
--input_file_re <s>


If --input_dir is specified, a regular expression to match the filenames in the directory against. The pattern must cover the entire filename (and only the filename, not the full path).
--input_encoding <e>


Input character encoding for raw files. Default is UTF-8.
--input_file_type <t>


The file type of the input. One of the available readers and writers. Required.
--handle_non_bmp <v>

one of 'warn', 'scrub_or_warn', 'fail', 'ignore'
Instructions on how to handle Unicode characters outside the Basic Multilingual Plane. Overrides the default HANDLE_NON_BMP configuration variable. See the Unicode issues discussion for details. Default is 'warn'.

Output options

The output options specify how the result is saved. Unlike MATEngine, an output is required. You can specify an output file for an input file, or an output directory and/or name mapping for an input directory. You must also specify the output format; usually, you'll want this to be one of the rich formats, but "raw" is useful in some rare circumstances. Finally, you can specify an output character encoding for raw files.

Command line option
XML attribute
Value
Description
--output_file <f>


Where to save the output. Either this or --output_dir must be provided. Must be paired with --input_file. A single dash ('-') will cause the engine to write to standard output.
--output_dir <d>


Where to save the output. Either this or --output_file must be provided. Must be paired with --input_dir.
--output_fsuff <s>


The suffix to add to each filename when --output_dir is specified. If absent, the name of each file will be identical to the name of the file in the input directory.
--output_file_type <t>


The type of the file to save. One of the available readers and writers. Required if either --output_file or --output_dir is specified.
--output_encoding <e>


Output character encoding for raw files. Default is UTF-8.

Conversion options

Command line option
XML attribute
Value
Description
--fresh_task


If this option is present, all task information in each document will be removed and re-inferred before the document is saved. Use this option if you're processing documents which were created using a task other than the current one. If no task is specified, this option will remove the task information without replacing it.
--document_mapping_xml <xml>


If present, the mapping XML will be applied to the document(s) before they're saved. Only one of this and --document_mapping_xml_file can be provided.
--document_mapping_xml_file <f>


If present, the specified file containing mapping XML will be applied to the document(s) before they're saved. Only one of this and --document_mapping_xml can be provided.
--document_mapping_record <f>


If --document_mapping_xml or --document_mapping_xml_file are present, or if the reader itself has a convertor registered for this task, this option specifies a file to save a CSV record of the mapping to, in a format quite similar to the format found in MATReport. If the value of this option is '-', the record will be printed out in plain text to the terminal.

Other options

The readers and writers described above may introduce additional options, which are described here. These options must follow the input and output options.

Examples

Example 1

Let's say you have an XML file /path/to/my/document.xml, and you wish to translate it into MAT JSON, converting all its tags, and printing it to standard output:

Unix:

% $MAT_PKG_HOME/bin/MATTransducer --input_file /path/to/my/document.xml \
--input_file_type xml-inline --xml_translate_all \
--output_file - --output_file_type mat-json


Windows native:

> %MAT_PKG_HOME%\bin\MATTransducer.cmd --input_file c:\path\to\my\document.xml \
--input_file_type xml-inline --xml_translate_all \
--output_file - --output_file_type mat-json