delphin.codecs
Serialization Codecs for Semantic Representations
The delphin.codecs
package is a namespace package for modules used in the
serialization and deserialization of semantic representations. All
modules included in this namespace must follow the common API (based
on Python’s pickle
and json
modules) in order to
work correctly with PyDelphin. This document describes that API.
Included Codecs
MRS:
DMRS:
EDS:
Codec API
Module Constants
There is one required module constant for codecs: CODEC_INFO
. Its
purpose is primarily to specify which representation (MRS, DMRS, EDS)
it serializes. A codec without CODEC_INFO
will work for programmatic
usage, but it will not work with the delphin.commands.convert()
function or at the command line with the delphin convert
command, which use the representation
key in CODEC_INFO
to
determine when and how to convert representations.
- CODEC_INFO
A dictionary containing information about the codec. While codec authors may put arbitrary data here, there are two keys used by PyDelphin’s conversion features:
representation
anddescription
. Onlyrepresentation
is required, and should be set to one ofmrs
,dmrs
, oreds
. For example, themrsjson
codec uses the following:CODEC_INFO = { 'representation': 'mrs', 'description': 'JSON-serialized MRS for the Web API' }
The following module constants are optional and are used to describe
strings that must appear in valid documents when serializing multiple
semantics representations at a time, as with dump()
and
dumps()
. It is used by delphin.commands.convert()
to
provide a streaming serialization rather than dumping the entire file
at once. If the values are not defined in the codec module, default
values will be used.
- HEADER
The string to output before any of semantic representations are serialized. For example, in
delphin.codecs.mrx
, the value ofHEADER
is<mrs-list>
, and indelphin.codecs.dmrstikz
it is an entire LaTeX preamble followed bybegin{document}
.
- JOINER
The string used to join multiple serialized semantic representations. For example, in
delphin.codecs.mrsjson
, it is a comma (,
) following JSON’s syntax. Normally it is either an empty string, a space, or a newline, depending on the conventions for the format and if theindent
argument is set.
- FOOTER
The string to output after all semantic representations have been serialized. For example, in
delphin.codecs.mrx
, it is</mrs-list>
, and indelphin.codecs.dmrstikz
it isend{document}
.
Deserialization Functions
The deserialization functions load()
, loads()
, and
decode()
accept textual serializations and return the
interpreted semantic representation. Both load()
and
loads()
expect full documents (including headers and footers,
such as <mrs-list>
and </mrs-list>
around a
mrx
serialization) and return lists of semantic
structure objects. The decode()
function expects single
representations (without headers and footers) and returns a single
semantic structure object.
Reading from a file or stream
- load(source)
Deserialize and return semantic representations from source.
- Parameters:
source – path-like object or file handle of a source containing serialized semantic representations
- Return type:
Reading from a string
Decoding from a string
- decode(s)
Deserialize and return the semantic representation from string s.
- Parameters:
s – string containing a serialized semantic representation
- Return type:
subclass of
delphin.sembase.SemanticStructure
Serialization Functions
The serialization functions dump()
, dumps()
, and
encode()
take semantic representations as input as either return
a string or print to a file or stream. Both dump()
and
dumps()
will provide the appropriate HEADER
,
JOINER
, and FOOTER
values to make the result a valid
document. The encode()
function only serializes a single
semantic representation, which is generally useful when working with
single representations, but is also useful when headers and footers
are not desired (e.g., if you want the dmrx
representation of a DMRS without <dmrs-list>
and </dmrs-list>
surrounding it).
Writing to a file or stream
- dump(xs, destination, properties=True, lnk=True, indent=False, encoding='utf-8')
Serialize semantic representations in xs to destination.
- Parameters:
xs – iterable of
SemanticStructure
objects to serializedestination –
path-like object or file object where data will be written to
properties (bool) – if
False
, suppress morphosemantic propertieslnk (bool) – if
False
, suppress surface alignments and stringsindent – if
True
or an integer value, add newlines and indentation; some codecs may support an integer value forindent
, which specifies how many columns to indentencoding (str) – if destination is a filename, write to the file with the given encoding; otherwise it is ignored
Writing to a string
Encoding to a string
Variations
All serialization codecs should use the function signatures above, but
some variations are possible. Codecs should not remove any positional
or keyword arguments from functions, but they can be ignored. If any
new positional arguments are added, they should appear after the last
positional argument in its function, before the keyword arguments. New
keyword arguments may be added in any order. Finally, a codec may
omit some functions entirely, such as for export-only codecs that do
not provide load()
, loads()
, or decode()
. The module
constants HEADER
, JOINER
, and FOOTER
are also
optional. Here are some examples of variations in PyDelphin:
delphin.codecs.indexedmrs
requires asemi
positional argument.delphin.codecs.mrsjson
,delphin.codecs.dmrsjson
, anddelphin.codecs.edsjson
introduceto_dict()
andfrom_dict()
functions in their public API as they may be generally useful.delphin.codecs.dmrspenman
anddelphin.codecs.edspenman
introduceto_triples()
andfrom_triples()
functions in their public API.delphin.codecs.eds
allows ashow_status
keyword argument to turn on graph connectedness markers on serialization.delphin.codecs.mrsprolog
anddelphin.codecs.dmrstikz
are export-only codecs and do not provideload()
,loads()
, ordecode()
functions.delphin.ace
is an import-only codec and does not providedump()
,dumps()
, orencode()
functions.