delphin.lnk¶
Surface alignment for semantic entities.
In DELPH-IN semantic representations, entities are aligned to the input surface string is through the so-called “lnk” (pronounced “link”) values. There are four types of lnk values which align to the surface in different ways:
Character spans (also called “characterization pointers”); e.g.,
<0:4>
Token indices; e.g.,
<0 1 3>
Chart vertex spans; e.g.,
<0#2>
Edge identifier; e.g.,
<@42>
The latter two are unlikely to be encountered by users. Chart vertices were used by the PET parser but are now essentially deprecated and edge identifiers are only used internally in the LKB for generation. I will therefore focus on the first two kinds.
Character spans (sometimes called “characterization pointers”) are by
far the most commonly used type—possibly even the only type most
users will encounter. These spans indicate the positions between
characters in the input string that correspond to a semantic entity,
similar to how Python and Perl do string indexing. For example,
<0:4>
would capture the first through fourth characters—a span
that would correspond to the first word in a sentence like “Dogs
bark”. These spans assume the input is a flat, or linear, string and
can only select contiguous chunks. Character spans are used by REPP
(the Regular Expression PreProcessor; see delphin.repp
) to
track the surface alignment prior to string changes introduced by
tokenization.
Token indices select input tokens rather than characters. This method, though not widely used, is more suitable for input sources that are not flat strings (e.g., a lattice of automatic speech recognition (ASR) hypotheses), or where non-contiguous sequences are needed (e.g., from input containing markup or other noise).
Note
Much of this background is from comments in the LKB source code: See: http://svn.emmtee.net/trunk/lingo/lkb/src/mrs/lnk.lisp
Support for lnk values in PyDelphin is rather simple. The Lnk
class is able to parse lnk strings and model the contents for
serialization of semantic representations. In addition, semantic
entities such as DMRS Nodes
and MRS
EPs
have cfrom
and cto
attributes which
are the start and end pointers for character spans (defaulting to -1
if a character span is not specified for the entity).
Classes¶
-
class
delphin.lnk.
Lnk
(arg, data=None)[source]¶ Surface-alignment information for predications.
Lnk objects link predicates to the surface form in one of several ways, the most common of which being the character span of the original string.
Valid types and their associated data shown in the table below.
type
data
example
Lnk.CHARSPAN
surface string span
(0, 5)
Lnk.CHARTSPAN
chart vertex span
(0, 5)
Lnk.TOKENS
token identifiers
(0, 1, 2)
Lnk.EDGE
edge identifier
1
- Parameters
arg – Lnk type or the string representation of a Lnk
data – alignment data (assumes arg is a Lnk type)
-
type
¶ the way the Lnk relates the semantics to the surface form
-
data
¶ the alignment data (depends on the Lnk type)
Example
>>> Lnk('<0:5>').data (0, 5) >>> str(Lnk.charspan(0,5)) '<0:5>' >>> str(Lnk.chartspan(0,5)) '<0#5>' >>> str(Lnk.tokens([0,1,2])) '<0 1 2>' >>> str(Lnk.edge(1)) '<@1>'
-
classmethod
charspan
(start, end)[source]¶ Create a Lnk object for a character span.
- Parameters
start – the initial character position (cfrom)
end – the final character position (cto)
-
classmethod
chartspan
(start, end)[source]¶ Create a Lnk object for a chart span.
- Parameters
start – the initial chart vertex
end – the final chart vertex
-
class
delphin.lnk.
LnkMixin
(lnk=None, surface=None)[source]¶ A mixin class for adding
cfrom
andcto
properties on structures.-
property
cfrom
¶ The initial character position in the surface string.
Defaults to -1 if there is no valid cfrom value.
-
property
cto
¶ The final character position in the surface string.
Defaults to -1 if there is no valid cto value.
-
property
Exceptions¶
-
exception
delphin.lnk.
LnkError
(*args, **kwargs)[source]¶ Bases:
delphin.exceptions.PyDelphinException
Raised on invalid Lnk values or operations.