delphin.tsdb
Test Suite Database (TSDB) Primitives
Note
This module implements the basic, low-level functionality for
working with TSDB databases. For higher-level views and uses of
these databases, see delphin.itsdb
. For complex queries
of the databases, see delphin.tsql
.
TSDB databases are plain-text file-based relational databases
minimally consisting of a directory with a file, called
relations
, containing the database’s schema (see
Schemas). Every relation, or table, in the database has its own
file, which may be gzipped
to save space. The relations have a simple format with columns
delimited by @
and records delimited by newlines. This makes
them easy to inspect at the command line with standard Unix tools
such as cut
and awk
(but gzipped relations need to be
decompressed or piped from a tool such as zcat
).
This module handles the technical details of reading and writing TSDB databases, including:
parsing database schemas
transparently opening either the plain-text or gzipped relations on disk, as appropriate
escaping and unescaping reserved characters in the data
pairing columns with their schema descriptions
casting types (such as
:integer
,:date
, etc.)
Additionally, this module provides very basic abstractions of
databases and relations as the Database
and
Relation
classes, respectively. These serve as base
classes for the more featureful delphin.itsdb.TestSuite
and delphin.itsdb.Table
classes, but may be useful as they
are for simple needs.
Module Constants
- delphin.tsdb.SCHEMA_FILENAME
relations
– The filename for the schema.
- delphin.tsdb.FIELD_DELIMITER
@
– The character used to delimit fields (or columns) in a record.
- delphin.tsdb.TSDB_CORE_FILES
The list of files used in “skeletons”. Includes:
item analysis phenomenon parameter set item-phenomenon item-set
- delphin.tsdb.TSDB_CODED_ATTRIBUTES
The default values of specific fields. Includes:
i-wf = 1 i-difficulty = 1 polarity = -1
Fields without a special value given above get assigned one based on their datatype.
Schemas
A TSDB database defines its schema in a file called relations
.
This file contains descriptions of each relation (table) and its
fields (columns), including the datatypes and whether a column
counts as a “key”. Key columns may be used when joining relations
together. As an example, the first 9 lines of the run
relation
description is as follows:
run:
run-id :integer :key # unique test run identifier
run-comment :string # descriptive narrative
platform :string # implementation platform (version)
protocol :integer # [incr tsdb()] protocol version
tsdb :string # tsdb(1) (version) used
application :string # application (version) used
environment :string # application-specific information
grammar :string # grammar (version) used
...
See also
See the TsdbSchemaRfc wiki for
a description of the format of relations
files.
In PyDelphin, TSDB schemas are represented as dictionaries of lists
of Field
objects.
- class delphin.tsdb.Field(name, datatype, flags=None, comment=None)[source]
A tuple describing a column in a TSDB database relation.
- Parameters:
- delphin.tsdb.read_schema(path)[source]
Instantiate schema dict from a schema file given by path.
If path is a directory, use the relations file under path. If path is a file, use it directly as the schema’s path. Otherwise raise a
TSDBSchemaError
.
- delphin.tsdb.write_schema(path, schema)[source]
Serialize schema and write it to the relations file at path.
If path is a directory, write to a
relations
file under path, otherwise write to the file path.
- delphin.tsdb.make_field_index(fields)[source]
Create and return a mapping of field names to indices.
This mapping helps with looking up columns by their names.
- Parameters:
fields – iterable of
Field
objects
Examples
>>> fields = [tsdb.Field('i-id', ':integer'), ... tsdb.Field('i-input', ':string')] >>> tsdb.make_field_index(fields) {'i-id': 0, 'i-input': 1}
Data Operations
Character Escaping and Unescaping
- delphin.tsdb.escape(string)[source]
Replace any special characters with their TSDB escape sequences. The characters and their escape sequences are:
@ -> \s (newline) -> \n \ -> \\
Also see
unescape()
- Parameters:
string – string to escape
- Returns:
The escaped string
Record Splitting and Joining
- delphin.tsdb.split(line, fields=None)[source]
Split a raw line from a relation into a list of column values.
Decoding involves splitting the line by the field delimiter and unescaping special characters. The column value for empty fields is
None
.If fields is given, cast each column value into its datatype, otherwise the value is returned as a string.
- Parameters:
line – raw line from a TSDB relation file.
fields – iterable of
Field
objects
- Returns:
A list of column values.
- delphin.tsdb.join(values, fields=None)[source]
Join a list of column values into a string for a relation file.
Encoding involves escaping special characters for each value, then joining the values into a single string with the field delimiter. If fields is given,
None
values will be replaced with the default value for their datatype.For creating a record from a mapping of column names to values, see
make_record()
.- Parameters:
values – list of column values
fields – iterable of
Field
objects
- Returns:
A TSDB-encoded string
- delphin.tsdb.make_record(colmap, fields)[source]
Create a record tuple from a mapping of column names to values.
This function is useful when colmap is either a subset or superset of the columns defined for a relation (as determined by fields). That is, it selects the relevant column values and fills in the missing ones with
None
. fields is also responsible for determining the column order.- Parameters:
colmap – mapping of column names to values
fields – iterable of
Field
objects
- Returns:
A tuple of column values
Datatype Conversion
- delphin.tsdb.cast(datatype, raw_value)[source]
Cast TSDB field raw_value into datatype.
If raw_value is
None
or an empty string (''
),None
will be returned, regardless of the datatype. However, when datatype is:integer
and raw_value is'-1'
(the default value for most:integer
columns),-1
is returned instead ofNone
. This means thatcast()
is the inverse offormat()
except for integer values of-1
, some date formats, and coded defaults.Supported datatypes:
TSDB datatype
Python type
:integer
int
:string
str
:float
float
:date
datetime.datetime
Casting the
:integer
,:string
, and:float
types is trivial, but for:date
TSDB uses a non-standard date format. This format generally follows theDD-MM-YY
pattern, optionally followed by a time (with no timezone or UTC-offset allowed). The day of the month may be left unspecified, in which case01
is used. Years may be 2 or 4 digits: in the case of 2-digit years,19
is prepended if the 2-digit year is greater than or equal to 93 (the year of the first TSNLP publications and the earliest test suites), otherwise20
is prepended (meaning that users are advised to start using 4-digit years by, at least, the year 2093). In addition, the more universal YYYY-MM-DD format is allowed, but it must have 4-digit years (to disambiguate with the other pattern).Examples
>>> tsdb.cast(':integer', '15') 15 >>> tsdb.cast(':float', '2.05e-3') 0.00205 >>> tsdb.cast(':string', 'Abrams slept.') 'Abrams slept.' >>> tsdb.cast(':date', '10-6-2002') datetime.datetime(2002, 6, 10, 0, 0) >>> tsdb.cast(':date', '8-sep-1999') datetime.datetime(1999, 9, 8, 0, 0) >>> tsdb.cast(':date', 'apr-95') datetime.datetime(1995, 4, 1, 0, 0) >>> tsdb.cast(':date', '01-dec-02 (15:31:01)') datetime.datetime(2002, 12, 1, 15, 31, 1) >>> tsdb.cast(':date', '2008-10-12 10:51') datetime.datetime(2008, 10, 12, 10, 51)
- delphin.tsdb.format(datatype, value, default=None)[source]
Format a column value based on its field.
If value is
None
then default is returned if it is given (i.e., notNone
). If default isNone
,'-1'
is returned if datatype is':integer'
, otherwise an empty string (''
) is returned.If datatype is
':date'
and value is adatetime.datetime
object then a TSDB-compatible date format (DD-MM-YYYY) is returned.In all other cases, value is cast directly to a string and returned.
Examples
>>> tsdb.format(':integer', 42) '42' >>> tsdb.format(':integer', None) '-1' >>> tsdb.format(':integer', None, default='1') '1' >>> tsdb.format(':date', datetime.datetime(1999,9,8)) '8-sep-1999'
File and Directory Operations
Paths
- delphin.tsdb.is_database_directory(path)[source]
Return
True
if path is a valid TSDB database directory.A path is a valid database directory if it is a directory containing a schema file. This is a simple test; the schema file itself is not checked for validity.
- delphin.tsdb.get_path(dir, name)[source]
Determine if the file path should end in .gz or not and return it.
A .gz path is preferred only if it exists and is newer than any regular text file path.
- Parameters:
dir – TSDB database directory
name – name of a file in the database
- Raises:
TSDBError – when neither the .gz nor the text file exist.
Relation File Access
- delphin.tsdb.open(dir, name, encoding=None)[source]
Open a TSDB database file.
Unlike a normal
open()
call, this function takes a base directory dir and a filename name and determines whether the plain text dir/name or compressed dir/name.gz file is opened. Furthermore, this function only opens files in read-only text mode. For writing database files, seewrite()
.- Parameters:
dir – path to the database directory
name – name of the file to open
encoding – character encoding of the file
Example
>>> sentences = [] >>> with tsdb.open('my-profile', 'item') as item: ... for line in item: ... sentences.append(tsdb.split(line)[6])
- delphin.tsdb.write(dir, name, records, fields=None, append=False, gzip=False, encoding='utf-8')[source]
Write records to relation name in the database at dir.
The simplest way to write data to a file would be something like the following:
>>> with open(os.path.join(db.path, 'item'), 'w') as fh: ... print('\n'.join(map(tsdb.join, db['item'])), file=fh)
This function improves on that method by doing the following:
Determining the path from the gzip parameter and existing files
Writing plain text or compressed data, as appropriate
Appending or overwriting data, as requested
Using the schema information to format fields
Writing to a temporary file then copying when done; this prevents accidental data loss when overwriting a file that is being read
Deleting any alternative (compressed or plain text) file to avoid having inconsistent files (e.g., delete any existing
item
when writingitem.gz
)
Note that append cannot be used with gzip or with an existing gzipped file and in such a case a
NotImplementedError
will be raised. This may be allowed in the future, but as appending to a gzipped file (in general) results in inefficient compression, it is better to append to plain text and compress when done.- Parameters:
dir – path to the database directory
name – name of the relation to write
records – iterable of records to write
fields – iterable of
Field
objects, optional if dir points to an existing test suite directoryappend – if
True
, append to rather than overwrite the filegzip – if
True
and the file is not empty, compress the file withgzip
; ifFalse
, do not compressencoding – character encoding of the file
Example
>>> tsdb.write('my-profile', ... 'item', ... item_records, ... schema['item'])
Database Directories
- delphin.tsdb.initialize_database(path, schema, files=False)[source]
Initialize a bare database directory at path.
Initialization creates the directory at path if it does not exist, writes the schema, an deletes any existing files defined by the schema.
Warning
If path points to an existing directory, all relation files defined by the schema will be overwritten or deleted.
- Parameters:
path – the path to the destination database directory
schema – the destination database schema
files – if
True
, create an empty file for every relation in schema
- delphin.tsdb.write_database(db, path, names=None, schema=None, gzip=False, encoding='utf-8')[source]
Write TSDB database db to path.
If path is an existing file (not a directory), a
TSDBError
is raised. If path is an existing directory, the files for all relations in the destination schema will be cleared. Every relation name in names must exist in the destination schema. If schema is given (even if it is the same as for db), every record will be remade (usingmake_record()
) using the schema, and columns may be dropped orNone
values inserted as necessary, but no more sophisticated changes will be made.Warning
If path points to an existing directory, all relation files defined by the schema will be overwritten or deleted.
- Parameters:
db – Database containing data to write
path – the path to the destination database directory
names – list of names of relations to write; if
None
use all relations in the destination schemaschema – the destination database schema; if
None
use the schema of dbgzip – if
True
, compress all non-empty files; ifFalse
, do not compressencoding – character encoding for the database files
Basic Database Class
- class delphin.tsdb.Database(path, autocast=False, encoding='utf-8')[source]
A basic abstraction of a TSDB database.
This class manages basic access into a TSDB database by loading its schema and allowing for named access to relation data.
Warning
Named access to relation data returns a generator iterator of an open file. Calling
generator.close()
or using an idiom likecontextlib.closing()
ensures that the file descriptor gets closed.- Parameters:
path – path to the database directory
autocast – if
True
, automatically cast column values to their datatypesencoding – character encoding of the database files
Example
>>> db = tsdb.Database('my-profile') >>> items = db['item'] >>> first_record = next(items) >>> items.close()
- schema
The schema for the database.
- autocast
Whether to automatically cast column values to their datatypes.
- encoding
The character encoding of database files.
- property path
The database directory’s path.
Exceptions
- exception delphin.tsdb.TSDBSchemaError(*args, **kwargs)[source]
Bases:
TSDBError
Raised when there is an error processing a TSDB schema.
- exception delphin.tsdb.TSDBError(*args, **kwargs)[source]
Bases:
PyDelphinException
Raised when encountering invalid TSDB databases.
- exception delphin.tsdb.TSDBWarning(*args, **kwargs)[source]
Bases:
PyDelphinWarning
Raised when encountering possibly invalid TSDB data.