Pete's Alley - Data and Metadata

Written by Rich Morin.

Contents: (hide) (show)

Path:  AreasContentOverviews

Precis:  how Pete's Alley uses data and metadata

In Pete’s Alley, the line between data and metadata is more like a spectrum. At one end, we have body text, such as the paragraph you’re currently reading. For convenience in reading and editing, it’s encoded as Markdown, with some minor extensions. At the other end, we have the file ID, keys for other items, etc. In the middle, there is a range of sub-trees (e.g., address, meta) whose content may be interpreted as either data or metadata.

This information is stored in a flexible, if somewhat unusual, data structure. The input data is stored as a shallow tree of TOML files (e.g., Areas/Content/Overviews/PA_Metadata/main.toml). Each file contains a tree of hash maps, using text strings as both keys and leaf nodes.

At load time, the file tree is flattened into a single hash (items) whose keys are the relative path names of each TOML file. The content of the file, with minor additions and changes, is stored as the item’s value. For example, in a nod toward efficiency, the item’s hash keys are converted to symbols. Some metadata is also harvested and added to the item. Finally, a set of inverted indexes is created, allowing tags and types to be located rapidly.

Most of the item’s metadata resides in its meta sub-tree. The meta.tags hash, in particular, stores sets of tag values (e.g., blindness) under a limited number of types (e.g., impairments). Our searching technology uses these to enable complex set operations, using the intersection and union of query results.

Graph Databases

Basically, we are using collections of hashes and functions to implement a small part of the capabilities offered by a graph database such as Neo4j. Indeed, we expect to experiment with Neo4j as we continue, but the current, informal approach provides all of the flexibility and performance we need for this stage of development.

To be continued…