Pete's Alley - Extensions

Written by Rich Morin.

Contents: (hide) (show)

Path:  AreasContentHowTos

Precis:  syntax extensions (and tweaks) used by Pete's Alley

Pete’s Alley uses some minor extensions (and tweaks) to Markdown and TOML syntax. This HowTo page attempts to cover all of the differences; please let me know if you run into anything I haven’t covered.

URL Shorthand

URL shortening is a common theme in this page, so let’s handle that topic first. When URLs get long and complicated, they become difficult to remember and unwieldy to edit. So, we support several forms of shorthand notation. Mostly, this involves expansion of common prefixes, but we also play around a bit with suffixes.

Note: The prefix configuration file (_config/prefix.toml) contains definitive information on globally supported prefixes.

External URLs

We support a simple form of shorthand notation for frequently referenced external sites. This has the form ext_foo|bar and results in the expansion of common prefixes. For example:

Internal URLs

We support a more nuanced form of shorthand notation for internal URLs. This notation has the form foo_bar|baz; it results in the expansion of common prefixes and the addition of type-specific suffixes. For example:

Markdown

The original Markdown syntax, as defined by John Gruber, generally serves us quite well as a markup language. We use Earmark (written by Dave Thomas) to transform this syntax into HTML. However, the flavor of Markdown used in our TOML files differs in some minor respects from the text we send to Earmark. With luck, most of these additions and changes will make sense to you.

Includes

Neither the original Markdown nor CommonMark has any syntax for file inclusion. Looking about on the web, I’ve found a couple of proposed extensions:

{% include_relative <filepath> %}   # Jekyll
[!include[<title>](<filepath>)]     # DocFX Flavored Markdown

I’ve decided to support a variation on the Jekyll syntax in which most file paths are relative to the including file, but paths starting with PA_toml/ are special-cased.

Links

Markdown’s syntax for hyperlinks requires both link text and a URL. A title attribute can also be added to explain the function of the link:

[<text>](<URL>)
[<text>](<URL> '<title>')

Although this syntax is shorter and simpler than the corresponding HTML, it can still be tedious and error prone to edit manually. As discussed above, many URLs (such as ours!) are long enough to be awkward, particularly on a width-limited display. We also want most of our links to use title attributes, because these work well with screen readers. However, adding these attributes manually is a nuisance and can result in inconsistent results.

Extended Links

Extending Markdown’s link syntax would be tricky for us and possibly confusing for our users, so we leave it alone. Instead, we support a very similar syntax which we can easily recognize and convert to vanilla Markdown. The basic format is as follows:

[<text>]{<URL shorthand>}

If we want to display the URL as the link text, it would be silly to have to enter it twice. So, we support a $url shorthand for this case:

[$url]{<URL shorthand>}
Examples

Here are some examples of extended links, followed by the generated Markdown links. Careful Reader will recognize the use of URL shorthand, as described above:

[Foo Bar]{ext_gh|foo/bar}
[Foo Bar](https://github.com/foo/bar)

[HTML]{ext_wp|HTML}
[HTML](https://en.wikipedia.org/wiki/HTML)

[$url]{http://www.foo.com}
[http://www.foo.com](http://www.foo.com)

[$url]{ext_gh|foo/bar}
[Foo Bar](https://github.com/foo/bar)

[Catalog]{cat|:a}
[Catalog](/area?key=Areas/Catalog/_area.toml)"

[LHBVI]{cat_gro|LHBVI}
[LHBVI](/item?key=Areas/.../LHBVI/main.toml)"

[LHBVI]{cat_gro|LHBVI:s}
[LHBVI](/source?key=Areas/.../LHBVI/main.toml)"

[About]{_text/about.toml}
[About](/text?key=_text/about.toml)"

[About]{_text/about.toml:s}
[About](/source?key=_text/about.toml)"
Spacing

To ease formatting, we remove leading and trailing spaces from the link text and all spaces from the generated URL. This, along with HTML’s tolerance of internal spaces in link text, allows the use of almost any desired spacing, including absurd syntax such as:

[
  One of Those Ridiculously Long Titles that Some Folks Use
  for Blog Posts and Such
]{
  http://www.foo.com/blog?id=123456789876543210/
  one_of_those_ridiculously_long_urls_that_some_folks_use_
  for_blog_posts_and_such.html
}

Line Breaks

When a Markdown processor encounters two or more trailing spaces on an input line, it is supposed to add a “hard break” (i.e., an HTML br tag). John Gruber explains the rationale for this behavior in this note. Although I understand his motivation, I think the decision was unfortunate. So, our code removes any trailing spaces it encounters; if you want to add a hard break, use an HTML br tag (e.g., <br>).

Discussion

Languages that change their behavior based on invisible syntax differences are (IMNSHO) setting the user up for failure. Specifically, they lead the user into “white space” errors which are typically invisible when displayed or printed. In addition, the spaces may not even be reported by a screen reader. Here are some examples I’ve encountered over the last four decades:

TOML

TOML is a simple and concise language which is well tuned to our needs. So, the only extension we support is the expansion of certain values, for brevity and editing convenience.

CSV Expansion

Certain of our TOML fields may contain multiple values. We use a simplified version of comma-separated value syntax for this. Here’s a (fairly ridiculous) example, for discussion:

' , foo,  , bar  , baz,  '

At runtime, we expand these comma-separated strings into lists of strings, trimming off any leading or trailing white space. We then discard any empty fields:

[ "foo", "bar", "baz" ]

It is often convenient to enter lists of lengthy strings (e.g., URLs), each on its own line. Because empty fields are discarded, we can have a comma on the last line. This regularity helps to prevent editing errors when lines are added or removed:

http://bar.com,
http://baz.com,

It is extremely unusual (and bad practice, in general) for list items to require an embedded comma. Usually, there is a simple workaround, but in some cases there is no real alternative. For example, a URL can legally contain a comma. In this event, escape the comma by preceding it with a pair of backslashes:

https://wiki.openstreetmap.org/wiki/Proposed_features/sidewalk_schema,
https://wiki.openstreetmap.org/wiki/Seattle\\,_Washington/Sidewalk_Import,

URL Expansion

We also expand external URLs, as discussed above. This is done on values which are expected to contain (single or multiple) URLs. So, for example:

wikipedia   = 'ext_wp|Linux'

would be expanded to:

wikipedia   = 'https://en.wikipedia.org/wiki/Linux'

In addition, we do one level of symbol substitution, using names in the same section of the file. We then remove any definitions whose name starts with an underscore. So, for example:

_1          = 'https://en.wikipedia.org/'
linux       = '_1|Linux'
naming      = 'linux|#Naming

would be converted to:

linux       = 'https://en.wikipedia.org/Linux'
naming      = 'https://en.wikipedia.org/Linux#Naming