About

BaseML is for writers. As an ultra-light version of CommonMark and John Gruber's original Markdown it diverges from Markdown's “overriding design goal ... to make [the source] as readable as possible” in favor of making it as fast as possible to learn, write, organize, parse, render, and transfer.

For Writers

Inspired by modern writing, social media, and self-publishing services, BaseML is ideal for note taking, writing articles, blogging, and building conversational user interfaces. In fact, BaseML, rendered or raw, can usually be directly cut and paste into another editor with little or no change.

Gruber emphasized readability of Markdown for understandable reasons. But open just about any Markdown file today and you will see things like soft-wrapping lines, use of # for headers, and other least-difficulty-to-write decisions by those who write it. Markdown has evolved into the syntax of knowledge source. Knowledge coders (aka writers) like a clean syntax that is—above all—quick to write, easy to understand, and fast to parse. This allows their code to be rendered and published in the most places.

Unfortunately most Markdown writers are also coders. As one would expect, they have taken the pure simplicity of readable Markdown and bastardized it with extensions into virtually unreadable source code. Gruber would probably shake his head at this (which might explain his lack of involvement in CommonMark). Some files today have been so overly extended and adapted that they no longer resemble anything close to Markdown.

BaseML seeks to strike a happy medium by giving up some source readability for writing efficiencies and streaming without throwing readability completely out the window.

Transitive Conversion

Perhaps the biggest reason to consider BaseML is the notion of transitive conversion which means going back and forth between rendered and raw BaseML.

Any rendered BaseML, on GitHub for example, can be copied and pasted directly into blogging sites without a problem.

Transitive conversion example animated GIF

The same is true in reverse. Most blog content document that is cut and paste from rendered HTML can be easily converted to BaseML for use elsewhere.

The applications of this are many. Most importantly this allows the core content to be maintained as a GitHub repository and simply copied by dragging and dropping. This is common when a writer wants to maintain his or her own content but enjoy the benefits of posting to a specific blogging site or community.

A Modern Markdown for an Internet of Things

BaseML is great when you don't want to bloat your application with a full CommonMark parsing and rendering engine, for example, when developing anything that does not need a full web pane to render like with Qt or GTK+. This opens up many possibilities for things like

  • light, open reader apps,
  • modern writing standards,
  • an essential web of signed, light-weight content, or even
  • a decentralized knowledge net with no web dependency at all.

For traditional (everything-including-the-kitchen-sink) publishing CommonMark and Pandoc are recommended.

One Best Way

BaseML gets out of the way and let's you create. This seems to be one large motivator for the original Markdown which was created by a writer and podcaster. BaseML, however, sacrifices redundancy for simplicity. There is only one way to do something in BaseML reduce the cognitive overhead allowing you to focus on your content and message. Since the simplifications are based on well-established best practices you need not worry.

💬 This constraint is exactly the reason behind the success of Medium.

Down, Not Up

Current Markdown parsers such as markdown-it and CommonMark are floating away from the original intent of Markdown. They are going up in complexity, extension, complication, and size. (The CommonMark JavaScript parser is 38KB gzipped.) The original intent of Markdown was to take these things down and away from complicated markup made by academics, scientists, and programmers for academics, scientists, and programmers. BaseML is our response to this problem, a lightweight markup for the least tech savvy among us that does only what we need and nothing more.

JAMstack

Modern web development favors a JAMstack approach to make best use of content delivery networks and offline first design. M is for "Markup" but could equally mean "Markdown" since most web content starts out written in it. Because BaseML is the fastest way to write Markdown it is therefore the quickest way to create JAMstack sites and applications. In fact, with as little as a single README.md file and VuePress you instantly have a progressive web app that is ready for JAMstack deployment.

Inline Parsing, Streamable

Unlike every other Markdown flavor, which all require at least two-passes through the entire document to parse properly, BaseML elements have no interdependency on one another (such as with reference based links) meaning parsing can be streamed and processed reliably with parse-event-driven handler callbacks (like SAX) to produce a stream of nodes that can be rendered or piped immediately. This opens possibilities heretofore impossible with existing Markdown formats.

Streamed parsing requires a fraction of the memory required by other parsers. The parser only needs enough to parse the currently open nodes meaning parsers can be implemented on very small devices—even potentially streamed on a single-row LED display.

Other parsers can be added using a render pipeline model (such as to LaTeX/MathJax, web components, VSCode extensions, or HTML rendering).

When combined with BaseML's front matter, several BaseML documents can be reliably streamed over a single open network connection.

Standardization

The BaseML specification is to be submitted to the IETF and other standards bodies. While Leonard's IETF submissions clarify the dilemma with Markdown no version of Markdown has been submitted currently. This is not surprising given the monumental amount of work just to arrive at the CommonMark consensus. The time is right for a light-weight minimal version of Markdown to be submitted. The ability to stream BaseML makes it particularly well-suited for an IETF standard as more Internet content is streamed over open network connections. No other existing Markdown version allows this.

Consistency

Minimal blogging services with constraints on formatting have successfully demonstrated that readers and writers of all types prefer a consistent, predictable format to allow the focus to be on the content, the writing. The cognitive overhead to learn and process a different presentation unnecessarily distracts writers from their writing. This makes BaseML particularly useful when paired with frameworks like VuePress for documentation and educational content and distribution.

No Regular Expression Parser Dependency

Regular expressions are great tools, but they are horrible when built into a production parser. Just the regx library size itself puts most that would depend on them out of range for many small devices with parsers written in C. Therefore, even though the entire BaseML specification can be represented as a single complex regular expression, parsers should never use them for anything other than experimentation.

Suggested Semantic Macrostructure

Semantic macrostructure is the idea that the structure of the content itself provides meaning and levels of importance. No MIME headers, properties, JSON, hashtags or extra keywords and categories are needed. The content itself provides this. When additional meta information is needed frontmatter can be used as well.

💬 Semantic macrostructure cannot be gamed by those looking to manipulate an SEO ranking without being completely obvious to anyone ready the content.

Here is a suggested relevance ranking for any BaseML document. The primary and secondary data can be read and ranked before any other parsing even takes place.

Primary:

  • title

Secondary:

  • summary (first paragraph)

Tertiary:

  • headings

General:

  • image alt text
  • list content
  • paragraphs
  • blocks

Ancillary:

  • subdocs (blockquotes)
  • fenced content
  • link addresses

Links can be easily identified and cataloged as they are parsed as well providing a collection of all outbound links that can be checked for validity or extracted into an index.

This set structure allows creators of BaseML search engines to easily provide users the ability to set their own search priorities without any guess work about how things are ranked.

Last Updated: 1/1/2019, 8:09:34 PM