Specification

BaseML is a simplification of CommonMark and inspired by Medium and Discord designed above all for fast learning, writing, and parsing.

Version

BaseML uses semantic versioning to track the specification.

Version v0.26.0

Version 1.0 will be submitted to the IETF and other standards bodies. Please submit any questions or suggestions as issues.

ABNF

document    = [ frontmatter LF ]
              *( ( subdoc / block ) LF )

frontmatter = "---" LF      ; must be at doc start
              *( *rune LF ) ; optional, unparsed
              "---" LF      ; not greedy

subdoc      =  1*( sub-blank / sub-block ) LF
sub-blank   = ">" LF
sub-block   = "> " block

block       = heading / paragraph / list ; text
            / image / imagelink          ; images
            / pre / fenced               ; raw
            / separator

heading     = h1 / h2 / h3 / h4 / h5 / h6
paragraph   = 1*( text / link ) LF
list        = ( list-u / list-o / llist-u / llist-o ) LF
image       = image-src LF
imagelink   = "[" image-src "](" <!rparen> ")" LF
pre         = 1*( 4SP *rune LF ) LF
fenced      = fenced-b / fenced-t
separator   = "----" LF

h1          = "# " 1*text LF
h2          = "## " 1*text LF
h3          = "### " 1*text LF
h4          = "#### " 1*text LF
h5          = "##### " 1*text LF
h6          = "###### " 1*text LF

list-u      = 1*item-u
list-o      = 1*item-o
item-u      = "* " paragraph
item-o      = "1. " paragraph

llist-u     = 1*item-u-l
llist-o     = 1*item-o-l
item-u-l    = item-u LN <peek:"* ">   ; loose
item-o-l    = item-o LN <peek:"1. ">

image-src   = "![" plain-nz "]" ; alt text required
              "(" 1*<!rparen> ")" ; path, URL, IRI

fenced-b    = "```" [ plain-nz ] LF ; info string
              *( *rune LF )         ; code
              "```" LF              ; not greedy

fenced-t    = "~~~" [ plain-nz ] LF ; same but ~s
              *( *rune* LF )
              "~~~" LF

text        = bold-italic / bold / italic / code / plain-nz-sq
bold-italic = "***" plain-nz-sq "***"
bold        = "**" plain-nz-sq "**"
italic      = "*" plain-nz-sq "*"
code        = "`" plain-nz "`"

smartq      = smdq-l / smdq-r / smsq-l / smsq-r / smsq-c
smdq-l      = " " dq <vrune!dq> ; "o
smdq-r      = <vrune!dq> dq " "; o"
smsq-l      = " " sq <vrune!sq> ; 'o
smsq-r      = <vrune!sq> sq " " ; o'
smsq-c      = <vrune!sq> sq <vrune!sq> ; n't

link        autolink / marklink

marklink    = "[" 1*text "](" 1*<!rparen> ")" 
autolink    = "<" scheme ":" 1*<!gt> ">"
scheme      = ALPHA 1-31( ALPHA / DIGIT / "+" / "." / "-" )

plain-nz-sq = linebreak / smartq / plain-nz
plain-nz    = vrune *plain
plain       = *( rune / collapsed-sp )
collapse-sp = 1*" "

linebreak   = 2SP LF

rune        = vrune / " "
vrune       = %x21-7E / %xA1-167F / %x1681-1FFF
            / %x200B-2027 / %x202A-202E / %x2030-205E
            / %x2060-2FFF / %x3001-D7FF / %xF900-FDCF
            / %xFDF0-FFFD / %x10000-1FFFD / %x20000-2FFFD
            / %x30000-3FFFD / %x40000-4FFFD / %x50000-5FFFD
            / %x60000-6FFFD / %x70000-7FFFD / %x80000-8FFFD
            / %x90000-9FFFD / %xA0000-AFFFD / %xB0000-BFFFD
            / %xC0000-CFFFD / %xD0000-DFFFD / %xE0000-EFFFD

rparen      = ")"
gt          = ">"
dq          = %x22 ; " (double quote)
sq          = %x27 ; ' (single quote)

Files

CommonMark and Markdown "do not specify an encoding."

  • must be UTF-8 encoded
  • emojis and such written directly (no escaping)

Naming suggestions are not required but common best practice:

  • .md suffix
  • ./<title-slug>/README.md primary
  • ./<title-slug>.md secondary
  • ./<any>.md permitted

Whitespace and Lines

  • line feed line endings only \n
  • two spaces plus line return is a hard break
  • no leading spaces allowed
  • multiple blank lines forbidden
  • only spaces
  • multiple spaces collapsed

Headings

# Heading Level One

## Heading Level Two

### Heading Level Three

#### Heading Level Four

##### Heading Level Five

###### Heading Level Six
  • only ATX (#)
  • bold, italic, bold-italic, code, and plain text only
  • automatically converted to local links (slugs)

Slugs

## A 😁 Smiley Section Header

... back in the [smiley](#a-😁-smiley-section-header) section ...

  • IRI UNICODE friendly
  • first heading slug becomes recommended IRI address
  • renderers must provide local heading links as slugs

Although regular expressions are discouraged in all parser implementations in favor of a proper scanner and parser here is a IRI compliant slug generation function:

const removeDiacritics = require('diacritics').remove
// eslint-disable-next-line no-control-regex
const rControl = /[\u0000-\u001f]/g
const rSpecial = /[\s~`!@#$%^&*()\-_+=[\]{}|\\;:"'<>,.?/]+/g

module.exports = function slugify (str) {
  return removeDiacritics(str)
    // Remove control characters
    .replace(rControl, '')
    // Replace special characters
    .replace(rSpecial, '-')
    // Remove continous separators
    .replace(/\-{2,}/g, '-')
    // Remove prefixing and trailing separators
    .replace(/^\-+|\-+$/g, '')
    // lowercase
    .toLowerCase()
}

💢 Many popular slug generators are not IRI compliant causing them to fail when used internationally.

Formatted Text

*one star for italics.*

**two stars for bold.**

***three stars for bold italics.***

`backticks for code.`

*do* ***this*** *instead of mixing*
  • never mixed
<https://baseml.soilsrc.org>
<mailto:foo@bar.example.com>
<tel:555-555-5555>
[text](link/)
[text](./link/)
[text](/link/)
[external](https://link/...)
[email](foo@bar.example.com)

[![alt](image.gif)](https://somewhere)
  • inline only []() (no footnotes)
  • no link titles
  • anywhere in a paragraph (including list items)
  • images can be used as hyperlinks
  • always use < and > for autolinks
  • no inferred email autolinks, use mailto: instead
  • no link conversion (.md to .html), link to foo/ instead

Images

![alt](/assets/image.png)

![alt](image.gif)

![alt](./image.jpg)

  • images must be in their own paragraph
  • no parentheses allowed in image source address

These other recommendations are not required but should be carefully considered:

  • raster images preferred
  • landscape preferred
  • large: 1366x768px limited
  • preferred: 683x384px
  • animated GIF allowed
  • 12mb per image preferred size
  • local links might need ./
  • remote image links strongly discouraged

AI starter files: large, preferred

Video

Watch [this video](./linked.mp4).

[![A image from the video](./image.jpg)](https://youtube.com/...)

Embedded video has never been supported by any Markdown format. The long-standing best practice is to link an image from the video to a video IRI, local or remote. The following are simply considerations when using video:

  • local preferred for offline-first
  • only MP4 supported, suffix required: .mp4
  • can link from image to simulate embedding
  • large: 1366x768px (when detail needed)
  • preferred: 683x384px
  • secondary to written content

Audio

Listen to [this](./linked.mp3) when you can.

Embedded audio has never been supported by any Markdown format. The long-standing best practice is to link text or an image to a IRI pointing to a sound resource, local or remote. The following are simply considerations when using audio:

  • just links to sound assets, local or remote
  • local preferred for offline-first
  • only MP3 supported, suffix required: .mp3
  • secondary to written content

Preformatted, Code Blocks

    Roses are red,
    Violets are blue.
    This isn't code,
    But who asked you?
```
Roses are red,
Violets are blue.
This isn't code,
But who asked you?
```
```js
console.log('code block')
```
~~~md
Also a code block since sometimes you need ```
~~~
  • only three back ticks or tildes
  • info string is all characters after fence
  • first word from info string is language identifier
  • no whitespace before identifier

Paragraphs

Lorem *ipsum* dolor sit amet, consectetur adipiscing elit. Praesent leo massa, pretium et dui et, blandit tempus risus. Fusce lacinia non magna quis pharetra. 

Etiam fringilla <purus@non.eros> volutpat, `sed` [pharetra](/some/) ante imperdiet. Quisque tincidunt magna vitae ullamcorper ornare. Donec ut neque eu velit condimentum consequat. Nulla facilisi. 

Roses are ***red***  
Violets are ***blue***
  • consist of text and links
  • contiguous line required, no line returns
  • hard break with two spaces and line return

Blockquote Sub-documents

> Something someone said.
> Lorem *ipsum* dolor sit amet, consectetur adipiscing elit. Praesent leo massa, pretium et dui et, blandit tempus risus. Fusce lacinia non magna quis pharetra. 
>
> * unordered
> * list
> * here
>
> * another
* unordered
* list
* here
>
> Etiam fringilla <purus@non.eros> volutpat, `sed` [pharetra](/some/) ante imperdiet. Quisque tincidunt magna vitae ullamcorper ornare. Donec ut neque eu velit condimentum consequat. Nulla facilisi. 
>
> Roses are ***red***  
> Violets are ***blue***
> 💬 A block with context based on the emoji.
  • best considered as a sub-document
  • can contain any other blocks except other blockquote sub-documents
  • first unicode code point parsed as context indicator (emoji, etc.)

Hard Breaks

PO Box 384  
Davidson, NC, 28036
  • exactly two spaces plus line return
  • only in text (including link text)
  • automatically inserts a hard break
  • does not start a new paragraph

Lists

* unordered
* list
* here
1. item One
1. item Two
1. item Three
1. Lorem *ipsum* dolor sit amet, consectetur adipiscing elit. Praesent leo massa, pretium et dui et, blandit tempus risus. Fusce lacinia non magna quis pharetra. 

1. Consectetur adipiscing elit. Praesent leo massa, pretium et dui et, blandit tempus risus. Fusce lacinia non magna quis pharetra. 

1. Praesent leo massa, pretium et dui et, blandit tempus risus. Fusce lacinia non magna quis pharetra. 
  • only * for unordered
  • only 1. for ordered ("lazy list numbering")
  • never nested, one level only
  • contain only paragraph content
  • loose lists supported (lines between items)
  • only one paragraph per loose list item (no indented paragraphs)

Separators (Horizontal Rules)

Before the separator.

----

After the separator.
  • exactly four dashes at start of line with no spaces
  • four instead of three to avoid conflict with front matter
  • must be own paragraph (blank line before and after)
  • separates without specific format inferred
  • call it a separator (not horizontal rule)

HTML/XML

Here is a <b>paragraph</b> but it has a component after it.

<SomeIgnoredComponent/>

And another paragraph.
  • ignored as just text, no parsing, passed directly on
  • no validation or security checking
  • distinguished from autolinks

YAML Front Matter

---
frontMatter: true
---
  • must be at the beginning of the document or data stream
  • always three dashes
  • ignored as just text, no parsing, passed directly on

Other considerations:

  • yaml 1.2 expected but not parsed or validated
  • only yaml, no plans for other format support
  • placed at start of document
  • allows multiple document streaming and delineation

Other Template Markers

$$ dollared $$
$$$ dollar3d $$$
[[ bracketed ]]
[[[ bracket3d ]]]
{{ curlied }}
{{{ curli3d }}}
  • ignored as just text, no parsing or validation, passed directly on
  • commonly used by other parsers, MathJax, LaTex, etc.
  • allows piping to additional renderers
Last Updated: 1/1/2019, 7:37:24 PM