CommonMark compliance

Overview

The parser provides complete support for the CommonMark 0.31.2 specification with additional support for GitHub Flavored Markdown (GFM) tables. The implementation closely follows the structure of the reference implementation.

CommonMark is a strongly specified, unambiguous syntax for Markdown. It resolves edge cases and ambiguities found in the original Markdown specification.

Implementation approach

From the README:

The implementation is inspired by various other markdown parsers, including commonmark.js, markdown-it, and marked.js. In fact, the implementation is structurally very similar to how commonmark.js goes about parsing.

The parser uses a line-by-line approach with state management for container blocks, matching the CommonMark reference implementation’s design.

Block-level elements

All CommonMark block elements are fully supported:

Leaf blocks

ATX headings (# to ######)
Setext headings (underlined)
Indented code blocks
Fenced code blocks
HTML blocks
Link reference definitions
Paragraphs
Thematic breaks

Container blocks

Block quotes
List items (ordered and unordered)
Tight and loose lists

ATX headings

Supports 1-6 levels with optional closing sequence (markdown-parser.ts:1232-1309):

function parseATXHeading(
  line: string,
): { level: 1 | 2 | 3 | 4 | 5 | 6; content: string } | null {
  // Must not be indented more than 3 spaces
  if (isIndentedCodeLine(line)) return null;
  
  line = line.trim();
  if (line.charAt(0) !== "#") return null;
  
  // Count consecutive # characters (max 6)
  let numOfOpeningHashes: 1 | 2 | 3 | 4 | 5 | 6 = 1 as 1 | 2 | 3 | 4 | 5 | 6;
  while (numOfOpeningHashes < line.length && line.charAt(numOfOpeningHashes) === "#") {
    numOfOpeningHashes++;
  }
  if (numOfOpeningHashes > 6) return null;
  
  // Must be followed by space/tab or end of line
  if (numOfOpeningHashes < line.length &&
      line.charAt(numOfOpeningHashes) !== "" &&
      line.charAt(numOfOpeningHashes) !== "\t") {
    return null;
  }
  
  // Strip optional closing sequence
  // ...
}

Examples:

# Heading 1
## Heading 2 ##
### Heading 3 ###############
####No space (not a heading)

Fenced code blocks

Supports both backtick and tilde fences (markdown-parser.ts:882-945):

function parseCodeFenceStart(line: string): {
  indentLevel: number;
  numOfMarkers: number;
  marker: "~" | "`";
  info: string | undefined;
} | null {
  const indentColumns = getLeadingNonspaceColumn(line);
  
  // Must be indented at most 3 spaces
  if (indentColumns > 3) return null;
  
  line = line.trim();
  if (line.length < 3) return null;
  
  const marker = line.charAt(0);
  if (marker !== "~" && marker !== "`") return null;
  
  // Count markers (minimum 3)
  let numOfMarkers = 1;
  while (numOfMarkers < line.length && line.charAt(numOfMarkers) === marker) {
    numOfMarkers++;
  }
  if (numOfMarkers < 3) return null;
  
  // For backtick fences, info string cannot contain backticks
  const info = line.slice(numOfMarkers).trim();
  if (marker === "`" && info.indexOf("`") >= 0) return null;
  
  return { indentLevel: indentColumns, numOfMarkers, marker, info: info || undefined };
}

Examples:

```javascript
const x = 1;

puts "Hello"

string can have spaces

code

### Indented code blocks

Four spaces or one tab creates a code block (`markdown-parser.ts:860-863`):

```typescript
function isIndentedCodeLine(line: string): boolean {
  const column = getLeadingNonspaceColumn(line);
  return column >= 4;
}

Tab expansion: Tabs are expanded to the next multiple of 4 spaces (markdown-parser.ts:1067-1080):

function getLeadingNonspaceColumn(line: string): number {
  let columns = 0;
  for (let i = 0; i < line.length; i++) {
    const ch = line.charAt(i);
    if (ch === " ") {
      columns += 1;
    } else if (ch === "\t") {
      columns += 4 - (columns % 4);  // Tab stops at multiples of 4
    } else {
      break;
    }
  }
  return columns;
}

Block quotes

Lines starting with > create blockquotes (markdown-parser.ts:1167-1211):

function parseBlockquoteLine(line: string): { content: string } | null {
  if (isIndentedCodeLine(line)) return null;
  
  const firstNonspaceIndex = getFirstNonspaceIndex(line);
  
  // First non-whitespace must be >
  if (line.charAt(firstNonspaceIndex) !== ">") return null;
  
  let characterIndex = firstNonspaceIndex + 1;
  let numOfColumns = characterIndex;
  
  // Handle tabs and spaces after >
  while (characterIndex < line.length) {
    if (line.charAt(characterIndex) === "\t") {
      numOfColumns += 4 - (numOfColumns % 4);
      characterIndex++;
    } else if (line.charAt(characterIndex) === " ") {
      numOfColumns++;
      characterIndex++;
    } else {
      break;
    }
  }
  
  // Construct content, consuming optional space after >
  const content = " ".repeat(numOfColumns - firstNonspaceIndex - 1) + line.slice(characterIndex);
  if (content.charAt(0) === " ") {
    return { content: content.slice(1) };
  }
  return { content };
}

Examples:

> Single level
>
> Multiple paragraphs

> Level 1
>> Level 2
>>> Level 3

Lists

Supports both ordered and unordered lists with proper nesting (markdown-parser.ts:383-465): Tight vs loose: Determined by blank lines between items (markdown-parser.ts:198-215):

if (lastMatchedNode.type === "list-item" && lastMatchedNode.hasPendingBlankLine) {
  lastMatchedNode.parent.isTight = false;
  let node = lastMatchedNode;
  while (node !== null) {
    if (node.type === "list-item") {
      node.hasPendingBlankLine = false;
    }
    node = node.parent;
  }
}

Examples:

<!-- Tight list -->
- Item 1
- Item 2
- Item 3

<!-- Loose list -->
- Item 1

- Item 2

- Item 3

<!-- Nested -->
1. First
   - Nested bullet
   - Another
2. Second

Thematic breaks

Three or more -, _, or * characters (markdown-parser.ts:1126-1165):

function isSeparator(line: string): boolean {
  // Must not be indented 4+ spaces
  if (isIndentedCodeLine(line)) return false;
  
  line = line.trim();
  const marker = line.charAt(0);
  
  // Only -, _, and * can create separators
  if (marker !== "-" && marker !== "_" && marker !== "*") return false;
  
  // Count markers (minimum 3)
  let markerCount = 1;
  for (let i = 1; i < line.length; i++) {
    const character = line.charAt(i);
    if (isSpaceOrTab(character)) continue;  // Spaces/tabs allowed
    if (character !== marker) return false;
    markerCount++;
  }
  
  return markerCount >= 3;
}

Examples:

---
***
___
- - -
* * * *

Inline-level elements

All CommonMark inline elements are fully supported:

Text formatting
Code spans
Links and images
Line breaks
HTML and entities

Emphasis and strong emphasis using the CommonMark flanking rules:From inline-parser.ts:131-177:

// Left-flanking: can open emphasis
const isLeftFlanking =
  !isNextCharacterWhitespace &&
  (!isNextCharacterPunctuation ||
    isPreviousCharacterWhitespace ||
    isPreviousCharacterPunctuation);

// Right-flanking: can close emphasis
const isRightFlanking =
  !isPreviousCharacterWhitespace &&
  (!isPreviousCharacterPunctuation ||
    isNextCharacterWhitespace ||
    isNextCharacterPunctuation);

Rule of three (inline-parser.ts:489-500):

// When a delimiter can both open and close:
// If sum of run lengths is divisible by 3,
// they don't match unless both are divisible by 3
if (
  (node.canClose || closer.node.canOpen) &&
  (node.count + closer.node.count) % 3 === 0 &&
  (node.count % 3 !== 0 || closer.node.count % 3 !== 0)
) {
  continue;
}

Examples:

*italic* _italic_
**bold** __bold__
***bold italic***

Backtick-delimited inline code:From inline-parser.ts:69-110:

const numOfOpeningBackticks = getNumOfConsecutiveCharacters(input, {
  characters: ["`"],
  startIndex: characterCursor,
});

// Find matching closing backticks
let closerIndex = openerIndex;
while (closerIndex < input.length) {
  closerIndex = input.indexOf("`", closerIndex);
  if (closerIndex === -1) break;
  
  const numOfClosingBackticks = getNumOfConsecutiveCharacters(input, {
    characters: ["`"],
    startIndex: closerIndex,
  });
  
  if (numOfClosingBackticks === numOfOpeningBackticks) {
    hasClosingBackticks = true;
    break;
  }
  closerIndex += numOfClosingBackticks;
}

// Strip surrounding spaces if both present
if (content[0] === " " && content[content.length - 1] === " ") {
  if (content.trim().length > 0) {
    content = content.slice(1, content.length - 1);
  }
}

Examples:

`code`
``code with ` backtick``
``` lots of backticks ```

Inline links:

[text](url "title")
![alt](image.png)

Reference links (inline-parser.ts:223-274):

[text][ref]
[text][]
[text]

[ref]: url "title"

Autolinks (inline-parser.ts:298-332):

<http://example.com>
<[email protected]>

Link nesting prevention (inline-parser.ts:288-295):

// Links cannot contain other links per CommonMark
if (openerBracket.marker === "[") {
  for (const bracket of brackets) {
    if (bracket.marker === "[") {
      bracket.isActive = false;
    }
  }
}

Hard breaks: Two or more spaces or backslash before newline (inline-parser.ts:14-43):

if (marker === "\n") {
  let numOfPrecedingSpaces = 0;
  while (true) {
    const currentIndex = startIndex - numOfPrecedingSpaces - 1;
    if (currentIndex < 0) break;
    if (input.charAt(currentIndex) !== " ") break;
    numOfPrecedingSpaces++;
  }
  
  if (numOfPrecedingSpaces >= 2) {
    nodes.push({ type: "hardbreak" });
  } else {
    nodes.push({ type: "softbreak" });
  }
}

Examples:

Hard break:  
(two spaces)

Hard break:\
(backslash)

Soft break:
(just newline)

Raw HTML (inline-parser.ts:1326-1340):

const HTML_TAG_REGEX = new RegExp(
  "^(?:" +
    OPEN_TAG + "|" +
    CLOSE_TAG + "|" +
    COMMENT + "|" +
    PROCESSING + "|" +
    DECLARATION + "|" +
    CDATA +
  ")"
);

HTML entities (inline-parser.ts:344-354):

const ENTITY_REGEX = /^&(?:#x[a-f0-9]{1,6}|#[0-9]{1,7}|[a-z][a-z0-9]{1,31});/i;

const match = input.slice(characterCursor).match(ENTITY_REGEX);
if (match !== null) {
  const entity = match[0];
  nodes.push({ type: "text", text: decodeHTMLStrict(entity) });
}

Examples:

<strong>HTML tags</strong>
&lt; &gt; &amp;
&#35; &#x1F600;

Character encoding

Backslash escapes

ASCII punctuation can be escaped (inline-parser.ts:708-746):

function isAsciiPunctuationCharacter(character: string): boolean {
  switch (character) {
    case "!": case '"': case "#": case "$": case "%": case "&": case "'":
    case "(": case ")": case "*": case "+": case ",": case "-": case ".":
    case "/": case ":": case ";": case "<": case "=": case ">": case "?":
    case "@": case "[": case "\\": case "]": case "^": case "_": case "`":
    case "{": case "|": case "}": case "~":
      return true;
    default:
      return false;
  }
}

Examples:

\* Not a bullet
\[Not a link\]
\\Literal backslash

URL encoding

URLs are percent-encoded for safety (inline-parser.ts:1217-1291):

function encodeUnsafeChars(
  input: string,
  allowedChars?: string,
  keepExistingEscapes?: boolean,
): string {
  const DEFAULT_ALLOWED_CHARS = ";/?:@&=+$,-_.!~*'()#";
  const asciiTable = getAsciiEncodeTable(allowedChars || DEFAULT_ALLOWED_CHARS);
  
  // Preserve existing %XX sequences
  if (keepExistingEscapes && codeUnit === 0x25 && i + 2 < input.length) {
    const maybeHex = input.slice(i + 1, i + 3);
    if (/^[0-9a-f]{2}$/i.test(maybeHex)) {
      encoded += input.slice(i, i + 3);
      continue;
    }
  }
  
  // Handle UTF-16 surrogate pairs
  if (codeUnit >= 0xd800 && codeUnit <= 0xdfff) {
    // Valid pair or replacement character
  }
}

Unicode handling

Full Unicode support with proper character classification (inline-parser.ts:748-786):

const UNICODE_P_REGEX = /[!-#%-*,-/:;?@[-\]_{}...]/; // Punctuation
const UNICODE_S_REGEX = /[$+<->^`|~...]/;            // Symbols

function isUnicodePunctuationCharacter(character: string): boolean {
  return UNICODE_P_REGEX.test(character) || UNICODE_S_REGEX.test(character);
}

function isWhiteSpaceCharacter(character: string): boolean {
  const code = character.charCodeAt(0);
  // U+0009 (tab), U+000A (LF), U+000B (VT), U+000C (FF), U+000D (CR)
  // U+0020 (space), U+00A0 (nbsp), U+1680, U+2000-U+200A, U+202F, U+205F, U+3000
}

GFM tables

Tables are a GitHub Flavored Markdown (GFM) extension, not part of core CommonMark. This is the only non-CommonMark feature supported by the parser.

Table syntax

Tables require a header row and delimiter row (markdown-parser.ts:1337-1436):

function parseTableStartLine({
  firstLine,
  secondLine,
}: {
  firstLine?: string;
  secondLine?: string;
}): {
  alignments: Array<"left" | "right" | "center" | undefined>;
  head: { cells: string[] };
} | null {
  // First line must contain pipes
  if (firstLine.indexOf("|") === -1) return null;
  
  // Second line must be delimiter: :---, :---:, ---:
  const delimiterCells = secondLine.split("|");
  const alignments: Array<"left" | "right" | "center" | undefined> = [];
  
  for (let i = 0; i < delimiterCells.length; i++) {
    const cell = delimiterCells[i]?.trim();
    if (!cell && (i === 0 || i === delimiterCells.length - 1)) continue;
    if (!/^:?-+:?$/.test(cell)) return null;
    
    if (cell.charAt(cell.length - 1) === ":") {
      alignments.push(cell.charAt(0) === ":" ? "center" : "right");
    } else if (cell.charAt(0) === ":") {
      alignments.push("left");
    } else {
      alignments.push(undefined);
    }
  }
}

Escaped pipes: Pipes can be escaped in cell content (markdown-parser.ts:1438-1465):

function parseTableRow(line: string): Array<string> {
  const cells = line
    .trim()
    .split(/(?<!\\)\|/)  // Split on unescaped pipes
    .map((cell) => cell.replace(/\\\|/g, "|"))  // Unescape \|
    .map((cell) => cell.trim());
  
  // Remove leading/trailing empty cells
  if (cells[0] === "") cells.shift();
  if (cells[cells.length - 1] === "") cells.pop();
  
  return cells;
}

Examples:

| Header 1 | Header 2 | Header 3 |
| :------- | :------: | -------: |
| Left     | Center   | Right    |
| A        | B        | C        |

<!-- Escaped pipe -->
| Code     | Result   |
| -------- | -------- |
| a \| b   | Shows pipe |

<!-- Minimal table -->
Header 1 | Header 2
--- | ---
Cell 1 | Cell 2

Edge cases

The CommonMark spec resolves many ambiguities:

Precedence rules

Block elements are parsed before inline elements. Within blocks, earlier rules take precedence.Example:

# Not a heading (indented 4 spaces -> code block)
    # This is code

<div> HTML blocks take precedence over paragraphs
Not a paragraph
</div>

Container continuation

Blockquotes and list items follow specific continuation rules (markdown-parser.ts:51-196).Example:

> Quote line 1
continued (still in quote)

Not in quote

Lazy continuation

Paragraphs in containers can be lazy-continued without markers.Example:

> Paragraph starts here
and continues without >

- List item paragraph
continues without bullet

Indentation and tabs

Tabs expand to 4-space tab stops, not fixed 4-space width.Example:

␣␣⇥X  → column 4 (2 spaces + tab to next stop)
␣␣␣⇥X → column 4 (3 spaces + tab to next stop)
␣⇥X   → column 4 (1 space + tab to next stop)

Compliance testing

The implementation can be tested against the official CommonMark test suite (spec.json). The parser structure closely follows commonmark.js to ensure spec compliance.

For detailed edge cases and examples, refer to the CommonMark specification directly.

Get Started

Core Concepts

Guides

Overview

Implementation approach

Block-level elements

Leaf blocks

Container blocks

ATX headings

Fenced code blocks

Block quotes

Lists

Thematic breaks

Inline-level elements

Character encoding

Backslash escapes

URL encoding

Unicode handling

GFM tables

Table syntax

Edge cases

Compliance testing

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

​Overview

​Implementation approach

​Block-level elements

Leaf blocks

Container blocks

​ATX headings

​Fenced code blocks

​Block quotes

​Lists

​Thematic breaks

​Inline-level elements

​Character encoding

​Backslash escapes

​URL encoding

​Unicode handling

​GFM tables

​Table syntax

​Edge cases

​Compliance testing

Build docs developers (and LLMs) love

Overview

Implementation approach

Block-level elements

ATX headings

Fenced code blocks

Block quotes

Lists

Thematic breaks

Inline-level elements

Character encoding

Backslash escapes

URL encoding

Unicode handling

GFM tables

Table syntax

Edge cases

Compliance testing