Skip to main content

Overview

The parser provides complete support for the CommonMark 0.31.2 specification with additional support for GitHub Flavored Markdown (GFM) tables. The implementation closely follows the structure of the reference implementation.
CommonMark is a strongly specified, unambiguous syntax for Markdown. It resolves edge cases and ambiguities found in the original Markdown specification.

Implementation approach

From the README:
The implementation is inspired by various other markdown parsers, including commonmark.js, markdown-it, and marked.js. In fact, the implementation is structurally very similar to how commonmark.js goes about parsing.
The parser uses a line-by-line approach with state management for container blocks, matching the CommonMark reference implementation’s design.

Block-level elements

All CommonMark block elements are fully supported:

Leaf blocks

  • ATX headings (# to ######)
  • Setext headings (underlined)
  • Indented code blocks
  • Fenced code blocks
  • HTML blocks
  • Link reference definitions
  • Paragraphs
  • Thematic breaks

Container blocks

  • Block quotes
  • List items (ordered and unordered)
  • Tight and loose lists

ATX headings

Supports 1-6 levels with optional closing sequence (markdown-parser.ts:1232-1309):
function parseATXHeading(
  line: string,
): { level: 1 | 2 | 3 | 4 | 5 | 6; content: string } | null {
  // Must not be indented more than 3 spaces
  if (isIndentedCodeLine(line)) return null;
  
  line = line.trim();
  if (line.charAt(0) !== "#") return null;
  
  // Count consecutive # characters (max 6)
  let numOfOpeningHashes: 1 | 2 | 3 | 4 | 5 | 6 = 1 as 1 | 2 | 3 | 4 | 5 | 6;
  while (numOfOpeningHashes < line.length && line.charAt(numOfOpeningHashes) === "#") {
    numOfOpeningHashes++;
  }
  if (numOfOpeningHashes > 6) return null;
  
  // Must be followed by space/tab or end of line
  if (numOfOpeningHashes < line.length &&
      line.charAt(numOfOpeningHashes) !== "" &&
      line.charAt(numOfOpeningHashes) !== "\t") {
    return null;
  }
  
  // Strip optional closing sequence
  // ...
}
Examples:
# Heading 1
## Heading 2 ##
### Heading 3 ###############
####No space (not a heading)

Fenced code blocks

Supports both backtick and tilde fences (markdown-parser.ts:882-945):
function parseCodeFenceStart(line: string): {
  indentLevel: number;
  numOfMarkers: number;
  marker: "~" | "`";
  info: string | undefined;
} | null {
  const indentColumns = getLeadingNonspaceColumn(line);
  
  // Must be indented at most 3 spaces
  if (indentColumns > 3) return null;
  
  line = line.trim();
  if (line.length < 3) return null;
  
  const marker = line.charAt(0);
  if (marker !== "~" && marker !== "`") return null;
  
  // Count markers (minimum 3)
  let numOfMarkers = 1;
  while (numOfMarkers < line.length && line.charAt(numOfMarkers) === marker) {
    numOfMarkers++;
  }
  if (numOfMarkers < 3) return null;
  
  // For backtick fences, info string cannot contain backticks
  const info = line.slice(numOfMarkers).trim();
  if (marker === "`" && info.indexOf("`") >= 0) return null;
  
  return { indentLevel: indentColumns, numOfMarkers, marker, info: info || undefined };
}
Examples:
```javascript
const x = 1;
puts "Hello"
string can have spaces
code

### Indented code blocks

Four spaces or one tab creates a code block (`markdown-parser.ts:860-863`):

```typescript
function isIndentedCodeLine(line: string): boolean {
  const column = getLeadingNonspaceColumn(line);
  return column >= 4;
}
Tab expansion: Tabs are expanded to the next multiple of 4 spaces (markdown-parser.ts:1067-1080):
function getLeadingNonspaceColumn(line: string): number {
  let columns = 0;
  for (let i = 0; i < line.length; i++) {
    const ch = line.charAt(i);
    if (ch === " ") {
      columns += 1;
    } else if (ch === "\t") {
      columns += 4 - (columns % 4);  // Tab stops at multiples of 4
    } else {
      break;
    }
  }
  return columns;
}

Block quotes

Lines starting with > create blockquotes (markdown-parser.ts:1167-1211):
function parseBlockquoteLine(line: string): { content: string } | null {
  if (isIndentedCodeLine(line)) return null;
  
  const firstNonspaceIndex = getFirstNonspaceIndex(line);
  
  // First non-whitespace must be >
  if (line.charAt(firstNonspaceIndex) !== ">") return null;
  
  let characterIndex = firstNonspaceIndex + 1;
  let numOfColumns = characterIndex;
  
  // Handle tabs and spaces after >
  while (characterIndex < line.length) {
    if (line.charAt(characterIndex) === "\t") {
      numOfColumns += 4 - (numOfColumns % 4);
      characterIndex++;
    } else if (line.charAt(characterIndex) === " ") {
      numOfColumns++;
      characterIndex++;
    } else {
      break;
    }
  }
  
  // Construct content, consuming optional space after >
  const content = " ".repeat(numOfColumns - firstNonspaceIndex - 1) + line.slice(characterIndex);
  if (content.charAt(0) === " ") {
    return { content: content.slice(1) };
  }
  return { content };
}
Examples:
> Single level
>
> Multiple paragraphs

> Level 1
>> Level 2
>>> Level 3

Lists

Supports both ordered and unordered lists with proper nesting (markdown-parser.ts:383-465): Tight vs loose: Determined by blank lines between items (markdown-parser.ts:198-215):
if (lastMatchedNode.type === "list-item" && lastMatchedNode.hasPendingBlankLine) {
  lastMatchedNode.parent.isTight = false;
  let node = lastMatchedNode;
  while (node !== null) {
    if (node.type === "list-item") {
      node.hasPendingBlankLine = false;
    }
    node = node.parent;
  }
}
Examples:
<!-- Tight list -->
- Item 1
- Item 2
- Item 3

<!-- Loose list -->
- Item 1

- Item 2

- Item 3

<!-- Nested -->
1. First
   - Nested bullet
   - Another
2. Second

Thematic breaks

Three or more -, _, or * characters (markdown-parser.ts:1126-1165):
function isSeparator(line: string): boolean {
  // Must not be indented 4+ spaces
  if (isIndentedCodeLine(line)) return false;
  
  line = line.trim();
  const marker = line.charAt(0);
  
  // Only -, _, and * can create separators
  if (marker !== "-" && marker !== "_" && marker !== "*") return false;
  
  // Count markers (minimum 3)
  let markerCount = 1;
  for (let i = 1; i < line.length; i++) {
    const character = line.charAt(i);
    if (isSpaceOrTab(character)) continue;  // Spaces/tabs allowed
    if (character !== marker) return false;
    markerCount++;
  }
  
  return markerCount >= 3;
}
Examples:
---
***
___
- - -
* * * *

Inline-level elements

All CommonMark inline elements are fully supported:
Emphasis and strong emphasis using the CommonMark flanking rules:From inline-parser.ts:131-177:
// Left-flanking: can open emphasis
const isLeftFlanking =
  !isNextCharacterWhitespace &&
  (!isNextCharacterPunctuation ||
    isPreviousCharacterWhitespace ||
    isPreviousCharacterPunctuation);

// Right-flanking: can close emphasis
const isRightFlanking =
  !isPreviousCharacterWhitespace &&
  (!isPreviousCharacterPunctuation ||
    isNextCharacterWhitespace ||
    isNextCharacterPunctuation);
Rule of three (inline-parser.ts:489-500):
// When a delimiter can both open and close:
// If sum of run lengths is divisible by 3,
// they don't match unless both are divisible by 3
if (
  (node.canClose || closer.node.canOpen) &&
  (node.count + closer.node.count) % 3 === 0 &&
  (node.count % 3 !== 0 || closer.node.count % 3 !== 0)
) {
  continue;
}
Examples:
*italic* _italic_
**bold** __bold__
***bold italic***

Character encoding

Backslash escapes

ASCII punctuation can be escaped (inline-parser.ts:708-746):
function isAsciiPunctuationCharacter(character: string): boolean {
  switch (character) {
    case "!": case '"': case "#": case "$": case "%": case "&": case "'":
    case "(": case ")": case "*": case "+": case ",": case "-": case ".":
    case "/": case ":": case ";": case "<": case "=": case ">": case "?":
    case "@": case "[": case "\\": case "]": case "^": case "_": case "`":
    case "{": case "|": case "}": case "~":
      return true;
    default:
      return false;
  }
}
Examples:
\* Not a bullet
\[Not a link\]
\\Literal backslash

URL encoding

URLs are percent-encoded for safety (inline-parser.ts:1217-1291):
function encodeUnsafeChars(
  input: string,
  allowedChars?: string,
  keepExistingEscapes?: boolean,
): string {
  const DEFAULT_ALLOWED_CHARS = ";/?:@&=+$,-_.!~*'()#";
  const asciiTable = getAsciiEncodeTable(allowedChars || DEFAULT_ALLOWED_CHARS);
  
  // Preserve existing %XX sequences
  if (keepExistingEscapes && codeUnit === 0x25 && i + 2 < input.length) {
    const maybeHex = input.slice(i + 1, i + 3);
    if (/^[0-9a-f]{2}$/i.test(maybeHex)) {
      encoded += input.slice(i, i + 3);
      continue;
    }
  }
  
  // Handle UTF-16 surrogate pairs
  if (codeUnit >= 0xd800 && codeUnit <= 0xdfff) {
    // Valid pair or replacement character
  }
}

Unicode handling

Full Unicode support with proper character classification (inline-parser.ts:748-786):
const UNICODE_P_REGEX = /[!-#%-*,-/:;?@[-\]_{}...]/; // Punctuation
const UNICODE_S_REGEX = /[$+<->^`|~...]/;            // Symbols

function isUnicodePunctuationCharacter(character: string): boolean {
  return UNICODE_P_REGEX.test(character) || UNICODE_S_REGEX.test(character);
}

function isWhiteSpaceCharacter(character: string): boolean {
  const code = character.charCodeAt(0);
  // U+0009 (tab), U+000A (LF), U+000B (VT), U+000C (FF), U+000D (CR)
  // U+0020 (space), U+00A0 (nbsp), U+1680, U+2000-U+200A, U+202F, U+205F, U+3000
}

GFM tables

Tables are a GitHub Flavored Markdown (GFM) extension, not part of core CommonMark. This is the only non-CommonMark feature supported by the parser.

Table syntax

Tables require a header row and delimiter row (markdown-parser.ts:1337-1436):
function parseTableStartLine({
  firstLine,
  secondLine,
}: {
  firstLine?: string;
  secondLine?: string;
}): {
  alignments: Array<"left" | "right" | "center" | undefined>;
  head: { cells: string[] };
} | null {
  // First line must contain pipes
  if (firstLine.indexOf("|") === -1) return null;
  
  // Second line must be delimiter: :---, :---:, ---:
  const delimiterCells = secondLine.split("|");
  const alignments: Array<"left" | "right" | "center" | undefined> = [];
  
  for (let i = 0; i < delimiterCells.length; i++) {
    const cell = delimiterCells[i]?.trim();
    if (!cell && (i === 0 || i === delimiterCells.length - 1)) continue;
    if (!/^:?-+:?$/.test(cell)) return null;
    
    if (cell.charAt(cell.length - 1) === ":") {
      alignments.push(cell.charAt(0) === ":" ? "center" : "right");
    } else if (cell.charAt(0) === ":") {
      alignments.push("left");
    } else {
      alignments.push(undefined);
    }
  }
}
Escaped pipes: Pipes can be escaped in cell content (markdown-parser.ts:1438-1465):
function parseTableRow(line: string): Array<string> {
  const cells = line
    .trim()
    .split(/(?<!\\)\|/)  // Split on unescaped pipes
    .map((cell) => cell.replace(/\\\|/g, "|"))  // Unescape \|
    .map((cell) => cell.trim());
  
  // Remove leading/trailing empty cells
  if (cells[0] === "") cells.shift();
  if (cells[cells.length - 1] === "") cells.pop();
  
  return cells;
}
Examples:
| Header 1 | Header 2 | Header 3 |
| :------- | :------: | -------: |
| Left     | Center   | Right    |
| A        | B        | C        |

<!-- Escaped pipe -->
| Code     | Result   |
| -------- | -------- |
| a \| b   | Shows pipe |

<!-- Minimal table -->
Header 1 | Header 2
--- | ---
Cell 1 | Cell 2

Edge cases

The CommonMark spec resolves many ambiguities:
Block elements are parsed before inline elements. Within blocks, earlier rules take precedence.Example:
# Not a heading (indented 4 spaces -> code block)
    # This is code

<div> HTML blocks take precedence over paragraphs
Not a paragraph
</div>
Blockquotes and list items follow specific continuation rules (markdown-parser.ts:51-196).Example:
> Quote line 1
continued (still in quote)

Not in quote
Paragraphs in containers can be lazy-continued without markers.Example:
> Paragraph starts here
and continues without >

- List item paragraph
continues without bullet
Tabs expand to 4-space tab stops, not fixed 4-space width.Example:
␣␣⇥X  → column 4 (2 spaces + tab to next stop)
␣␣␣⇥X → column 4 (3 spaces + tab to next stop)
␣⇥X   → column 4 (1 space + tab to next stop)

Compliance testing

The implementation can be tested against the official CommonMark test suite (spec.json). The parser structure closely follows commonmark.js to ensure spec compliance.
For detailed edge cases and examples, refer to the CommonMark specification directly.

Build docs developers (and LLMs) love