5. 容器块

A container block is a block that has other blocks as its contents. There are two basic kinds of container blocks: [block quotes] and [list items]. [Lists] are meta-containers for [list items].

We define the syntax for container blocks recursively. The general form of the definition is:

If X is a sequence of blocks, then the result of transforming X in such-and-such a way is a container of type Y with these blocks as its content.

So, we explain what counts as a block quote or list item by explaining how these can be generated from their contents. This should suffice to define the syntax, although it does not give a recipe for parsing these constructions. (A recipe is provided below in the section entitled A parsing strategy.)

5.1. Block quotes

A block quote marker, optionally preceded by up to three spaces of indentation, consists of (a) the character > together with a following space of indentation, or (b) a single character > not followed by a space of indentation.

The following rules define [block quotes]:

  1. Basic case. If a string of lines Ls constitute a sequence of blocks Bs, then the result of prepending a [block quote marker] to the beginning of each line in Ls is a block quote containing Bs.

  2. Laziness. If a string of lines Ls constitute a block quote with contents Bs, then the result of deleting the initial [block quote marker] from one or more lines in which the next character other than a space or tab after the [block quote marker] is [paragraph continuation text] is a block quote with Bs as its content. Paragraph continuation text is text that will be parsed as part of the content of a paragraph, but does not occur at the beginning of the paragraph.

  3. Consecutiveness. A document cannot contain two [block quotes] in a row unless there is a [blank line] between them.

Nothing else counts as a block quote.

Here is a simple{panels}:

> # Foo
> bar
> baz
<blockquote>
<h1>Foo</h1>
<p>bar
baz</p>
</blockquote>

The space or tab after the > characters can be omitted:

># Foo
>bar
> baz
<blockquote>
<h1>Foo</h1>
<p>bar
baz</p>
</blockquote>

The > characters can be preceded by up to three spaces of indentation:

   > # Foo
   > bar
 > baz
<blockquote>
<h1>Foo</h1>
<p>bar
baz</p>
</blockquote>

Four spaces of indentation is too many:

    > # Foo
    > bar
    > baz
<pre><code>&gt; # Foo
&gt; bar
&gt; baz
</code></pre>

The Laziness clause allows us to omit the > before paragraph continuation text:

> # Foo
> bar
baz
<blockquote>
<h1>Foo</h1>
<p>bar
baz</p>
</blockquote>
```

A block quote can contain some lazy and some non-lazy continuation lines:

bar baz foo .

bar baz foo

Laziness only applies to lines that would have been continuations of paragraphs had they been prepended with [block quote markers]. For{panels}, the > cannot be omitted in the second line of

> foo
> ---

without changing the meaning:

foo

.

foo


Similarly, if we omit the > in the second line of

> - foo
> - bar

then the block quote ends after the first line:

  • foo

  • bar .

  • foo
  • bar

For the same reason, we can’t omit the > in front of subsequent lines of an indented or fenced code block:

foo
bar

.

foo
bar

foo

.
<blockquote>
<pre><code></code></pre>
</blockquote>
<p>foo</p>
<pre><code></code></pre>

Note that in the following case, we have a [lazy continuation line]:

foo - bar .

foo - bar

To see why, note that in

> foo
>     - bar

the - bar is indented too far to start a list, and can’t be an indented code block because indented code blocks cannot interrupt paragraphs, so it is [paragraph continuation text].

A block quote can be empty:

.

.

A block quote can have initial or final blank lines:

foo

.

foo

A blank line always separates block quotes:

foo

bar .

foo

bar

(Most current Markdown implementations, including John Gruber’s original Markdown.pl, will parse this{panels} as a single block quote with two paragraphs. But it seems better to allow the author to decide whether two block quotes or one are wanted.)

Consecutiveness means that if we put these block quotes together, we get a single block quote:

foo bar .

foo bar

To get a block quote with two paragraphs, use:

foo

bar .

foo

bar

Block quotes can interrupt paragraphs:

foo

bar .

foo

bar

In general, blank lines are not needed before or after block quotes:

aaa


bbb .

aaa


bbb

However, because of laziness, a blank line is needed between a block quote and a following paragraph:

bar baz .

bar baz

bar

baz .

bar

baz

bar

baz .

bar

baz

It is a consequence of the Laziness rule that any number of initial >s may be omitted on a continuation line of a nested block quote:

foo bar .

foo bar

foo bar baz .

foo bar baz

When including an indented code block in a block quote, remember that the [block quote marker] includes both the > and a following space of indentation. So five spaces are needed after the >:

code

not code .

code

not code

5.2. List items

A list marker is a [bullet list marker] or an [ordered list marker].

A bullet list marker is a -, +, or * character.

An ordered list marker is a sequence of 1–9 arabic digits (0-9), followed by either a . character or a ) character. (The reason for the length limit is that with 10 digits we start seeing integer overflows in some browsers.)

The following rules define [list items]:

  1. Basic case. If a sequence of lines Ls constitute a sequence of blocks Bs starting with a character other than a space or tab, and M is a list marker of width W followed by 1 ≤ N ≤ 4 spaces of indentation, then the result of prepending M and the following spaces to the first line of Ls*, and indenting subsequent lines of Ls by W + N spaces, is a list item with Bs as its contents. The type of the list item (bullet or ordered) is determined by the type of its list marker. If the list item is ordered, then it is also assigned a start number, based on the ordered list marker.

    Exceptions:

    1. When the first list item in a [list] interrupts a paragraph—that is, when it starts on a line that would otherwise count as [paragraph continuation text]—then (a) the lines Ls must not begin with a blank line, and (b) if the list item is ordered, the start number must be 1.

    2. If any line is a [thematic break][thematic breaks] then that line is not a list item.

For{panels}, let Ls be the lines

A paragraph with two lines.

indented code

A block quote. .

A paragraph with two lines.

indented code

A block quote.

And let M be the marker 1., and N = 2. Then rule #1 says that the following is an ordered list item with start number 1, and the same contents as Ls:

  1. A paragraph with two lines.

    indented code
    

    A block quote. .

  1. A paragraph with two lines.

    indented code
    

    A block quote.

The most important thing to notice is that the position of the text after the list marker determines how much indentation is needed in subsequent blocks in the list item. If the list marker takes up two spaces of indentation, and there are three spaces between the list marker and the next character other than a space or tab, then blocks must be indented five spaces in order to fall under the list item.

Here are some{panels}s showing how far content must be indented to be put under the list item:

  • one

two .

  • one

two

  • one

    two .

  • one

    two

  • one

 two

.

  • one
 two
  • one

    two .

  • one

    two

It is tempting to think of this in terms of columns: the continuation blocks must be indented at least to the column of the first character other than a space or tab after the list marker. However, that is not quite right. The spaces of indentation after the list marker determine how much relative indentation is needed. Which column this indentation reaches will depend on how the list item is embedded in other constructions, as shown by this{panels}:

  1. one

    two .

  1. one

    two

Here two occurs in the same column as the list marker 1., but is actually contained in the list item, because there is sufficient indentation after the last containing blockquote marker.

The converse is also possible. In the following{panels}, the word two occurs far to the right of the initial text of the list item, one, but it is not considered part of the list item, because it is not indented far enough past the blockquote marker:

  • one

two .

  • one

two

Note that at least one space or tab is needed between the list marker and any following content, so these are not list items:

-one

2.two .

-one

2.two

A list item may contain blocks that are separated by more than one blank line.

  • foo

    bar .

  • foo

    bar

A list item may contain any kind of block:

  1. foo

    bar
    

    baz

    bam .

  1. foo

    bar
    

    baz

    bam

A list item that contains an indented code block will preserve empty lines within the code block verbatim.

  • Foo

    bar
    
    
    baz
    

.

  • Foo

    bar
    

    baz

Note that ordered list start numbers must be nine digits or less:

  1. ok .

  1. ok

1234567890. not ok .

1234567890. not ok

A start number may begin with 0s:

  1. ok .

  1. ok
  1. ok .

  1. ok

A start number may not be negative:

-1. not ok .

-1. not ok

  1. Item starting with indented code. If a sequence of lines Ls constitute a sequence of blocks Bs starting with an indented code block, and M is a list marker of width W followed by one space of indentation, then the result of prepending M and the following space to the first line of Ls, and indenting subsequent lines of Ls by W + 1 spaces, is a list item with Bs as its contents. If a line is empty, then it need not be indented. The type of the list item (bullet or ordered) is determined by the type of its list marker. If the list item is ordered, then it is also assigned a start number, based on the ordered list marker.

An indented code block will have to be preceded by four spaces of indentation beyond the edge of the region where text will be included in the list item. In the following case that is 6 spaces:

  • foo

    bar
    

.

  • foo

    bar
    

And in this case it is 11 spaces:

  1. foo

    bar
    

.

  1. foo

    bar
    

If the first block in the list item is an indented code block, then by rule #2, the contents must be preceded by one space of indentation after the list marker:

indented code

paragraph

more code

.

indented code

paragraph

more code
  1. indented code
    

    paragraph

    more code
    

.

  1. indented code
    

    paragraph

    more code
    

Note that an additional space of indentation is interpreted as space inside the code block:

  1.  indented code
    

    paragraph

    more code
    

.

  1.  indented code
    

    paragraph

    more code
    

Note that rules #1 and #2 only apply to two cases: (a) cases in which the lines to be included in a list item begin with a character other than a space or tab, and (b) cases in which they begin with an indented code block. In a case like the following, where the first block begins with three spaces of indentation, the rules do not allow us to form a list item by indenting the whole thing and prepending a list marker:

foo

bar .

foo

bar

  • foo

bar .

  • foo

bar

This is not a significant restriction, because when a block is preceded by up to three spaces of indentation, the indentation can always be removed without a change in interpretation, allowing rule #1 to be applied. So, in the above case:

  • foo

    bar .

  • foo

    bar

  1. Item starting with a blank line. If a sequence of lines Ls starting with a single [blank line] constitute a (possibly empty) sequence of blocks Bs, and M is a list marker of width W, then the result of prepending M to the first line of Ls, and preceding subsequent lines of Ls by W + 1 spaces of indentation, is a list item with Bs as its contents. If a line is empty, then it need not be indented. The type of the list item (bullet or ordered) is determined by the type of its list marker. If the list item is ordered, then it is also assigned a start number, based on the ordered list marker.

Here are some list items that start with a blank line but are not empty:

  • foo

  • bar
    
  • baz
    

.

  • foo
  • bar
    
  • baz
    

When the list item starts with a blank line, the number of spaces following the list marker doesn’t change the required indentation:

  • foo .

  • foo

A list item can begin with at most one blank line. In the following{panels}, foo is not part of the list item:

foo .

foo

Here is an empty bullet list item:

  • foo

  • bar .

  • foo
  • bar

It does not matter whether there are spaces or tabs following the [list marker]:

  • foo

  • bar .

  • foo
  • bar

Here is an empty ordered list item:

  1. foo

  2. bar .

  1. foo
  2. bar

A list may start or end with an empty list item:

.

However, an empty list item cannot interrupt a paragraph:

foo *

foo 1. .

foo *

foo 1.

  1. Indentation. If a sequence of lines Ls constitutes a list item according to rule #1, #2, or #3, then the result of preceding each line of Ls by up to three spaces of indentation (the same for each line) also constitutes a list item with the same contents and attributes. If a line is empty, then it need not be indented.

Indented one space:

  1. A paragraph with two lines.

    indented code
    

    A block quote. .

  1. A paragraph with two lines.

    indented code
    

    A block quote.

Indented two spaces:

  1. A paragraph with two lines.

    indented code
    

    A block quote. .

  1. A paragraph with two lines.

    indented code
    

    A block quote.

Indented three spaces:

  1. A paragraph with two lines.

    indented code
    

    A block quote. .

  1. A paragraph with two lines.

    indented code
    

    A block quote.

Four spaces indent gives a code block:

1.  A paragraph
    with two lines.

        indented code

    > A block quote.

.

1.  A paragraph
    with two lines.

        indented code

    > A block quote.
  1. Laziness. If a string of lines Ls constitute a list item with contents Bs, then the result of deleting some or all of the indentation from one or more lines in which the next character other than a space or tab after the indentation is [paragraph continuation text] is a list item with the same contents and attributes. The unindented lines are called lazy continuation lines.

Here is an{panels} with [lazy continuation lines]:

  1. A paragraph with two lines.

    indented code
    

    A block quote. .

  1. A paragraph with two lines.

    indented code
    

    A block quote.

Indentation can be partially deleted:

  1. A paragraph with two lines. .

  1. A paragraph with two lines.

These{panels}s show how laziness can work in nested structures:

  1. Blockquote continued here. .

  1. Blockquote continued here.

  1. Blockquote continued here. .

  1. Blockquote continued here.

  1. That’s all. Nothing that is not counted as a list item by rules #1–5 counts as a list item.

The rules for sublists follow from the general rules [above][List items]. A sublist must be indented the same number of spaces of indentation a paragraph would need to be in order to be included in the list item.

So, in this case we need two spaces indent:

  • foo

    • bar

      • baz

        • boo .

  • foo
    • bar
      • baz
        • boo

One is not enough:

- foo
 - bar
  - baz
   - boo
<ul>
<li>foo</li>
<li>bar</li>
<li>baz</li>
<li>boo</li>
</ul>

Here we need four, because the list marker is wider:

10) foo
    - bar
<ol start="10">
<li>foo
<ul>
<li>bar</li>
</ul>
</li>
</ol>

Three is not enough:

10) foo
   - bar
<ol start="10">
<li>foo</li>
</ol>
<ul>
<li>bar</li>
</ul>

A list may be the first block in a list item:

- - foo
<ul>
<li>
<ul>
<li>foo</li>
</ul>
</li>
</ul>
1. - 2. foo
<ol>
<li>
<ul>
<li>
<ol start="2">
<li>foo</li>
</ol>
</li>
</ul>
</li>
</ol>

A list item can contain a heading:

- # Foo
- Bar

baz


<ul>
<li>
<h1>Foo</h1>
</li>
<li>
<h2>Bar</h2>
baz</li>
</ul>

5.2.1. Motivation

John Gruber’s Markdown spec says the following about list items:

  1. “List markers typically start at the left margin, but may be indented by up to three spaces. List markers must be followed by one or more spaces or a tab.”

  2. “To make lists look nice, you can wrap items with hanging indents… But if you don’t want to, you don’t have to.”

  3. “List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab.”

  4. “It looks nice if you indent every line of the subsequent paragraphs, but here again, Markdown will allow you to be lazy.”

  5. “To put a blockquote within a list item, the blockquote’s > delimiters need to be indented.”

  6. “To put a code block within a list item, the code block needs to be indented twice — 8 spaces or two tabs.”

These rules specify that a paragraph under a list item must be indented four spaces (presumably, from the left margin, rather than the start of the list marker, but this is not said), and that code under a list item must be indented eight spaces instead of the usual four. They also say that a block quote must be indented, but not by how much; however, the example given has four spaces indentation. Although nothing is said about other kinds of block-level content, it is certainly reasonable to infer that all block elements under a list item, including other lists, must be indented four spaces. This principle has been called the four-space rule.

The four-space rule is clear and principled, and if the reference implementation Markdown.pl had followed it, it probably would have become the standard. However, Markdown.pl allowed paragraphs and sublists to start with only two spaces indentation, at least on the outer level. Worse, its behavior was inconsistent: a sublist of an outer-level list needed two spaces indentation, but a sublist of this sublist needed three spaces. It is not surprising, then, that different implementations of Markdown have developed very different rules for determining what comes under a list item. (Pandoc and python-Markdown, for{panels}, stuck with Gruber’s syntax description and the four-space rule, while discount, redcarpet, marked, PHP Markdown, and others followed Markdown.pl’s behavior more closely.)

Unfortunately, given the divergences between implementations, there is no way to give a spec for list items that will be guaranteed not to break any existing documents. However, the spec given here should correctly handle lists formatted with either the four-space rule or the more forgiving Markdown.pl behavior, provided they are laid out in a way that is natural for a human to read.

The strategy here is to let the width and indentation of the list marker determine the indentation necessary for blocks to fall under the list item, rather than having a fixed and arbitrary number. The writer can think of the body of the list item as a unit which gets indented to the right enough to fit the list marker (and any indentation on the list marker). (The laziness rule, #5, then allows continuation lines to be unindented if needed.)

This rule is superior, we claim, to any rule requiring a fixed level of indentation from the margin. The four-space rule is clear but unnatural. It is quite unintuitive that

- foo

  bar

  - baz

should be parsed as two lists with an intervening paragraph,

<ul>
<li>foo</li>
</ul>
<p>bar</p>
<ul>
<li>baz</li>
</ul>

as the four-space rule demands, rather than a single list,

<ul>
<li>
<p>foo</p>
<p>bar</p>
<ul>
<li>baz</li>
</ul>
</li>
</ul>

The choice of four spaces is arbitrary. It can be learned, but it is not likely to be guessed, and it trips up beginners regularly.

Would it help to adopt a two-space rule? The problem is that such a rule, together with the rule allowing up to three spaces of indentation for the initial list marker, allows text that is indented less than the original list marker to be included in the list item. For{panels}, Markdown.pl parses

   - one

  two

as a single list item, with two a continuation paragraph:

<ul>
<li>
<p>one</p>
<p>two</p>
</li>
</ul>

and similarly

>   - one
>
>  two

as

<blockquote>
<ul>
<li>
<p>one</p>
<p>two</p>
</li>
</ul>
</blockquote>

This is extremely unintuitive.

Rather than requiring a fixed indent from the margin, we could require a fixed indent (say, two spaces, or even one space) from the list marker (which may itself be indented). This proposal would remove the last anomaly discussed. Unlike the spec presented above, it would count the following as a list item with a subparagraph, even though the paragraph bar is not indented as far as the first paragraph foo:

 10. foo

   bar  

Arguably this text does read like a list item with bar as a subparagraph, which may count in favor of the proposal. However, on this proposal indented code would have to be indented six spaces after the list marker. And this would break a lot of existing Markdown, which has the pattern:

1.  foo

        indented code

where the code is indented eight spaces. The spec above, by contrast, will parse this text as expected, since the code block’s indentation is measured from the beginning of foo.

The one case that needs special treatment is a list item that starts with indented code. How much indentation is required in that case, since we don’t have a “first paragraph” to measure from? Rule #2 simply stipulates that in such cases, we require one space indentation from the list marker (and then the normal four spaces for the indented code). This will match the four-space rule in cases where the list marker plus its initial indentation takes four spaces (a common case), but diverge in other cases.

5.3. Lists

A list is a sequence of one or more list items [of the same type]. The list items may be separated by any number of blank lines.

Two list items are of the same type if they begin with a [list marker] of the same type. Two list markers are of the same type if (a) they are bullet list markers using the same character (-, +, or *) or (b) they are ordered list numbers with the same delimiter (either . or )).

A list is an ordered list if its constituent list items begin with [ordered list markers], and a bullet list if its constituent list items begin with [bullet list markers].

The start number of an [ordered list] is determined by the list number of its initial list item. The numbers of subsequent list items are disregarded.

A list is loose if any of its constituent list items are separated by blank lines, or if any of its constituent list items directly contain two block-level elements with a blank line between them. Otherwise a list is tight. (The difference in HTML output is that paragraphs in a loose list are wrapped in <p> tags, while paragraphs in a tight list are not.)

Changing the bullet or ordered list delimiter starts a new list:

  • foo

  • bar

  • baz .

  • foo
  • bar
  • baz
  1. foo

  2. bar

  1. baz .

  1. foo
  2. bar
  1. baz

In CommonMark, a list can interrupt a paragraph. That is, no blank line is needed to separate a paragraph from a following list:

Foo

  • bar

  • baz .

Foo

  • bar
  • baz

Markdown.pl does not allow this, through fear of triggering a list via a numeral in a hard-wrapped line:

The number of windows in my house is
14.  The number of doors is 6.

Oddly, though, Markdown.pl does allow a blockquote to interrupt a paragraph, even though the same considerations might apply.

In CommonMark, we do allow lists to interrupt paragraphs, for two reasons. First, it is natural and not uncommon for people to start lists without blank lines:

I need to buy
- new shoes
- a coat
- a plane ticket

Second, we are attracted to a

principle of uniformity: if a chunk of text has a certain meaning, it will continue to have the same meaning when put into a container block (such as a list item or blockquote).

(Indeed, the spec for [list items] and [block quotes] presupposes this principle.) This principle implies that if

  * I need to buy
    - new shoes
    - a coat
    - a plane ticket

is a list item containing a paragraph followed by a nested sublist, as all Markdown implementations agree it is (though the paragraph may be rendered without <p> tags, since the list is “tight”), then

I need to buy
- new shoes
- a coat
- a plane ticket

by itself should be a paragraph followed by a nested sublist.

Since it is well established Markdown practice to allow lists to interrupt paragraphs inside list items, the [principle of uniformity] requires us to allow this outside list items as well. (reStructuredText takes a different approach, requiring blank lines before lists even inside other list items.)

In order to solve of unwanted lists in paragraphs with hard-wrapped numerals, we allow only lists starting with 1 to interrupt paragraphs. Thus,

The number of windows in my house is 14. The number of doors is 6. .

The number of windows in my house is 14. The number of doors is 6.

We may still get an unintended result in cases like

The number of windows in my house is

  1. The number of doors is 6. .

The number of windows in my house is

  1. The number of doors is 6.

but this rule should prevent most spurious list captures.

There can be any number of blank lines between items:

  • foo

  • bar

  • baz .

  • foo

  • bar

  • baz

  • foo

    • bar

      • baz

        bim .

  • foo
    • bar
      • baz

        bim

To separate consecutive lists of the same type, or to separate a list from an indented code block that would otherwise be parsed as a subparagraph of the final list item, you can insert a blank HTML comment:

  • foo

  • bar

  • baz

  • bim .

  • foo
  • bar
  • baz
  • bim
  • foo

    notcode

  • foo

code

.

  • foo

    notcode

  • foo

code

List items need not be indented to the same level. The following list items will be treated as items at the same list level, since none is indented enough to belong to the previous list item:

  • a

  • b

  • c

  • d

  • e

  • f

  • g .

  • a
  • b
  • c
  • d
  • e
  • f
  • g
  1. a

  2. b

  3. c .

  1. a

  2. b

  3. c

Note, however, that list items may not be preceded by more than three spaces of indentation. Here - e is treated as a paragraph continuation line, because it is indented more than three spaces:

  • a

  • b

  • c

  • d - e .

  • a
  • b
  • c
  • d - e

And here, 3. c is treated as in indented code block, because it is indented four spaces and preceded by a blank line.

  1. a

  2. b

3. c

.

  1. a

  2. b

3. c

This is a loose list, because there is a blank line between two of the list items:

  • a

  • b

  • c .

  • a

  • b

  • c

So is this, with a empty second item:

  • a

  • c .

  • a

  • c

These are loose lists, even though there are no blank lines between the items, because one of the items directly contains two block-level elements with a blank line between them:

  • a

  • b

    c

  • d .

  • a

  • b

    c

  • d

  • a

  • b

  • d .

  • a

  • b

  • d

This is a tight list, because the blank lines are in a code block:

  • a

  • b
    
    
    
  • c .

  • a
  • b
    

  • c

This is a tight list, because the blank line is between two paragraphs of a sublist. So the sublist is loose while the outer list is tight:

  • a

    • b

      c

  • d .

  • a
    • b

      c

  • d

This is a tight list, because the blank line is inside the block quote:

  • a

    b

  • c .

  • a

    b

  • c

This list is tight, because the consecutive block elements are not separated by blank lines:

  • a

    b

    c
    
  • d .

  • a

    b

    c
    
  • d

A single-paragraph list is tight:

  • a .

  • a
  • a

    • b .

  • a
    • b

This list is loose, because of the blank line between the two block elements in the list item:

  1. foo
    

    bar .

  1. foo
    

    bar

Here the outer list is loose, the inner list tight:

  • foo

    • bar

    baz .

  • foo

    • bar

    baz

  • a

    • b

    • c

  • d

    • e

    • f .

  • a

    • b
    • c
  • d

    • e
    • f