5. 容器块¶
A container block is a block that has other blocks as its contents. There are two basic kinds of container blocks: [block quotes] and [list items]. [Lists] are meta-containers for [list items].
We define the syntax for container blocks recursively. The general form of the definition is:
If X is a sequence of blocks, then the result of transforming X in such-and-such a way is a container of type Y with these blocks as its content.
So, we explain what counts as a block quote or list item by explaining how these can be generated from their contents. This should suffice to define the syntax, although it does not give a recipe for parsing these constructions. (A recipe is provided below in the section entitled A parsing strategy.)
5.1. Block quotes¶
A block quote marker,
optionally preceded by up to three spaces of indentation,
consists of (a) the character >
together with a following space of
indentation, or (b) a single character >
not followed by a space of
indentation.
The following rules define [block quotes]:
Basic case. If a string of lines Ls constitute a sequence of blocks Bs, then the result of prepending a [block quote marker] to the beginning of each line in Ls is a block quote containing Bs.
Laziness. If a string of lines Ls constitute a block quote with contents Bs, then the result of deleting the initial [block quote marker] from one or more lines in which the next character other than a space or tab after the [block quote marker] is [paragraph continuation text] is a block quote with Bs as its content. Paragraph continuation text is text that will be parsed as part of the content of a paragraph, but does not occur at the beginning of the paragraph.
Consecutiveness. A document cannot contain two [block quotes] in a row unless there is a [blank line] between them.
Nothing else counts as a block quote.
Here is a simple{panels}:
> # Foo
> bar
> baz
<blockquote>
<h1>Foo</h1>
<p>bar
baz</p>
</blockquote>
The space or tab after the >
characters can be omitted:
># Foo
>bar
> baz
<blockquote>
<h1>Foo</h1>
<p>bar
baz</p>
</blockquote>
The >
characters can be preceded by up to three spaces of indentation:
> # Foo
> bar
> baz
<blockquote>
<h1>Foo</h1>
<p>bar
baz</p>
</blockquote>
Four spaces of indentation is too many:
> # Foo
> bar
> baz
<pre><code>> # Foo
> bar
> baz
</code></pre>
The Laziness clause allows us to omit the >
before
paragraph continuation text:
> # Foo
> bar
baz
<blockquote>
<h1>Foo</h1>
<p>bar
baz</p>
</blockquote>
```
A block quote can contain some lazy and some non-lazy continuation lines:
bar baz foo .
bar baz foo
Laziness only applies to lines that would have been continuations of
paragraphs had they been prepended with [block quote markers].
For{panels}, the >
cannot be omitted in the second line of
> foo
> ---
without changing the meaning:
foo
.
foo
Similarly, if we omit the >
in the second line of
> - foo
> - bar
then the block quote ends after the first line:
foo
bar .
- foo
- bar
For the same reason, we can’t omit the >
in front of
subsequent lines of an indented or fenced code block:
foo
bar
.
foo
bar
foo
.
<blockquote>
<pre><code></code></pre>
</blockquote>
<p>foo</p>
<pre><code></code></pre>
Note that in the following case, we have a [lazy continuation line]:
foo - bar .
foo - bar
To see why, note that in
> foo
> - bar
the - bar
is indented too far to start a list, and can’t
be an indented code block because indented code blocks cannot
interrupt paragraphs, so it is [paragraph continuation text].
A block quote can be empty:
.
.
A block quote can have initial or final blank lines:
foo
.
foo
A blank line always separates block quotes:
foo
bar .
foo
bar
(Most current Markdown implementations, including John Gruber’s
original Markdown.pl
, will parse this{panels} as a single block quote
with two paragraphs. But it seems better to allow the author to decide
whether two block quotes or one are wanted.)
Consecutiveness means that if we put these block quotes together, we get a single block quote:
foo bar .
foo bar
To get a block quote with two paragraphs, use:
foo
bar .
foo
bar
Block quotes can interrupt paragraphs:
foo
bar .
foo
bar
In general, blank lines are not needed before or after block quotes:
aaa
bbb .
aaa
bbb
However, because of laziness, a blank line is needed between a block quote and a following paragraph:
bar baz .
bar baz
bar
baz .
bar
baz
bar
baz .
bar
baz
It is a consequence of the Laziness rule that any number
of initial >
s may be omitted on a continuation line of a
nested block quote:
foo bar .
foo bar
foo bar baz .
foo bar baz
When including an indented code block in a block quote,
remember that the [block quote marker] includes
both the >
and a following space of indentation. So five spaces are needed
after the >
:
code
not code .
code
not code
5.2. List items¶
A list marker is a [bullet list marker] or an [ordered list marker].
A bullet list marker
is a -
, +
, or *
character.
An ordered list marker
is a sequence of 1–9 arabic digits (0-9
), followed by either a
.
character or a )
character. (The reason for the length
limit is that with 10 digits we start seeing integer overflows
in some browsers.)
The following rules define [list items]:
Basic case. If a sequence of lines Ls constitute a sequence of blocks Bs starting with a character other than a space or tab, and M is a list marker of width W followed by 1 ≤ N ≤ 4 spaces of indentation, then the result of prepending M and the following spaces to the first line of Ls*, and indenting subsequent lines of Ls by W + N spaces, is a list item with Bs as its contents. The type of the list item (bullet or ordered) is determined by the type of its list marker. If the list item is ordered, then it is also assigned a start number, based on the ordered list marker.
Exceptions:
When the first list item in a [list] interrupts a paragraph—that is, when it starts on a line that would otherwise count as [paragraph continuation text]—then (a) the lines Ls must not begin with a blank line, and (b) if the list item is ordered, the start number must be 1.
If any line is a [thematic break][thematic breaks] then that line is not a list item.
For{panels}, let Ls be the lines
A paragraph with two lines.
indented code
A block quote. .
A paragraph with two lines.
indented code
A block quote.
And let M be the marker 1.
, and N = 2. Then rule #1 says
that the following is an ordered list item with start number 1,
and the same contents as Ls:
A paragraph with two lines.
indented code
A block quote. .
-
A paragraph with two lines.
indented code
A block quote.
The most important thing to notice is that the position of the text after the list marker determines how much indentation is needed in subsequent blocks in the list item. If the list marker takes up two spaces of indentation, and there are three spaces between the list marker and the next character other than a space or tab, then blocks must be indented five spaces in order to fall under the list item.
Here are some{panels}s showing how far content must be indented to be put under the list item:
one
two .
- one
two
one
two .
-
one
two
one
two
.
- one
two
one
two .
-
one
two
It is tempting to think of this in terms of columns: the continuation blocks must be indented at least to the column of the first character other than a space or tab after the list marker. However, that is not quite right. The spaces of indentation after the list marker determine how much relative indentation is needed. Which column this indentation reaches will depend on how the list item is embedded in other constructions, as shown by this{panels}:
one
two .
one
two
Here two
occurs in the same column as the list marker 1.
,
but is actually contained in the list item, because there is
sufficient indentation after the last containing blockquote marker.
The converse is also possible. In the following{panels}, the word two
occurs far to the right of the initial text of the list item, one
, but
it is not considered part of the list item, because it is not indented
far enough past the blockquote marker:
one
two .
- one
two
Note that at least one space or tab is needed between the list marker and any following content, so these are not list items:
-one
2.two .
-one
2.two
A list item may contain blocks that are separated by more than one blank line.
foo
bar .
-
foo
bar
A list item may contain any kind of block:
foo
bar
baz
bam .
-
foo
bar
baz
bam
A list item that contains an indented code block will preserve empty lines within the code block verbatim.
Foo
bar baz
.
-
Foo
bar
baz
Note that ordered list start numbers must be nine digits or less:
ok .
- ok
1234567890. not ok .
1234567890. not ok
A start number may begin with 0s:
ok .
- ok
ok .
- ok
A start number may not be negative:
-1. not ok .
-1. not ok
Item starting with indented code. If a sequence of lines Ls constitute a sequence of blocks Bs starting with an indented code block, and M is a list marker of width W followed by one space of indentation, then the result of prepending M and the following space to the first line of Ls, and indenting subsequent lines of Ls by W + 1 spaces, is a list item with Bs as its contents. If a line is empty, then it need not be indented. The type of the list item (bullet or ordered) is determined by the type of its list marker. If the list item is ordered, then it is also assigned a start number, based on the ordered list marker.
An indented code block will have to be preceded by four spaces of indentation beyond the edge of the region where text will be included in the list item. In the following case that is 6 spaces:
foo
bar
.
-
foo
bar
And in this case it is 11 spaces:
foo
bar
.
-
foo
bar
If the first block in the list item is an indented code block, then by rule #2, the contents must be preceded by one space of indentation after the list marker:
indented code
paragraph
more code
.
indented code
paragraph
more code
indented code
paragraph
more code
.
-
indented code
paragraph
more code
Note that an additional space of indentation is interpreted as space inside the code block:
indented code
paragraph
more code
.
-
indented code
paragraph
more code
Note that rules #1 and #2 only apply to two cases: (a) cases in which the lines to be included in a list item begin with a character other than a space or tab, and (b) cases in which they begin with an indented code block. In a case like the following, where the first block begins with three spaces of indentation, the rules do not allow us to form a list item by indenting the whole thing and prepending a list marker:
foo
bar .
foo
bar
foo
bar .
- foo
bar
This is not a significant restriction, because when a block is preceded by up to three spaces of indentation, the indentation can always be removed without a change in interpretation, allowing rule #1 to be applied. So, in the above case:
foo
bar .
-
foo
bar
Item starting with a blank line. If a sequence of lines Ls starting with a single [blank line] constitute a (possibly empty) sequence of blocks Bs, and M is a list marker of width W, then the result of prepending M to the first line of Ls, and preceding subsequent lines of Ls by W + 1 spaces of indentation, is a list item with Bs as its contents. If a line is empty, then it need not be indented. The type of the list item (bullet or ordered) is determined by the type of its list marker. If the list item is ordered, then it is also assigned a start number, based on the ordered list marker.
Here are some list items that start with a blank line but are not empty:
foo
bar
baz
.
- foo
-
bar
-
baz
When the list item starts with a blank line, the number of spaces following the list marker doesn’t change the required indentation:
foo .
- foo
A list item can begin with at most one blank line.
In the following{panels}, foo
is not part of the list
item:
foo .
foo
Here is an empty bullet list item:
foo
bar .
- foo
- bar
It does not matter whether there are spaces or tabs following the [list marker]:
foo
bar .
- foo
- bar
Here is an empty ordered list item:
foo
bar .
- foo
- bar
A list may start or end with an empty list item:
.
However, an empty list item cannot interrupt a paragraph:
foo *
foo 1. .
foo *
foo 1.
Indentation. If a sequence of lines Ls constitutes a list item according to rule #1, #2, or #3, then the result of preceding each line of Ls by up to three spaces of indentation (the same for each line) also constitutes a list item with the same contents and attributes. If a line is empty, then it need not be indented.
Indented one space:
A paragraph with two lines.
indented code
A block quote. .
-
A paragraph with two lines.
indented code
A block quote.
Indented two spaces:
A paragraph with two lines.
indented code
A block quote. .
-
A paragraph with two lines.
indented code
A block quote.
Indented three spaces:
A paragraph with two lines.
indented code
A block quote. .
-
A paragraph with two lines.
indented code
A block quote.
Four spaces indent gives a code block:
1. A paragraph
with two lines.
indented code
> A block quote.
.
1. A paragraph
with two lines.
indented code
> A block quote.
Laziness. If a string of lines Ls constitute a list item with contents Bs, then the result of deleting some or all of the indentation from one or more lines in which the next character other than a space or tab after the indentation is [paragraph continuation text] is a list item with the same contents and attributes. The unindented lines are called lazy continuation lines.
Here is an{panels} with [lazy continuation lines]:
A paragraph with two lines.
indented code
A block quote. .
-
A paragraph with two lines.
indented code
A block quote.
Indentation can be partially deleted:
A paragraph with two lines. .
- A paragraph with two lines.
These{panels}s show how laziness can work in nested structures:
Blockquote continued here. .
Blockquote continued here.
Blockquote continued here. .
Blockquote continued here.
That’s all. Nothing that is not counted as a list item by rules #1–5 counts as a list item.
The rules for sublists follow from the general rules [above][List items]. A sublist must be indented the same number of spaces of indentation a paragraph would need to be in order to be included in the list item.
So, in this case we need two spaces indent:
foo
bar
baz
boo .
- foo
- bar
- baz
- boo
- baz
- bar
One is not enough:
- foo
- bar
- baz
- boo
<ul>
<li>foo</li>
<li>bar</li>
<li>baz</li>
<li>boo</li>
</ul>
Here we need four, because the list marker is wider:
10) foo
- bar
<ol start="10">
<li>foo
<ul>
<li>bar</li>
</ul>
</li>
</ol>
Three is not enough:
10) foo
- bar
<ol start="10">
<li>foo</li>
</ol>
<ul>
<li>bar</li>
</ul>
A list may be the first block in a list item:
- - foo
<ul>
<li>
<ul>
<li>foo</li>
</ul>
</li>
</ul>
1. - 2. foo
<ol>
<li>
<ul>
<li>
<ol start="2">
<li>foo</li>
</ol>
</li>
</ul>
</li>
</ol>
A list item can contain a heading:
- # Foo
- Bar
baz
<ul>
<li>
<h1>Foo</h1>
</li>
<li>
<h2>Bar</h2>
baz</li>
</ul>
5.2.1. Motivation¶
John Gruber’s Markdown spec says the following about list items:
“List markers typically start at the left margin, but may be indented by up to three spaces. List markers must be followed by one or more spaces or a tab.”
“To make lists look nice, you can wrap items with hanging indents… But if you don’t want to, you don’t have to.”
“List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab.”
“It looks nice if you indent every line of the subsequent paragraphs, but here again, Markdown will allow you to be lazy.”
“To put a blockquote within a list item, the blockquote’s
>
delimiters need to be indented.”“To put a code block within a list item, the code block needs to be indented twice — 8 spaces or two tabs.”
These rules specify that a paragraph under a list item must be indented four spaces (presumably, from the left margin, rather than the start of the list marker, but this is not said), and that code under a list item must be indented eight spaces instead of the usual four. They also say that a block quote must be indented, but not by how much; however, the example given has four spaces indentation. Although nothing is said about other kinds of block-level content, it is certainly reasonable to infer that all block elements under a list item, including other lists, must be indented four spaces. This principle has been called the four-space rule.
The four-space rule is clear and principled, and if the reference
implementation Markdown.pl
had followed it, it probably would have
become the standard. However, Markdown.pl
allowed paragraphs and
sublists to start with only two spaces indentation, at least on the
outer level. Worse, its behavior was inconsistent: a sublist of an
outer-level list needed two spaces indentation, but a sublist of this
sublist needed three spaces. It is not surprising, then, that different
implementations of Markdown have developed very different rules for
determining what comes under a list item. (Pandoc and python-Markdown,
for{panels}, stuck with Gruber’s syntax description and the four-space
rule, while discount, redcarpet, marked, PHP Markdown, and others
followed Markdown.pl
’s behavior more closely.)
Unfortunately, given the divergences between implementations, there
is no way to give a spec for list items that will be guaranteed not
to break any existing documents. However, the spec given here should
correctly handle lists formatted with either the four-space rule or
the more forgiving Markdown.pl
behavior, provided they are laid out
in a way that is natural for a human to read.
The strategy here is to let the width and indentation of the list marker determine the indentation necessary for blocks to fall under the list item, rather than having a fixed and arbitrary number. The writer can think of the body of the list item as a unit which gets indented to the right enough to fit the list marker (and any indentation on the list marker). (The laziness rule, #5, then allows continuation lines to be unindented if needed.)
This rule is superior, we claim, to any rule requiring a fixed level of indentation from the margin. The four-space rule is clear but unnatural. It is quite unintuitive that
- foo
bar
- baz
should be parsed as two lists with an intervening paragraph,
<ul>
<li>foo</li>
</ul>
<p>bar</p>
<ul>
<li>baz</li>
</ul>
as the four-space rule demands, rather than a single list,
<ul>
<li>
<p>foo</p>
<p>bar</p>
<ul>
<li>baz</li>
</ul>
</li>
</ul>
The choice of four spaces is arbitrary. It can be learned, but it is not likely to be guessed, and it trips up beginners regularly.
Would it help to adopt a two-space rule? The problem is that such
a rule, together with the rule allowing up to three spaces of indentation for
the initial list marker, allows text that is indented less than the
original list marker to be included in the list item. For{panels},
Markdown.pl
parses
- one
two
as a single list item, with two
a continuation paragraph:
<ul>
<li>
<p>one</p>
<p>two</p>
</li>
</ul>
and similarly
> - one
>
> two
as
<blockquote>
<ul>
<li>
<p>one</p>
<p>two</p>
</li>
</ul>
</blockquote>
This is extremely unintuitive.
Rather than requiring a fixed indent from the margin, we could require
a fixed indent (say, two spaces, or even one space) from the list marker (which
may itself be indented). This proposal would remove the last anomaly
discussed. Unlike the spec presented above, it would count the following
as a list item with a subparagraph, even though the paragraph bar
is not indented as far as the first paragraph foo
:
10. foo
bar
Arguably this text does read like a list item with bar
as a subparagraph,
which may count in favor of the proposal. However, on this proposal indented
code would have to be indented six spaces after the list marker. And this
would break a lot of existing Markdown, which has the pattern:
1. foo
indented code
where the code is indented eight spaces. The spec above, by contrast, will
parse this text as expected, since the code block’s indentation is measured
from the beginning of foo
.
The one case that needs special treatment is a list item that starts with indented code. How much indentation is required in that case, since we don’t have a “first paragraph” to measure from? Rule #2 simply stipulates that in such cases, we require one space indentation from the list marker (and then the normal four spaces for the indented code). This will match the four-space rule in cases where the list marker plus its initial indentation takes four spaces (a common case), but diverge in other cases.
5.3. Lists¶
A list is a sequence of one or more list items [of the same type]. The list items may be separated by any number of blank lines.
Two list items are of the same type
if they begin with a [list marker] of the same type.
Two list markers are of the
same type if (a) they are bullet list markers using the same character
(-
, +
, or *
) or (b) they are ordered list numbers with the same
delimiter (either .
or )
).
A list is an ordered list if its constituent list items begin with [ordered list markers], and a bullet list if its constituent list items begin with [bullet list markers].
The start number of an [ordered list] is determined by the list number of its initial list item. The numbers of subsequent list items are disregarded.
A list is loose if any of its constituent
list items are separated by blank lines, or if any of its constituent
list items directly contain two block-level elements with a blank line
between them. Otherwise a list is tight.
(The difference in HTML output is that paragraphs in a loose list are
wrapped in <p>
tags, while paragraphs in a tight list are not.)
Changing the bullet or ordered list delimiter starts a new list:
foo
bar
baz .
- foo
- bar
- baz
foo
bar
baz .
- foo
- bar
- baz
In CommonMark, a list can interrupt a paragraph. That is, no blank line is needed to separate a paragraph from a following list:
Foo
bar
baz .
Foo
- bar
- baz
Markdown.pl
does not allow this, through fear of triggering a list
via a numeral in a hard-wrapped line:
The number of windows in my house is
14. The number of doors is 6.
Oddly, though, Markdown.pl
does allow a blockquote to
interrupt a paragraph, even though the same considerations might
apply.
In CommonMark, we do allow lists to interrupt paragraphs, for two reasons. First, it is natural and not uncommon for people to start lists without blank lines:
I need to buy
- new shoes
- a coat
- a plane ticket
Second, we are attracted to a
principle of uniformity: if a chunk of text has a certain meaning, it will continue to have the same meaning when put into a container block (such as a list item or blockquote).
(Indeed, the spec for [list items] and [block quotes] presupposes this principle.) This principle implies that if
* I need to buy
- new shoes
- a coat
- a plane ticket
is a list item containing a paragraph followed by a nested sublist,
as all Markdown implementations agree it is (though the paragraph
may be rendered without <p>
tags, since the list is “tight”),
then
I need to buy
- new shoes
- a coat
- a plane ticket
by itself should be a paragraph followed by a nested sublist.
Since it is well established Markdown practice to allow lists to interrupt paragraphs inside list items, the [principle of uniformity] requires us to allow this outside list items as well. (reStructuredText takes a different approach, requiring blank lines before lists even inside other list items.)
In order to solve of unwanted lists in paragraphs with
hard-wrapped numerals, we allow only lists starting with 1
to
interrupt paragraphs. Thus,
The number of windows in my house is 14. The number of doors is 6. .
The number of windows in my house is 14. The number of doors is 6.
We may still get an unintended result in cases like
The number of windows in my house is
The number of doors is 6. .
The number of windows in my house is
- The number of doors is 6.
but this rule should prevent most spurious list captures.
There can be any number of blank lines between items:
foo
bar
baz .
-
foo
-
bar
-
baz
foo
bar
baz
bim .
- foo
- bar
-
baz
bim
-
- bar
To separate consecutive lists of the same type, or to separate a list from an indented code block that would otherwise be parsed as a subparagraph of the final list item, you can insert a blank HTML comment:
foo
bar
baz
bim .
- foo
- bar
- baz
- bim
foo
notcode
foo
code
.
-
foo
notcode
-
foo
code
List items need not be indented to the same level. The following list items will be treated as items at the same list level, since none is indented enough to belong to the previous list item:
a
b
c
d
e
f
g .
- a
- b
- c
- d
- e
- f
- g
a
b
c .
-
a
-
b
-
c
Note, however, that list items may not be preceded by more than
three spaces of indentation. Here - e
is treated as a paragraph continuation
line, because it is indented more than three spaces:
a
b
c
d - e .
- a
- b
- c
- d - e
And here, 3. c
is treated as in indented code block,
because it is indented four spaces and preceded by a
blank line.
a
b
3. c
.
-
a
-
b
3. c
This is a loose list, because there is a blank line between two of the list items:
a
b
c .
-
a
-
b
-
c
So is this, with a empty second item:
a
c .
-
a
-
c
These are loose lists, even though there are no blank lines between the items, because one of the items directly contains two block-level elements with a blank line between them:
a
b
c
d .
-
a
-
b
c
-
d
a
b
d .
-
a
-
b
-
d
This is a tight list, because the blank lines are in a code block:
a
b
c .
- a
-
b
- c
This is a tight list, because the blank line is between two paragraphs of a sublist. So the sublist is loose while the outer list is tight:
a
b
c
d .
- a
-
b
c
-
- d
This is a tight list, because the blank line is inside the block quote:
a
b
c .
- a
b
- c
This list is tight, because the consecutive block elements are not separated by blank lines:
a
b
c
d .
- a
b
c
- d
A single-paragraph list is tight:
a .
- a
a
b .
- a
- b
This list is loose, because of the blank line between the two block elements in the list item:
foo
bar .
-
foo
bar
Here the outer list is loose, the inner list tight:
foo
bar
baz .
-
foo
- bar
baz
a
b
c
d
e
f .
-
a
- b
- c
-
d
- e
- f