Yet Another Markup Language (YAML) 1.0

Working Draft 01 Aug 2001

Editors:

Brian Ingerson mailto:briani@ActiveState.Com
Clark C. Evans
Oren Ben-Kiki mailto:oren@ben-kiki.org

Copyright © 2001 Brian Ingerson, Clark Evans & Oren Ben-Kiki, All Rights Reserved. This document may be freely copied provided it is not modified.

Status of this Document

This specification is a working draft and reflects consensus reached by the members of the yaml-core mailing list. Any questions regarding this draft should be raised on this list. This is a draft and changes are expected, therefore implementers should closely follow this mailing list to stay up-to-date on trends and announcements.

This should not stop an implementer, though! Feedback is welcome.


Abstract

YAML (pronounced "yaamel") is a straightforward machine parsable data serialization format designed for human readability and interaction with scripting languages such as Perl and Python. YAML is optimized for configuration settings, log files, Internet messaging and filtering. This specification describes the YAML information model and serialization format.

Table of Contents

1 Introduction
    1.1 Origin and Goals
    1.2 Key Concepts
    1.3 Example
    1.4 Relation to XML
    1.5 Terminology
2 Serialization
    2.1 Information Model
    2.2 Characters
        2.2.1 Character Set
        2.2.2 Encoding
        2.2.3 End-of-Line Normalization
        2.2.4 Indicators
        2.2.5 Escape Sequences
        2.2.6 Miscellaneous Characters
    2.3 Strings
        2.3.1 Indentation
        2.3.2 Line Folding
        2.3.3 Quoted String
        2.3.4 Anchor String
        2.3.5 Separator Line
    2.4 Document
        2.4.1 Node
        2.4.2 Reference
        2.4.3 Null
        2.4.4 List
        2.4.5 Map
        2.4.6 Shorthand
    2.5 Scalar
        2.5.1 Quoted Scalar
        2.5.2 Simple Scalar
        2.5.3 Block Scalar
3 Changes From Other Versions
    3.1 Changes from the 31 Jul 2001 Draft
    3.2 Changes from the 22 Jul 2001 Draft
    3.3 Changes from the 23 Jun 2001 Draft
    3.4 Changes from the 16 Jun 2001 Draft
    3.5 Changes from the 09 Jun 2001 Draft
    3.6 Changes from the 26 May 2001 Draft
    3.7 Probable Future Changes

1 Introduction

Yet Another Markup Language, abbreviated YAML, describes a class of data objects called YAML documents and partially describes the behavior of computer programs that process them.

YAML documents encode into a serialized form information having a recursive null, scalar, map, or list structure. YAML also includes a method to encode references. At its core, a YAML document consists of a sequence of characters, some of which are considered part of the document's content, and others that are used to indicate structure within the information stream.

A software module called a YAML parser is used to read YAML documents and provide access to their content and structure. In a similar way, a YAML emitter is used to write YAML documents, serializing their content and structure. A YAML processor is a module that provides for parser or emitter functionality or both. It is assumed that a YAML processor does its work on behalf of another module, called an application. This specification describes the interface and required behavior of an YAML processor in terms of how it must read or write YAML documents and the information it must provide or obtain from the application.

1.1 Origin and Goals

The design goals for YAML are:

  1. YAML documents are very readable by humans.

  2. YAML interacts well with scripting languages.

  3. YAML uses host language's native data structures.

  4. YAML works well with Internet mail architecture.

  5. YAML allows large formatted text.

  6. YAML has a consistent information model.

  7. YAML includes a stream based interface.

  8. YAML is expressive and extensible.

  9. YAML is easy to implement.

YAML was designed with experience gained from the construction and deployment of Data::Denter. YAML has also enjoyed much markup language critique from SML-DEV list participants, including experience with the Minimal XML and Common XML specifications.

This specification, together with the Unicode standard for characters, provides all the information necessary to understand YAML Version 1.0 and construct computer programs to process it.

1.2 Key Concepts

YAML builds upon the structures and concepts described by XML, SOAP, Perl, HTML, Python, C, RFC0822, RFC2045, RFC2046, SAX.

YAML's type structures are similar to those of Perl. In YAML, there are four fundamental structures: nulls, scalars, maps (%) and lists (@). YAML also supports references to enable the serialization of graphs. This type structure is common to many other languages and provides a solid basis for an information model. Furthermore, it enables the programmer to use their programming language's native data constructs for YAML manipulation, instead of a document object.

YAML has a unique way of handling whitespace. In YAML, a line break is folded into a single space. Excepting indentation, a sequences of spaces and tabs is usually preserved "as is". This technique makes markup code readable while allowing for line-wrapping without affecting the canonical form of the content.

YAML's block scoping is similar to Python's. In YAML, the extent of a node is indicated by its child's nesting level, i.e., what column it is in. Block indenting provides for easy inspection of the document's structure and greatly improves readability.

YAML's quoted strings are similar to C's. In YAML, text scalars can be surrounded by quotes enabling escape sequences such as \n to represent a new line, \t to represent a tab, and \\ to represent the backslash. Unlike C, since line break is folded into a space, a trailing \ is used as a continuation marker, allowing content to be broken into multiple lines without introducing unwanted whitespace. Further, YAML treats an empty line (two consecutive line breaks) as being equivalent to \n. Lastly, 8-bit (ISO 8859-1) characters can be specified using "\x3B" style escapes, 16-bit (Unicode) characters can be specified using "\u003B" style escapes, and 32-bit (ISO/IEC 10646) characters can be specified using "\U0000003B" style escapes.

The syntax of YAML is an extension of RFC0822, allowing for direct usage of YAML in mail handlers.

YAML was designed to allow for both a native in-memory load/save interface and an incremental interface which includes both a pull style input stream and a push style (SAX like) output stream interface. This enables YAML to directly support the processing of large documents, such as a transaction log, or continuous streams, such as a feed from a production machine.

1.3 Example

Following is a simple example of an invoice represented as a YAML document. The colon is used to separate name:value pairs. The percent sign following a colon indicates that the value is a mapping. The at sign indicates that the value is an ordered list.

buyer: %
    address     : %
       city       : Royal Oak
       line one   : 458 Wittigen's Way
       line two   : Suite #292
       postal     : 48046
       state      : MI
    family name : Dumars
    given name  : Chris
date    : 12-JAN-2001
delivery: %
    method : UZS Express Overnight
    price  : $45.50
comments :
    Mr. Dumars is frequently gone in the morning
    so it is best advised to try things in late
    afternoon. If Joe isn't around, try his house
    keeper, Nancy Billsmer @ (734) 338-4338.
invoice : 00034843
product : @
    : %
        desc      : Grade A, Leather Hide Basketball
        id        : BL394D
        price     : $450.00
        quantity  : 4
    : %
        desc      : Super Hoop (tm)
        id        : BL4438H
        price     : $2,392.00
        quantity  : 1
tax      : $0.00
total    : $4237.50

1.4 Relation to XML

There are many differences between YAML and the eXtensible Markup Language ("XML").  XML was designed to be backwards compatible with Standard Generalized Markup Language ("SGML") and thus had many design constraints placed on it that YAML does not share. Also XML, inheriting SGML's legacy, is designed to support structured documents, where YAML is more closely targeted at messaging with direct support for the native data structures of modern programming languages. Further, XML is a pioneer in many domains and YAML has been grown on the lessons learned by the XML community. These points aside, there are many differences.

The YAML and XML information model are starkly different. In XML, the primary construct is an attributed tree, where each element has an ordered, named list of children and an unordered mapping of names to strings. In YAML, the primary hierarchical construct alternates between a list of anonymous entries, a map of named entries, and scalar and null values. This difference is critical since YAML's model is directly supported by native data structures in most modern programming languages, where XML's model requires mapping, conventions, or an alternative programming component, a document object model.

The YAML and XML syntax vary significantly. In XML, tags are used to denote the begin and end of an element. In YAML, scope is determined by line indentation. Where YAML builds upon RFC0822, XML builds upon SGML and has a separate processing instructions and comments syntax. Furthermore, YAML has a simple whitespace policy, where XML's whitespace policy is completely configurable.

1.5 Terminology

The terminology used to describe YAML is defined in the body of this specification. The terms defined in the following list are used in building those definitions and in describing the actions of a YAML processor:

may

Conformant YAML texts and processors are permitted to but need not behave as described.

must

Conformant YAML texts and processors are required to behave as described, otherwise they are in error.

error

A violation of the rules of this specification; results are undefined. Conforming software may detect and report an error and may recover from it.

fatal error

An error which a conforming YAML processor must detect and report to the application. After encountering a fatal error, the processor may continue processing the data to search for further errors and may report such errors to the application.

2 Serialization

A YAML document can reside in many different forms as long as it complies with the information model below. This includes a sequence of bytes in memory, on disk, or arriving via a network socket as much as it includes events from a sequential interface or a language specific in-memory representation. After covering the information model, this section focuses on the serialized representation.

2.1 Information Model

The information model for YAML is a directed graph having map, list, scalar and null nodes. These data structures are directly supported in modern programming languages such as Python, Perl, Java, and C++.

Since the information model is a graph, a separate serialization model is required. The serialization model adds an additional node type, the reference, to record subsequent occurrences of a given node within a sequence. This enables a more compact notation so that duplicate occurrences of a given node need not be serialized more than once.

document

An ordered sequence of anonymous map nodes.

node

A YAML node can be one of four types: list, null, scalar, map.

list

An ordered sequence of zero or more nodes. Nodes are included by reference, thus they may be part of more than one list or map.

null

A "no value" value.

scalar

A sequence of zero or more characters.

map

An unordered sequence of zero or more (key, node) tuples where each scalar key is unique within the sequence.

The serialization model adds an anchor attribute to every node, and introduces the reference node. The reference node advertises an anchor to indicate the repetition of a node previously encountered.

anchor

An additional attribute added to each node that provides for identification of the node within a given node sequence. Only nodes which could be referenced further in the sequence must be given an anchor. It is not necessary that the anchor be unique.

reference

An additional node type consisting of an anchor which is used to indicate the repetition of the node with the same anchor most previously encountered.

In the serialization model, it is important that each node is serialized exactly once. If a node appears more than once in the graph, only the first occurrence of the node should be serialized. All remaining occurrences of this node should be represented with reference nodes. In this scheme, if a YAML document is loaded into an random access representation, then the reference nodes and anchor indicators should not be available as the non-serialized information model should be used. Also note that anchors can repeat to allow for concatenation, although only the most recent node with a given anchor may be referenced.

2.2 Characters

Characters are the basis for a serialized version of a YAML document. Below is a general definition of a character followed by several characters which have specific meaning in particular contexts.

2.2.1 Character Set

Serialized YAML uses a subset of the Unicode character set.

[01] char ::= #x9 | #xA | [#x20-#xD7FF]
| [#xE000-#xFFFD]
| [#x10000-#x10FFFF]
/* a single printable Unicode character, including the space, tab and new line characters */

Due to the end of line normalization rules, the carriage return (#xD) is not included. As with standard practice, the the surrogate block, FFFE and FFFF are excluded.

2.2.2 Encoding

A YAML processor is required to support both UTF-16 and UTF-8 character encodings. If an input stream begins with a byte order mark, then the initial character encoding shall be UTF-16. Otherwise, the initial encoding shall be UTF-8.

[02] bom ::= #xFEFF /* the Unicode ZERO WIDTH NON-BREAKING SPACE character used to mark a UTF-16 stream and determine byte ordering */

If the stream begins with #xFFFE, then the byte order of the input stream must be swapped when reading. For more information on the byte order mark see the Unicode FAQ.

2.2.3 End-of-Line Normalization

On input and before parsing, a compliant YAML parser must translate both the two-character sequence #xD #xA (CR LF) and any #xD (CR) which is not followed by a #xA (LF) into a single #xA (LF) character (this does not apply to escaped characters). This allows for the definition of an end-of-line marker.

[03] eol ::= #xA /* a normalized end of line marker */

On output, a YAML emitter is free to serialize end of line markers using what ever convention is most appropriate. For Internet mail, CRLF is the preferred form.

2.2.4 Indicators

Indicators are special characters which are used to describe the structure of a YAML document.

[04] imap ::= '%' /* indicates a map node */
[05] ilist ::= '@' /* indicates a list node */
[06] isquote ::= ''' /* indicates a single quoted string */
[07] idquote ::= '"' /* indicates a double quoted string */
[08] iblk ::= '|' /* indicates a block scalar */
[09] iref ::= '*' /* indicates a reference node */
[10] inull ::= '~' /* indicates a null node */
[11] ianchor ::= '&' /* indicates an anchor attribute */
[12] isopen ::= '[' /* open a shorthand section */
[13] isclose ::= ']' /* close a shorthand section */
[14] iescape ::= '\' /* indicates an escape sequence */
[15] ientry ::= ':' /* indicates a list entry or a map entry separator between key and value */
[16] ireserved ::= '!' | '`' | '#' | '^'
| ';' | '.' | ','
| '(' | ')' | '{' | '}'
/* reserved */
[17] indicator ::= imap | ilist | isquote | idquote
| iblk | iref | inull | ianchor
| isopen | isclose | iescape | ientry
| ireserved
/* indicator characters */

2.2.5 Escape Sequences

Escape sequences are used to denote significant whitespace, specify characters by a hexadecimal value, and produce the literal quote and escape indicators.

[18] eescape ::= iescape iescape /* escape literal */
[19] esquote ::= iescape isquote
| isquote isquote
/* single quote literal */
[20] edquote ::= iescape idquote
| idquote idquote
/* double quote literal */
[21] ebel ::= iescape 'a' /* ASCII alert (BEL) */
[22] ebs ::= iescape 'b' /* ASCII backspace (BS) */
[23] eesc ::= iescape 'e' /* ASCII escape (ESC) */
[24] eff ::= iescape 'f' /* ASCII formfeed (FF) */
[25] eeol ::= iescape 'n' /* ASCII linefeed (LF) */
[26] eret ::= iescape 'r' /* ASCII carriage return (CR) */
[27] etab ::= iescape 't' /* ASCII horizontal tab (TAB) */
[28] evtab ::= iescape 'v' /* ASCII vertical tab (VTAB) */
[29] enul ::= iescape 'z' /* ASCII zero (NUL) */
[30] ex2 ::= iescape 'x' hex hex /* 8-bit character */
[31] eu4 ::= iescape 'u' hex hex hex hex /* 16-bit character */
[32] eu8 ::= iescape 'U'
    hex hex hex hex
    hex hex hex hex
/* 32-bit character */
[33] sescape ::= eescape | esquote
| ebel | ebs | eesc | eff
| eeol | eret | etab | evtab
| enul | ex2 | eu4 | eu8
/* single quote escape sequences */
[34] descape ::= eescape | edquote
| ebel | ebs | eesc | eff
| eeol | eret | etab | evtab
| enul | ex2 | eu4 | eu8
/* double quote escape sequences */

2.2.6 Miscellaneous Characters

This section includes several common character range definitions.

[35] lwsp ::= #x20 | #x9 /* linear whitespace, the space or tab character */
[36] lchr ::= char - eol /* linear characters */
[37] pchr ::= lchr - lwsp /* printable linear non-whitespace characters */
[38] sqchr ::= ( ( pchr - isquote ) - iescape ) /* printables less the single quote and escape character */
[39] dqchr ::= ( ( pchr - idquote ) - iescape ) /* printables less the double quote and escape character */
[40] kchr ::= pchr - ientry /* printables less the map entry separator character */
[41] ichr ::= pchr - indicator /* printables less the indicator characters */
[42] ascii ::= [#x21-#x7f] /* ASCII printable characters */
[43] alpha ::= [#x41-#x5a] | [#x61-#x7a] /* ASCII alphabetic character, a-z and A-Z */
[44] number ::= [#x30-#x39] /* ASCII numeric character, 0-9 */
[45] alphanum ::= alpha | number /* ASCII alpha numeric character */
[46] nonalnum ::= ascii - alphanum /* ASCII non alpha numeric character */
[47] hex ::= number | [#x41-#x46] | [#x61-#x66] /* one hexadecimal digit 0-9, A-F or a-f */

2.3 Strings

Moving on to a higher level of abstraction, are sequences of characters, or strings. This section describes line folding and indentation policies, as well as quoted and raw strings.

2.3.1 Indentation

In a YAML serialization, structure is determined from indentation, where indentation is defined as an end of line marker followed by zero or more spaces or tabs. Indentation level is defined recursively.

[48] indent(0) ::= /* the first level of indentation is zero spaces */
[49] indent(n) ::= indent(n-1) #x20 #x20 #x20 #x20 /* the previous indentation setting plus four spaces */

Since the YAML serialization depends upon indentation level to delineate blocks, additional productions are a function of an integer, based on the indent(n) production above.

The indentation level is used exclusively to delineate blocks. Indentation characters are otherwise ignored. In particular, they are never taken to be a part of the value of serialized text.

2.3.2 Line Folding

To increase readability, YAML serialization allows for breaking long text lines. Therefore in many cases the parser replaces a line break with a single space (#x20). When encountering a sequence of (possibly indented) line breaks without any additional intermediate characters, the parser ignores the first one and preserves the rest. Thus, a single line break would be serialized as two, two line breaks would be serialized as three, etc. When this functionality is implied, the lfeols(n) production below will be used.

[50] lfeols(n) ::= eol ( indent(n)? eol )* indent(n) /* line folded newlines */

In quoted strings there is special handling of line breaks immediately preceded by a \ character. In this case the following production is used:

[51] elfeols(n) ::= iescape? eol
( ( indent(n) iescape? )? eol )*
indent(n)
/* escaped line folded newlines */

2.3.3 Quoted String

A quoted string is a mechanism to treat a sequence of characters as a single unit. Within a quoted string, indicators (with the exception of \ and ' or ") can be used without worry and escape sequences can be used to introduce unprintable characters and control line folding.

A quoted string begins and ends with a quote character. It can extend for as many lines as necessary, although an editor or emitter is free to re-break and indent a quoted string as needed to maintain readability.

A line break in a multi-line quoted string is subject to line folding, unless prefixed by a \ in which case the line break is completely ignored. This allows a quoted string to be broken into multiple lines at arbitrary positions.

A quoted string cannot contain an un-escaped quote or an invalid escape sequence.

[52] sqstr(n) ::= isquote
( sqchr | sescape | lwsp | elfeols(n) )*
isquote
/* single quoted string */
[53] dqstr(n) ::= idquote
( dqchr | descape | lwsp | elfeols(n) )*
idquote
/* double quoted string */
[54] qstr(n) ::= sqstr(n) | dqstr(n) /* quoted string */

2.3.4 Anchor String

An anchor string is a sequence of numeric digits used when referencing a node previously visited in the document stream.

[55] astr ::= number+ /* any sequence of digits 0-9 */

2.3.5 Seperator Line

An separator line is used to separate between consecutive top level maps.

[56] sep ::= '-' '-' '-' '-' eol /* separator string */

2.4 Document

A serialized object is a YAML document if, taken as a whole, it complies with the following production.

[57] document ::= bom? eol* pair(0)*
( sep eol* pair(0)* )*
sep? eol*
/* a byte order mark followed by a sequence of maps separated by separator lines */

2.4.1 Node

A node begins at a particular level of indentation, n-1, and its content is indented at a level n. A node can either be a map, list, scalar, a reference or a null.

[58] node(n) ::= ref
| ( shand lwsp+ )? null
| ( anchor lwsp+ )? ( shand lwsp+ )?
  ( list(n) | map(n) | scalar(n) )
/* a reference, a null with optional shorthand, or a list, map or scalar with optional anchor and shorthand */

2.4.2 Reference

An anchor is an indicator which can be used to mark a node giving it an sequential numeric digit for an identifier. The reference node type can then be used to indicate additional inclusions of an anchored node. The anchor string of a reference refers to the most recent node having the equivalent anchor string. Two anchor strings are equivalent if they are identical after removal of any leading 0 characters.

It is an error to have a reference use an anchor string which does not occur previously in the serialization.

[59] anchor ::= ianchor astr /* associates an anchor string with a given node for further reference */
[60] ref ::= iref astr eol /* a reference node */
anchor : &0001  This scalar has an anchor.
repeat :  &001  An anchor string may be reused.
non-ref:        Next node refers to the previous one.
reference: *01

2.4.3 Null

In some cases a list entry or a map key exists but has no associated value. To indicate this a null node is used.

[61] null ::= inull eol /* a null value node */
first: ~
second: @
    : ~
    : Second entry.
    : ~
    : This list has 4 entries, only two with values.
three:
    This map has three keys,
    only two with values.

2.4.4 List

A list is the simplest form of node, it is a sequence of nodes at a higher indentation.

[62] list(n) ::= ilist eol
( indent(n) ientry lwsp+ node(n+1) )*
/* a list of zero or more indented nodes */
list: @
    : First item in top list
    : @
        : Subordinate list
    : @
    : Above list is empty
    :
        A multi-line
        bulleted list entry
    : Sixth item in top list

2.4.5 Map

A map is an association of unique keys with values. Where a key is either a quoted or a single line simple string.

[63] map(n) ::= imap eol ( indent(n) pair(n) )* /* a map indicator, followed by a list of map items */
[64] pair(n) ::= key(n) lwsp* ikey lwsp+ node(n+1) /* a key/node map pair indented appropriately */
[65] key(n) ::= kstr | qstr(n) /* a simple or quoted key string */

In a given map, there is the further restriction that within the map, two folded key values cannot be identical.

map: %
    first : First entry
    second: %
        key: Subordinate map!
    third item: @
        : Subordinate list
        : %
        : Previous map is empty.
    ":": This key had to be quoted.
    "This is
    a multi-line
    key" :
        Whose value is in the next line.

2.4.6 Shorthand

YAML provides a shorthand form for serializing a severely restricted set of map keys. This is merely an alternative syntax for writing the same keys the usual way.

The shorthand form is intentionally restricted. The key must be a single non-alphanumeric ASCII character and the value must be an indicator-free and space-free string.

The shorthand form is meant to be used for special purposes rather than for normal application keys. Currently, three special keys have standard semantics assigned to them:

!

Specifies a class name. If the YAML parser recognizes the this name, the parser de-serializes the map into an object of that class instead into a regular map.

%

Specifies a serialization format. A serialization format optionally accompanies a class name and sepecifies the exact syntax used to de-serialize the object.

=

Specifies a default value for the map. This allows a certain type of schema evolution. What used to be a list or a simple scalar value may be converted into a map (or an object), given additional properties, and still be acceptable to an older application expecting a the original value type, provided it is using an API aware of the default value convention.

Additional standard keys may be defined in future versions of this spec, and a set of keys may be reserved for application-specific use.

The shorthand keys may be associated with a scalar or list node as well as with a map node. When it is associated with such a node, the effect is to convert it into an map node as far as the information model is concerned. The map will include the specified shorthand keys and the original node value (scalar or list) is placed under the = default value key.

[66] shand ::= isopen sentry ( lwsp sentry )* isclose /* a list of zero or more shorthand entries */
[67] sentry ::= skey svalue /* a shorthand key and value */
[68] skey ::= nonalnum /* a shorthand key character */
[69] svalue ::= ichr* /* a shorthand scalar value */
line: [!line] %
    from: [!point] %
        x: 12.5
        y: 3.5
    to:
        !: point
        x: 12.5
        y: 3.5
ordered: 10-JAN-2001
sent: [!date %m/d/y] 1/12/2001
delivered: %
    !: date
    %: d/m/y
    =: 14/1/2001
triangle: [!poly] @
    : [!point] %
        x: 12.5
        y: 3.5
    : [!point] %
        x: 3.5
        y: 12.5
    : [!point] %
        x: 1.5
        y: 2.3

2.5 Scalar

While most of the document productions are fairly strict, the scalar production is generous. It offers three styles of expressing scalar values depending upon the readability requirements. Some of these styles may be used for specifying map keys. The table below describes the various styles.

Line Folded? Used in keys? Escaped?
Quoted Scalar

Yes

Yes

Yes

Simple Scalar

Yes

Yes (single line only)

No

Block Scalar

No

No

No

[70] scalar(n) ::= quoted(n) | simple(n) | block(n) /* scalar node styles */

2.5.1 Quoted Scalar

A quoted scalar uses quoted strings. This is the most general scalar form, allowing every possible Unicode string to be expressed at the cost of some verbosity.

Quoted strings may be used for keys as well as for scalar values.

[71] quoted(n) ::= qstr(n) eol /* a quoted scalar value */
first: "Quoted scalar.\nWith a new line."
second: @
    :   "Line breaks are folded so this ->
        <- new line is the same a space. A
        new line may be inserted by using a
        blank line:

        Or an escape: \n. A line may be brok\
        en anywhere by escaping the newline."
    :   'Each type of quotes may
        be used to avoid the need
        for quoting the other: "'
    :   "Furthermore indicators such
        as @ # : can be added."
    :   'Escape sequences can be used
        to specify quotes and unprintable
        characters: \', \a, \x01.'
    :   "  Leading and trailing  
           spaces are significant
           in all lines   "
    :   'This was a list of six quoted
        scalars!'

2.5.2 Simple Scalar

Simple scalars are more limited then quoted scalars. Line folding is always performed as there is no way to escape a newline. There is no way to specify non-printable characters, and the content must not start with an indicator. Leading white space can not be specified on the same line as the map key or the list entry indicator. Any such white space must be specified in the following line.

In exchange for these limitations, a simple scalar is more readable then a quoted string, since no escaping is required for ', " and \ characters and no surrounding quotes are used. To delineate the end of this scalar, indentation is employed.

If the value of a simple scalar begins or ends with a single LF character, this character is ignored rather than adding a leading or trailing space character to the value. This allows a scalar value to be naturally specified starting at a separate line, and also allows an elegant way of specifying the empty string value.

A limited form of simple scalars may be used as keys. Simple scalar keys are limited to a single line and may not contain any leading or trailing white space.

[72] simple(n) ::= ( ichr lchr* )?
( lfeols(n) lchr* )*
eol
/* one or more indented, non-escaped, line-folded value characters */
[73] kstr ::= kchr+ ( lwsp+ kchr+ )* /* simple string without the map entry separator character */
empry:
first: The value of the previous key is the empty string.
second:
     This value has just one leading white space,
    and is terminated by a hard newline (LF).

third:
    <html><head><title>Embedded HTML!</title></head>
    <body><p class="none">This can even
    have embedded HTML since there is no
    escaping, and since < (the starting character)
    is not an indicator!
    </p></body>
fourth: Indicators like @ : % are allowed, as 
    well as quotes, as long as the first
    character is not an indicator.  Further,
    whitespace     is   preserved.
fifth: @
    : A single line entry.
    :   A second, multi-line,
        entry of the list.
    :
        A third, multi-line list
        entry, without any leading
        or trailing white space.
    :


    :   The value of the previous
        entry is two hard newlines
        (LF LF).

2.5.3 Block Scalar

A block scalar is even more restricted then a simple scalar. A block scalar value must begin in a line following the block indicator. Like simple scalars it is restricted to printable characters only. Unlike a simple scalar, no line folding is done, and therefore long lines cannot be broken. In fact, no processing is performed on block characters aside from stripping away indentation and end of line normalization.

In exchange, block scalars are the most readable format for source code or other text values with significant use of indicators, quoted escaping, or significant newlines. They may also start with any printable character, indicators included.

The value of a block scalar contains, by default, a trailing LF. To prevent this trailing newline from being added, the block indicator should be immediately followed by a '-' character.

[74] block(n) ::= iblk '-'?
( eol ( indent(n) lchr* )? )*
eol
/* an indented character block */
first: |
    This is a block scalar,    with significant
    whitespace, and the use of " @, etc.
         All whitespace    is    significant.
second: |-
    No leading nor trailing new line.
second: @
    : |
        
        First list item which has a 
        leading and trailing new line.
    : |-
        Second list item. Does not have 
        leading nor trailing new line.  
        Has two new lines altogether.

3 Changes From Other Versions

3.1 Changes From The 31 Jul 2001 Draft

Simple Scalar and End Of Lines

Moved eol productions to the end, rather then the start, of most productions. The wording and productions for the simple scalar were fixed to match each other and the indended semantics. The simple scalar example set was enhanced to clarify the proper interpretation.

Empty Document

Both empty top level maps and no top level maps are now allowed, and hence so are empty documents.

3.2 Changes From The 22 Jul 2001 Draft

Thanks to Joe Lapp for reviewing the 22 Jul 2001 draft and recommending these changes.

Phrasing fixes

Fixed phrasing in the abstract, and sections 1.4, 2.1, 2.3.1, 2.4.3, 2.4.4, 2.4.5, 2.4.6 and 2.5.3.

Production fixes

Fixed productions: added production 47, 59, fixed productions 57, 58, 60 and 64 (productions numbers in the 22 Jul 2001 draft are off by one in some cases). Most are bug fixes. Actual changes include allowing for empty lines surrounding a top level map, allowing an optional trailing seperator line, and forbidding annotations which have no sensible semantics (anchor to null, anchor to a reference, shorthand for a reference).

3.3 Changes From The 23 Jun 2001 Draft

Merge Spec

Due to the decision to leave all API related issues outside the core spec, the spec has been re-merged into a single file, covering just what used to be the introduction and serialization sections of the previous specs.

Character Encodings

The spec now refers only to the Unicode standard. Due to the efforts by the Unicode and ISO/IEC 10646 groups, both standards are in almost complete agreement. The additional features provided by the ISO/IEC standard are rarely used in practice, while Unicode is simpler and is more widely supported by existing languages and systems.

Strict Indentation

Indentation is now a strict 4 spaces per level. This allows for the new whitespace policy and the new block notation.

Shorthand Notation

The spec introduces a shorthand notation for attaching special keys to any node type (converting it to a map if necessary). This will need more work.

Null Nodes

Null nodes have finally been added, after somehow eluding all previous versions.

Bullet Lists

Change the * optional prefix for scalar list entries to a mandatory : and therefore remove the special name "bulleted list entries".

Simplify Keys

Multi-line simple keys are now out. The door is open for re-introducing them, however.

Change Whitespace Policy

White space folding has been replaced by line break folding. White space is now always significant, except for indentation and for seperation of structure tokens.

Block Scalar Syntax

The syntax for block scalars has been replaced by a more elegant one.

3.4 Changes From The 16 Jun 2001 Draft

Split Spec

The spec is now separated into several files. This allows different versions of the spec to share the same version of unchanged section, and make it easier to refer to a particular version of important pieces of the spec such as serialization and interfaces. All the HTML files use the same shared CSS file. Cross references between the separate parts of the spec are now relative, though references to older versions are absolute and refer to the main site.

Cyclical Graph

Change the wording on the information model to allow for graphs with cycles. The alternative is to define the anchor semantics in such a way that would preclude cycles.

Null Character Escape

The escape sequence \z was added to allow convenient escaping of the ASCII zero (null) character.

Remove Binary Scalars

The information model now contains just one type of scalar. The special syntax for binary scalars has been removed. This functionality will be re-added in the form of a color.

Remove Class Shorthand

The syntax no longer supports the !class syntax. This functionality will be re-added in the form of a color.

Bullet Lists

Change the optional prefix for scalar list entries to * and rename such entries to "bulleted list entries".

Make Keys More Scalars-Compatible

Allow for multi-line simple keys and unify the description of scalar keys and values where it makes sense.

HTML Tidying

All the HTML pages have gone through Tidy. Also, all the HTML files have been run through an HTML validation service and a CSS validation service. Broken links and spelling were checked using another online HTML validator. This needs to be repeated for all future drafts.

3.5 Changes From The 09 Jun 2001 Draft

Relationship with MIME

Beyond using base64 for binary scalars, no additional special relationship with MIME is expected. Hence references to the MIME and mail RFCs were moved from section 1.1 ("required reading") to section 1.2 ("background material").

Strict Indentation

Indentation is now completely strict for all scalar styles. Also, the productions were changes to use a consistent semantics to the indentation level parameter.

List Scalar Prefixes

A list scalar entry may be prefixed by an optional : indicator to improve readability of multi-line simple scalar values.

Anchor Semantics

Leading zeros are now ignored for comparing anchor strings.

No Empty Line At Start

The document production was fixed so as not to require an empty line at the start of a document.

Character Escapes

The set of character escapes is now maximal (including the rare \e escape for the useful ASCII ESC character). Also, it is now possible to "escape" a line break in a quoted string (the previous drafts were inconsistent at this point).

32 Bit Characters

The current draft allows such characters, and includes a specialized escaping format ('\Uxxxxxxxx') to support them.

3.6 Changes From The 26 May 2001 Draft

Changes Section

The changes section was added for easier comparison of different versions. The final draft will not contain this section.

Class Indicator

The indicator was changed from # to ! to allow for # to be used for comments.

No Empty Line At End

The document production was fixed so as not to require an empty line at the end of a document.

Strict Indentation

Indentation in quoted strings and binary blocks is now strict to ensure readability.

Productions

Problems in the productions were fixed, especially where related to white space issues and formatting of the result.

BOM Comment

The link to the Unicode FAQ was moved to section 2.2.2.

Binary Scalars

The information model now distinguishes between text and binary scalars.

3.7 Probable Future Changes

Character Set

It may be useful to base the definition of a valid character on Unicode character properties. For example, we may define a valid character as any printable Unicode character.

Shorthand Syntax

As this is the first draft to contain this features, changes are to be expected. In particular, the exact set of valid keys needs to be defined and partitioned to "standard YAML keys" and "application defined keys". Similarly, the set of possible values for each key must be defined. Currently, for example, it excludes the possibility of using Java style '.'-separated class names as a value.

Shorthand References

On the one hand, this may be useful. On the other, in some cases (reference to a map), it is no longer possible to map a YAML reference to a native language pointer or reference. The current draft simply forbids attaching shorthand to a reference, but the decision is not final (yet).

Seperator Lines

The policy with regard to seperator lines and empty lines following them needs to be finalized.

Reviewing Examples

Ensure there are enough examples. In particular, special syntax forms should be demonstrated to remove doubt in the interpretation of the productions.

Verifying Productions

Verify all productions are correct, are actually used, and properly hyper-linked. Joe has done a good job reviewing the previous draft, so this draft is much improved. Still, a re-verification will need to be done for a release candidate.

Polish

Spell and grammar checking, formatting, etc. Again, this draft is much improved thanks to Joe's review, but this will need to be repeated for a release candidate.