YAML Ain't Markup Language (YAML) (tm) 1.0

Working Draft 31 Oct 2002

Latest version:
http://yaml.org/spec/
Editors:
Oren Ben-Kiki, Clark C. Evans, Brian Ingerson

Status of this Document

This specification is a working draft and reflects consensus reached by the members of the yaml-core mailing list. Any questions regarding this draft should be raised on this list at http://lists.sourceforge.net/lists/listinfo/yaml-core.

With this release of the YAML specificiation, we now encourage development of YAML processors, so that the design of YAML can be validated. The specification is still subject to change; however, such changes will be limited to polish and fixing any logical flaws and bugs.

Therefore, this is "Last Call" for changes; if you have a pet feature now is the very last time that it can be proposed before Release Canaidate status. Changes which would cause "Last Call" YAML streams to be invalid will be seriously considered only if absolutely necessary.

Abstract

YAML(tm) (rhymes with "camel") is a straightforward machine-parsable data serialization format designed for human readability and interaction with scripting languages such as Perl and Python. YAML is designed for data serialization, formatted dumping, configuration files, log files, Internet messaging and filtering. This specification describes the YAML information model and serialization format. Together with the Unicode standard for characters, it provides all the information necessary to understand YAML Version 1.0 and to construct programs to process YAML information.

Table of Contents

  31 Introduction
  3   1.1 Goals
  4   1.2 Design
  5   1.3 Prior Art
  6   1.4 Relation to XML
  6   1.5 Terminology
 
  72 Preview
  7   2.1 Collections
  8   2.2 Structures
  9   2.3 Scalars
  10   2.4 Type Family
11   2.5 Full Length Example
 
123 Syntax
 
12   3.1 Characters
12      3.1.1 Character Set
12      3.1.2 Encoding
13      3.1.3 Indicators
14      3.1.4 Line Breaks
15      3.1.5 Miscellaneous
 
16   3.2 Space Processing
16      3.2.1 Indentation
17      3.2.2 Throwaway comments
 
18   3.3 YAML Stream
18      3.3.1 Document
20      3.3.2 Directive
20      3.3.3 Text Node
22      3.3.4 Node Property
22      3.3.5 Transfer Method
27      3.3.6 Anchor
 
27   3.4 Alias
 
27   3.5 Collection
28       3.5.1 Sequence
29       3.5.2 Mapping
 
30   3.6 Scalar
31      3.6.1 End Of Line Normalization
31      3.6.2 Block Modifiers
32      3.6.3 Explicit Indentation
  33      3.6.4 Chomping
34      3.6.5 Literal
35      3.6.6 Folding
37      3.6.7 Folded
39      3.6.8 Single Quoted
40      3.6.9 Escaping
41      3.6.10 Double Quoted
43      3.6.11 Plain
 
454 Information Models
 
45   4.1 Overview
 
47   4.2 Native (Language) Model
47      4.2.1 Native Node
47      4.2.2 Type Family
49      4.2.3 Equivalence
50      4.2.4 Documents Stream
 
51   4.3 Graph (Generic) Model
51      4.3.1 Format
52      4.3.2 Type Family Formats
52      4.3.3 Graph Node
52      4.3.4 Implicit Types and Formats
 
53   4.4 Serial (Tree) Model
53      4.4.1 Serial Node
53      4.4.2 Alias
54      4.4.3 Pair
54      4.4.4 Serial Mapping
54      4.4.5 Ordering
 
55   4.5 Text (Syntax) Model
55      4.5.1 Style
55      4.5.2 Comment
55      4.5.3 Directive
55      4.5.4 Details
 
565 Type Families
57   5.1 Sequence
57   5.2 Mapping
58   5.3 String
 
    Language-Independent Types
 
    Change History

1 Introduction

YAML Ain't Markup Language, abbreviated YAML, is both a human-readable data serialization format and processing model. This text describes the class of data objects called YAML document streams and partially describes the behavior of computer programs that process them.

YAML document streams encode in a textual form the native data constructs of modern scripting languages. Strings, arrays, hashes, and other user-defined data types are supported. A YAML document stream consists of a sequence of characters, some of which are considered part of the document's content, and others that are used to indicate document structure.

YAML information can be viewed in two primary ways, for machine processing and for human presentation. A YAML processor is a tool for converting information between these complementary views. It is assumed that a YAML processor does its work on behalf of another module, called an application. This specification describes the required behavior of a YAML processor. It describes how a YAML processor must read or write YAML document streams and the information structures it must provide to or obtain from the application.

1.1 Goals

The design goals for YAML are:

  1. YAML documents are very readable by humans.

  2. YAML interacts well with scripting languages.

  3. YAML uses host languages' native data structures.

  4. YAML has a consistent model to support generic tools.

  5. YAML enables stream-based processing.

  6. YAML is expressive and extensible.

  7. YAML is easy to implement.

YAML's initial direction was set by the markup language discussions among SML-DEV members. YAML was also conceived with experience gained from the construction and deployment of Brian Ingerson's Perl module Data::Denter. Since then YAML has matured through the ideas and support it has received from its user community.

1.2 Design

YAML was first conceived as notation for a simple set of primatives, the sequence, the mapping and the scalar, which, when used recursively to form a graph structure, are strong enough for most machine processing needs. By sequence we mean a ordered collection, by mapping we mean an unordered association of unique keys to values, and by scalar we mean a series of unicode characters. These primitives map cleanly to most modern programming languages; the sequence corresponds to a Perl array and a Python list, the mapping corresponds to a Perl hashtable and a Python dictionary. This basis is also formally justified, as both mapping and sequence are mathematical functions with well defined characteristics. With this core model, YAML supports machine processing with a balance of pratical motivation and theory.

To meet the needs of serialization and human presentation, YAML has many syntatical aspects beyond the primitives described above. As a graph is flattened into a tree, ordering is imposed upon mapping keys and an alias mechaism is used to write subsequent occurances of duplicate nodes. To enhance readability, various writing styles are provided for different aesthetic needs. Further, a comment mechanism allows for annotation othogonal to the "content" of a YAML stream. YAML syntax also has other details, such as placement of line breaks and choice of escaping and scalar formats. While these aspects are essential to a human presentation of YAML, they are not needed for machine processing.

This split between machine processing and human presentation creates an inherent tension. While it may be tempting to drive machine processing with comments, key order, styles and other presentation information, this would greatly complicate the definition and operation of generic tools. Although one could argue that their YAML data is sufficently isolated, information is often used in ways unforseen. Therefore, applications should only rely upon the formal definition of YAML's primitives to drive processing. For example, sequences should be used when order is important for machine processing even though mapping key order may be available. Likewise, duplicate keys should never be used, even though a parser may report them without warning. Overall, this distinction is one of intent. Applications which respect the split between human presentation and machine processing will enjoy the ability to use generic tools such as path expressions evaluators, graph transformation languages, or schema validators.

This separation does not prevent YAML processors from providing mechanisms to report or handle presentation aspects. Human readability is a prime directive for YAML. Therefore, a YAML processor may provide a shadow or wrapper mechanism to maintain and provide access to presentation aspects of a YAML text. In this way an application can have influence over how its information will be written to a stream for the best human impact. Since presentation aspects may be the same for a large class of YAML documents, a stylesheet could also be used to provide preferred key ordering, syntax styles, comments, and other presentation oriented instructions.

1.3 Prior Art

YAML integrates and builds upon structures and concepts described by C, Java, Perl, Python, Ruby, RFC0822 (MAIL), RFC1866 (HTML), RFC2045 (MIME), RFC2396 (URI), SAX, SOAP and XML.

YAML's core type system is based on the serialization requirements of Perl, Python and Ruby. YAML directly supports both scalar (string) values and collection (array, hash) values. Support for common types enables programmers to use their language's native data constructs for YAML manipulation, instead of requiring a special document object model (DOM).

Like XML's SOAP, the YAML serialization supports native graph structures through a rich alias mechanism. Also like SOAP, YAML provides for application-defined types. This allows YAML to serialize rich data structures required for modern distributed computing. YAML provides unique global type names using a namespace mechanism inspired by Java's DNS based package naming convention and XML's URI based namespaces.

YAML's block scoping is similar to Python's. In YAML, the extent of a node is indicated by its column. YAML's literal scalar leverages this by enabling formatted text to be cleanly mixed within an indented structure without troublesome escaping. Further, YAML's block indenting provides for easy inspection of the document's structure.

Motivated by HTML's end-of-line normalization, YAML's folded scalar introduces a unique method of handling white space. In YAML, single line breaks may be folded into a single space, while empty lines represent line break characters. This technique allows for paragraphs to be word-wrapped without affecting the canonical form of the content.

YAML's double quoted scalar uses familar C-style escape sequences. This enables ASCII representation of non-printable or 8-bit (ISO 8859-1) characters such as '\x3B'. 16-bit Unicode and 32-bit (ISO/IEC 10646) characters are supported with escape sequences such as '\u003B' and '\U0000003B'.

The syntax of YAML was motivated by Internet Mail (RFC0822) and remains partially compatible with this standard. Further, YAML borrows the idea of having multiple documents from MIME (RFC2045). YAML's top-level production is a stream of independent documents; ideal for message-based distributed processing systems.

YAML was designed to have an incremental interface that includes both a pull-style input stream and a push-style (SAX-like) output stream interfaces. Together this enables YAML to support the processing of large documents, such as a transaction log, or continuous streams, such as a feed from a production machine.

1.4 Relation to XML

Newcomers to YAML often search for its correlation to the eXtensible Markup Language (XML). While the two languages may actually compete in several application domains, there is no direct correlation between them. YAML is primarily a data serialization language. XML is often used for various types of data serialization but that is not its fundamental design goal.

There are many differences between YAML and XML. XML was designed to be backwards compatible with Standard Generalized Markup Language (SGML) and thus had many design constraints placed on it that YAML does not share. Also XML, inheriting SGML's legacy, is designed to support structured documents, where YAML is more closely targeted at messaging and native data structures. Where XML is a pioneer in many domains, YAML is the result of many lessons from the XML community.

The YAML and XML information models are starkly different. In XML, the primary construct is an attributed tree, where each element has an ordered, named list of children and an unordered mapping of names to strings. In YAML, the primary constructs are sequence (natively stored as an array), mapping (natively stored as a hash) and scalar values (string, integer, floating point). This difference is critical since YAML's model is directly supported by native data structures in most modern programming languages, where XML's model requires mapping conventions, or an alternative programming component (e.g. a document object model).

It should be mentioned that there are ongoing efforts to define standard XML/YAML mappings. This generally requires that a subset of each language be used.

1.5 Terminology

The terminology used to describe YAML is defined in the body of this specification. The terms defined in the following list are used in building those definitions and in describing the actions of a YAML processor:

may   Conformant YAML streams and processors are permitted to but need not behave as described.
should   Conformant YAML streams and processors are encouraged to behave as described, but may do otherwise if a warning message is provided to the user and any deviant behavior requires conscious effort to enable. (i.e. a non-default setting)
must   Conformant YAML streams and processors are required to behave as described, otherwise they are in error.
error   A violation of the rules of this specification; results are undefined. Conforming software must detect and report an error and may recover from it.

2 Preview

This section provides a quick glimpse into the expressive power of YAML. It is not expected that the first-time reader grok all of the examples. Rather, these selections are used as motivation for the remainder of the specification.

2.1 Collections

YAML's block collections use indentation for scope and begin each member on its own line. Block sequences indicate each member with a dash (-). Block mappings use a colon to mark each (key: value) pair.
- Mark McGwire
- Sammy Sosa
- Ken Griffey

A1

Sequence of scalars
(ball players)

hr:  65
avg: 0.278
rbi: 147

A2

Mapping of scalars to scalars
(player statistics)

american:
   - Boston Red Sox
   - Detroit Tigers
   - New York Yankees
national:
   - New York Mets
   - Chicago Cubs
   - Atlanta Braves

A3

Mapping of scalars to sequences
(ball clubs in each league)

- 
  name: Mark McGwire
  hr:   65
  avg:  0.278
- 
  name: Sammy Sosa
  hr:   63
  avg:  0.288

A4

Sequence of mappings
(players' statistics)


YAML also has in-line flow styles for compact notation. The flow sequence is written as a comma separated list within square brackets. In a similar manner, the flow mapping uses curley braces.
- [ name         , hr , avg   ]
- [ Mark McGwire , 65 , 0.278 ] 
- [ Sammy Sosa   , 63 , 0.288 ]

A5

Sequence of sequences

Mark McGwire: {hr: 65, avg: 0.278} 
Sammy Sosa:   {hr: 63,
               avg: 0.288}

A6

Mapping of mappings

2.2 Structures

YAML uses three dashes (---) to separate documents within a stream (file or socket). Comment lines begin with the pound sign (#). Repeated nodes are first marked with the ampersand (&) and then referenced with an asterisk (*) thereafter.
---
name: Mark McGwire
hr:   65
avg:  0.278
---
name: Sammy Sosa
hr:   63
avg:  0.288

B1

Two documents; one stream
(players' statistics)

# Ranking of players by
# 1998 season home runs.
---
- Mark McGwire
- Sammy Sosa
- Ken Griffey



B2

Document with leading comment

hr: # 1998 hr ranking
   - Mark McGwire 
   - Sammy Sosa 
rbi:
   # 1998 rbi ranking
   - Sammy Sosa
   - Ken Griffey

B3

Single document with two comments

hr:
   - Mark McGwire
   # Following node labeled SS
   - &SS Sammy Sosa
rbi:
   - *SS # Subsequent occurance 
   - Ken Griffey

B4

Node for Sammy Sosa appears twice in this document


The question mark indicates a complex key. Within a block sequence, mapping pairs can start immediately following the dash.
? # PLAY SCHEDULE
  - Detroit Tigers
  - Chicago Cubs
:  
  - 2001-07-23

? [ New York Yankees,
    Atlanta Braves ]
: [ 2001-07-02, 2001-08-12, 
    2001-08-14 ]

B5

Mapping between sequences

invoice: 34843
date   : 2001-01-23
bill-to: Chris Dumars
product:
   - item    : Super Hoop
     quantity: 1
   - item    : Basketball
     quantity: 4
   - item    : Big Shoes
     quantity: 1
        

B6

Sequence key shortcut

2.3 Scalars

Scalar values can be written in block form using a literal style (|) where all new lines count. Or they can be written with the folded style (>) for content that can be word wrapped. In the folded style, newlines are treated as a space unless they are part of a blank or indented line.

--- |
    \/|\/|
    / |  |_


C1

In literals, newlines are preserved

--- >
    Mark McGwire's
    year was crippled
    by a knee injury.

C2

In folded, newlines are treated as a space

--- >
 Sammy Sosa completed another
 fine season with great stats.

   63 Home Runs
   0.288 Batting Average

 What a year!
        

C3

Newlines preserved for indented and blank lines

name: Mark McGwire
accomplishment: >
   Mark set a major league 
   home run record in 1998.
stats: |
   65 Home Runs
   0.278 Batting Average


C4

Indentation determines scope


YAML's flow scalars include the plain style (most examples thus far) and quoted styles. The double quoted style provides escape sequences. Single quoted style is useful when escaping is not needed. All flow scalars can span multiple lines; intermediate whitespace trimmed to a single space.
unicode: "Sosa did fine.\u263A"
control: "\b1998\t1999\t2000\n" 
hexesc:  "\x13\x10 is \r\n"

single: '"Howdy!" he cried.'
quoted: ' # not a ''comment''.'
tie-fighter: '|\-*-/|'

C5

Quoted scalars

plain: This unquoted
       scalar spans
       many lines.
quoted: "\
  So does this quoted
  scalar.\n"

        

C6

Multiline flow scalars

2.4 Type Family

In YAML, plain (unquoted) scalars are given an implicit type depending on the application. YAML's type repository includes integers, floating point values, timestamps, null, boolean, and string values.

canonical: 12345
decimal: +12,345
octal: 014
hexadecimal: 0xC


D1

Integers

canonical: 1.23015e+3
exponential: 12.3015e+02
fixed: 1,230.15
negative infinity: (-inf)
not a number: (NaN)

D2

Floating point

null: ~
true: +
false: -
string: '12345'

D3

Miscellaneous

canonical: 2001-12-15T02:59:43.1Z
iso8601:  2001-12-14t21:59:43.10-05:00
spaced:  2001-12-14 21:59:43.10 -05:00
date:   2002-12-14

D4

Timestamps


Explicit typing is denoted with the bang (!) symbol. Application types should include a domain name and may use the caret (^) to abbreviate subsequent types.
---
not-date: !str 2002-04-28
picture: !binary#base64 |
 R0lGODlhDAAMAIQAAP//9/X
 17unp5WZmZgAAAOfn515eXv
 Pz7Y6OjuDg4J+fn5OTk6enp
 56enmleECcgggoBADs=

hmm: !somewhere.com,2002/type | 
 family above is short for
 taguri:somewhere.com,2002:type

D5

Various explicit families

--- !clarkevans.com,2002/graph/^shape
- !^circle
  center: &ORIGIN {x: 73, y: 129}
  radius: 7
- !^line # !clarkevans.com,2002/graph/line
  start: *ORIGIN
  finish: { x: 89, y: 102 }
- !^text
  start: *ORIGIN
  color: 0xFFEEBB
  value: Pretty vector drawing.

D6

Application specific family

2.5 Full Length Example

Below are two full-length examples of YAML. On the left is a sample invoice; on the right is a sample log file.

--- !clarkevans.com,2002/^invoice
invoice: 34843
date   : 2001-01-23
bill-to: &id001
    given  : Chris
    family : Dumars
    address:
        lines: |
            458 Walkman Dr.
            Suite #292
        city    : Royal Oak
        state   : MI
        postal  : 48046
ship-to: *id001
product:
    - sku         : BL394D
      quantity    : 4
      description : Basketball
      price       : 450.00
    - sku         : BL4438H
      quantity    : 1
      description : Super Hoop
      price       : 2392.00
tax  : 251.42
total: 4443.52
comments: >
    Late afternoon is best.
    Backup contact is Nancy
    Billsmer @ 338-4338.

E1

Invoice

---
Time: 2001-11-23 15:01:42 -05:00
User: ed
Warning: >
  This is an error message
  for the log file
---
Time: 2001-11-23 15:02:31 -05:00
User: ed
Warning: >
  A slightly different error
  message.
---
Date: 2001-11-23 15:03:17 -05:00
User: ed
Fatal: >
  Unknown variable "bar"
Stack:
  - file: TopClass.py
    line: 23
    code: |
      x = MoreObject("345\n")
  - file: MoreClass.py
    line: 58
    code: |-
      foo = bar




E2

Log file

3 Syntax

Following are the BNF productions defining the syntax of YAML streams.

3.1 Characters

Characters are the basis for a YAML stream. Below is a general definition of a character followed by several characters that have specific meaning in particular contexts.

3.1.1 Character Set

YAML streams use a subset of the Unicode character set. A YAML parser must accept all printable ASCII characters, the space, tab, line break, and all Unicode characters beyond 0x9F. A YAML emitter must only produce those characters accepted by the parser, but should also escape all non-printable Unicode characters if a character table is readily available.

[001] printable_char ::=
|
|
|
|
|
#x9
#xA | #xD | #x85
[#x20-#x7E]
[#xA0-#xD7FF]
[#xE000-#xFFFD]
[#x10000-#x10FFFF]
/* characters as defined by the Unicode standard, excluding most control characters and the surrogate blocks */

The range above explicitly excludes the surrogate block [#xD800-#xDFFF], DEL 0x7F, the C0 control block [#x0-#x1F], the C1 control block [#x80-#x9F], #xFFFE and #xFFFF. Note that in UTF-16, characters above #xFFFF are represented with a surrogate pair. DEL and characters in the C0 and C1 control block may be represented in a YAML stream using escape sequences.

3.1.2 Encoding

A YAML parser is required to support the UTF-32, UTF-16 and UTF-8 character encodings. If an input stream does not begin with a byte order mark, the encoding shall be UTF-8. Otherwise the encoding shall be UTF-32 (LE or BE), UTF-16 (LE or BE) or UTF-8, as signaled by the byte order mark. Note that as YAML files may only contain printable characters, this does not raise any ambiguities. For more information about the byte order mark and the Unicode character encoding schemes see the Unicode FAQ.

[002] byte_order_mark ::= #xFEFF /* the Unicode ZERO WIDTH NON-BREAKING SPACE character used to mark a UTF-32 or UTF-16 stream and determine byte ordering */

3.1.3 Indicators

Indicator characters.

Indicators are special characters that are used to describe the structure of a YAML document.

[003] sequence_entry_indicator ::= '-' /* indicates a sequence entry */
[004] mapping_entry_separator ::= ':' /* separates a key from its value */
[005] sequence_flow_start ::= '[' /* starts a flow sequence collection */
[006] sequence_flow_end ::= ']' /* ends a flow sequence collection */
[007] mapping_flow_start ::= '{' /* starts a flow mapping collection */
[008] mapping_flow_end ::= '}' /* ends a flow mapping collection */
[009] collect_line_separator ::= ',' /* separates flow collection entries */
[010] top_key_indicator ::= '?' /* indicates a complex key */
[011] alias_indicator ::= '*' /* indicates an alias node */
[012] anchor_indicator ::= '&' /* indicates an anchor property */
[013] transfer_indicator ::= '!' /* indicates a transfer method property */
[014] literal_indicator ::= '|' /* indicates a literal scalar */
[015] folded_indicator ::= '>' /* indicates a folded scalar */
[016] single_quote ::= ''' /* indicates a single quoted scalar */
[017] double_quote ::= '"' /* indicates a double quoted scalar */
[018] throwaway_indicator ::= '#' /* indicates a throwaway comment */
[019] reserved_indicators ::= '%' | '@' | '`' /* reserved for future use */
Indicator categories

Indicators can be grouped into two categories. The '-' , ':', ',', '?' and '#' space indicators are always followed by a white space character (space, tab or line break). If followed by any other character, they are taken to be normal content characters. The remaining indicators are taken to be indicators even if followed by a non-space character.

[020] space_indicators ::=
|
|
|
|
sequence_entry_indicator
mapping_entry_separator
collect_line_separator
top_key_indicator
throwaway_indicator
/* must be followed by white space */
[021] non_space_indicators ::=
|
|
|
|
|
|
|
|
|
|
|
sequence_flow_start
sequence_flow_end
mapping_flow_start
mapping_flow_end
alias_indicator
anchor_indicator
transfer_indicator
literal_indicator
folded_indicator
single_quote
double_quote
reserved_indicators
/* do not require a following white space */

3.1.4 Line Breaks

Line break characters

The Unicode standard defines the following line break characters.

[022] line_feed ::= #xA /* ASCII line feed (LF) */
[023] carriage_return ::= #xD /* ASCII carriage return (CR) */
[024] next_line ::= #x85 /* Unicode next line (NEL) */
[025] line_separator ::= #x2028 /* Unicode line separator (LS) */
[026] paragraph_separator ::= #x2029 /* Unicode paragraph separator (PS) */
[027] line_break_char ::=
|
|
|
|
line_feed
carriage_return
next_line
line_separator
paragraph_separator
/* line break characters */
Line break categories

Line breaks can be grouped into two categories. Specific line breaks have well-defined semantics for breaking text into lines and paragraphs. The semantics of generic line break characters is not defined beyond "ending a line".

Outside scalar text content, YAML allows any line break to be used to terminate lines, and in most cases also allows such line breaks to be preceded by trailing comment characters. On output, a YAML emitter is free to emit such line breaks using whatever convention is most appropriate. An emitter should avoid emitting trailing line spaces.

[028] generic_break ::=
 
|
|
|
( carriage_return
  line_feed )
greedy
carriage_return
line_feed
next_line
/* line break with non-specific semantics */
[029] specific_break ::=
|
line_separator
paragraph_separator
/* line break with specific semantics */
[030] any_break ::=
|
generic_break
specific_break
/* any non-content line break */

3.1.5 Miscellaneous

This section includes several common character range definitions.

[031] flow_char ::=
-
printable_char
line_break_char
/* characters valid in a line */
[032] flow_space ::= #x20 | #x9 /* white space valid in a line */
[033] flow_non_space ::=
-
flow_char
flow_space
/* non-space characters valid in a line */
[034] ascii_letter ::=
|
[#x41-#x5A]
[#x61-#x7A]
/* ASCII letters, A-Z or a-z */
[035] decimal_digit ::= [#x30-#x39] /* 0-9 */
[036] hex_digit ::=
|
|
decimal_digit
[#x41-#x46]
[#x61-#x66]
/* 0-9, A-F or a-f */
[037] word_char ::=
|
|
decimal_digit
ascii_letter
'-'
/* characters valid in a word */

3.2 Space Processing

YAML streams use lines and spaces to convey structure. This requires special processing rules for white space (space and tab).

3.2.1 Indentation

In a YAML text representation, structure is determined from indentation, where indentation is defined as a line break character followed by zero or more space characters.

Tab characters are not allowed in indentation. Since different systems treat tabs differently, portability problems are a concern. Therefore, YAML's tab policy is conservative; they are not allowed. Note that most modern editors may be configured so that pressing the tab key results in the insertion of an appropriate number of spaces.

A node must be more indented than its parent node. All sibling nodes must use the exact same indentation level. However the content of each such node may be indented independently.

The indentation level is used exclusively to delineate structure. Indentation characters are otherwise ignored. In particular, they are never taken to be a part of the the serialized text.

[038] indent(n) ::= #x20 x n /* specific level of indentation */
[039] indent(<n) ::= indent(m) /* for some specific m such that m < n */
[040] indent(<=n) ::= indent(m) /* for some specific m such that m <= n */

Since the YAML strean depends upon indentation level to delineate blocks, additional productions are a function of an integer, based on the indent(n), indent(<n) and indent(<=n) productions above. In some cases the notation production(any) is used; it is a shorthand for "production(n) for some specific value of n".

The '-' sequence entry indicator is considered to be part of the indentation, as this seems the way people tend to interpret it. Hence this indicator itself need be not indented relative to its parent node. Note that spaces following this indicator are not taken to be part of the indentation except for in one special case (map_in_seq).

3.2.2 Throwaway comments

Throwaway comments have no effect whatsoever on the data serialized in the stream. Their usual purpose is to communicate between the human maintainers of the file. A typical example is comments in a configuration file.

A throwaway comment always spans to the end of a line. It consists of white spaces, optionally followed by a '#' indicators, a white space character, and arbitrary comment characters to the end of the line.

Outside text content, empty lines or lines containing only white space are taken to be implicit throwaway comment lines. Lines containing indentation followed by '#' and comment characters are taken to be explicit throwaway comment lines.

A throwaway comment may appear before a document node or following any node. It may not appear inside a scalar node, but may precede or follow it.

[041] throwaway_comment ::= throwaway_indicator+
( flow_space
  flow_char* )?
/* comment trailing a line */
[042] comment_line(n) ::=
|
comment_empty_line(n)
comment_text_line(n)
/* types of comment lines */
[043] comment_empty_line(n) ::= indent(<=n)
any_break
/* empty throwaway comment line */
[044] comment_text_line(n) ::= indent(<n)
throwaway_comment
any_break
/* explicit throwaway comment line */
[045] comment_break ::= ( flow_space+
  throwaway_comment? )?
any_break
/* trailing non-content spaces, comment and line break */
### The first tree lines of this stream

### are comments (the second one is empty).
this: |   # Comments may trail block indicators.
    contains three lines of text.
    The third one starts with a
    # character. This isn't a comment.

# The last three lines of this stream
# are comments (the first line is empty).

3.3 YAML Stream

A sequence of bytes is a YAML stream if, taken as a whole, it complies with the following production. Note that an empty stream is a valid YAML stream containing no documents.

Encoding is assumed to be UTF-8 unless explicitly specified by including a byte order mark as the first character of the stream. While a byte order mark may also appear before additional document headers, the same encoding must be used for all documents contained in a YAML stream.

[046] yaml_stream ::= implicit_document?
explicit_document*
/* YAML document stream */
[047] implicit_document ::= byte_order_mark?
comment_line(any)*
blk_collection(any)
document_trailer?
/* first document with an implicit header line */
[048] explicit_document ::= byte_order_mark?
comment_line(any)*
document_header
( top_scalar_node(any)
| top_collect_node(any) )
document_trailer?
/* stream document with an explicit header */

3.3.1 Document

A YAML stream may contain several independent YAML documents. A document header line is used to start a new document. This line must start with a document separator: '---' followed by a line break or a sequence of space characters. If no explicit header line is specified at the start of the stream, the parser should behave as if a header line containing '--- #YAML:1.0' was specified.

When YAML is used as the format for a communication stream, it is useful to be able to indicate the end of a document independent of starting the next one. Without such a marker, the YAML processor reading the stream would be forced to wait for the header of the next document (that may be long time in coming) in order to detect the end of the previous document.

To support this scenario, a YAML document may be terminated by a '...' line. Nothing but throwaway comments may appear between this line and the (mandatory) header line of the following document.

[049] document_header ::= document_start
( flow_space+ directive )*
/* YAML document header */
[050] document_start ::= '-' '-' '-' /* YAML document start indicator */
[051] document_trailer ::= document_end
any_break
comment_line(any)*
/* YAML document trailer */
[052] document_end ::= '.' '.' '.' /* YAML document end indicator */
--- >
This YAML stream contains a single text value.
The next stream is a log file - a sequence of
log entries. Adding an entry to the log is a
simple matter of appending it at the end.
---
at: 2001-08-12 09:25:00.00 Z
type: GET
HTTP: '1.0'
url: '/index.html'
---
at: 2001-08-12 09:25:10.00 Z
type: GET
HTTP: '1.0'
url: '/toc.html'
# This stream is an example of a top-level mapping.
invoice : 34843
date    : 2001-01-23
total   : 4443.52
# The following is a stream of three documents. The first is an empty
# mapping, the second an empty sequence, and the last an empty string.
--- {}
--- [ ]
--- ''
# A communication channel based on a YAML stream.
---
sent at: 2002-06-06 11:46:25.10 Z
payload: Whatever
# Receiver can process this as soon as the following is sent:
...
# Even if the next message is sent long after:
---
sent at: 2002-06-06 12:05:53.47 Z
payload: Whatever
...

3.3.2 Directive

Directives are instructions to the YAML parser. Like throwaway comments, directives are not reflected in the data serialized in the stream. Directives apply to a single document. It is an error for the same directive to be specified more than once for the same document.

[053] directive ::= throwaway_indicator
directive_name
mapping_entry_separator
directive_value
/* document directive */
[054] directive_name ::= word_char+ /* document directive name */
[055] directive_value ::= flow_non_space+ /* document directive value */

This version of YAML defines a single directive, #YAML. Additional directives may be added in future versions of YAML. A parser should ignore unknown directives with an appropriate warning. There is no provision for specifying private directives. This is intentional.

The #YAML directive specifies the version of YAML the document adheres to. This specification defines version 1.0.

A version 1.0 parser should accept documents with an explicit #YAML:1.0 directive, as well as documents lacking a #YAML directive. Documents with a directive specifying a higher minor version (e.g. #YAML:1.1) should be processed with an appropriate warning. Documents with a directive specifying a higher major version (e.g. #YAML:2.0) should be rejected with an appropriate error message.

3.3.3 Text Node

A text node begins at a particular level of indentation, n, and its content is indented at some level >n. A text node can be a collection (mapping or sequence), a scalar (block or flow) or an alias.

A YAML document is a normal node. However a document can't be an alias (there is nothing it may refer to). Also if the header line is omitted the first document must be a block (not flow) collection.

[056] top_value_node(n) ::=
|
|
top_alias_node
top_collect_node(n)
top_scalar_node(n)
/* value node outside flow collection */
[057] flow_value_node(n) ::=
|
|
alias
flow_collect_node(n)
flow_scalar_value_node(n)
/* value node inside flow collection */
[058] top_key_node(n) ::=


|
( top_key_indicator
  top_value_node(>n)
  indent(n) )
( flow_key_node(n)
  flow_space* )
/* key node outside flow collection */
[059] flow_key_node(n) ::=
|
|
alias
flow_collect_node(n)
flow_scalar_key_node(n)
/* key node inside flow collection */
[060] top_alias_node ::= flow_space+
alias
comment_break
comment_line(any)*
/* alias node outside flow collection */
[061] top_collect_node(n) ::=
|
blk_collect_node(n)
( flow_space+
  flow_collect_node(n)
  comment_break
  comment_line(any)* )
/* collection node outside flow collection */
[062] blk_collect_node(n) ::= ( flow_space+
  collect_properties )?
comment_break
comment_line(any)*
blk_collection(n)
/* collection node in block style */
[063] flow_collect_node(n) ::= ( collect_properties
  flow_space+ )?
flow_collection(n)
/* collection node inside flow collection */
[064] top_scalar_node(n) ::=
|
blk_scalar_node(n)
( flow_space+
  top_scalar_value_node(n)
  comment_break
  comment_line(any)* )
/* scalar node outside flow collection */
[065] blk_scalar_node(n) ::= ( flow_space+
  scalar_properties )?
flow_space+
blk_scalar(n)
/* scalar node in block style */
[066] top_scalar_value_node(n) ::= ( scalar_properties
  flow_space+ )?
top_scalar_value(n)
/* scalar node using flow style outside flow collection */
[067] flow_scalar_value_node(n) ::= ( scalar_properties
  flow_space+ )?
flow_scalar_value(n)
/* scalar value node inside flow collection */
[068] flow_scalar_key_node(n) ::= ( scalar_properties
  flow_space+ )?
flow_scalar_key(n)
/* scalar key node inside flow collection */

3.3.4 Node Property

Each text node may have anchor and transfer method properties. These properties are specified in a properties list appearing before the node value itself. For a top-level node (a document), the properties appear in the document header line, following the directives (if any). It is an error for the same property to be specified more than once for the same node.

[069] collect_properties ::=


|
( collect_transfer
  ( flow_space+
    anchor_property )? )
( anchor_property
  ( flow_space+
    collect_transfer )? )
/* collection properties list */
[070] scalar_properties ::=


|
( scalar_transfer
  ( flow_space+
    anchor_property )? )
( anchor_property
  ( flow_space+
    scalar_transfer )? )
/* scalar properties list */

3.3.5 Transfer Method

The transfer method property specifies how to load the associated node. It includes the type family for the node and, for global scalar type families, an optional specific format used, separated by a '#' character.

Like throwaway comments and directives, formats are not reflected in the data serialized in the stream. In contrast, the type family is considered to be part of this data.

Explicit/Implicit

By providing an explicit transfer property to a node, implicit typing is prevented. However, an explicit empty transfer method property can be used to force implicit typing to be applied to a node. If either an empty explicit format or no explicit format are given, the loader automatically detects the format.

implicit integer type family: 12
also implicit integer family: ! "12"
explicit integer, implicit format: !int 12
also implicit format: !int# 0x12
explicit format: !int#dec 0x12
Shorthands

YAML makes use of the taguri: scheme for defining URIs for its global type families and the x-private: scheme for its private type families. While these schemes provide the necessary semantics for identifying type families, they are rather verbose.

To increase readability, YAML does not use the full URI notation in the stream. Instead, it provides several shorthand notations for different groups of type family URIs. A parser may choose not to expand shorthand type family names to URIs. However, in such a case the parser must still perform escaping to ensure a single unique representation of each type family name.

  • If the type family begins with a '!' character, it is taken to be a private type family whose URI is under the x-private: scheme. URI fragments are allowed but their semantics is completely up to the semantics of the private type. In particular, they may or may not indicate a format.

# Both examples below make use of the 'x-private:ball'
# type family URI, but with different semantics.
---
pool: !!ball { number: 8 }
---
bearing: !!ball { material: steel }
  • If the type family contains no ':' and no '/' characters it is assumed to be defined under the yaml.org domain. This domain is used to define the core and language-independent YAML data types.

# The URI is 'taguri:yaml.org,2002:str'
- !str a Unicode string
  • Otherwise, if the type family begins with a single word, followed by a '/' character, it is assumed to belong to a sub-domain of yaml.org.

    Each domain language.yaml.org will include all globally unique types of the language that aren't covered by the set of language-independent types. Globally unique types for each language include any built-in types and any standard library types. For languages such as Java and C#, all type names based on reverse DNS strings are globally unique. For languages such as Perl, that has a central authority (CPAN) for managing the global namespace, all the types sanctioned by the central authority are globally unique. The list of supported languages and their types is maintained as part of the YAML type repository.

# The URI is 'taguri:perl.yaml.org,2002:Text::Tabs'
- !perl/Text::Tabs {}
  • Otherwise, the type family must begin with a domain name and a date (separated by a ',' character), followed by a '/' character. In this case it is taken to be defined under the specified domain and date.

# The URI is 'taguri:clarkevans.com,2003-02:timesheet'
- !clarkevans.com,2003-02/timesheet

Type families defined in the yaml.org domain or any of its sub-domains must be defined using the appropriate specialized shorthand rather than using the generic domain syntax. This ensures each type family has a unique representation as a shorthand, in addition to having a unique representation as a URI.

Escaping

YAML allows non-printable Unicode characters to be used in a transfer method using escape sequences.

# The following values have the same type family.
- !domain.tld,2002/type\x30 value
- !domain.tld,2002/type0 value
Expanding

Sometimes it may be helpful for a YAML type family or transfer method to be expanded to a full URI. A YAML processor may provide a mechanism to perform such expansion. Since URIs support a limited ASCII-based character set, this expansion requires all characters outside this set to be encoded in UTF-8 and the resulting bytes to be encoded using % notation.

When an explicit % character appears in a transfer method, it is passed to the URI form unchanged, allowing explicit % escapes to be used in the transfer method where necessary. It is an error for a transfer method not to have a valid expanded URI format (e.g., contain an invalid explicit % escape sequence).

# The following are different as far as YAML is concerned.
- !domain.tld,2002/type%30 value
- !domain.tld,2002/type0 value
Prefixing

YAML provides convenient shorthand for the common case where a node and (most of) its descendents have global types families whose shorthand forms share a common prefix. For this case, YAML allows using the '^' character to separate the ancestor node's type family into a prefix and a suffix. The parser does not consider the separator to be part of type family name.

When the parser encounters a descendant node whose type family name begins with '^', it appends the ancestor node's prefix to it. Again the '^' character is not taken to be part of the name.

It is possible for a descendant node to establish a different prefix. In this case the node may not make use of its ancestor's node prefix. It must specify a full type family name, separated into a prefix