YAML Ain't Markup Language (YAML) (tm) 1.0

Working Draft 01 Sep 2002

Latest version:
http://yaml.org/spec/
Editors:
Oren Ben-Kiki (mailto:oren@ben-kiki.org), Clark C. Evans, Brian Ingerson (mailto:ingy@ttul.org)

Status of this Document

This specification is a working draft and reflects consensus reached by the members of the yaml-core mailing list. Any questions regarding this draft should be raised on this list at http://lists.sourceforge.net/lists/listinfo/yaml-core.

With this release of the YAML specificiation, we now encourage development of YAML processors, so that the design of YAML can be validated and for early adoption. The specification is still subject to change; however, such changes will be limited to polish and fixing any logical flaws and bugs. Changes to the special keys area may also occur pending work on complementary specifications, but special keys are a rather isolated aspect of the specification.

Therefore, this is "Last Call" for changes; if you have a pet feature now is the very last time that they can be proposed before Release Canidate status. Changes which would cause "Last Call" YAML texts to be invalid will be seriously considered only if absolutely necessary.

Abstract

YAML(tm) (rhymes with "camel") is a straightforward machine parsable data serialization format designed for human readability and interaction with scripting languages such as Perl and Python. YAML is optimized for data serialization, formatted dumping, configuration files, log files, Internet messaging and filtering. This specification describes the YAML information model and serialization format. Together with the Unicode standard for characters, it provides all the information necessary to understand YAML Version 1.0 and construct programs to process it.

Table of Contents

  31 Introduction
  3   1.1 Goals
  4   1.2 Prior Art
  5   1.3 Relation to XML
  5   1.4 Terminology
 
  62 Preview
  6   2.1 Collections
  7   2.2 Structures
  8   2.3 Scalars
  9   2.4 Type Family
10   2.5 Full Length Example
 
113 Information Models
 
13   3.1 Native Model
13      3.1.1 Native Node
13      3.1.2 Type Family
15      3.1.3 Equivalence
16      3.1.4 Documents Stream
 
173.2 Generic Model
17      3.2.1 Formats
18      3.2.2 Type Family Formats
18      3.2.3 Node Format
 
19   3.3 Serial Model
19      3.3.1 Serial Node
19      3.3.2 Alias
19      3.3.3 Pair
20      3.3.4 Serial Mapping
20      3.3.5 Ordering
 
20   3.4 Syntax Model
20      3.4.1 Style
21      3.4.2 Comment
21      3.4.3 Directive
 
214 Serialization Syntax
 
21   4.1 Characters
21      4.1.1 Character Set
22      4.1.2 Encoding
22      4.1.3 Indicators
23      4.1.4 Line Breaks
24      4.1.5 Miscellaneous
  25   4.2 Space Processing
25      4.2.1 Indentation
25      4.2.2 Throwaway comments
 
26   4.3 YAML Stream
27      4.3.1 Document
28      4.3.2 Directive
30      4.3.3 Serialization Node
31      4.3.4 Node Property
32      4.3.5 Transfer Method
36      4.3.6 Anchor
 
36   4.4 Alias
 
37   4.5 Collection
37       4.5.1 Sequence
38       4.5.2 Mapping
 
40   4.6 Scalar
40      4.6.1 End Of Line Normalization
41      4.6.2 Block Modifiers
41      4.6.3 Explicit Indentation
42      4.6.4 Chomping
42      4.6.5 Literal
44      4.6.6 Folding
45      4.6.7 Folded
47      4.6.8 Single Quoted
48      4.6.9 Escaping
49      4.6.10 Double Quoted
51      4.6.11 Plain
 
535 Transfer Methods
54   5.1 Sequence
54   5.2 Mapping
55   5.3 String
56   5.4 Null
57   5.5 Boolean
57   5.6 Integer
58   5.7 Float
59   5.8 Time
60   5.9 Binary
61   5.10 Special Keys
 
    6 Change History

1 Introduction

YAML Ain't Markup Language, abbreviated YAML, is a human-readable data serialization format and processing model. This text describes the class of data objects called YAML document streams and partially describes the behavior of computer programs that process them.

YAML document streams encode into a serialized form the native data constructs of modern scripting languages. Strings, arrays, hashes, and other user-defined data types are supported. A YAML document stream consists of a sequence of characters, some of which are considered part of the document's content, and others that are used to indicate structure within the information stream.

A YAML processor is a software module that is used to manipulate YAML information. A processor may perform multiple functions, such as parsing a YAML serialization into a sequence of events, loading these events into a native language representation, dumping a native representation into a sequence of events, and emitting these events into a serialized form. It is assumed that a YAML processor does its work on behalf of another module, called an application. This specification describes the required behavior of a YAML processor. It describes how a YAML processor must read or write YAML document streams and the information structures it must provide to or obtain from the application.

1.1 Goals

The design goals for YAML are:

  1. YAML documents are very readable by humans.

  2. YAML interacts well with scripting languages.

  3. YAML uses host languages' native data structures.

  4. YAML has a consistent information model.

  5. YAML enables stream-based processing.

  6. YAML is expressive and extensible.

  7. YAML is easy to implement.

YAML was designed with experience gained from the construction and deployment of Brian Ingerson's Perl module Data::Denter. YAML's initial direction was set by the markup language discussions among SML-DEV members. Since then YAML has matured through the support and encouragement it has received from its user community.

1.2 Prior Art

YAML integrates and builds upon structures and concepts described by C, Java, Perl, Python, RFC0822 (MAIL), RFC1866 (HTML), RFC2045 (MIME), RFC2396 (URI), SAX, SOAP and XML.

YAML's core type system is based on the serialization requirements of Perl. YAML directly supports both scalar values (string, integer) and collections (array, hash). Support for common types enables programmers to use their language's native data constructs for YAML manipulation, instead of requiring a special document object model (DOM).

Like XML's SOAP, the YAML serialization supports native graph structures through a rich alias mechanism. Also like SOAP, YAML provides for application-defined types. This allows YAML to serialize rich data structures required for modern distributed computing. YAML provides unique global type names using a namespace mechanism inspired by Java's DNS based package naming convention and XML's URI based namespaces.

YAML's block scoping is similar to Python's. In YAML, the extent of a node is indicated by its column. YAML's literal scalar leverages this by enabling formatted text to be cleanly mixed within an indented structure without troublesome escaping. Further, YAML's block indenting provides for easy inspection of the document's structure.

Motivated by HTML's end-of-line normalization, YAML's folded scalar introduces a unique method of handling white space. In YAML, single line breaks may be folded into a single space, while empty lines represent line break characters. This technique allows for paragraphs to be word-wrapped without affecting the canonical form of the content.

YAML's double quoted scalar uses familar C-style escape sequences. This enables ASCII representation of non-printable or 8-bit (ISO 8859-1) characters such as '\x3B'. 16-bit Unicode and 32-bit (ISO/IEC 10646) characters are supported with escape sequences such as '\u003B' and '\U0000003B'.

The syntax of YAML was motivated by Internet Mail (RFC0822) and remains partially compatible with this standard. Further, YAML borrows the idea of having multiple documents from MIME (RFC2045). YAML's top-level production is a stream of independent documents; ideal for message-based distributed processing systems.

YAML was designed to have an incremental interface that includes both a pull-style input stream and a push-style (SAX-like) output stream interfaces. Together this enables YAML to support the processing of large documents, such as a transaction log, or continuous streams, such as a feed from a production machine.

1.3 Relation to XML

There are many differences between YAML and the eXtensible Markup Language (XML). XML was designed to be backwards compatible with Standard Generalized Markup Language (SGML) and thus had many design constraints placed on it that YAML does not share. Also XML, inheriting SGML's legacy, is designed to support structured documents, where YAML is more closely targeted at messaging and native data structures. Where XML is a pioneer in many domains, YAML is the result of many lessons from the XML community.

The YAML and XML information models are starkly different. In XML, the primary construct is an attributed tree, where each element has an ordered, named list of children and an unordered mapping of names to strings. In YAML, the primary graph constructs are sequence (natively stored as an array), mapping (natively stored as a hash) and scalar values (string, integer, floating point). This difference is critical since YAML's model is directly supported by native data structures in most modern programming languages, where XML's model requires mapping conventions, or an alternative programming component (e.g. a document object model).

1.4 Terminology

The terminology used to describe YAML is defined in the body of this specification. The terms defined in the following list are used in building those definitions and in describing the actions of a YAML processor:

may   Conformant YAML streams and processors are permitted to but need not behave as described.
should   Conformant YAML texts and processors are encouraged to behave as described, but may do otherwise if a warning message is provided to the user and any deviant behavior requires conscious effort to enable. (i.e. a non-default setting)
must   Conformant YAML texts and processors are required to behave as described, otherwise they are in error.
error   A violation of the rules of this specification; results are undefined. Conforming software must detect and report an error and may recover from it.

2 Preview

This section provides a quick glimpse into the expressive power of YAML. It is not expected that the first-time reader grok all of the examples. Rather, these selections are used as motivation for the remainder of the specification.

2.1 Collections

YAML's block collections use indentation for scope and begin each member on its own line. Block sequences indicate each member with a dash (-). Block mappings use a colon to mark each (key: value) pair.
- Mark McGwire
- Sammy Sosa
- Ken Griffey

A1

Sequence of scalars
(ball players)

hr:  65
avg: 0.278
rbi: 147

A2

Mapping of scalars to scalars
(player statistics)

american:
   - Boston Red Sox
   - Detroit Tigers
   - New York Yankees
national:
   - New York Mets
   - Chicago Cubs
   - Atlanta Braves

A3

Mapping of scalars to sequences
(ball clubs in each league)

- 
  name: Mark McGwire
  hr:   65
  avg:  0.278
- 
  name: Sammy Sosa
  hr:   63
  avg:  0.288

A4

Sequence of mappings
(players' statistics)


YAML also has in-line flow styles for compact notation. The flow sequence is written as a comma separated list within square brackets. In a similar manner, the flow mapping uses curley braces.
- [ name         , hr , avg   ]
- [ Mark McGwire , 65 , 0.278 ] 
- [ Sammy Sosa   , 63 , 0.288 ]

A5

Sequence of sequences

Mark McGwire: {hr: 65, avg: 0.278} 
Sammy Sosa:   {hr: 63,
               avg: 0.288}

A6

Mapping of mappings

2.2 Structures

YAML uses three dashes (---) to separate documents within a file or stream. Comment lines begin with the pound sign (#). Repeated nodes are first marked with the anpersand (&) and then referenced with an asterix (*) thereafter.
---
name: Mark McGwire
hr:   65
avg:  0.278
---
name: Sammy Sosa
hr:   63
avg:  0.288

B1

Two documents; one stream
(players' statistics)

# Ranking of players by
# 1998 season home runs.
---
   - Mark McGwire
   - Sammy Sosa
   - Ken Griffey



B2

Document /w leading comment

hr: # 1998 hr ranking
   - Mark McGwire 
   - Sammy Sosa 
rbi:
   # 1998 rbi ranking
   - Sammy Sosa
   - Ken Griffey

B3

Single document with two comments

hr:
   - Mark McGwire
   # Following node labeled SS
   - &SS Sammy Sosa
rbi:
   - *SS # Subsequent occurance
   - Ken Griffey

B4

Node for Sammy Sosa appears twice in this document


The question mark indicates a complex key. Within a block sequence, mapping pairs can start immediately following the dash.
? # PLAY SCHEDULE
  - Detroit Tigers
  - Chicago Cubs
:  
  - 2001-07-23

? [ New York Yankees,
    Atlanta Braves ]
: [ 2001-07-02, 2001-08-12, 
    2001-08-14 ]

B5

Mapping between sequences

invoice: 34843
date   : 2001-01-23
bill-to: Chris Dumars
product:
   - item    : Super Hoop
     quantity: 1
   - item    : Basketball
     quantity: 4
   - item    : Big Shoes
     quantity: 1
        

B6

Sequence key shortcut

2.3 Scalars

Scalar values can be written in block form using a literal style (|) where all new lines count. Or they can be written with the folded style (>) for content that can be word wrapped. In the folded style, newlines are treated as a space unless they are part of a blank or indented line.

--- |
    \/|\/|
    / |  |_


C1

In literals, newlines are preserved

--- >
    Mark McGwire's
    year was crippled
    by a knee injury.

C2

In folded, newlines are treated as a space

--- >
 Sammy Sosa completed another
 fine season with great stats.

   63 Home Runs
   0.288 Batting Average

 What a year!
        

C3

Newlines preserved for indented and blank lines

name: Mark McGwire
accomplishment: >
   Mark set a major league 
   home run record in 1998.
stats: |
   65 Home Runs
   0.278 Batting Average


C4

Indentation determines scope


YAML's flow scalars include the plain style (most examples thus far) and quoted styles. The double quoted style provides escape sequences. Single quoted style is useful when escaping is not needed. All flow scalars can span multiple lines; intermediate whitespace trimmed to a single space.
unicode: "Sosa did fine.\u263A"
control: "\b1998\t1999\t2000\n" 
hexesc:  "\x13\x10 is \r\n"

single: '"Howdy!" he cried.'
quoted: ' # not a ''comment''.'
tie-fighter: '|\-*-/|'

C5

Quoted scalars

plain: This unquoted
       scalar spans
       many lines.
quoted: "\
  So does this quoted
  scalar.\n"

        

C6

Multiline flow scalars

2.4 Type Family

In YAML, plain (unquoted) scalars are given an implicit type depending on a regular expression matched. YAML's recognizes integers, floating point values, timestamps, null, boolean, and string values.

canonical: 12345
decimal: +12,345
octal: 014
hexadecimal: 0xC


D1

Integers

canonical: 1.23015e+3
exponential: 12.3015e+02
fixed: 1,230.15
negative infinity: (-inf)
not a number: (NaN)

D2

Floating point

null: ~
true: +
false: -
string: '12345'

D3

Miscellaneous

canonical: 2001-12-15T02:59:43.1Z
iso8601:  2001-12-14t21:59:43.10-05:00
spaced:  2001-12-14 21:59:43.10 -05:00
date:   2002-12-14 # Time is noon UTC 

D4

Timestamps


Explicit typing is denoted with the bang (!) symbol. Application types should include a domain name and may use the caret (^) to avoid typing.
---
not-date: !str 2002-04-28
picture: !binary|base64 |
 R0lGODlhDAAMAIQAAP//9/X
 17unp5WZmZgAAAOfn515eXv
 Pz7Y6OjuDg4J+fn5OTk6enp
 56enmleECcgggoBADs=

hmm: !somewhere.com,2002/type | 
 family above is short for
 taguri:somewhere.com,2002:type

D5

Various explicit families

--- !clarkevans.com,2002/graph/^shape
- !^circle
  center: &ORIGIN {x: 73, y: 129}
  radius: 7
- !^line # !clarkevans.com,2002/graph/line
  start: *ORIGIN
  finish: { x: 89, y: 102 }
- !^text
  start: *ORIGIN
  color: 0xFFEEBB
  value: Pretty vector drawing.

D6

Application specific family

2.5 Full Length Example

Below are two full-length examples of YAML. On the left is a sample invoice; on the right is a sample log file.

--- !clarkevans.com,2002/^invoice
invoice: 34843
date   : 2001-01-23
bill-to: &id001
    given  : Chris
    family : Dumars
    address:
        lines: |
            458 Walkman Dr.
            Suite #292
        city    : Royal Oak
        state   : MI
        postal  : 48046
ship-to: *id001
product:
    - sku         : BL394D
      quantity    : 4
      description : Basketball
      price       : 450.00
    - sku         : BL4438H
      quantity    : 1
      description : Super Hoop
      price       : 2392.00
tax  : 251.42
total: 4443.52
comments: >
    Late afternoon is best.
    Backup contact is Nancy
    Billsmer @ 338-4338.

E1

Invoice

---
Time: 2001-11-23 15:01:42 -05:00
User: ed
Warning: >
  This is an error message
  for the log file
---
Time: 2001-11-23 15:02:31 -05:00
User: ed
Warning: >
  A slightly different error
  message.
---
Date: 2001-11-23 15:03:17 -05:00
User: ed
Fatal: >
  Unknown variable "bar"
Stack:
  - file: TopClass.py
    line: 23
    code: |
      x = MoreObject("345\n")
  - file: MoreClass.py
    line: 58
    code: |-
      foo = bar




E2

Log file

3 Information Models

Each YAML file/stream is a series of disjoint directed graphs, each having a single root. YAML processing may be understood in terms of four interacting representations of the data: a serialization format, an event stream, a native binding and a generic view of this binding.

Translating YAML information between these representations are five processing components: a parser, a loader, a viewer, a dumper and an emitter. The parser extracts structured information from the input stream. The loader converts this information into an appropriate native structure. A viewer presents this native structure in a YAML-compatible way. A dumper converts this view into an event stream. An emitter converts this stream into YAML syntax.

SYNTAX
(serialization format)
- Parser -> SERIAL
(stream of tree events)
- Loader ->
/--
Viewer 
\->
NATIVE
(in-memory objects)
GENERIC
(uniform view)
----\
 Application
<---/
<- Emitter - <- Dumper -

For each one of the representations above, there is a corresponding information model. The native model is defined by the programming language used. The generic model provides a concrete uniform view of the native model. The serial model covers the one-pass view of this data. The syntax model covers the serialization format. Type information is moved between these representations using the type family and format constructs.

A processor need not expose the event stream (serial model) or uniform view (generic model) and may translate directly between a serialization and its native binding. However, such a direct translation should take place so that the native binding is constructed only from information available in the native model. In particular, information specific to the generic model (format), serial model (alias anchors and pair ordering) and syntax model (comments and styles) should not be used in the construction of a native binding. Exceptions to this guideline include editors that must operate on a direct image of the serialization format.

native model
The native model describes the structure of application, platform, and language specific information which can be represented as YAML. Information native to a specific environment is modeled by YAML as a graph, where nodes in the graph include atomic values called scalars, sequences of nodes, or mappings of nodes from one set to another.

The native model may be implemented by arbitrary native data structures of the programming language used. The only constraint on the native representation is that it preserve the information defined by the native model.

generic model
The generic model provides a view of native data structures in a way which is independent of particular platform, language, or application. This allows for the definition of a generic YAML API and corresponding tools that do not depend on any particular native representation.

Implementations of the generic model are, by necessity, specific to a particular programming language. Such implementations are constrained to only provide the information specified by the generic model.

serial model
The serial model flattens the graph structure into a hierarchy using alias nodes. An alias node is a surrogate used for subsequent occurrences of any kind of node. In this model, mappings are realized as an ordered set of node pairs.

The serial model is often implemented as an event stream, and is important for implementing one-pass operations on YAML data. Again, of necessity implementations are specific to a particular programming language, and are constrained to provide the information required by the serial model.

syntax model
The syntax model enhances the serial model with comments, styles and other serialization specific details. Serializations must comply with the syntax productions given in the following section.

Serialization is fully defined by this specification and hence any instance is independent of the particular programming language chosen. This allows the definition of generic YAML tools that may be applied independently of the programming language used, as well as provides a way to interchange data between applications implemented in differing languages.

3.1 Native Model

The native model abstracts data structures of common programming languages. In the native model, any data is viewed as a directed graph of typed nodes. Nodes that are defined in terms of other nodes are collections and nodes that are defined independent of any other nodes are scalars.

YAML supports two kinds of collection nodes, mappings and sequences. The native model also defines when two different nodes have the same content and provides a definition of node identity.

3.1.1 Native Node

A native node is the building block of data structures. A native node stands for anything from a single integer to a complex data structure such as a complete VRML scene or SQL database. A native node has the following properties:

type family
Each native node is associated with a type family. This association may be implicit, based on the native data type of the node. Indirectly each node is also associated with a kind through its family.

value
A node has an associated value. This value must satisfy the constraints specified by the type family. In particular, the value of a collection (mappings and sequences) is given in terms of other nodes, while the value of a scalar is defined independent of any other node.

3.1.2 Type Family

The type family mechanism provides an abstraction of data types that is portable across languages and platforms. Each native binding may have zero or more native concrete types or class constructs that correspond to a given type family.

YAML supports both global and private type families. Global type families have consistent semantics across all YAML documents. Private type families should not be expected to maintain the same semantics in different documents, even if these appear in the same document stream.

name
Global type family names are URIs under the taguri: scheme. Private type family names are URIs under the x-private: scheme. See section 4.3.5 for further details. The taguri: scheme is described in http://www.taguri.org.

YAML only makes use of taguri: URIs that take the form taguri:domain,date:identifier. Specifically, it does not make use of taguri: URIs that are based on an E-mail address. Nor does it make use of URIs outside the taguri: scheme.

definition
A description of the particular category of information, independent of language and platform.

im/mutable
Each type family is either mutable or immutable. If a type family is mutable, it is possible to modify the value of a node of this type "in place". If a type family is immutable, it is impossible to do so; instead, modifications require the creation of a new, independent value of the same type family and using it instead.

To better understand this distinction, consider the following example:

C syntax: |
    struct Point { int x; int y; } p = { 1, 2 };
YAML syntax: !Point { x: 1, y: 2 }

It is impossible to modify the integer value 1. The only modification possible is constructing a new, unrelated integer value 3 and using this new value for the X coordinate. Performing this replacement would cause the point to change "in place" from { x: 1, y: 2 } to { x: 3, y: 2 }. Thus, in this example points are mutable but integers are not.

Typically collection type families are mutable and scalar type families are immutable, though exceptions are possible.

kind
Each type family must have a kind that is either a scalar or a collection. There are two kinds of collections, sequence and mapping. Usually the kind of each type family follows immediately from the definition (for example, integers are scalars while Point structures are mappings). In other cases, deciding on the kind requires a data modeling decision (for example, whether a date is thought of as a single scalar or as a mapping with independent sub-parts).

scalar
Scalar type families are the simplest. The value of a scalar node is defined in some mathematical terms, independent of any other nodes and type families.

sequence
The value of a sequence node is defined as an ordered set of nodes. Each sequence type family may impose additional constraints on these nodes. For example, it may require that they belong to particular type families.

mapping
The value of a mapping node is defined as a function from a domain to a range. Each mapping type family may impose additional constraints. For example, it may require a specific set of keys and that the value for each key must be of a particular type family.

domain
A domain is an unordered set of nodes, restricted such that no two nodes in the set may be equal. Nodes that are members of the domain are often called "keys".

range
A range is an unordered set of nodes without restrictions. Nodes that are members of the range are often called "values".

function
A function is a rule of correspondence from the domain onto the range such that there is a unique value in the range assigned to every key in the domain, and every value in the range is assigned to at least one key.

collection
Is possible to think of a sequence as a mapping using a special domain (all integer values between zero and some maximal value). A unified collection model is helpful both for theoretical analysis and in constructing practical YAML tools and APIs.

3.1.3 Equivalence

In most programming languages, there are two distinct manners in which variables can be equivalent.

identity
The first form of equivalence is by reference, where the two variables refer to the same memory address. We call this equivalence relation "identity".

equality
The second form of equivalence occurs when two nodes are different (have different memory addresses), but have the same content. We call this second form of equivalence "equality". It follows that when two nodes are identical they are also equal.

Equality is defined between scalar nodes and between collection nodes, as described below.

scalar equality
Two scalar nodes are equal if and only if they have the same type family and their values are the same under the type family's definition.

collection equality
Equality of collections is defined recursively. Two collection nodes are equal if and only if they have the same type family and for each key in the domain of one, there is a corresponding key in the domain of the other such that both keys are equal and their corresponding values are equal; here corresponding value refers to the unique node in the range of the collection assigned to the key by the collection's function.

For immutable type families, the distinction between equal and identical nodes is only of interest for efficiency reasons (reducing memory usage), and has no semantic significance. Hence for such type families a YAML processor may freely replace two equal but separate (non-identical) nodes with two occurrences of the same (identical) node, and vice versa.

For mutable type families, however, this distinction is an important part of the information model and a YAML processor is required to preserve node identity. It follows that if a YAML processor supports the handling of unknown type families, it must treat them as mutable (preserve node identity). In particular, a YAML processor can not assume unknown scalar type families are immutable.

3.1.4 Documents Stream

A YAML stream is a sequence of disjoint graphs, each with a root node.

stream
A sequence of zero or more document root nodes.

document
A top-level node that is disjoint from all other root document nodes.

The term disjoint means that for any two nodes x and y, there does not exist a third node z that is reachable from both x and y. For any node x, x is reachable from y if and only if either x and y are identical, or y is a collection and there exists a node z in the domain or the range of y such that x is reachable from z.

3.2 Generic Model

The generic model provides a concrete uniform realization of the native model. This model allows the creation of generic YAML APIs and tools that can apply to arbitrary native data given appropriate viewer code. It is also possible to use the generic model as a guide for creating generic YAML data structures for processing arbitrary YAML data.

It is impossible to implement concrete generic APIs directly using the native model, because of the differences between the native data types that may be used to represent each type family. To overcome this problem, the generic model provides a view of the value that is independent of the native data type chosen, using the concept of a format.

3.2.1 Format

It may be possible to write a string value of a scalar in more than one way. For example, an integer value of 255 can also be written in hex as 0xFF. This distinction is covered by the concept of a format.

A format defines a way to write the values of a scalar type family as Unicode strings. Using formats allows generic YAML APIs to be implemented in terms of such strings and still allow handling of arbitrary native data.

name
Each format has a name used for explicit typing and for general identification. This name must comply with the format production, and must be unique within the type families it applies to.

definition
A description of the format as it applies to particular data values.

regexp
Regular expressions may be provided to allow implicit typing using the string format, or to enable the YAML processor to validate that a given value is indeed compliant with the string format.

Formats are an extension required by the generic model, and are not part of the native model. Hence, when constructing native data structures from YAML data, format need not be preserved. For example, a YAML integer node should be loaded to a native integer data type, discarding the information that the integer was serialized in hex format.

3.2.2 Type Family Formats

Each type family used for scalar nodes has associated formats. These formats can be separated into two groups, implicit formats and explicit formats. In addition, one of the formats is designated to be the type family's canonical format.

Type families used for collection nodes do not have any associated formats.

implicit formats
A set of zero or more formats used for implicit typing. Each format may only be used in a single type family for this purpose.

explicit formats
A set of zero or more formats used for explicit typing. It is possible for two type families to share the same explicit format, though this practice is discouraged.

canonical format
In addition to the above, each scalar type family must provide a canonical format. This must be one of the implicit or explicit formats, or a subset of one of these formats. The canonical format must provide exactly one unique string representation for each possible value of the scalar.

3.2.3 Node Format

In the generic model, each scalar node has an associated format that is one of those defined by the node's type family. Collection nodes do not have an associated format.

The value of generic scalar nodes is a Unicode string that is a representation of the appropriate native value using the node's format.

3.3 Serial Model

To allow for YAML to be communicated as a sequence of events, an ordered tree structure must be used instead of a graph. This section describes an extension to the generic model where the graph is flattened and ordered to provide a serial interface. The resulting tree-structured model imposes a linear ordering and uses several constructs that are not part of the generic model. Applications constructing a native binding from the serial model should not use these additional constructs and the imposed ordering for the preservation of important data.

3.3.1 Serial Node

To lay out graph nodes as a tree structure, a mechanism is needed to manage duplicate occurrences. This is solved using an additional node kind, alias. The first occurrence of a node is represented using a serial node of the appropriate kind. Subsequent occurrences of either a collection or a scalar are represented by an alias.

All nodes in the serial model have the following properties in addition to the properties defined in the generic model:

parent
The parent property gives access to the collection that holds the current serial node.

anchor
The anchor is a Unicode string that complies with the anchor production. The anchor is used to associate the first occurrence of a node with subsequent occurrences, via the alias serial node. This property is optional for scalar or collection nodes, provided that the scalar or collection represented does not occur more than once.

Note that when a serial node is converted to a generic node, the anchor, if any, is not converted. Likewise the parent property and the alias kind are not preserved as the node may participate in several collections.

3.3.2 Alias

The alias serial node represents subsequent occurrences of a scalar or collection in the serialization. Like all serial nodes, an alias node has a parent and an anchor property. In addition, an anchor node has a single additional property:

referent
The collection or scalar serial node that the alias references is the closest preceding serial node having the same anchor.

When an alias is converted into a generic node it becomes a subsequent occurrence of its referent's generic node.

3.3.3 Pair

A pair is an ordered set of two serial nodes. The first member of the set is the key and the second member of the set is the value.

3.3.4 Serial Mapping

Mapping serial nodes represent the first occurrence of a mapping in a given serialization. The value of mapping serial nodes is an ordered set of node pairs.

When a mapping serial node is converted into a generic node, three operations occur. The domain is constructed with the graph node for each key in its set of pairs. Likewise, the range is constructed with the graph node for each value in its set of pairs. Last, the function is constructed via association of key graph nodes to value graph nodes, as provided by the set of pairs. Note that the ordering of the pairs is explicitly not converted.

3.3.5 Ordering

When serializing a YAML graph, every serial node is put into a single linear sequence within a given document through the mapping pair ordering. With the composition of collections, this ordering becomes total. For any two nodes or aliases, x and y we say that x precedes y when any of the following holds:

  • x is the parent of y.

  • x and y are nodes within a sequence, and x appears before y.

  • x is a key and y is a value in a given pair.

  • x and y are nodes in two pairs within a mapping, and the pair containing x comes before the pair containing y.

  • There exists a node z such that x precedes z and z precedes y.

3.4 Syntax Model

To enhance readability, a YAML serialization extends the serial model with syntax styles, comments and directives. Although the parser may provide this information, applications should take care not to use these features to encode information found in a native binding.

3.4.1 Style

The serial node is extended with a style property that can have different values depending upon its kind.

scalar style
Scalar styles include two block styles and three flow styles. All but the double quoted style are limited to scalars having only printable characters.

collection style
There are two styles for each of the collection kinds, a block style and an flow style.

3.4.2 Comment

The syntax model allows optional comment blocks to be interleaved with the node blocks. Comment blocks may appear before or after any node block. A comment block can't appear inside a scalar node value.

comment
A comment is a sequence of zero or more Unicode characters complying with the comment productions.

3.4.3 Directive

Attached to each document is a document directive section.

directive section
A collection of directives to the parser where each member of the domain and range are scalar values matching the directive_name and directive_value productions.

4 Serialization Syntax

Following are the syntax productions for the YAML serialization.

4.1 Characters

Characters are the basis for a serialized version of a YAML document. Below is a general definition of a character followed by several characters that have specific meaning in particular contexts.

4.1.1 Character Set

Serialized YAML uses a subset of the Unicode character set. A YAML parser must accept all printable ASCII characters, the space, tab, line break, and all Unicode characters beyond 0x9F. A YAML emitter must only produce those characters accepted by the parser, but should also escape all non-printable Unicode characters if a character table is readily available.

[001] printable_char ::=
|
|
|
|
|
#x9
#xA | #xD | #x85
[#x20-#x7E]
[#xA0-#xD7FF]
[#xE000-#xFFFD]
[#x10000-#x10FFFF]
/* characters as defined by the Unicode standard, excluding most control characters and the surrogate blocks */

The range above explicitly excludes the surrogate block [#xD800-#xDFFF], DEL 0x7F, the C0 control block [#x0-#x1F], the C1 control block [#x80-#x9F], #xFFFE and #xFFFF. Note that in UTF-16, characters above #xFFFF are represented with a surrogate pair. DEL and characters in the C0 and C1 control block may be represented in a YAML serialization using escape sequences.

4.1.2 Encoding

A YAML processor is required to support the UTF-32, UTF-16 and UTF-8 character encodings. If an input stream does not begin with a byte order mark, the encoding shall be UTF-8. Otherwise the encoding shall be UTF-32 (LE or BE), UTF-16 (LE or BE) or UTF-8, as signaled by the byte order mark. Note that as YAML files may only contain printable characters, this does not raise any ambiguities. For more information about the byte order mark and the Unicode character encoding schemes see the Unicode FAQ.

[002] byte_order_mark ::= #xFEFF /* the Unicode ZERO WIDTH NON-BREAKING SPACE character used to mark a UTF-32 or UTF-16 stream and determine byte ordering */

4.1.3 Indicators

Indicator characters.

Indicators are special characters that are used to describe the structure of a YAML document.

[003] sequence_entry_indicator ::= '-' /* indicates a sequence entry */
[004] mapping_entry_separator ::= ':' /* separates a key from its value */
[005] sequence_flow_start ::= '[' /* starts a flow sequence collection */
[006] sequence_flow_end ::= ']' /* ends a flow sequence collection */
[007] mapping_flow_start ::= '{' /* starts a flow mapping collection */
[008] mapping_flow_end ::= '}' /* ends a flow mapping collection */
[009] collect_line_separator ::= ',' /* separates flow collection entries */
[010] top_key_indicator ::= '?' /* indicates a complex key */
[011] alias_indicator ::= '*' /* indicates an alias node */
[012] anchor_indicator ::= '&' /* indicates an anchor property */
[013] transfer_indicator ::= '!' /* indicates a transfer method property */
[014] literal_indicator ::= '|' /* indicates a literal scalar */
[015] folded_indicator ::= '>' /* indicates a folded scalar */
[016] single_quote ::= ''' /* indicates a single quoted scalar */
[017] double_quote ::= '"' /* indicates a double quoted scalar */
[018] throwaway_indicator ::= '#' /* indicates a throwaway comment */
Indicator categories

Indicators can be grouped into two categories. The '-' , ':', ',', '?' and '#' space indicators are always followed by a white space character (space, tab or line break). If followed by any other character, they are taken to be normal content characters. The remaining indicators are taken to be indicators even if followed by a non-space character.

[019] space_indicators ::=
|
|
|
|
sequence_entry_indicator
mapping_entry_separator
collect_line_separator
top_key_indicator
throwaway_indicator
/* must be followed by white space */
[020] non_space_indicators ::=
|
|
|
|
|
|
|
|
|
|
sequence_flow_start
sequence_flow_end
mapping_flow_start
mapping_flow_end
alias_indicator
anchor_indicator
transfer_indicator
literal_indicator
folded_indicator
single_quote
double_quote
/* do not require a following white space */

4.1.4 Line Breaks

Line break characters

The Unicode standard defines the following line break characters.

[021] line_feed ::= #xA /* ASCII line feed (LF) */
[022] carriage_return ::= #xD /* ASCII carriage return (CR) */
[023] next_line ::= #x85 /* Unicode next line (NEL) */
[024] line_separator ::= #x2028 /* Unicode line separator (LS) */
[025] paragraph_separator ::= #x2029 /* Unicode paragraph separator (PS) */
[026] line_break_char ::=
|
|
|
|
line_feed
carriage_return
next_line
line_separator
paragraph_separator
/* line break characters */
Line break categories

Line breaks can be grouped into two categories. Specific line breaks have well-defined semantics for breaking text into lines and paragraphs. The semantics of generic line break characters is not defined beyond "ending a line".

Outside text content, YAML allows any line break to be used to terminate lines, and in most cases also allows such line breaks to be preceded by trailing comment characters. On output, a YAML emitter is free to emit non-content line breaks using whatever convention is most appropriate. An emitter should avoid emitting trailing line spaces.

[027] generic_break ::=
 
|
|
|
( carriage_return
  line_feed )
greedy
carriage_return
line_feed
next_line
/* line break with non-specific semantics */
[028] specific_break ::=
|
line_separator
paragraph_separator
/* line break with specific semantics */
[029] any_break ::=
|
generic_break
specific_break
/* any non-content line break */

4.1.5 Miscellaneous

This section includes several common character range definitions.

[030] flow_char ::=
-
printable_char
line_break_char
/* characters valid in a line */
[031] flow_space ::= #x20 | #x9 /* white space valid in a line */
[032] flow_non_space ::=
-
flow_char
flow_space
/* non-space characters valid in a line */
[033] flow_non_ascii ::=
-
flow_char
[#x00-#x7F]
/* non-ASCII line characters */
[034] ascii_letter ::=
|
[#x41-#x5A]
[#x61-#x7A]
/* ASCII letters, A-Z or a-z */
[035] non_zero_digit ::= [#x31-#x39] /* 1-9 */
[036] decimal_digit ::= [#x30-#x39] /* 0-9 */
[037] hex_digit ::=
|
|
decimal_digit
[#x41-#x46]
[#x61-#x66]
/* 0-9, A-F or a-f */
[038] word_char ::=
|
|
decimal_digit
ascii_letter
'-'
/* characters valid in a word */

4.2 Space Processing

Serialized YAML uses text lines to convey structure. This requires special processing rules for white space (space and tab).

4.2.1 Indentation

In a YAML serialization, structure is determined from indentation, where indentation is defined as a line break character followed by zero or more space characters.

Tab characters are not allowed in indentation unless a #TAB directive is used. If such a directive is used, each indentation tab is equivalent to a certain number of spaces determined by the specified tab policy.

A node must be more indented than its parent node. All sibling nodes must use the exact same indentation level. However the content of each such node may be indented independently.

The indentation level is used exclusively to delineate structure. Indentation characters are otherwise ignored. In particular, they are never taken to be a part of the value of serialized text.

[039] indent(n) ::= #x20 x n /* specific level of indentation */
[040] indent(<n) ::= indent(m) /* for some specific m such that m < n */
[041] indent(<=n) ::= indent(m) /* for some specific m such that m <= n */

Since the YAML serialization depends upon indentation level to delineate blocks, additional productions are a function of an integer, based on the indent(n), indent(<n) and indent(<=n) productions above. In some cases the notation production(any) is used; it is a shorthand for "production(n) for some specific value of n".

4.2.2 Throwaway comments

Throwaway comments have no effect whatsoever on the serial, generic, or native models represented in the file. Their usual purpose is to communicate between the human maintainers of the file. A typical example is comments in a configuration file.

A throwaway comment always spans to the end of a line. It consists of white spaces, optionally followed by a '#' indicators, a white space character, and arbitrary comment characters to the end of the line.

Outside text content, empty lines or lines containing only white space are taken to be implicit throwaway comment lines. Lines containing indentation followed by '#' and comment characters are taken to be explicit throwaway comment lines.

A throwaway comment may appear before a document node or following any node. A throwaway comment may not appear inside a scalar node, but may precede or follow it.

[042] throwaway_comment ::= throwaway_indicator+
( flow_space
  flow_char* )?
/* comment trailing a line */
[043] comment_line(n) ::=
|
comment_empty_line(n)
comment_text_line(n)
/* types of comment lines */
[044] comment_empty_line(n) ::= indent(<=n)
any_break
/* empty throwaway comment line */
[045] comment_text_line(n) ::= indent(<n)
throwaway_comment
any_break
/* explicit throwaway comment line */
[046] comment_break ::= ( flow_space+
  throwaway_comment? )?
any_break
/* trailing non-content spaces, comment and line break */
### These are three throwaway comment  ###

### lines (the second line is empty). ###
this: |   # Comments may trail lines.
    contains three lines of text.
    The third one starts with a
    # character. This isn't a comment.

# These are three throwaway comment
# lines (the first line is empty).

4.3 YAML Stream

A sequence of bytes is a YAML stream if, taken as a whole, it complies with the following production. Note that an empty stream is a valid YAML stream containing no documents.

Encoding is assumed to be UTF-8 unless explicitly specified by including a byte order mark as the first character of the stream. While a byte order mark may also appear before additional document headers, the same encoding must be used for all documents contained in a YAML stream.

[047] yaml_stream ::= implicit_document?
explicit_document*
/* YAML document stream */
[048] implicit_document ::= byte_order_mark?
comment_line(any)*
blk_collection(any)
document_trailer?
/* first document with an implicit header line */
[049] explicit_document ::= byte_order_mark?
comment_line(any)*
document_header
( top_scalar_node(any)
| top_collect_node(any) )
document_trailer?
/* stream document with an explicit header */

4.3.1 Document

A YAML stream may contain several independent YAML documents. A document header line is used to start a new document. This line must start with a document separator: '---' followed by a line break or a sequence of space characters.

If no explicit header line is specified at the start of the stream, the parser should behave as if a header line containing '--- #YAML:1.0 #TAB:NONE' was specified.

When YAML is used as the format for a communication stream, it is useful to be able to indicate the end of a document independent of starting the next one. Without such a marker, the YAML processor reading the stream would be forced to wait for the header of the next document (that may be long time in coming) in order to detect the end of the previous document.

To support this scenario, a YAML document may be terminated by a '...' line. Nothing but throwaway comments may appear between this line and the (mandatory) header line of the following document.

[050] document_header ::= document_start
( flow_space+ directive )*
/* YAML document header */
[051] document_start ::= '-' '-' '-' /* YAML document start indicator */
[052] document_trailer ::= document_end
any_break
comment_line(any)*
/* YAML document trailer */
[053] document_end ::= '.' '.' '.' /* YAML document end indicator */
--- >
This YAML stream contains a single text value.
The next stream is a log file - a sequence of
log entries. Adding an entry to the log is a
simple matter of appending it at the end.
---
at: 2001-08-12 09:25:00.00 Z
type: GET
HTTP: '1.0'
url: '/index.html'
---
at: 2001-08-12 09:25:10.00 Z
type: GET
HTTP: '1.0'
url: '/toc.html'
# This stream is an example of a top-level mapping.
invoice : 34843
date    : 2001-01-23
total   : 4443.52
# The following is a sequence of three documents.
# The first contains an empty mapping, the second
# an empty sequence, and the last an empty string.
--- {}
--- [ ]
--- ''
# A communication channel based on a YAML stream.
---
sent at: 2002-06-06 11:46:25.10 Z
payload: Whatever
# Receiver can process this as soon as the following is sent:
...
# Even if the next message is sent long after:
---
sent at: 2002-06-06 12:05:53.47 Z
payload: Whatever
...

4.3.2 Directive

Directives are instructions to the YAML parser. Like throwaway comments, directives are not reflected in the serial, generic or native models. Directives apply to a single document. It is an error for the same directive to be specified more than once for the same document.

[054] directive ::= throwaway_indicator
directive_name
mapping_entry_separator
directive_value
/* document directive */
[055] directive_name ::= word_char+ /* document directive name */
[056] directive_value ::= flow_non_space+ /* document directive value */

YAML defines two directives, #YAML and #TAB. Additional directives may be added in future versions of YAML. A parser should ignore unknown directives with an appropriate warning. There is no provision for specifying private directives. This is intentional.

#YAML

The #YAML directive specifies the version of YAML the document adheres to. This specification defines version 1.0.

A version 1.0 parser should accept documents with an explicit #YAML:1.0 directive, as well as documents lacking a #YAML directive. Documents with a directive specifying a higher minor version (e.g. #YAML:1.1) should be processed with an appropriate warning. Documents with a directive specifying a higher major version (e.g. #YAML:2.0) should be rejected with an appropriate error message.

#TAB

Since different systems treat tabs differently, portability problems are a concern. Therefore, the default tab policy of YAML is conservative; don't allow them inindentation (#TAB:NONE). However, for some users, their editor may make it difficult to not use tabs. In this case, the #TAB directive is available so that the tab policy is explicitly provided to the YAML parser. Note that tab characters in text content are always valid and must be preserved by the parser, regardless of the tab policy used.

YAML supports the following tab policies:

#TAB:NONE (default policy)
This policy forbids the use of tabs in indentation. If such a tab character is detected, the parser must treat it as an error. The error message should refer to the need for providing an explicit tab policy for tabs to be used as indentation characters.

Many editors can be configured such that pressing the tab key is automatically converted to the insertion of an appropriate number of spaces into the edited file, and in general support convenient editing of indented blocks without making use of tab characters. Where possible, YAML editors should be configured to using this indentation policy, as it is the only truly portable one. The http://yaml.org/editors page contains instructions on configuring known editors to use this policy.

#TAB:N (for some positive integer N)
Tab characters in indentation are equivalent to the number of spaces that would bring the indentation level to the next multiple of N.

Almost every editor supports this type of policy, with #TAB:8 being the most common, followed by #TAB:4. Most editors also allow users to configure the value of N. Typically an editor providing this flexibility can also be configured to use the #TAB:NONE policy as described above.

When #TAB:N policy is used, the parser must expand indentation tabs to spaces accordingly. Each tab, when expanded to spaces, must not span beyond the indentation into the serialized text. While this is an error, parsers should recover from it with a warning, by assigning some of the spaces to the indentation and some to the serialized text.

4.3.3 Serialization Node

A serialization node begins at a particular level of indentation, n, and its content is indented at some level >n. A serialization node can be a collection (mapping or sequence), a scalar (block or flow) or an alias.

A YAML document is a normal node. However a document can't be an alias (there is nothing it may refer to). Also if the header line is omitted the first document must be a block (not flow) collection.

[057] top_value_node(n) ::=
|
|
top_alias_node
top_collect_node(n)
top_scalar_node(n)
/* value node outside flow collection */
[058] flow_value_node(n) ::=
|
|
alias
flow_collect_node(n)
flow_scalar_value_node(n)
/* value node inside flow collection */
[059] top_key_node(n) ::=


|
( top_key_indicator
  top_value_node(>n)
  indent(n) )
( flow_key_node(n)
  flow_space* )
/* key node outside flow collection */
[060] flow_key_node(n) ::=
|
|
alias
flow_collect_node(n)
flow_scalar_key_node(n)
/* key node inside flow collection */
[061] top_alias_node ::= flow_space+
alias
comment_break
comment_line(any)*
/* alias node outside flow collection */
[062] top_collect_node(n) ::=
|
blk_collect_node(n)
( flow_space+
  flow_collect_node(n)
  comment_break
  comment_line(any)* )
/* collection node outside flow collection */
[063] blk_collect_node(n) ::= ( flow_space+
  collect_properties )?
comment_break
comment_line(any)*
blk_collection(n)
/* collection node in block style */
[064] flow_collect_node(n) ::= ( collect_properties
  flow_space+ )?
flow_collection(n)
/* collection node inside flow collection */
[065] top_scalar_node(n) ::=
|
blk_scalar_node(n)
( flow_space+
  top_scalar_value_node(n)
  comment_break
  comment_line(any)* )
/* scalar node outside flow collection */
[066] blk_scalar_node(n) ::= ( flow_space+
  scalar_properties )?
flow_space+
blk_scalar(n)
/* scalar node in block style */
[067] top_scalar_value_node(n) ::= ( scalar_properties
  flow_space+ )?
top_scalar_value(n)
/* scalar node using flow style outside flow collection */
[068] flow_scalar_value_node(n) ::= ( scalar_properties
  flow_space+ )?
flow_scalar_value(n)
/* scalar value node inside flow collection */
[069] flow_scalar_key_node(n) ::= ( scalar_properties
  flow_space+ )?
flow_scalar_key(n)
/* scalar key node inside flow collection */

4.3.4 Node Property

Each serialization node may have anchor and transfer method properties. These properties are specified in a properties list appearing before the node value itself. For a top-level node (a document), the properties appear in the document header line, following the directives (if any). It is an error for the same property to be specified more than once for the same node.

[070] collect_properties ::=


|
( collect_transfer
  ( flow_space+
    anchor_property )? )
( anchor_property
  ( flow_space+
    collect_transfer )? )
/* collection properties list */
[071] scalar_properties ::=


|
( scalar_transfer
  ( flow_space+
    anchor_property )? )
( anchor_property
  ( flow_space+
    scalar_transfer )? )
/* scalar properties list */

4.3.5 Transfer Method

The transfer method property specifies how to deserialize the associated node. It includes the type family for the node and optionally the specific format used, separated by a '|' character.

Like throwaway comments and directives, formats are not reflected in the native model. Unlike them, however, formats are preserved in the serial and generic models. While type families are preserved in all the data models, the native model may do so by implicitly associating specific type family with certain native types.

Explicit/Implicit

By providing an explicit transfer property to a node, implicit typing is prevented. However, an explicit empty transfer method property can be used to force implicit typing to be applied to a non-plain scalar value.

integer: 12
also int: ! "12"
string: !str 12
Shorthands

YAML makes use of the taguri: scheme for defining URIs for its global type families and the x-private: scheme for defining private type families. While this schemes provide the necessary semantics for identifying type families, they are rather verbose.

To increase readability, YAML does not use the full URI notation in the serialization. Instead, it provides several shorthand notations for different groups of type family URIs. A parser may choose not to expand shorthand type family names to URIs. However, in such a case the parser must still perform escaping to ensure a single unique representation of each type family name.

  • If the type family begins with a '!' character, it is taken to be a private type family whose URI is under the x-private: scheme.

# Both examples below make use of the 'x-private:ball'
# type family URI, but with different semantics.
---
pool: !!ball
   number: 8
   color: black
---
bearing: !!ball
   material: steel
  • If the type family contains no ':' and no '/' characters it is assumed to be defined under the yaml.org domain. This domain is used to define the core YAML data types.

# The URI is 'taguri:yaml.org,2002:str'
- !str a Unicode string
  • Otherwise, if the type family begins with a single word, followed by a '/' character, it is assumed to belong to a sub-domain of yaml.org.

    Each domain language.yaml.org will include all globally unique types of the language that aren't covered by the set of language-independent types. Globally unique types for each language include any built-in types and any standard library types. For languages such as Java and C#, all type names based on reverse DNS strings are globally unique. For languages such as Perl, that has a central authority (CPAN) for managing the global namespace, all the types sanctioned by the central authority are globally unique. The list of supported languages and their types is maintained as part of the YAML type repository.

# The URI is 'taguri:perl.yaml.org,2002:Text::Tabs'
- !perl/Text::Tabs {}
  • Otherwise, the type family must begin with a domain name and a date (separated by a ',' character), followed by a '/' character. In this case it is taken to be defined under the specified domain and date.

# The URI is 'taguri:clarkevans.com,2003-02:timesheet'
- !clarkevans.com,2003-02/timesheet

Type families defined in the yaml.org domain or any of its sub-domains must be defined using the appropriate specialized shorthand rather than using the generic domain syntax. This ensures each type family has a unique representation as a shorthand, in addition to having a unique representation as a URI.

Escaping

URIs support a limited ASCII-based character set. Hence, when parsing a type family name, the parser must convert any non-ASCII character to UTF-8 encoding, then use '%' style escaping to represent the resulting bytes.

In general, expanding '%' escaped characters may change the semantics of a URI. Hence the parser must accept such sequences and pass them unmodified to the application.

The parser must also accept YAML style escape sequences. These must be converted to '%' style escape sequences as described above even if the specified character is a valid printable ASCII URI character.

same:
  - !domain.tld,2002/type%30%10 value
  - !domain.tld,2002/type\0x30\n value
different: # As far as the YAML parser is concerned
  - !domain.tld,2002/type0%10 value
Prefixing

YAML provides convenient shorthand for the common case where a node and (most of) its descendents have global types families whose URIs share a common prefix. For this case, YAML allows using the '^' character to separate the ancestor node's type family into a prefix and a suffix. The parser does not consider the separator to be part of type family name.

When the parser encounters a descendant node whose type family name begins with '^', it appends the ancestor node's prefix to it. Again the '^' character is not taken to be part of the name.

It is possible for a descendant node to establish a different prefix. In this case the node may not make use of its ancestor's node prefix. It must specify a full type family name, separated into a prefix and suffix as above.

It is an error for a node's type family name to begin with '^' unless it has an ancestor node establishing a prefix. However, a node may establish a prefix even if none of its descendents make use of it.

Note that the type prefix mechanism is purely syntactical and does not imply any additional semantics. In particular, the prefix must not be assumed to be an identifier for anything.

# 'taguri:domain.tld,2002:invoice' is some type family.
invoice: !domain.tld,2002/^invoice
  # 'seq' is shorthand for 'taguri:yaml.org,2002:seq'.
  # This does not effect '^customer' below
  # because it is does not specify a prefix.
  customers: !seq
    # '^customer' is shorthand for the full notation
    # '!domain.tld,2002/customer' that stands for the
    # URI 'taguri:domain.tld,2002:customer'.
    - !^customer
      given : Chris
      family : Dumars
[072] prefix_separator ::= '^' /* separates prefix from type */
[073] format_separator ::= '|' /* separates type from format */
[074] uri_char ::=
|
|
|
|
|
|
|
|
|
|
esc_8_bit
esc_16_bit
esc_32_bit
'%' ( hex_digit x 2 )
flow_non_ascii
word_char
';' | '/' | '?' | ':'
'@' | '&' | '=' | '+'
'$' | ',' | '_' | '.'
'!' | '~' | '*' | '''
'(' | ')' | '#'
/* characters valid in a URI as defined in RFC2396, plus YAML style escaping and non-ASCII characters */
[075] mundane_uri_char ::= uri_char - ':' - '/' /* non-magical URI character */
[076] collect_transfer ::= transfer_indicator
( /* empty (implicit) */
| private_family
| global_family )
/* collection transfer method (no format) */
[077] scalar_transfer ::=
|
collect_transfer
( transfer_indicator
  global_family
  format_separator
  format )
/* scalar transfer method (with format) */
[078] private_family ::= transfer_indicator
uri_char+
/* private type names */
[079] global_family ::=
|
|
core_family
language_family
domain_family
/* global type names */
[080] format ::= flow_non_space+ /* format of a scalar */
[081] core_family ::=


|
( ( mundane_uri_char
  - transfer_indicator )
  mundane_uri_char* )
( prefix-of-above?
  prefix_separator
  suffix-of-above )
/* shorthand for taguri:
yaml.org,2002:
type names */
[082] language_family ::=

|
( word_char+
  '/' uri_char* )
( prefix-of-above?
  prefix_separator
  suffix-of-above )
/* shorthand for
taguri:
language
.yaml.org,2002:
type
names */
[083] domain_family ::=





|
( word_char+
  ( '.' word_char+ )+
  ',' ( decimal_digit x 4 )
  ( '-' ( decimal_digit x 2 )
    ( '-' decimal_digit x 2 )? )? )
  '/' uri_char*
( prefix-of-above?
  prefix_separator
  suffix-of-above )
/* shorthand for
taguri:
domain,date:type
names */

4.3.6 Anchor

An anchor is a property that can be used to mark a serialization node for future reference. An alias node can then be used to indicate additional inclusions of an anchored node by specifying the node's anchor.

[084] anchor_property ::= anchor_indicator
anchor
/* associates an anchor with a given node */
[085] anchor ::= word_char+ /* unique anchor */

4.4 Alias

Once an anchor is used to mark a node, an alias should be used to indicate additional occurrences of the node in the graph. An alias refers to the most recent preceding node having the same anchor. It is an error to have an alias use an anchor that does not occur previously in the serialization of the document.

An alias node only exists in the syntax and serial models. When converted to the generic model, an alias node becomes a second occurrence of the anchored node.

[086] alias ::= alias_indicator anchor /* alias of a preceding anchored node */
anchor : &A001 This scalar has an anchor.
override : &A001 >
 The alias node below is a repeated use of this value.
alias : *A001

4.5 Collection

Collection nodes come in two kinds, sequence and mapping. Each kind has two styles, block and flow.

[087] blk_collection(n) ::=
|
blk_sequence(n)
blk_mapping(n)
/* block collection node styles */
[088] flow_collection(n) ::=
|
flow_sequence(n)
flow_mapping(n)
/* flow collection node styles */

Flow collection styles may span multiple lines. In most cases where tokens may be separated by white space, it is possible to end the line (with an optional throwaway comment) and continue the collection in the next line. Line spanning functionality is indicated by the use of the optional_space and the required_space productions.

[089] optional_space(n) ::=
|
flow_space*
( comment_break
  indent(n)
  flow_space+ )
/* optional white space separating tokens */
[090] required_space(n) ::=
|
flow_space+
( comment_break
  indent(n)
  flow_space+ )
/* required white space separating tokens */

4.5.1 Sequence

A sequence node is the simplest node style. It contains a sequence of sub-nodes at a higher indentation level. A flow style is available for short, simple sequence.

[091] blk_sequence(n) ::= ( indent(n)
  blk_seq_entry(n) )+
/* block sequence node */
[092] blk_seq_entry(n) ::= sequence_entry_indicator
( top_value_node(>n)
| map_in_seq(n) )
/* block sequence node entry */
[093] flow_sequence(n) ::= sequence_flow_start
optional_space(n)
( flow_seq_entry(n)
  ( collect_line_separator
    required_space(n)
    flow_seq_entry(n) )* )?
sequence_flow_end
/* flow sequence node */
[094] flow_seq_entry(n) ::= flow_value_node(n)
optional_space(n)
/* flow sequence node entry */
empty: []
flow: [ one, two, three # May span lines,
         , four,        # indentation is
           five ]       # mostly ignored.
block:
 - First item in top sequence
 -
  - Subordinate sequence entry
 - >
  A folded sequence entry
 - Sixth item in top sequence

4.5.2 Mapping

A mapping node is an association of unique keys with values. It is an error for two equal key entries to appear in the same mapping node. In such a case the parser may continue processing, ignoring the second key and issuing an appropriate warning. This strategy preserves a consistent information model for streaming and random access applications.

A flow form is available for short, simple mapping nodes. Also, if a mapping node has no properties, and its first key is specified as a flow scalar without any properties, this first key may immediately follow the sequence entry indicator.

[095] blk_mapping(n) ::= ( indent(n)
  blk_map_entry(n) )+
/* block mapping node */
[096] map_in_seq(n) ::= ( flow_space x m )
flow_scalar_key(n+m+1)
flow_space*
mapping_entry_separator
top_value_node(>n+m+1)
blk_mapping(n)?
/* mapping node with no properties in a sequence entry (where m > 0) */
[097] blk_map_entry(n) ::= top_key_node(n)
mapping_entry_separator
top_value_node(>n)
/* single key:value pair */
[098] flow_mapping(n) ::= mapping_flow_start
optional_space(n)
( flow_map_entry(n)
  ( collect_line_separator
    required_space(n)
    flow_map_entry(n) )* )?
mapping_flow_end
/* flow mapping node */
[099] flow_map_entry(n) ::= flow_key_node(n)
flow_space*
mapping_entry_separator
required_space(n)
flow_value_node(n)
optional_space(n)
/* flow key:value pair */
empty: {}
flow: { one: 1, two: 2 }
spanning: { one: 1,
   two: 2 }
block:
 first : First entry
 second:
  key: Subordinate mapping
 third:
  - Subordinate sequence
  - { }
  - Previous mapping is empty.
  - A key: value pair in a sequence.
    A second: key:value pair.
  - The previous entry is equal to the following one.
  -
   A key: value pair in a sequence.
   A second: key:value pair.
 !float 12 : This key is a float.
 ? >
  ?
 : This key had to be protected.
 "\a" : This key had to be escaped.
 ? >
  This is a
  multi-line
  folded key
 : Whose value is
   also multi-line.
 ? this also works as a key
 : with a value at the next line.
 ?
  - This key
  - is a sequence
 :
  - With a sequence value.
 ?
  This: key
  is a: mapping
 :
  with a: mapping value.

4.6 Scalar

While most of the document productions are fairly strict, the scalar production is generous. It offers three flow style variants and two block style variants to choose from depending upon the readability requirements.

Throwaway comments may follow a scalar node, but may not appear inside one. The comment lines following a block scalar node must be less indented than the block scalar value. Empty lines in a scalar node that are followed by a non-empty content line are interpreted as content rather than as implicit comments. Such lines may be less indented than the text content.

[100] blk_scalar(n) ::=
|
literal(n)
folded(n)
/* block scalar styles */
[101] top_scalar_value(n) ::=
|
|
single_quoted(n)
double_quoted(n)
plain_top_value(n)
/* flow scalar value styles outside flow collection */
[102] flow_scalar_value(n) ::=
|
|
single_quoted(n)
double_quoted(n)
plain_flow_value(n)
/* flow scalar value styles inside flow collection */
[103] flow_scalar_key(n) ::=
|
|
single_quoted(n)
double_quoted(n)
plain_key
/* flow scalar key styles */

4.6.1 End Of line Normalization

Inside all scalar nodes, a compliant YAML parser must translate the two-character combination CR LF, any CR that is not followed by an LF, and any NEL into a single LF (this does not apply to escaped characters). LS and PS characters are preserved. These rules are compatible with Unicode's newline guidelines.

Normalization functionality is indicated by the use of the line_feed_break production defined below.

[104] line_feed_break ::= generic_break /* line break converted to a line feed */
[105] normalized_break ::=
|
line_feed_break
specific_break
/* a normalized end of line marker */

On output, a YAML emitter is free to serialize end of line markers using whatever convention is most appropriate, though again LS and PS must be preserved.

4.6.2 Block Modifiers

Each block scalar may have explicit indentation and chomping modifiers. These modifiers are specified following the block style indicator. It is an error for the same modifier to be specified more than once for the same node.

[106] blk_modifiers ::=

|
( explicit_indent
  chomp_control? )
( chomp_control
  explicit_indent? )
/* block scalar modifiers */

4.6.3 Explicit Indentation

Typically the indentation level of a block scalar node is detected from its first content line. This detection fails when this first line is empty, contains a leading '#' character, or contains leading white space characters.

In such cases YAML requires that the indentation level for the scalar node text content be given explicitly. This level is specified as the integer number of the additional indentation spaces used for the text content.

It is always valid to specify an explicit indentation level, though emitters should not do so in cases where detection succeeds. It is an error for detection to fail when there is no explicit indentation specified.

[107] explicit_indent ::= non_zero_digit
decimal_digit*
/* explicit additional indentation level */
# Explicit indentation must
# be given in all the three
# following cases.
leading spaces: |2
      This value starts with four spaces.

leading line break: |2

  This value starts with a line break.

leading comment indicator: |2
  # first line starts with a
  # character.

# Explicit indentation may
# also be given when it is
# not required.
redundant: |2
  This value is indented 2 spaces.

4.6.4 Chomping

Typically the final line break of a block scalar is considered to be a part of its value, and any trailing empty lines are taken to be comment lines. This default "clip" chomping behavior can be overriden by specifying a chomp control modifier.

[108] chomp_control ::= '-' | '+' /* strip or keep */
- : strip
The '-' chomp control specifies that the final line break character of the block scalar should be stripped from its value.

+ : keep
The '+' chomp control specifies that any trailing empty lines following the block scalar should be considered to be a part of its value. If this modifier is not specified, such lines are considered to be empty throwaway comment lines and are ignored. When this functionality is implied, the trailing_lines(n) production will be used.

[109] trailing_lines(n) ::= line_feed_empty_line(n)+ /* trailing content empty line (ignored unless '+' keep) */
clipped: |
    This has one newline.

same as "clipped" above: "This has one newline.\n"

stripped: |-
    This has no newline.

same as "stripped" above: "This has no newline."

kept: |+
    This has two newlines.

same as "kept" above: "This has two newlines.\n\n"

4.6.5 Literal

A literal scalar is the simplest scalar form. No processing is performed on literal scalar characters aside from end of line normalization and stripping away the indentation. Indentation is detected from the first content line. Explicit indentation must be specified in case this yields the wrong result.

This restricts literal scalars to printable characters only. Also, long lines can't be broken. In exchange for these restrictions, a literal scalar may use any printable character, including line breaks. This makes literal scalars the most readable format for source code or other text values with significant use of indicators, quotes, escape sequences, and line breaks.

[110] literal(n) ::= literal_indicator
blk_modifiers?
comment_break
literal_value(n)?
trailing_lines(n)
( comment_text_line(n)
  comment_line(n)* )?
/* literal scalar */
[111] literal_value(n) ::= literal_chunk(n)+ /* value of literal scalar */
[112] literal_chunk(n) ::= line_feed_empty_line(n)*
( literal_text_line(n)
| specific_empty_line(n) )
/* chunk of literal scalar lines */
[113] specific_empty_line(n) ::= indent(<=n)
specific_break
/* empty line with preserved specific line break */
[114] line_feed_empty_line(n) ::= indent(<=n)
line_feed_break
/* empty line with line break normalized to line feed */
[115] literal_text_line(n) ::= indent(n) flow_char+
normalized_break
/* literal line character data */
empty: |

literal: |
 The \ ' " characters may be
 freely used. Leading white
    space is significant.

 Line breaks are significant. Thus this value
 contains one empty line and ends with a single
 line break, but does not start with one.

is equal to: "The \\ ' \" characters may \
 be\nfreely used. Leading white\n   space \
 is significant.\n\nLine breaks are \
 significant. Thus this value\ncontains \
 one empty line and ends with a single\nline \
 break, but does not start with one.\n"

# Comments may follow a block scalar value.
# They must be less indented.

# Modifiers may be combined in any order.
indented and chomped: |2-
    This has no newline.

also written as: |-2
    This has no newline.

both are equal to: "  This has no newline."

4.6.6 Folding

When folding is done, a single normalized line feed is converted to a single space (#x20). When two or more consecutive (possibly indented) normalized line feeds are encountered, the parser does not convert them into spaces. Instead, the parser ignores the first of the line feeds and preserves the rest. Thus a single line feed can be serialized as two, two line feeds can be serialized as three, etc.

When folding is done, specific line breaks are preserved and may be safely used to convey text structure.

Folding block scalars

Block scalars are based on indentation to convey structure. Hence leading white space in block scalar lines is always significant. Folding block scalars builds on this fact to offer powerful and intuitive semantics.

In block scalars, folding only applies to line feeds that separate text lines starting with a non-space character. Hence, folding does not apply to leading line feeds, line feeds surrounding an empty line ending with a specific line break, or line feeds surrounding a text line that starts with a space character.

The combined effect of the processing rules above is that each "paragraph" is interpreted as a single line, empty lines are used to represent a line feed, and "more indented" lines are preserved. Also, specific line breaks may be safely used to indicate text structure.

[116] space_line_feed ::= line_feed_break /* line feed converted to a space */
[117] ignored_line_feed ::= line_feed_break /* ignored line feed */
[118] blk_line_feeds(n) ::= ignored_line_feed
line_feed_empty_line(n)+
/* sequence of line feeds in block scalar */
Folding flow scalars

Flow scalars depend on explicit indicators to convey structure, rather than indentation. Hence, in such scalars, all line space preceding or following a line break is not considered to be part of the scalar value. Hence folding flow scalars provides a more relaxed, less powerful semantics.

In flow scalars, all leading and trailing white space is stripped from each line. All generic line breaks are folded (even if the line was "more indented").

The combined effect of these processing rules is that each "paragraph" is interpreted as a single line, empty lines are used to represent a line feed, and text can be freely "indented" without affecting the scalar value. Again, specific line breaks may be safely used to indicate text structure.

[119] ignored_trail_spaces ::= flow_space* /* ignored spaces before a line break */
[120] ignored_lead_spaces(n) ::= indent(n)
flow_space+
/* ignored spaces following a line break */
[121] trail_space_line ::= ignored_trail_spaces
line_feed_break
/* line feed converted to a space */
[122] trail_line_feeds ::= ignored_trail_spaces
ignored_line_feed
( ignored_trail_spaces
  line_feed_break )+
/* sequence of line feeds in block scalar */
[123] trail_specific_line ::= ignored_trail_spaces
specific_break
/* preserved specific line break */

4.6.7 Folded

A folded scalar is similar to a literal scalar. However, unlike a literal scalar, a folded scalar is subject to (block) line folding. This allows long lines to be broken anywhere a space character (#x20) appears, at the cost of requiring an empty line to represent each line feed character.

[124] folded(n) ::= folded_indicator
blk_modifiers?
comment_break
folded_value(n)?
trailing_lines(n)
( comment_text_line(n)
  comment_line(n)* )?
/* folded scalar */
[125] folded_value(n) ::= line_feed_empty_line(n)*
( folded_chunk(n)
| non_folded_chunk(n) )
/* value of folded scalar */
[126] folded_chunk(n) ::= ( folded_paragraph(n)
  blk_line_feeds(n) )*
folded_paragraph(n)
folded_after_chunk(n)
/* value starting with a chunk of folded text */
[127] folded_paragraph(n) ::= ( folded_text_line(n)
  space_line_feed )*
folded_text_line(n)
/* single content paragraph folded into multiple physical lines */
[128] folded_text_line(n) ::= indent(n)
flow_non_space
flow_char*
/* folded text line characters */
[129] folded_after_chunk(n) ::= normalized_break
( line_feed_empty_line(n)*
  non_folded_chunk(n) )?
/* text following a folded text chunk */
[130] non_folded_chunk(n) ::=
|
non_folded_empty(n)
non_folded_indent(n)
/* value starting with non-folded text chunk */
[131] non_folded_empty(n) ::= indent(<=n)
specific_break
folded_value(n)?
/* not folded due to specific line break */
[132] non_folded_indent(n) ::= indent(n)
flow_non_space
flow_char*
normalized_break
folded_value(n)?
/* not folded due to starting white space */
empty: >

one paragraph: >
 Line feeds are converted to spaces,
 so this value contains no line
 breaks except for the final one.

multiple paragraphs: >2

  An empty line, either at
  the start or in the value:

  Is interpreted as a line
  break. Thus this value
  contains three line breaks.

indented text: >
    This is a folded paragraph
    followed by a list:
     * first entry
     * second entry
    Followed by another folded
    paragraph, another list:

     * first entry

     * second entry

    And a final folded
    paragraph.

above is equal to: |
    This is a folded paragraph followed by a list:
     * first entry
     * second entry
    Followed by another folded paragraph, another list: 

     * first entry

     * second entry

    And a final folded paragraph.

# Explicit comments may follow
# but must be less indented.

4.6.8 Single Quoted

The single quoted flow scalar style is indicated by surrounding ''' characters. Therefore, within a single quoted scalar such characters need to be escaped. No other form of escaping is done, limiting single quoted scalars to printable characters.

Single quoted scalars are subject to (flow) folding. This allows empty lines to be broken everywhere a single space character (#x20) separates non-space characters, at the cost of requiring an empty line to represent each line feed character.

[133] single_quoted(n) ::= single_quote
single_quoted_value(n)?
single_quote
/* single quoted scalar */
[134] single_quoted_value(n) ::=
|
|
single_line_feeds(n)
single_specific_lines(n)
( flow_space*
  single_text_chunk(n)? )
/* value of a single quoted scalar */
[135] single_line_feeds(n) ::= ( trail_space_line
| trail_line_feeds(n) )
( single_specific_lines(n)
| single_text_line(n) )
/* value starting with line feeds */
[136] single_specific_lines(n) ::= trail_specific_line+
( single_line_feeds(n)
| single_text_line(n) )
/* value starting with specific line breaks */
[137] single_text_line(n) ::= ignored_lead_spaces(n)
single_text_chunk(n)?
/* value ending or continuing with a text line */
[138] single_text_chunk(n) ::= single_quoted_char
( flow_space*
  single_quoted_char )*
( single_line_feeds(n)
| single_specific_lines(n)
| flow_space* )
/* value starting with some non-space text */
[139] single_quoted_char ::=
|
escaped_single_quote
( flow_non_space
- single_quote )
/* non-space char valid in a single quoted scalar */
[140] escaped_single_quote ::= single_quote
single_quote
/* indicates a single quote */
empty: ''
second: '! : \ etc. can be used freely.'
third: 'a single quote '' must be escaped.'
span: 'this contains
      six spaces
      
      and one
      line break'
is same as: "this contains six spaces\nand one line break"

4.6.9 Escaping

Escaping allows YAML scalar nodes to specify arbitrary Unicode characters, using C-style escape codes. Non-escaped nodes are restricted to printable Unicode characters.

[141] escape ::= '\' /* indicates an escape code */
[142] esc_escape ::= escape escape /* escaped backslash */
[143] esc_double_quote ::= escape double_quote /* escaped double quote character */
[144] esc_bel ::= escape 'a' /* ASCII alert (BEL) */
[145] esc_backspace ::= escape 'b' /* ASCII backspace (BS) */
[146] esc_esc ::= escape 'e' /* ASCII escape (ESC) */
[147] esc_form_feed ::= escape 'f' /* ASCII formfeed (FF) */
[148] esc_line_feed ::= escape 'n' /* ASCII linefeed (LF) */
[149] esc_return ::= escape 'r' /* ASCII carriage return (CR) */
[150] esc_tab ::= escape 't' /* ASCII horizontal tab (TAB) */
[151] esc_vertical_tab ::= escape 'v' /* ASCII vertical tab (VTAB) */
[152] esc_null ::= escape 'z' /* ASCII zero (NUL) */
[153] esc_space ::= escape #x20 /* ASCII space (SP) */
[154] esc_non_breaking_space ::= escape '_' /* Unicode non-breaking space (NBSP) */
[155] esc_next_line ::= escape 'N' /* Unicode next line (NEL) */
[156] esc_line_separator ::= escape 'L' /* Unicode line separator (LS) */
[157] esc_paragraph_separator ::= escape 'P' /* Unicode paragraph separator (PS) */
[158] esc_8_bit ::= escape 'x'
( hex_digit x 2 )
/* 8-bit character */
[159] esc_16_bit ::= escape 'u'
( hex_digit x 4 )
/* 16-bit character */
[160] esc_32_bit ::= escape 'U'
( hex_digit x 8 )
/* 32-bit character */
[161] escape_sequence ::=
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
esc_escape
esc_double_quote
esc_bel
esc_backspace
esc_esc
esc_form_feed
esc_line_feed
esc_return
esc_tab
esc_vertical_tab
esc_null
esc_space
esc_non_breaking_space
esc_next_line
esc_line_separator
esc_paragraph_separator
esc_8_bit
esc_16_bit
esc_32_bit
/* escape codes in escaped scalars */

An escaped line break is completely ignored.

[162] ignored_break ::= escape any_break /* ignored (escaped) line break */

4.6.10 Double Quoted

The double quoted style variant adds escaping to the single quoted style variant. This is indicated by surrounding '"' characters. Escaping allows arbitrary Unicode characters to be specified at the cost of some verbosity: escaping the printable '\' and '"' characters. It is an error for a double quoted value to contain invalid escape sequences.

Like single quoted scalars, double quoted scalars may span multiple lines, resulting in a single space content character for each line break. If the line break is escaped, any white space preceding it is preserved, and the line break and any leading white space in the continuation line are discarded.

[163] double_quoted(n) ::= double_quote
double_quoted_value(n)?
double_quote
/* double quoted scalar value */
[164] double_quoted_value(n) ::=
|
|
|
double_line_feeds(n)
double_specific_lines(n)
double_escaped_line(n)
( flow_space*
  double_text_chunk(n)? )
/* value of a double quoted scalar */
[165] double_line_feeds(n) ::= ( trail_space_line
| trail_line_feeds(n) )
( double_specific_lines(n)
| double_empty_lines(n)
| double_text_line(n) )
/* value starting with line feeds */
[166] double_specific_lines(n) ::= trail_specific_line+
( double_line_feeds(n)
| double_empty_lines(n)
| double_text_line(n) )
/* value starting with specific line breaks */
[167] double_text_line(n) ::= ignored_lead_spaces(n)
double_text_chunk(n)?
/* value ending or continuing with a text line */
[168] double_text_chunk(n) ::= double_quoted_char
( flow_space*
  double_quoted_char )*
( double_line_feeds(n)
| double_specific_lines(n)
| double_escaped_line(n)
| flow_space* )
/* value starting with some non-space text */
[169] double_escaped_line(n) ::= flow_space*
ignored_break
( double_line_feeds(n)
| double_specific_lines(n)
| double_empty_lines(n)
| double_text_line )
/* value starting with some non-space text */
[170] double_empty_lines(n) ::= ( ignored_lead_spaces(n)
  ignored_break(n) )*
( double_line_feeds(n)
| double_specific_lines(n)
| double_text_line )
/* value starting with some non-space text */
[171] double_quoted_char ::=
|
escape_sequence
( flow_non_space
- escape )
/* non-space character valid in a double quoted scalar */
empty: ""
second: "! : etc. can be used freely."
third: "a \" or a \\ must be escaped."
fourth: "this value ends with an LF.\n"
span: "this contains
  four  \
      spaces"
is equal to: "this contains four  spaces"

4.6.11 Plain

The plain style variant is a restricted form of the single quoted style variant. As it has no identifying markers, it may not start or end with white space characters, may not start with most indicators, and may not contain certain indicators. Also, a plain scalar is subject to implicit typing. This can be avoided by providing an explicit transfer method property.

Since it lacks identifying markers, the restrictions on a plain scalar depends on the context. There are three different such contexts, with increasing restrictions.

Top level plain values are the least restricted plain scalar format. While they can't start with most indicators, they may contain any indicator except ' # ' and ': '. Plain scalars used in flow collections are further restricted not to contain the ', ' indicator. Finally, plain keys are further restricted to a single line.

[172] plain_top_value(n)
plain_flow_value(n)
::= '-' | plain_value(n) /* avoiding either top or inline indicator chars */
[173] plain_value(n) ::= plain_first_line
( plain_line_feeds(n)
| plain_specific_lines(n) )
/* plain scalar used as a value */
[174] plain_key ::= '-' | plain_first_line /* plain scalar used as a key; avoiding inline indicator chars */
[175] plain_first_line ::= plain_first_char
( flow_space*
  plain_char )*
/* start of plain scalar */
[176] plain_first_char ::=


|
( flow_non_space
- space_indicators
- non_space_indicators )
( space_indicators
  plain_char )
/* first character of plain scalar */
[177] plain_char ::=

|

|
( flow_non_space
- plain_space_indicator )
( plain_space_indicator
  plain_char )
( plain_char
  throwaway_indicator+ )
/* char valid inside plain value */
[178] plain_space_indicator ::=
|
plain_top_indicator
plain_flow_indicator
/* depends on being reached from key/inline or top */
[179] plain_top_indicator ::=
|
throwaway_indicator
mapping_entry_separator
/* invalid in top plain value, unless followed by non-space char */
[180] plain_flow_indicator ::=
|
plain_top_indicator
collect_line_separator
/* invalid in flow plain value, unless followed by non-space char */
[181] plain_line_feeds(n) ::= ( trail_space_line
| trail_line_feeds(n) )
( plain_specific_lines(n)
| plain_text_line(n) )
/* value starting with line feeds */
[182] plain_specific_lines(n) ::= trail_specific_line+
( plain_line_feeds(n)
| plain_text_line(n) )
/* value starting with specific line breaks */
[183] plain_text_line(n) ::= ignored_lead_spaces(n)
plain_text_chunk(n)?
/* value ending or continuing with a text line */
[184] plain_text_chunk(n) ::= plain_char
( flow_space*
  plain_char )*
( plain_line_feeds(n)
| plain_specific_lines(n) )?
/* value starting with some non-space text */
first: There is no unquoted empty string.
second: 12          ## This is an integer.
boolean: -          ## This is (false).
third: !str 12      ## This is a string.
span: this contains
      six spaces

      and one
      line break

indicators: this has no comments.
            #:foo and bar# are
            both text.
flow: [ can span
           lines, # comment
           like
           this ]
note: { one-line keys: but
        multi-line values }

5 Transfer Methods

A transfer method is the combination of the type family and (for scalars) the format used to serialize a value in a YAML document stream. This section provides a list of common type families and their associated formats defined under the yaml.org domain.

Every serialization node has, by definition, a transfer method. YAML provides three mechanisms for identifying the transfer method of a node.

Default Typing
By default the parser assigns the str type family to all scalar nodes (except for plain scalars), the map type family to all mapping nodes, and the seq type family to all sequence nodes.

Implicit Typing
All plain scalars are subject to implicit typing, unless they are annotated with an explicit transfer method property. For each scalar type family, there is a set of implicit formats, and each such format has a regular expression. The parser compares the scalar value with the list of these regular expressions. If the value matches one of these expressions, it is parsed as if it were explicitly annotated with the appropriate type family and format. It is an error for a value to match more than one such regular expression.

The active set of implicit transfer methods depends upon the application. Regular expressions for implicit formats must start with '^`' if they are not defined in this specification or accepted into the YAML type repository. Values matching such private implicit transfer methods therefore always begin with the '`' character. This prevents private implicit transfer methods from interfering with global ones.

Explicit Typing
A node may be given an explicit transfer method property, specifying the node's type family and optionally its format. If no format is given, the parser matches the value with the regular expressions of each of the implicit and explicit formats provided by the type family to determine the specific format used. It is an error for a value to match more than one such regular expression.

Using an explicit transfer method is required when default and implicit typing fail to identify the intended type family and format for a node. This usually occurs only for application-defined collection type families or when there is no implicit type for a scalar type family (such as binary). In addition, implicit typing will assign the wrong type family to plain string values that match any implicit format. Quoting the value is usually a more elegant way of handling this than adding an explicit !str transfer method.

Following is a list of common type families and their associated formats defined under the yaml.org domain. YAML requires support for the seq, map and str type families. While the other type families are not mandatory, they usually map to native data types in most programming languages, so using them promotes interoperability with other YAML systems.

Additional common type families are defined in the YAML type repository available at http://yaml.org/repository. An application may also use private type families or global type families defined on the basis of some URI or DNS domain name. The exact set of transfer methods used in a document is a part of the document's schema, and is tied to the expected document graph structure, the set of valid mapping keys, etc.

5.1 Sequence

This type family is used for sequence nodes unless they are given an explicit transfer method property. Example bindings include the Perl array, Python's list or tuple, and Java's array or vector.

name: http://yaml.org/seq
definition:

Collections indexed by sequential integers starting with zero.

kind:

Sequence.

node:

Mutable.

# The following are equal seqs
# with different identities.
flow: [ one, two ]
spanning: [ one,
     two ]
block:
  - one
  - two

5.2 Mapping

This type family is used for mapping nodes unless they are given an explicit transfer method property. Example bindings include the Perl hash, Python's dictionary, and Java's Hashtable.

name: http://yaml.org/map
definition:

Associative container, where each key is unique in the association and mapped to exactly one value.

kind:

Mapping.

node:

Mutable.

# The following are equal maps
# with different identities.
flow: { one: 1, two: 2 }
block:
    one: 1
    two: 2

5.3 String

This type family is used for all scalar styles with the exception of plain scalars, unless they are given an explicit transfer method property. It is also used as the implicit type for all plain scalars starting with a '/' character, with an alphabetic character, or with a digit, unless they match the integer, float or time implicit formats. Note that '_' and all non-ASCII characters are assumed to be alphabetic for this purpose. This allows the detection pattern to be independent of the Unicode character properties table.

This type is usually bound to the native language's string or character array construct.

name: http://yaml.org/str
definition:

Unicode strings, a sequence of zero or more Unicode characters.

kind:

Scalar.

node:

Immutable. Note that C/C++ string classes are mutable and hence a copy-on-write policy must be used when modifying C/C++ YAML string values.

formats:

subset

canonical ~= .* /* any sequence of characters */

implicit

text_first ~= [/_a-zA-Z0-9\
\x80-\Uffffffff].*
- (
int|flt| time)
/* start with text character, unless it is an integer, float or time */

explicit

non_text_first ~= [^/_a-zA-Z0-9\
\x80-\Uffffffff].*\
|
int|flt| time
/* start with a non-text character, or is an integer, float or time */

Specifying an explicit string type family is required to bypass implicit typing for a plain scalar. The same effect can be achieved by converting it to another scalar style.

- 12 # An integer
# The following scalars
# are loaded to the
# string value '1' '2'.
- !str 12
- '12'
- "12"
- "\
 1\
 2\
 "
# Strings containing paths and regexps can be unquoted:
- /foo/bar      # A string containing an absolute path.
- d:/foo/bar    # A string containing a CP/M absolute path.
- foo/bar       # A string containing a relative path.
- /a.*b/        # A string containing a regular expression.

5.4 Null

A null value is used to indicate the lack of a value. This is typically converted into any native null-like value (e.g., undef in Perl, None in Python). Note that in most programming languages a mapping entry with a key and a null value is valid and different from not having that key in the mapping.

name: http://yaml.org/null
definition:

Devoid of value.

kind:

Scalar.

node:

Immutable.

formats:

implicit

canonical ~= ~ /* single tilde character */

implicit

english ~= \(null\)|\(Null\)|\(NULL\)
\(nil\)|\(Nil\)|\(NIL\)
/* English word format */
canonical: ~
english: (null)
# This sequence has four
# entries, two with values.
sparse:
  - ~
  - 2nd entry
  - (nil)
  - 4th entry
four: This mapping has four keys,
      only two with values.

5.5 Boolean

The Boolean represents a true/false value. Booleans can be formatted either as +/- or as English words (true/false, yes/no and on/off).

name: http://yaml.org/bool
definition:

Mathematical Booleans.

kind:

Scalar.

node:

Immutable.

formats:

implicit

canonical ~= +|- /* single sign character */

implicit

english ~= \(true\)|\(True\)|\(TRUE\)
|\(false\)|\(False\)|\(FALSE\)
|\(yes\)|\(Yes\)|\(YES\)
|\(no\)|\(No\)|\(NO\)
|\(on\)|\(On\)|\(ON\)
|\(off\)|\(Off\)|\(OFF\)
/* English word format */
- : used as key  # Does not indicate a sequence.
canonical: +
logical:  (true)
informal: (no)

5.6 Integer

The integer represents arbitrarily sized finite mathematical integers. Integers can be formatted using the familiar decimal notation, or may have a leading '0x' to signal hexadecimal, or a leading '0' to signal an octal base. Any ',' characters in the number are ignored, allowing a readable representation of large values.

Scalars of this type should be represented by a native integer data type, if possible. However, there are cases where an integer provided may overflow the native type's storage capability. In this case, the loader should find some manner to round-trip the integer, perhaps using a string based representation. In general, integers representable using 32 binary digits should safely round-trip through most systems.

name: http://yaml.org/int
definition:

Mathematical integers.

kind:

Scalar.

node:

Immutable.

formats:

subset

canonical ~= 0|-?[1-9][0-9]* /* canonical integer format */

implicit

dec ~= [-+]?(0|[1-9])[0-9,]* /* base 10 signed decimal integer format */

implicit

oct ~= [-+]?0[0-7,]+ /* base 8 integer format */

implicit

hex ~= [-+]?0x[0-9a-fA-F,]+ /* base 16 integer format */
canonical: 12345
decimal: +12,345
octal: 014
hexadecimal: 0xC

5.7 Float

The floating point type family handles approximations to real numbers, including three special values (positive and negative infinity and "not a number"). This should be loaded to some native float data type. The loader may choose from a range of such native data types according to the size and accuracy of the floating point value. Note that not all float values can be represented exactly when stored in any native float type, and hence a float value may change by "a small amount" when round-tripped through a native type. The valid range and accuracy depend on the loader, though 32 bit IEEE floats should be safe.

name: http://yaml.org/float
definition:

Floating point approximation to real numbers.

kind:

Scalar.

node:

Immutable.

formats:

subset

canonical ~= [-]?[0-9]\.([0-9]*[1-9])\
?e[-+](0|[1-9][0-9]+)\
|\(-?inf\)|\(nan\)
/* canonical (scientific notation and special values) floating point format */

implicit

exp ~= [-+]?[0-9][0-9,]*\.[0-9.]*\
[eE][-+][0-9]+
/* exponential notation floating point format */

implicit

fix ~= [-+]?[0-9][0-9,]*\.[0-9,]* /* fixed point notation floating point format */

implicit

english ~= \(nan\)|\(NaN\)|\(NAN\)
|\([+-]?inf\)|\([+-]?Inf\)\
|\([+-]?INF\)
/* special floating point values */
canonical: 1.23015e+3
exponential: 12.3015e+02
fixed: 1,230.15
negative infinity: (-inf)
not a number: (NaN)

5.8 Time

A time value represents a single point in time. This can be serialized using a subset of the ISO8601 format and the formats proposed by the W3C note on datetime. In addition, a similar format is also supported for enhanced readability, based on white space separation.

The time part may be omitted, resulting in a date format. In such a case, the time part is assumed to be 12:00:00Z (noon UTC). This allows the time point to retain its date, regardless of the time zone used.

name: http://yaml.org/time
definition:

A point in time.

kind:

Scalar.

node:

Immutable.

formats:

subset

canonical ~= [0-9][0-9][0-9][0-9]-\
[0-9][0-9]-[0-9][0-9]\
T[0-9][0-9]:[0-9][0-9]:\
[0-9][0-9](\.[0-9]*[1-9])?Z
/* canonical specific ISO 8601 format based on UTC */

implicit

iso8601 ~= [0-9][0-9][0-9][0-9]-\
[0-9][0-9]-[0-9][0-9]\
[Tt][0-9][0-9]:[0-9][0-9]:\
[0-9][0-9](\.[0-9]*[1-9])?\
(Z|[-+][0-9][0-9]\
(:[0-9][0-9])?)
/* valid ISO 8601 format variant */

implicit

ymd_hms_z ~= [0-9][0-9][0-9][0-9]-\
[0-9][0-9]-[0-9][0-9]\
[ \t]+[0-9][0-9]:[0-9][0-9]:\
[0-9][0-9](\.[0-9]*)?\
[ \t]+(Z|[-+][0-9][0-9]\
(:[0-9][0-9])?)
/* space separated (non-ISO 8601) format for enhanced readability */

implicit

ymd ~= [0-9][0-9][0-9][0-9]-\
[0-9][0-9]-[0-9][0-9]
/* omitted time (valid ISO 8601 format) */
canonical:       2001-12-15T02:59:43.1Z
valid iso8601:   2001-12-14t21:59:43.10-05:00
space separated: 2001-12-14 21:59:43.10 -05:00
date (noon UTC): 2002-12-14

5.9 Binary

The binary type family accepts the base64 format and deserializes it into some native binary data type (e.g., byte[] in Java). This is the recommended way to store such data in YAML files. Note however that many forms of binary data have internal structure that may benefit from being represented as YAML nodes (e.g. the Java serialization format).

name: http://yaml.org/binary
definition:

Binary data, a sequence of zero or more octets (8 bit values).

kind:

Scalar.

node:

Immutable.

formats:

subset

canonical ~= Clean base64 /* base64 encoded data without any white space characters */

explicit

base64 ~= Generic base64 /* base64 encoded data as per RFC2045 */
canonical: !binary "\
 R0lGODlhDAAMAIQAAP//9/X17unp5WZmZgAAAOf\
 n515eXvPz7Y6OjuDg4J+fn5OTk6enp56enmlpaW\
 NjY6Ojo4SEhP/++f/++f/++f/++f/++f/++f/++\
 f/++f/++f/++f/++f/++f/++f/++SH+Dk1hZGUg\
 d2l0aCBHSU1QACwAAAAADAAMAAAFLCAgjoEwnuN\
 AFOhpEMTRiggcz4BNJHrv/zCFcLiwMWYNG84Bww\
 EeECcgggoBADs="
base64: !binary |
 R0lGODlhDAAMAIQAAP//9/X17unp5WZmZgAAAOf
 n515eXvPz7Y6OjuDg4J+fn5OTk6enp56enmlpaW
 NjY6Ojo4SEhP/++f/++f/++f/++f/++f/++f/++
 f/++f/++f/++f/++f/++f/++f/++SH+Dk1hZGUg
 d2l0aCBHSU1QACwAAAAADAAMAAAFLCAgjoEwnuN
 AFOhpEMTRiggcz4BNJHrv/zCFcLiwMWYNG84Bww
 EeECcgggoBADs=
description: >
 The binary value above is a tiny arrow
 encoded as a gif image.

5.10 Special Keys

The special key type family is used for special YAML defined items that are used as mapping keys to denote structural information.

name: http://yaml.org/special
kind:

Scalar.

node:

Mutable.

definition:

Special mapping keys (see below).

formats:

implicit

canonical ~= = /* serializable special keys */

virtual

canonical ~= !|\||&|\* /* special non-serializable keys */

All special keys stand for special in-memory values that are different from any value in any other type family. Specifically, these special in-memory values must not be implemented as string values.

=
The '=' key is used to denote the "default value" of a mapping. In some cases, it is useful to evolve a schema so that a scalar value is replaced with a mapping. A processor may present a "scalar value" method that provides the value directly if the node is a scalar, or, returns the value of this special key if the node is a mapping. If applications only ask for the scalar value, then the schema may freely grow over time replacing scalar values with richer data constructs without breaking older processing systems.

---     # Old schema
link with:
  - library1.dll
  - library2.dll
---     # New schema
link with:
  - = : library1.dll
    version: 1.2
  - = : library2.dll
    version: 2.3
! & *
These keys should not be used in serialized YAML documents. Their names are merely a convention for representing the appropriate special in-memory values. Hence these keys are called "virtual keys".

Virtual keys are used when a YAML parser encounters a valid YAML value of an unknown transfer method. For a schema-specific application, this is not different from encountering any other valid YAML document that does not satisfy the schema. Such an application may safely use a parser that rejects any value of any unknown transfer method, or discards the transfer method property with an appropriate warning and parses the value as if the property was not present.

For a schema-independent application (for example, a hypothetical YAML pretty print application), this is not an option. Parsers used by such applications should encode the value instead. This may be done by wrapping the value in a mapping containing virtual special keys. The '!' key denotes the unsupported type family, and the '|' key denotes the format used. In some cases it may be necessary to encode anchors and alias nodes as well. The '&' and '*' keys are used for this purpose.

This encoding should be reversed on output, allowing the application to safely round-trip any valid YAML document. In-memory, the encoded data may be accessed and manipulated in a standard way using the three basic data types (mapping, sequence and string), allowing limited processing to be applied to arbitrary YAML data.

"!": These three keys
"&": had to be quoted
"=": and are normal strings.
# NOTE: the following node should NOT be serialized this way.
encoded node :
 !special '!' : '!type'
 !special|canonical '&' : 12
 = : value
# The proper way to serialize the above node is as follows:
node : !!type &12 value