YAML Ain't Markup Language (YAML) (tm) 1.0

Working Draft 07 Apr 2002

Latest version:
https://yaml.org/spec/
Editors:
Brian Ingerson (mailto:ingy@ttul.org)),
Clark C. Evans,
Oren Ben-Kiki (mailto:oren@ben-kiki.org)

Status of this Document

This specification is a working draft and reflects consensus reached by the members of the yaml-core mailing list. Any questions regarding this draft should be raised on this list. This is a draft and changes are expected. Therefore, implementers should follow this mailing list closely.


Abstract

YAML(tm) (rhymes with "camel") is a straightforward machine parsable data serialization format designed for human readability and interaction with scripting languages such as Perl and Python. YAML is optimized for data serialization, configuration settings, log files, Internet messaging and filtering. This specification describes the YAML information model and serialization format. Together with the Unicode standard for characters, it provides all the information necessary to understand YAML Version 1.0 and construct computer programs to process it.

Table of Contents

1 Introduction
    1.1 Goals
    1.2 Origin
    1.3 Relation to XML
    1.4 Terminology

2 Preview
    2.1 Collections
    2.2 Structures
    2.3 Styles
    2.4 Type Family
    2.5 Full Length Examples

3 Key Concepts

     3.1 General Concepts
         3.1.1 Type Family
         3.1.2 String Format

     3.2 Graph Model
         3.2.1 Node
         3.2.2 Scalar
         3.2.3 Identity
         3.2.4 Node set
         3.2.5 Collection
         3.2.6 Equality
         3.2.7 Documents Stream

     3.3 Tree Model
         3.3.1 Tree Node
         3.3.2 Leaf
         3.3.3 Alias
         3.3.4 Pair
         3.3.5 Branch
         3.3.6 Ordering

     3.4 Syntax Model
         3.4.1 Style
         3.4.2 Comment
         3.4.3 Directive

4 Serialization Syntax

     4.1 Characters
         4.1.1 Character Set
         4.1.2 Encoding
         4.1.3 Indicators
         4.1.4 Line Breaks
         4.1.5 Miscellaneous Characters

     4.2 Line Processing
         4.2.1 Indentation
         4.2.2 Throwaway comments

     4.3 YAML Stream
         4.3.1 Header
         4.3.2 Directive
         4.3.3 Serialization Node
         4.3.4 Node Property
         4.3.5 Transfer Method
         4.3.6 Anchor

     4.4 Alias

     4.5 Branch
         4.5.1 Series
         4.5.2 Keyed

     4.6 Leaf

         4.6.1 Nested Properties
             4.6.1.1 Folding
             4.6.1.2 Escaping
             4.6.1.3 Chomping
             4.6.1.4 Explicit Indentation

         4.6.2 Nested
             4.6.2.1 Plain Block
             4.6.2.2 Chomped Block
             4.6.2.3 Escaped Block
             4.6.2.4 Chomped Escaped Block
             4.6.2.5 Plain Folded
             4.6.2.6 Chomped Folded
             4.6.2.7 Escaped Folded
             4.6.2.8 Chomped Escaped Folded

         4.6.3 In-line
             4.6.3.1 Single Quoted
             4.6.3.2 Double Quoted
             4.6.3.3 Simple

5 Transfer Methods
     5.1 Sequence
     5.2 Map
     5.3 String
     5.4 Null
     5.5 Pointer
     5.6 Integer
     5.7 Float
     5.8 Date
     5.9 Time
     5.10 Timestamp
     5.11 Binary
     5.12 Special Keys

6 Changes From Other Versions

1 Introduction

YAML Ain't Markup Language, abbreviated YAML, is a human-readable data serialization format and processing model. This text describes the class of data objects called YAML document streams and partially describes the behavior of computer programs that process them.

YAML document streams encode into a serialized form the native data constructs of modern scripting languages. Strings, arrays, hashes, and other user-defined data types are supported. A YAML document stream consists of a sequence of characters, some of which are considered part of the document's content, and others which are used to indicate structure within the information stream.

A YAML processor is a software module that is used to manipulate YAML information. A processor may perform multiple functions, such as parsing a YAML serialization into a series of events, loading these events into a native language representation, dumping a native representation into a series of events, and emitting these events into a serialized form. It is assumed that a YAML processor does its work on behalf of another module, called an application. This specification describes the required behavior of a YAML processor. It describes how a YAML processor must read or write YAML document streams and the information structures it must provide to or obtain from the application.

1.1 Goals

The design goals for YAML are:

  1. YAML documents are very readable by humans.

  2. YAML interacts well with scripting languages.

  3. YAML uses host languages' native data structures.

  4. YAML has a consistent information model.

  5. YAML enables stream-based processing.

  6. YAML is expressive and extensible.

  7. YAML is easy to implement.

YAML was designed with experience gained from the construction and deployment of Brian Ingerson's Perl module Data::Denter. YAML has also enjoyed much markup language critique from SML-DEV list participants and builds upon the experiences with the Minimal XML and Common XML specifications.

1.2 Origin

YAML integrates and builds upon structures and concepts described by C, Java, Perl, Python, RFC0822 (MAIL), RFC1866 (HTML), RFC2045 (MIME), RFC2396 (URI), SAX, SOAP and XML.

YAML's core type system is based on the serialization requirements of Perl. YAML directly supports both scalar values (string, integer) and collections (array, hash). Support for common types enables programmers to use their language's native data constructs for YAML manipulation, instead of requiring a special document object model (DOM).

Like XML's SOAP, the YAML serialization supports native graph structures through a rich alias mechanism. Also like SOAP, YAML provides for application-defined types. This allows YAML to serialize rich data structures required for modern distributed computing.

YAML provides unique global type names using a namespace mechanism inspired by Java's DNS based package naming convention and XML's URI based namespaces.

YAML's block scoping is similar to Python's. In YAML, the extent of a node is indicated by its column. YAML's block leaf leverages this by enabling formatted text to be cleanly mixed within an aggregate structure without troublesome escaping. Further, YAML's block indenting provides for easy inspection of the document's structure.

Motivated by HTML's end-of-line normalization, YAML's folded leaf introduces a unique method of handling whitespace. In YAML, single line breaks may be folded into a single space. This technique allows for paragraphs to be word-wrapped without affecting the canonical form of the content.

YAML's escaped leaf uses familar C-style escape sequences. This enables ASCII representation of non-printable or 8-bit (ISO 8859-1) characters such as '\x3B'. 16-bit Unicode and 32-bit (ISO/IEC 10646) characters are supported with escape sequences such as '\u003B' and '\U0000003B'.

The syntax of YAML was motivated by Internet Mail (RFC0822). Further, YAML borrows the document separator from MIME (RFC2045). YAML's top level production is a stream of independent documents; ideal for distributed processing systems.

YAML was designed to have an incremental interface which includes both a pull-style input stream and a push-style (SAX-like) output stream interfaces. Together this enables YAML to support the processing of large documents, such as a transaction log, or continuous streams, such as a feed from a production machine.

1.3 Relation to XML

There are many differences between YAML and the eXtensible Markup Language (XML). XML was designed to be backwards compatible with Standard Generalized Markup Language (SGML) and thus had many design constraints placed on it that YAML does not share. Also XML, inheriting SGML's legacy, is designed to support structured documents, where YAML is more closely targeted at messaging and native data structures. Where XML is a pioneer in many domains, YAML is the result of many lessons from the XML community.

The YAML and XML information models are starkly different. In XML, the primary construct is an attributed tree, where each element has an ordered, named list of children and an unordered mapping of names to strings. In YAML, the primary graph constructs are keyed collections (natively stored as a hash or array) and scalar values (string, integer, floating point). This difference is critical since YAML's model is directly supported by native data structures in most modern programming languages, where XML's model requires mapping conventions, or an alternative programming component (e.g. a document object model).

1.4 Terminology

The terminology used to describe YAML is defined in the body of this specification. The terms defined in the following list are used in building those definitions and in describing the actions of a YAML processor:

may

Conformant YAML streams and processors are permitted to but need not behave as described.

should

Conformant YAML texts and processors are encouraged to behave as described, but may do otherwise if a warning mesage is provided to the user and any deviant behavior requires concious effort to enable. (i.e. a non-default setting)

must

Conformant YAML texts and processors are required to behave as described, otherwise they are in error.

error

A violation of the rules of this specification; results are undefined. Conforming software must detect and report an error and may recover from it.

2 Preview

This section provides a quick glimpse into the expressive power of YAML without going into too much detail. It is not expected that the first-time reader grok all of the examples. Instead these selections are used as motivation for the following sections.

2.1 Collections

YAML collections allow for aggregation of data. There are two primary types of collections which YAML supports, sequences and mappings. Most tree structures can be constructed by nesting collections.

- Mark McGwire
- Sammy Sosa
- Ken Griffey

A1

Sequence of scalars
(ball players)

hr:  65
avg: 0.278
rbi: 147

A2

Mapping of scalars to scalars
(player statistics)

american:
   - Boston Red Sox
   - Detroit Tigers
   - New York Yankees
   - Texas Rangers
national:
   - New York Mets
   - Chicago Cubs
   - Atlanta Braves
   - Montreal Expos

A3

Mapping to sequences of scalars
(ball clubs in each league)

- 
  name: Mark McGwire
  hr:   65
  avg:  0.278
  rbi:  147
- 
  name: Sammy Sosa
  hr:   63
  avg:  0.288
  rbi:  141

A4

Sequence of mappings
(players' statistics)

?
    - New York Yankees
    - Atlanta Braves
:
  - 2001-07-02
  - 2001-08-12
  - 2001-08-14
?
    - Detroit Tigers
    - Chicago Cubs
:
  - 2001-07-23

A5

Mapping from sequences to sequences
(team pair to play dates)

invoice: 34843
date   : 2001-01-23
bill-to:
   given  : Chris
   family : Dumars
product:
   - quantity: 4
     desc    : Basketball
   - quantity: 1
     desc    : Super Hoop





A6

Nesting of mappings and sequences
(a simple invoice)

2.2 Structures

YAML streams can be commented and separated into multiple documents. To allow for graph serialization, YAML has a built-in alias mechanism.

---
name: Mark McGwire
hr:  65
avg: 0.278
rbi: 147

---
name: Sammy Sosa
hr:  63
avg: 0.288
rbi: 141

B1

Two documents within a stream
(players' statistics)

# Ranking of players by
# season home runs.
---
   - Mark McGwire
   - Sammy Sosa
   - Ken Griffey






B2

Single document with leading comment

# Home runs
hr:
 # 1998 record
   - Mark McGwire
   - Sammy Sosa
# Runs batted in
rbi:
   - Sammy Sosa
   - Ken Griffey

B3

Single document with nested comments

# Home runs
hr:
 # 1998 record
   - Mark McGwire
   - &SS Sammy Sosa
# Runs batted in
rbi:
   - *SS
   - Ken Griffey

B4

Alias used for second occurrence of Sammy Sosa.

2.3 Styles

Besides the simple in-line scalars used above, YAML has support for several nested and quoted scalar styles. For small sequences and mappings, an in-line style helps make YAML easy to author.

--- ]
    Mark McGwire's
    year was crippled
    by a knee injury.

C1

Line folding helps readability

--- |
    \/|\/|
    / |  |_


C2

Line folding is not desired

--- ]\
Sosa completed
another fine
season. \u263A



C3

Unicode smiley using ASCII

name: Mark McGwire
occupation: baseball player
comments: ]
   Mark set a major
   league home run
   record in 1998.

C4

Scalars within a collection

years: "1998\t1999\t2000\n"
msg:   "Sosa did fine. \u263A"

C5

Double quoted (escaped in-line)

- ' \/|\/|  '
- ' / |  |_ '

C6

Single quoted (unescaped in-line)

- [ name        , hr,  avg ]
- [ Mark McGwire, 65, 0.278 ]
- [ Sammy Sosa  , 63, 0.288 ]

C7

Sequence of sequences (in-line)

Mark McGwire: {hr: 65, avg: 0.278}
Sammy Sosa:   {hr: 63, avg: 0.288}


C8

Mapping of mappings (in-line)

2.4 Type Family

To encode data type and other application semantics in a YAML serialization, every node has a type family and leaf nodes have a syntax format.

invoice: 34843
date   : 2001-01-23
buyer:
  given  : Chris
  family : Dumars
product:
  - Basketball: 4
  - Superhoop:  1

D1

Implicit family and format

invoice: !int|dec 34843
date   : !date|ymd 2001-01-23
buyer: !map
   given  : !str Chris
   family : !str Dumars
product: !seq
   - !str Basketball: !int 4
   - !str Superhoop:  !int 1

D2

Explicit family and format

--- !binary|base64 ]
 R0lGODlhDAAMAIQAAP/
 9/X17unp5WZmZgAAAOf
 n515eXvPz7Y6OjuDg4J
 +fn5OTk6enp56enmlpa
 NjY6Ojo4SEhP/++f/++
 f/++f/++f/++f/++f/+
 EeECcgggoBADs=

D3

Binary type family and Base64 string format

--- !seq
  0: Mark McGwire
  1: Sammy Sosa
  2: Ken Griffey
---
empty: !map
invoice: !str 34843


D4

Override implicit family

--- !clarkevans.org/schedule/^entry
who: Clark C. Evans
when: 2001-11-18
hours: !^hours 3
description: ]
   Wrote up these examples
   and learned alot about
   baseball statistics.


D5

Application-specific family

--- !clarkevans.com/graph/^shape
- !^circle
  center: &ORIGIN {x: 73, y: 129}
  radius: 7
- !^line [23,32,300,200]
- !^text
  center: *ORIGIN
  color: 0x02FDBA
  value: Center of circle

D6

Application specific family

2.5 Full Length Examples

Following are two full-length examples. On the left is a sample invoice, on the right is a sample log file.

--- !clarkevans.com/^invoice
invoice: 34843
date   : 2001-01-23
bill-to: &id001
    given  : Chris
    family : Dumars
    address:
        lines: |
            458 Walkman Dr.
            Suite #292
        city    : Royal Oak
        state   : MI
        postal  : 48046
ship-to: *id001
product:
    - sku         : BL394D
      quantity    : 4
      description : Basketball
      price       : 450.00
    - sku         : BL4438H
      quantity    : 1
      description : Super Hoop
      price       : 2392.00
tax  : 251.42
total: 4443.52
comments: ]
    Late afternoon is best.
    Backup contact is Nancy
    Billsmer @ 338-4338.

E1

Invoice

---
Date: 2001-11-23
Time: 13:02+5:00
User: ed
Warning: ]
  This is an error message
  for the log file
---
Date: 2001-11-23
Time: 15:02+5:00
User: ed
Warning: ]
  A slightly different error
  message.
---
Date: 2001-11-23
Time: 15:03+5:00
User: ed
Fatal: ]
  Unknown variable "bar"
Stack:
  - file: TopClass.py
    line: 23
    code: |
      x = MoreObject("345\n")
  - file: MoreClass.py
    line: 58
    code: |
      foo = bar

E2

Log file

3 Key Concepts

Conceptually, a YAML system may be understood as three interacting states: a serialization format, an event stream, and a native binding. Translating YAML information between these states are four processing components: a parser, a loader, a dumper and an emitter. The parser extracts structured information from the input stream. The loader converts this information into the appropriate native structures.

[serialization  format]

-->

[event  stream]

-->

[native  binding]

 

(parser)

 

(loader)

 

 

 

 

 

 

[serialization  format]

<--

[event  stream]

<--

[native  binding]

 

(emitter)

 

(dumper)

 

For each one of the states above, there is a corresponding information model. The graph model covers the native binding, the tree model covers the event stream, and the syntax model covers the serialization format. Type information is moved between these states using the the type family and string format constructs.

graph model

The graph model abstracts data structures of common programming languages. Nodes in the graph include collections or scalars. A collection is modeled as a function from one set of nodes to another. Scalars are nodes having a string representation. Both kinds of nodes have a type family.

tree model

The tree model flattens the graph structure into a hierarchy of branches, leaves and alias nodes. A branch represents the first occurrence of a collection, a leaf represents the first occurrence of a given scalar, and an alias is a surrogate used for subsequent occurrences of either collections or scalars. In this model, collections are realized as an ordered set of node pairs, called a branch.

syntax model

The syntax model enhances the tree model with comments, leaf styles and other serialization specific details. Serializations must comply with the syntax productions given in the following section.

A processor need not expose the event stream (tree model) and may translate directly between a serialization and its native binding. However, such a direct translation should take place so that the native binding is constructed only from information available in the graph model. In particular, information particular to the the tree model (alias anchors and pair ordering) and syntax-specific information (comments and styles) should not be used in the construction of a native binding. Exceptions to this guideline include editors which must operate on a direct image of the serialization format.

3.1 General Concepts

There are several core concepts shared by each information model, primarily relating to type information and how it is communicated between the serialization format and a native binding.

3.1.1 Type Family

The type family mechanism provides an abstraction of data types which is portable across various languages and platforms. Each native binding may have zero or more native concrete types or class constructs which correspond to a given type family.

name

A URI used as a globally unique identifier for the type family. YAML does not require that this URI point to anything in particular. However, where possible, it is considered good practice to have the URI point to some human-readable document providing information about the type data family.

definition

A description of the particular category of information, independent of language and platform.

formats

Each type family used for scalar nodes has associated string formats. These formats can be separated into two groups, implicit formats and explicit formats. In addition, one of the formats is designated to be the type family's canonical string format.

Type families used for collection nodes do not have any associated string formats.

implicit formats

A set of zero or more string formats used for implicit typing. Each format may only be used in a single type family for this purpose.

explicit formats

A set of zero or more string formats used for explicit typing. It is possible for two type families to share the same explicit format, though this practice is discouraged.

canonical format

In addition to the above, each scalar type family must provide a canonical string format. This must be one of the implicit or explicit formats, or a subset of one of these formats. The canonical format must provide exactly one unique string representation for each possible value of the scalar.

In general, there may be more than one native type which corresponds to a YAML type family. In the Python language, for example, the integer family may be bound to either the plain integer capable of holding 32 bits, or the long integer with unlimited size. In ambiguous situations like this, the loader should choose between the alternative based on the requirements of the native binding.

In other cases, a binding may not have an appropriate native construct for a given type family. This may be addressed with a generic YAML construct to act as a place-holder so that the data value and the type family may round-trip. Alternatively, with warning to the user, a value may be cast to a different, perhaps less specific family. Otherwise, when a native binding for a particular value is not possible, the parser must treat it as an error.

3.1.2 String Format

It may be possible to write a string value of a leaf in more than one way. For example, an integer value of 255 can also be written in hex as 0xFF. This distinction is covered by the concept of a string format.

name

Each string format has a name used for for explicit typing and for general identification. This name must comply with the format production, and must be unique within the type families it applies to.

definition

A description of the format as it applies to particular data values.

regexp

Regular expressions may be provided to allow implicit typing using the string format, or to enable the YAML processor to validate that a given value is indeed compliant with the string format.

As noted above, each scalar type family has exactly one canonical string format, although more than one string format may apply. For example, the scientific format is the canonical format for floating point numbers, but such numbers are typically written using the fixed format.

3.2 Graph Model

The graph model abstracts data structures of common programming languages. The model is a graph of collection and scalar values, where each node in the graph is provided with type information. The model provides an intermediate interface between the parser/emitter, which can be shared by multiple native languages, and the loader/dumper, which is specific to a particular binding. The model also provides a concrete representation for language-independent storage, simple structural queries, and graph transformations.

In the graph model, YAML is viewed as a directed graph of typed nodes. Nodes that can reference other nodes are collections and nodes with a string representation are scalars. The graph model also requires node identity and a mechanism to determine if two different nodes have the same content.

3.2.1 Graph Node

A graph node is the building block of YAML structures. In the serialization, they are represented by indented blocks. Within a native binding they represent application-specific objects. In the graph model, a node is tagged with a type family and can either be a collection or a scalar.

kind

A node may be one of two kinds, a collection or a scalar.

type family

Each node is associated with a type family. For native data, this association may be implicit, based on the native data type of the node.

3.2.2 Scalar

A scalar is a graph node with a string representation.

value

Each scalar has a value as specified by the type family definition.

string representations

Each scalar has one or more string representations. Each string representation is a series of zero or more printable Unicode characters compliant with one of the type family's string formats.

canonical representation

A single unique string representation of the scalar according to the type family's canonical string format.

A string representation of a scalar together with its type family and format should be sufficient to encode most native data types not having a composite structure.

YAML requires the Unicode string scalar type family. Other scalar type families include integer, float, date, time, timestamp and binary. Application specific type families may also be used.

3.2.3 Identity

In most programming languages, there are two manners in which variables can be equivalent. The first is by reference, where the two variables refer to the same memory address. We call this equivalence relation "identity".

The second form of equivalence occurs when two nodes are different (have a different memory addresses), but share the same content (same binary layout). We call this second form of equivalence "equality". It follows that when two nodes are identical they are also equal.

3.2.4 Node set

A node set is an unordered association of zero or more graph nodes. A node may participate in many node sets without restriction, allowing for a graph structure. Node sets may not contain duplicates, that is, a node with a particular identity may only appear once. The primary purpose of the node set is to provide a basis for the definition of a collection. A native binding usually exposes node sets through a mechanism to enumerate the keys of a hash or dictionary.

3.2.5 Collection

A collection is a graph node which represents sequences such as lists or arrays, or mappings such as hashes or dictionaries. In the graph model, sequences are treated uniformly as mappings with integer keys. There are three collection rules. First, a set of keys may not contain two nodes that are equal. Second, each key is associated with exactly one value. Finally, each value is associated with at least one key. Note that this does not prevent a value from being associated with more than one key.

domain

A domain is a node set restricted such that no two nodes in the set may be equal. Nodes which are members of the domain are often called "keys".

range

A range is node set without restrictions. Nodes which are members of the range are often called "values".

function

A function is a rule of correspondence from the domain onto the range such that there is a unique value in the range assigned to every key in the domain, and every value in the range is assigned to at least one key.

YAML requires the mapping collection type family, which covers associative containers such as the Perl hash or Python dictionary. When the domain is a series of sequential integers starting with zero, the preferred type family is the sequence which corresponds to a Perl array or a Python list.

3.2.6 Equality

Node equality determines when two given nodes have the same content. When two nodes are equivalent under this equivalence relation, they are said to be "equal". Equality is defined between scalar nodes and between collection nodes, as described below.

scalar equality

Two scalars are equal if and only if they have the same type family and their canonical string representations have exactly the same series of Unicode characters.

collection equality

Equality of a collection is defined recursively. Two collections are equal if and only if they have the same type family and for each key in the domain of one, there is a corresponding key in the domain of the other such that both keys are equal and their corresponding values are equal; here corresponding value refers to the unique node in the range of the collection assigned to the key by the collection's function.

3.2.7 Documents Stream

A YAML text (file or stream) is a series of disjoint graphs, each with a root node.

stream

A series of zero or more document root nodes.

document

A top level graph node that is disjoint from all other root document nodes.

The term disjoint means that for any two nodes x and y, there does not exist a third node z that is reachable from both x and y. For any node x, x is reachable from y if and only if either x and y are identical, or y is a collection and there exists a node z in the domain or the range of y such that x is reachable from z.

3.3 Tree Model

To allow for YAML to be communicated as a series of events, an ordered tree structure must be used instead of a graph. This section describes an extension to the graph model where the graph is flattened and ordered to provide a tree interface. The resulting tree-structured model imposes a linear ordering and uses several constructs which are not part of the graph model. Applications constructing a native binding from the tree model should not use these additional constructs and the imposed ordering for the preservation of important data.

3.3.1 Tree node

To lay out graph nodes as a tree structure, a mechanism is needed to manage duplicate occurrences. This is solved with three node kinds: branch, leaf, and alias. The first occurrence of a scalar is represented by a leaf, the first occurrence of a collection is represented by a branch, and subsequent occurrences of either a collection or a scalar are represented by an alias. All tree nodes in this model have the following properties:

kind

A tree node may be one of three kinds, a branch, a leaf or an alias.

parent

The parent property gives access to the branch which holds the current tree node.

anchor

The anchor is a Unicode string which complies with the anchor production. The anchor is used to associate the first occurrence of a graph node with subsequent occurrences, via the alias tree node. This property is optional for leaf or branch nodes, provided that the scalar or collection represented does not occur more than once.

Note that when a tree node is converted to a graph node, the anchor, if any, is not converted. Likewise the parent property and the alias kind are not preserved as the graph node may participate in several collections.

3.3.2 Leaf

Leaf tree nodes represent the first occurrence of a scalar in a given serialization.

type family

Like a scalar, each leaf is associated with a type family.

format

Unlike a scalar, each leaf is associated with a specific string format.

string value

Each leaf has a string value which is a string representation of the scalar according to the specific string format used.

When a leaf is converted into a graph node it becomes a scalar of the same type family. The scalar's value would be such that its string representation according to the specific format used would be identical to the leaf's string value. Note that the particular format used is not converted.

3.3.3 Alias

The alias tree node represents subsequent occurrences of a scalar or collection in the serialization.

referent

The branch or leaf which the alias references is the closest preceding tree node having the same anchor.

When an alias is converted into a graph node it becomes a subsequent occurrence of its referent's graph node.

3.3.4 Pair

A pair is an ordered set of two tree nodes. The first member of the set is the key and the second member of the set is the value.

3.3.5 Branch

Branch tree nodes represent the first occurrence of a collection in a given serialization.

type family

Like a collection, each branch is associated with a type family.

pairs

A branch has an ordered set of zero or more pairs.

When a branch is converted into a graph node, three operations occur. The domain is constructed with the graph node for each key in its set of pairs. Likewise, the range is constructed with the graph node for each value in its set of pairs. Last, the function is constructed via assocation of key graph nodes to value graph nodes, as provided by the set of pairs. Note that the ordering of the pairs is explicitly not converted.

3.3.6 Ordering

When serializing a YAML graph, every tree node is put into a single linear sequence within a given document through the branch pair ordering. With the composition of branches, this ordering becomes total, so that for any two distinct tree nodes in a serialization, one can be said to precede another.

For any two nodes or aliases, x and y we say that x precedes y when any of the following holds:

  • x is the parent of y.

  • x is a key and y is a value in a given pair.

  • x and y are nodes in two pairs within a branch, and the pair containing x comes before the pair containing y.

  • There exists a node z such that x precedes z and z precedes y.

3.4 Syntax Model

To enhance readability, a YAML serialization extends the tree model with syntax styles, comments and directives. Although the parser may provide this information, applications should take care not to use these features to encode information found in a native binding.

3.4.1 Style

The tree node is extended with a style property, which can have different values depending upon its kind.

leaf style

Leaf styles include eight nested styles and three in-line styles. All but the escaped and double quoted styles are limited to scalars having only printable characters.

branch style

Branch styles are series and keyed. The series style may only be used if the domain of the collection's function is the set of sequential positive integers starting at zero.

3.4.2 Comment

The syntax model allows optional comment blocks to be interleaved with the node blocks. Comment blocks may appear before or after any node block. A comment block can't appear in a nested leaf node block value.

comment

A comment is a series of zero or more Unicode characters complying with the comment productions.

3.4.3 Directive

Attached to each document is a document directive section.

directive section

A collection of directives to the parser where each member of the domain and range are scalar values matching the directive_name and directive_value productions.

4 Serialization Syntax

Following are the syntax productions for the YAML serialization.

4.1 Characters

Characters are the basis for a serialized version of a YAML document. Below is a general definition of a character followed by several characters which have specific meaning in particular contexts.

4.1.1 Character Set

Serialized YAML uses a subset of the Unicode character set. A YAML parser must accept all printable ASCII characters, the space, tab, line break, and all Unicode characters beyond 0x9F. A YAML emitter must only produce those characters accepted by the parser, but should also escape all non-printable Unicode characters if a character table is readily available.

[001] printable_char ::=
|
|
|
|
|
#x9
#xA | #xD | #x85
[#x20-#x7E]
[#xA0-#xD7FF]
[#xE000-#xFFFD]
[#x10000-#x10FFFF]
/* characters as defined by the Unicode standard, excluding most control characters and the surrogate blocks */

The range above explicitly excludes the surrogate block [#xD800-#xDFFF], DEL 0x7F, the C0 control block [#x0-#x1F], the C1 control block [#x80-#x9F], #xFFFE and #xFFFF. Note that in UTF-16, characters above #xFFFF are represented with a surrogate pair. DEL and characters in the C0 and C1 control block may be represented in a YAML serilization using escape sequences.

4.1.2 Encoding

A YAML processor is required to support the UTF-32, UTF-16 and UTF-8 character encodings. If an input stream does not begin with a byte order mark, the encoding shall be UTF-8. Otherwise the encoding shall be UTF-32 (LE or BE), UTF-16 (LE or BE) or UTF-8, as signaled by the byte order mark. Note that as YAML files may only contain printable characters, this does not raise any ambiguities. For more information about the byte order mark and the Unicode character encoding schemes see the Unicode FAQ.

[002] byte_order_mark ::= #xFEFF /* the Unicode ZERO WIDTH NON-BREAKING SPACE character used to mark a UTF-32 or UTF-16 stream and determine byte ordering */

4.1.3 Indicators

Indicators are special characters which are used to describe the structure of a YAML document.

[003] series_entry_indicator ::= '-' /* indicates a series entry */
[004] keyed_entry_separator ::= ':' /* separates a key from its value */
[005] series_inline_start ::= '[' /* starts an in-line series branch */
[006] series_inline_end ::= ']' /* ends an in-line series branch */
[007] keyed_inline_start ::= '{' /* starts an in-line keyed branch */
[008] keyed_inline_end ::= '}' /* ends an in-line keyed branch */
[009] branch_inline_separator ::= ',' /* separates in-line branch entries */
[010] nested_key_indicator ::= '?' /* indicates a nested key */
[011] alias_indicator ::= '*' /* indicates an alias node */
[012] anchor_indicator ::= '&' /* indicates an anchor property */
[013] transfer_indicator ::= '!' /* indicates a transfer method property */
[014] block_indicator ::= '|' /* indicates a block leaf */
[015] folded_indicator ::= ']' /* indicates a folded leaf */
[016] single_quote ::= ''' /* indicates a single quoted leaf */
[017] double_quote ::= '"' /* indicates a double quoted leaf */
[018] throwaway_indicator ::= '#' /* indicates a throwaway comment */
[019] reserved_indicators ::= '@' | '%' | '^' /* reserved */

Indicators can be grouped into three categories. The '-' and ':' space indicators are always followed by a white space character (space, tab or line break). If followed by any other character, these indicators are treated as content. The '[', ']', '{', '}' and ',' in line indicators are used to denote in-line branch structure and therefore must not be used as content text characters unless protected in some way. The remaining indicators are used to denote the start of various YAML elements and hence may used as internal content text character in most cases. The exact restrictions on the use of indicators as content text characters depend on the particular leaf style used.

[020] space_indicators ::=
|
series_entry_indicator
keyed_entry_separator
/* indicators which are always followed by white space */
[021] inline_indicators ::=
|
|
|
|
series_inline_start
series_inline_end
keyed_inline_start
keyed_inline_end
branch_inline_separator
/* indicators for in-line structure */
[022] non_space_indicators ::=
|
|
|
|
|
|
|
|
|
nested_key_indicator
alias_indicator
anchor_indicator
transfer_indicator
block_indicator
folded_indicator
single_quote
double_quote
throwaway_indicator
reserved_indicators
/* additional indicators, which don't require a following white space */

4.1.4 Line Breaks

The Unicode standard defines the following line break characters.

[023] line_feed ::= #xA /* ASCII line feed (LF) */
[024] carriage_return ::= #xD /* ASCII carriage return (CR) */
[025] next_line ::= #x85 /* Unicode next line (NEL) */
[026] line_separator ::= #x2028 /* Unicode line separator (LS) */
[027] paragraph_separator ::= #x2029 /* Unicode paragraph separator (PS) */
[028] line_break_char ::=
|
|
|
|
line_feed
carriage_return
next_line
line_separator
paragraph_separator
/* line break characters */

Line breaks can be grouped into two groups. Specific line breaks have well-defined sematics for breaking text into lines and paragraphs. The semantics of generic line break characters is not defined beyond ending a line.

Outside text content, YAML allows any line break to be used to terminate lines, and in most cases also allows such line breaks to be preceded by trailing line space characters. On output, a YAML emitter is free to emit non content line breaks using whatever convention is most appropriate. An emitter should avoid emitting trailing line spaces.

[029] generic_line_break ::=
  
|
|
|
( carriage_return
  line_feed )
greedy
carriage_return
line_feed
next_line
/* line break with non-specific semantics */
[030] specific_line_break ::=
|
line_separator
paragraph_separator
/* line break with specific semantics */
[031] any_line_break ::=
|
generic_line_break
specific_line_break
/* any non-content line break */
[032] trailing_line_break ::= line_space*
any_line_break
/* trailing non-content spaces and line break */

4.1.5 Miscellaneous Characters

This section includes several common character range definitions.

[033] line_char ::=
-
printable_char
line_break_char
/* characters valid in a line */
[034] line_space ::= #x20 | #x9 /* whitespace valid in a line */
[035] line_non_space ::=
-
line_char
line_space
/* non space characters valid in a line */
[036] line_non_ascii ::=
-
line_char
[#x00-#x7F]
/* non-ASCII line characters */
[037] ascii_letter ::=
|
[#x41-#x5A]
[#x61-#x7A]
/* ASCII letters, A-Z or a-z */
[038] non_zero_digit ::= [#x31-#x39] /* 1-9 */
[039] decimal_digit ::= [#x30-#x39] /* 0-9 */
[040] hexadecimal_digit ::=
|
|
decimal_digit
[#x41-#x46]
[#x61-#x66]
/* 0-9, A-F or a-f */
[041] word_char ::=
|
|
decimal_digit
ascii_letter
'-'
/* characters valid in a word */

4.2 Line Processing

Serialized YAML uses text lines to convey structure. This requires special processing rules for white space (space, tab and line break) characters. These rules are compatible with Unicode's newline guidelines.

4.2.1 Indentation

In a YAML serialization, structure is determined from indentation, where indentation is defined as a line break character followed by zero or more space characters.

Tab characters are not allowed in indentation unless a '#TAB' directive is used. If such a directive is used, each indentation tab is equivalent to a certain number of spaces determined by the specified tab policy.

A node must be more indented than its parent node. All sibling nodes must use the exact same indentation level. However the content of each such node may be indented independently.

The indentation level is used exclusively to delineate structure. Indentation characters are otherwise ignored. In particular, they are never taken to be a part of the value of serialized text.

[042] indent(n) ::= #x20 x n /* specific level of indentation */
[043] indent(<n) ::= indent(m) /* for some specific m such that m < n */
[044] indent(<=n) ::= indent(m) /* for some specific m such that m <= n */

Since the YAML serialization depends upon indentation level to delineate blocks, additional productions are a function of an integer, based on the indent(n), indent(<n) and indent(<=n) productions above.

4.2.2 Throwaway comments

Throwaway comments have no effect whatsoever on the tree or graph models represented in the file. Their usual purpose is to communicate between the human maintainers of the file. A typical example is comments in a configuration file.

A throwaway comment always spans a complete line. An explicit throwaway comment line consists of of some indentation, a '#' indicator, and arbitrary comment characters to the end of the line. Empty lines or lines containing only indentation spaces are taken to be an implicit throwaway comment.

A throwaway comment may appear before a document node or following any node. A throwaway comment may not appear inside a nested line leaf node, but may precede or follow such a node. When following a nested leaf value, the first comment line must be explicit and be less indented than the nested node value. Following comment lines are not restricted.

[045] implicit_comment(n) ::= indent(<n)
( throwaway_indicator
  line_char* )?
normalized_line_break
/* explicit or empty throwaway comment line */
[046] explicit_comment(n) ::= indent(<n)
throwaway_indicator
line_char*
normalized_line_break
/* throwaway comment line with indicator */
# These are three throwaway comment

# lines (the second line is empty).
this: |
    contains two lines of text, the
    # second of which starts with '#'.
# A comment may follow a leaf value.

4.3 YAML Stream

A series of bytes is a YAML stream if, taken as a whole, it complies with the following production. Note that an empty stream is a valid YAML stream containing no documents.

[047] yaml_stream ::= byte_order_mark?
implicit_comment(any)*
first_document?
next_document*
/* YAML document stream */
[048] first_document ::= nested_branch(any) /* first document with an implicit header line */
[049] next_document ::= document_header
non_alias_node(any)
/* separated document top level node */

4.3.1 Header

A YAML stream may contain several independent YAML documents. A document header line is used to separate documents. This line must start with a document separator - '--' followed by a series of non-space characters. The same separator line must be used in all the document headers throughout the stream.

If no explicit header line is specified at the start of the stream, the parser should behave as if a header line containing '--- #YAML:1.0 #TAB:NONE' was specified.

[050] document_header ::= document_separator
( line_space+ directive )*
/* YAML document header */
[051] document_separator ::= '-' '-' line_non_space+ /* YAML document separator */
--- ]
This YAML stream contains a single text value.
The next stream is a log file - a series of log
entries. Adding an entry to the log is a simple
matter of appending it at the end.
---
at: 2001-08-12 09:25:00.00
type: GET
HTTP: '1.0'
url: '/index.html'
---
at: 2001-08-12 09:25:10.00
type: GET
HTTP: '1.0'
url: '/toc.html'
# This stream is an example of a top level map.
invoice : 34843
date    : 2001-01-23
total   : 4443.52
# The following is a sequence of five documents.
# The first two contain an empty map, the second two
# an empty sequence, and the last an empty string.
--- {}
--- !map
--- []
--- !seq
---

4.3.2 Directive

Directives are instructions to the YAML parser. Like throwaway comments, directives are not reflected in the tree or graph models. Directives apply to a single document. It is an error for the same directive to be specified more than once for the same document.

[052] directive ::= throwaway_indicator
directive_name
keyed_entry_separator
directive_value
/* document directive */
[053] directive_name ::= word_char+ /* document directive name */
[054] directive_value ::= line_non_space+ /* document directive value */

YAML defines two directives, '#YAML' and '#TAB'. Additional directives may be added in future versions of YAML. A parser should ignore unknown directives with an appropriate warning. There is no provision for specifying private directives. This is intentional.

#YAML

The '#YAML' directive specifies the version of YAML the document adheres to. This specification defines version '1.0'.

A version 1.0 parser should accept documents with an explicit '#YAML:1.0' directive, as well as documents lacking a '#YAML' directive. Documents with a directive specifying a higher minor version (e.g. '#YAML:1.1') should be processed with an appropriate warning. Documents with a directive specifying a higher major version (e.g. '#YAML:2.0') should be rejected with an appropriate error message.

#TAB

Since different systems treat tabs differently, portability problems are a concern. Therefore, the default tab policy of YAML is conservative ('#TAB:NONE'); don't allow them in indentation. However, for some users, their editor may make it difficult to not use tabs. In this case, the '#TAB' directive is available so that the tab policy is explicitly provided to the YAML parser. Note that tab characters in text content are always valid and must be preserved by the parser, regardless of the tab policy used.

YAML supports the following tab policies:

#TAB:NONE

This default policy forbids the use of tabs in indentation. If such a tab character is detected, the parser must treat it as an error. The error message should refer to the need for providing an explicit tab policy for tabs to be used as indentation characters.

Many editors can be configured such that pressing the tab key is automatically converted to the insertion of a appropriate number of spaces into the edited file, and in general support convenient editing of indented blocks without making use of tab characters. Where possible, YAML editors should be configured to using this indentation policy, as it is the only truly portable one. The https://yaml.org/editors page contains instructions on configuring known editors to use this policy.

#TAB:N (for some positive integer N)

Tab characters in indentation are equivalent to the number of spaces which would bring the indentation level to the next multiple of N.

Almost every editor supports this type of policy, with '#TAB:8' being the most common, followed by '#TAB:4'. Most editors also allow users to configure the value of N. Typically an editor providing this flexibility can also be configured to use the '#TAB:NONE' policy as described above.

#TAB:N:HARD (for some positive integer N)

Tab characters in indentation are equivalent to exactly N spaces. This type of policy has much less support by editors. However, if an editor does use this type of policy, it is less likely to allow configuration to using a different type.

When either a '#TAB:N' or a '#TAB:N:HARD' policy is used, the parser must expand indentation tabs to spaces accordingly. Each tab, when expanded to spaces, must not span beyond the indentation into the serialized text. While this is an error, parsers should recover from it with a warning, by assigning some of the spaces to the indentation and some to the serialized text.

4.3.3 Serialization Node

A serialization node begins at a particular level of indentation, n, and its content is indented at some level >n. A serialization node can be either a branch (keyed or series), a leaf (nested or in-line) or an alias.

A YAML document is a normal node. However a document can't be an alias (there is nothing it may refer to). Also if the header line is omitted the first document must be a nested (not in-line) branch.

[055] value_node(n) ::=
|
alias_value_node
leaf_value_node(n)
branch_value_node(n)
/* node used as a value */
[056] alias_value_node ::= line_space+
alias
trailing_line_break
implicit_comment(any)*
/* alias node used as a value */
[057] branch_value_node(n) ::= ( line_space+
  branch_properties )?
branch(n)
/* branch node used as a value */
[058] leaf_value_node(n) ::= ( line_space+
  leaf_properties )?
leaf(n)
/* leaf node used as a value */
[059] key_node(n) ::=


|
( nested_key_indicator
  nested_node(>n)
  indent(n) )
( inline_node
  line_space* )
/* node used as a key */
[060] nested_node(n) ::=
|
nested_branch_node
nested_leaf_node
/* node nested in following lines */
[061] nested_branch_node(n) ::= ( line_space+
  branch_properties )?
trailing_line_break
implicit_comment(any)*
nested_branch(n)
/* branch node nested in following lines */
[062] nested_leaf_node(n) ::= ( line_space+
  leaf_properties )?
line_non_space+
nested_leaf(n)
/* leaf node nested in following lines */
[063] inline_node ::=
|
|
alias
inline_branch_node
inline_leaf_node
/* node embedded in-line */
[064] inline_branch_node ::= ( branch_properties
  line_space+ )?
inline_branch
/* branch node embedded in-line */
[065] inline_leaf_node ::= ( leaf_properties
  line_space+ )?
inline_leaf
/* leaf node embedded in-line */

4.3.4 Node Property

Each serialization node may have anchor and transfer method properties. These properties are specified in a properties list appearing before the node value itself. For a top level node (a document), the properties appear in the document header line, following the directives (if any). It is an error for the same property to be specified more than once for the same node.

[066] branch_properties ::=


|
( branch_transfer_property
  ( line_space+
    anchor_property )? )
( anchor_property
  ( line_space+
    branch_transfer_property )? )
/* branch properties list */
[067] leaf_properties ::=


|
( leaf_transfer_property
  ( line_space+
    anchor_property )? )
( anchor_property
  ( line_space+
    leaf_transfer_property )? )
/* leaf properties list */

4.3.5 Transfer Method

The transfer method property specifies how to deserialize the associated node. It includes the type family for the node and optionally the specific format used, separated by a '|' character.

A type family may be either public or private. A public type family name is a globally unique URI. A private type family names must begin with a '!' character. Such type families should not be expected to have consistent semantics in different documents.

By providing an explicit transfer property to a node, implicit typing is prevented. However, an explicit empty transfer method property can be used to force implicit typing to be applied to a non-simple leaf value.

Escaping

URIs support a limited ASCII-based character set. Hence, when parsing a URI type family name, the parser must convert any non-ASCII character to UTF-8 encoding, then use '%' style escaping to represent the resulting bytes.

In general, expanding '%' escaped characters may change the semantics of a URI. Hence the parser must accepts such sequences and pass them unmodified to the application.

The parser must also accept YAML style escape sequences. These must be converted to '%' style escape sequences as described above even if the specified character is a valid printable ASCII URI character. Thus the parser must convert the YAML escape sequence '\x30' to the URI escape sequence '%30' rather than to the digit '0'.

Prefixing

YAML provides convenient shorthand for the common case where a node and (most of) its decsendents have public types families whose URIs share a common prefix.

For this case, YAML allows using the '^' character to separate the ancestor node's type family URI into a prefix and a suffix. The parser does not consider the separator to be part of type family name.

When the parser encounters a descendant node whose type family name begins with '^', it appends the ancestor node's prefix to it. Again the '^' character is not taken to be part of the name.

It is possible for a descendant node to establish a different prefix. In this case the node may not make use of its ancestor's node prefix. It must specify a full type family name URI, separated into a prefix and suffix as above.

It is an error for a node's type family name to begin with '^' unless it has an ancestor node establishing a prefix. However, a node may establish a prefix even if none of its decendents make use of it.

Note that the type prefix mechanism is purely syntactical and does not imply any additional semantics. In particular, the prefix must not be assumed to be an identifier for anything.

Shorthands

To increase readability, YAML provides shorthand notations for certain type family URIs. Like the prefixing mechanism, shorthand notations are merely syntactical and do not imply any additional semantics. Note that it is valid to use a shorthand type family in order to establish a prefix.

  • If the type family contains no ':' and no '/' characters it is prefixed with https://yaml.org/. Thus the parser must report a node with the type family !seq as if it was written using the full !https://yaml.org/seq notation.

  • Otherwise, if the type family begins with a single word followed by a '/' character, it is assumed to belong to a sub-domain of yaml.org. Hence the parser must report a node with the type family !perl/Text::Tabs as if it was written using the full !http://perl.yaml.org/Text::Tabs notation.

    Each domain language.yaml.org will include all globally unique types of the language which aren't covered by the set of language-independent types. Globally unique types for each language include any built-in types and any standard library types. For languages such as Java and C#, all type names based on reverse DNS strings are globally unique. For languages such as Perl, which has a central authority (CPAN) for managing the global namespace, all the types sanctioned by the central authority are globally unique. The list of supported languages and their types is maintained as part of the YAML type repository.

  • Otherwise, if the type family contains a '/' before any ':' characters it may include, it is prefixed with http://. Therefore the parser must report a node with the type family !clarkevans.com/timesheet as if it was written using the full !http://clarkevans.com/timesheet notation.

  • Otherwise, the type family contains a ':' before any '/' characters it may include, is assumed to begin with an explicit URI scheme, and is preserved. Hence the parser must report a node with the type family !modem:+3585551234567;type=v32b?7e1;type=v110 as it is written.

[068] prefix_separator ::= '^' /* separates prefix from type */
[069] format_separator ::= '|' /* separates type from format */
[070] uri_char ::=
|
|
|
|
|
|
|
|
|
|
escaped_8_bit
escaped_16_bit
escaped_32_bit
'%' hexadecimal_digit x 2
line_non_ascii
word_char
';' | '/' | '?' | ':'
'@' | '&' | '=' | '+'
'$' | ',' | '_' | '.'
'!' | '~' | '*' | '''
'(' | ')' | '#'
/* characters valid in a URI as defined in RFC2396, plus YAML style escaping and non-ASCII characters */
[071] mundane_uri_char ::= uri_char - ':' - '/' /* non magical URI character */
[072] branch_transfer_property ::= transfer_indicator
( /* empty (implicit) */
| private_type
| public_type )
/* branch transfer method (no format) */
[073] leaf_transfer_property ::=
|
branch_transfer_property
( transfer_indicator
  public_type
  format_separator
  format )
/* leaf transfer method (with format) */
[074] private_type ::= transfer_indicator
line_non_space+
/* private type names */
[075] public_type ::=
   |
|
|
|
yaml_uri
language_uri
greedy
http_uri
scheme_uri
suffix_uri
/* public type names */
[076] format ::= line_non_space+ /* format of a leaf */
[077] yaml_uri ::=
|
mundane_uri_char+
( mundane_uri_char+
  prefix_separator
  mundane_uri_char* )
/* Shorthand for https://yaml.org/type names */
[078] language_uri ::=

|



|
( word_char+
  '/' uri_char* )
( word_char+
  prefix_separator
  word_char*
  '/' uri_char* )
( word_char+
  '/' uri_char*
  prefix_separator
  uri_char* )
/* Shorthand for http://language.yaml.org/type names */
[079] http_uri ::=

|



|
( mundane_uri_char+
  '/' uri_char* )
( mundane_uri_char+
  prefix_separator
  mundane_uri_char*
  '/' uri_char* )
( mundane_uri_char+
  '/' uri_char*
  prefix_separator
  uri_char* )
/* Shorthand for http://type names */
[080] scheme_uri ::=

|



|
( mundane_uri_char+
  ':' uri_char* )
( mundane_uri_char+
  prefix_separator
  mundane_uri_char*
  ':' uri_char* )
( mundane_uri_char+
  ':' uri_char*
  prefix_separator
  uri_char* )
/* full URI names */
[081] suffix_uri ::= prefix_separator
uri_char*
/* URI names based on ancestor prefix */
# All entries in the series
# have the same type and value.
- 10.0
- !float 10
- !yaml.org/^float '10'
- !https://yaml.org/float ]\
  1\
  0
# Private types are per-document.
---
pool: !!ball
   number: 8
   color: black
---
bearing: !!ball
        material: steel
# 'http://company.tld/invoice' is some type family.
invoice: !company.tld/^invoice
  # 'seq' is a shorthand for 'https://yaml.org/seq'.
  # This does not effect '^customer' below
  # because it is does not specify a prefix.
  customers: !seq
    # '^customer' is a shorthand for the full
    # notation 'http://company.tld/customer'.
    - !^customer
      given : Chris
      family : Dumars
# It is possible to use XML namespace URIs as
# YAML namespaces. Using the ancestor's URI
# allows specifying it only once. The $ separates
# between the XML namespace URI and the tag name.
doc: !http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd$^html
 - !^body
  - !^p This is an HTML paragraph.

4.3.6 Anchor

An anchor is a property which can be used to mark a serialization node for future reference. An alias node can then be used to indicate additional inclusions of an anchored node by specifying the node's anchor.

[082] anchor_property ::= anchor_indicator
anchor
/* associates an anchor with a given node */
[083] anchor ::= word_char+ /* unique anchor */

4.4 Alias

Once an anchor is used to mark a node, an alias should be used to indicate additional occurrences of the node in the graph. An alias refers to the most recent preceding node having the same anchor.

An alias node only exists in the syntax and tree models. When converted to the graph model, an alias node becomes a second occurrence of the anchored node.

It is an error to have an alias use an anchor which does not occur previously in the serialization of the document.

[084] alias ::= alias_indicator
anchor
/* alias of a preceding anchored node */
anchor : &A001 This leaf has an anchor.
override : &A001 ]
 The alias node below is a
 repeated use of this value.
alias : *A001

4.5 Branch

Branch nodes come in two styles, series and keyed. Each style has two variants, nested and in-line.

[085] branch(n) ::=


|
( trailing_line_break
  implicit_comment(any)*
  nested_branch(n) )
( line_space+
  inline_branch
  trailing_line_break
  implicit_comment(any)* )
/* branch node styles */
[086] nested_branch(n) ::=
|
nested_series(n)
nested_keyed(n)
/* nested branch node styles */
[087] inline_branch ::=
|
inline_series
inline_keyed
/* in-line branch node styles */

4.5.1 Series

A series node is the simplest node style. it contains a series of sub-nodes at a higher indentation level. An in-line style is available for short, simple series.

[088] nested_series(n) ::= ( indent(n)
  nested_series_entry(n) )+
/* nested series node */
[089] nested_series_entry(n) ::= series_entry_indicator
( value_node(>n)
| keyed_in_series(>n) )
/* nested series node entry */
[090] inline_series ::= series_inline_start
( inline_series_entry
  branch_inline_separator )*
inline_series_entry?
series_inline_end
/* in-line series node */
[091] inline_series_entry ::= line_space*
inline_node
line_space*
/* inline series node entry */
empty: []
inline: [ one, two, three ]
nested:
 - First item in top series
 -
  - Subordinate series entry
 - ]
  A multi-line
  series entry
 - Sixth item in top series

4.5.2 Keyed

A keyed node is an association of unique keys with values. It is an error for two equal key entries to appear in the same keyed node. In such a case the parser may continue processing, ignoring the second key and issuing an appropriate warning. This strategy preserves a consistent information model for streaming and random access applications.

An in-line form is available for short, simple keyed nodes. Also, if a keyed node has no properties, it may start in-line in a series entry.

[092] nested_keyed(n) ::= ( indent(n)
  nested_keyed_entry(n) )+
/* nested keyed node */
[093] keyed_in_series(n) ::= line_space+
nested_keyed_entry(n)
nested_keyed(n)?
/* keyed node with no properties in a series entry */
[094] nested_keyed_entry(n) ::= key_node(n)
keyed_entry_separator
value_node(>n)
/* single key:value pair */
[095] inline_keyed ::= keyed_inline_start
( inline_keyed_entry
  branch_inline_separator )*
inline_keyed_entry?
keyed_inline_end
/* in-line keyed node */
[096] inline_keyed_entry ::= line_space*
inline_node
line_space*
keyed_entry_separator
line_space+
inline_node
line_space*
/* in-line key:value pair */
empty: {}
inline: { one: 1, two: 2 }
nested:
 first : First entry
 second:
  key: Subordinate keyed
 third:
  - Subordinate series
  - !map
  - Previous keyed is empty.
  - A key: value pair in a series.
    A second: key:value pair.
  - The previous entry is equal to the following one.
  -
   A key: value pair in a series.
   A second: key:value pair.
 !float 12 : This key is a float.
 ? ]
  ?
 : This key had to be protected.
 ? ]\
  \a
 : This key had to be escaped.
 "\b": Another way to escape
 ? ]
  This is a
  multi-line
  plain key
 : ]
  Whose value is
  also multi-line.
 ?
  - This key
  - is a series
 :
  - With a series value.
 ?
  This: key
  is a: mapping
 :
  with a: mapping value.

4.6 Leaf

While most of the document productions are fairly strict, the leaf production is generous. It offers three in-line style variants and eight nested style variants to choose from depending upon the readability requirements.

[097] leaf(n) ::=


|
( line_space+
  nested_leaf(n)
  trailing_comment(n)? )
( line_space+
  inline_leaf
  trailing_line_break
  implicit_comment(any)* )
/* leaf node styles */

Throwaway comments may follow a leaf node, but may not appear inside one. The first comment line following a nested leaf node must be explicit and less indented than the nested leaf value. Further comment lines are unrestricted. Comment lines following an in-line leaf node are unrestricted.

[098] trailing_comment(n) ::= explicit_comment(<n)
implicit_comment(any)*
/* comments trailing nested leaf value */

Empty lines in nested leaf blocks appearing before the trailing explicit comment line, if any, are interpreted as content rather than as implicit comments. Such lines may be less indented than the text content.

[099] less_indented_empty_line(n) ::= indent(<n) /* empty line with optional indentation */

4.6.1 Nested Properties

The style variant of a nested leaf node is defined using the following three independent properties: folding, escaping, and chomping. In addition, a nested leaf may have explicit indentation.

4.6.1.1 Folding
Block (normalized, non-folded)

Block leaf values are indicated by a '|' character. In such values, line break characters are taken to be a part of the serialized value.

Each generic line break is converted to a single LF character. Specific line breaks are preserved. This functionality is indicated by the use of the normalized_line_break production defined below.

[100] line_feed_line_break ::= generic_line_break /* line break converted to a line feed */
[101] normalized_line_break ::=
|
line_feed_line_break
specific_line_break
/* normalized line break */

On output, a YAML emitter is free to serialize LF characters using whatever convention is most appropriate. Escaping must be used to serialize significant CR and NEL content characters. LS and PS must be preserved.

Folded

Folded leaf values are indicated by a ']' character. In folded values, in addition to being normalized, line break characters are subject to line folding. Line folding provides increased readability by allowing long text lines to be broken.

In a folded leaf, a single normalized line feed is converted to a single space (#x20). When two or more consecutive (possibly indented) normalized line feeds are encountered, the parser does not convert them into spaces. Instead, the parser ignores the first of the line feeds and preserves the rest. Thus a single line feed can be serilized as two, two line feeds can be serialized as three, etc. When this functionality is implied, the folded_internal_line_breaks(n) production below will be used.

When a folded value starts with one or more line feed characters, the parser preserves them, neither converting a single line feed to a space not stripping away the first line feed in a series. When this functionality is implied, the folded_leading_line_feeds(n) production below will be used.

When a folded value ends with one or more line feed characters, the parser always ignores the first one and preserves the rest. When this functionality is implied, the folded_trailing_line_breaks(n) production below will be used.

The combined effect of the three processing rules above is that each empty line within a folded leaf represents a single line feed character, be it at the start, middle or end of the value.

Note that folding only applies to generic line break characters. Specific line break characters are preserved, and may be safely used to indicate line/paragraph text structure even when line folding is done.

[102] space_line_feed ::= line_feed_line_break /* single line feed converted to a space */
[103] ignored_line_feed ::= line_feed_line_break /* ignored line feed */
[104] clipped_line_feeds(n) ::= ignored_line_feed
( indent(<=n)
  line_feed_line_break )+
greedy
/* clipped series of line feeds */
[105] folded_leading_line_feeds(n) ::= ( indent(<=n)
  line_feed_line_break )*
greedy
/* preserved line feeds at start of a folded value */
[106] folded_internal_line_breaks(n) ::=
|
|
clipped_line_feeds(n)
space_line_feed(n)
specific_line_break(n)
/* line breaks in a folded value */
[107] folded_trailing_line_breaks(n) ::=
|
|
clipped_line_feeds(n)
ignored_line_feed
specific_line_break
/* line breaks ending a folded value */
4.6.1.2 Escaping

Escaped leaf values are indicated by a '\' character. In escaped values, arbitrary Unicode characters may be specified using escape codes. Plain (non-escaped) values are restricted to printable Unicode characters.

[108] escape ::= '\' /* indicates an escape code */
[109] escaped_escape ::= escape escape /* escape literal */
[110] escaped_double_quote ::= escape double_quote /* escaped double quote character */
[111] escaped_bel ::= escape 'a' /* ASCII alert (BEL) */
[112] escaped_backspace ::= escape 'b' /* ASCII backspace (BS) */
[113] escaped_esc ::= escape 'e' /* ASCII escape (ESC) */
[114] escaped_form_feed ::= escape 'f' /* ASCII formfeed (FF) */
[115] escaped_line_feed ::= escape 'n' /* ASCII linefeed (LF) */
[116] escaped_return ::= escape 'r' /* ASCII carriage return (CR) */
[117] escaped_tab ::= escape 't' /* ASCII horizontal tab (TAB) */
[118] escaped_vertical ::= escape 'v' /* ASCII vertical tab (VTAB) */
[119] escaped_null ::= escape 'z' /* ASCII zero (NUL) */
[120] escaped_non_breaking_space ::= escape #x20 /* Unicode non breaking space (NBSP) */
[121] escaped_next_line ::= escape 'N' /* Unicode next line (NEL) */
[122] escaped_line_separator ::= escape 'L' /* Unicode line separator (LS) */
[123] escaped_paragraph_separator ::= escape 'P' /* Unicode paragraph separator (PS) */
[124] escaped_8_bit ::= escape 'x'
hexadecimal_digit x 2
/* 8-bit character */
[125] escaped_16_bit ::= escape 'u'
hexadecimal_digit x 4
/* 16-bit character */
[126] escaped_32_bit ::= escape 'U'
hexadecimal_digit x 8
/* 32-bit character */
[127] escape_sequence ::=
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
escaped_escape
escaped_double_quote
escaped_bel
escaped_backspace
escaped_esc
escaped_form_feed
escaped_line_feed
escaped_return
escaped_tab
escaped_vertical
escaped_null
escaped_next_line
escaped_non_breaking_space
escaped_line_separator
escaped_paragraph_separator
escaped_8_bit
escaped_16_bit
escaped_32_bit
/* escape codes in escaped leaves */

An escaped line break is ignored, allowing long lines in an escaped value to be broken at arbitrary positions, regardless of folding.

[128] indented_escaped_chars ::= indent(n) escaped_char* /* escaped line data */
[129] escaped_char ::=
|
escape_sequence
( line_char - escape )
/* single escaped character */
[130] escaped_line_break ::= escape
any_line_break
/* ignored line break */

In contrast, plain leaf values may freely make use of the '\' as a text character.

[131] indented_plain_chars ::= indent(n) line_char* /* plain line character data */
4.6.1.3 Chomping

Chomped leaf values are indicated by a '-' character. In chomped values, any trailing generic line break characters are ignored. Trailing specific_line_break characters are preserved.

[132] chomp_indicator ::= '-' /* indicates stripping the final line break character */
[133] ignored_line_breaks(n) ::= generic_line_break
( indent(<=n)
  generic_line_break )*
greedy
/* non-content line breaks ending chomped leaf value */
[134] chomped_line_breaks(n) ::=
|
ignored_line_breaks(n)
specific_line_break
/* final line breaks for a chomped value */
4.6.1.4 Explicit Indentation

Typically the indentation level of a nested leaf value is detected from its first content line. This detection fails when this first line is empty, contains a leading '#' character, or contains leading white space characters.

In such cases YAML requires that the indentation level for the leaf value text content be given explicitly. This level is specified as an integer number of the additional number of indentation spaces for the text content.

It is always valid to specify an explicit indentation level, though emitters should not do so in cases where detection succeeds. It is an error for detection to fail when there is no explicit indentation specified.

[135] explicit_indent ::= non_zero_digit
decimal_digit*
/* explicit additional indentation level */

4.6.2 Nested

A nested leaf node may be in one of eight style variants defined by the three nested leaf properties.

[136] nested_leaf(n) ::=
|
|
|
|
|
|
|
plain_block(n)
chomped_block(n)
escaped_block(n)
chomped_escaped_block(n)
plain_folded(n)
chomped_folded(n)
escaped_folded(n)
chomped_escaped_folded(n)
/* nested leaf styles */
4.6.2.1 Plain Block

A plain block is the simplest style variant of a nested leaf. The only processing done is end-of-line normalization and stripping away the indentation.

[137] plain_block(n) ::= block_indicator
explicit_indent?
trailing_line_break
plain_block_value(n)
/* plain block leaf */
[138] plain_block_value(n) ::= plain_block_line(n)* /* value of plain block leaf */
[139] plain_block_line(n) ::= ( indented_plain_chars(n)
| less_indented_empty_line(n) )
normalized_line_break
/* single plain block line */
empty: |
detected: |
 The \ character may be freely
 used. Leading white space
    is significant.

 All line breaks are significant,
 including the final one. Thus
 this value contains one empty
 line and ends with a line break,
 but does not start with one.
# Comments may follow a nested
 # leaf value. Once the first
  # comment line is seen, they
   # can be at any indentation
    # level, or even empty lines.

# Explicit indentation must
# be given in all the three
# following cases.
leading spaces: |2
      This value starts with
  four spaces. It ends with
  two line breaks.

leading line break: |2

  This value starts with
  a line break and ends
  with one.
leading comment indicator: |2
  # first line starts with a
  #. This value does not start
  with a line break but ends
  with one.
# Explicit indentation may
# also be given when it is
# not required.
redundant: |2
  This value is indented 2 spaces.
4.6.2.2 Chomped Block

The chomped block style variant is identical to the plain block, except that the trailing line breaks are subject to chomping.

[140] chomped_block(n) ::= block_indicator
chomp_indicator
explicit_indent?
trailing_line_break
chomped_block_value(n)
/* chomped block leaf */
[141] chomped_block_value(n) ::= ( plain_block_line(n)*
  chomped_block_last(n) )?
/* value of chomped block leaf */
[142] chomped_block_last(n) ::= ( indented_plain_chars(n)
| less_indented_empty_line(n) )
chomped_line_breaks(n)
/* final line of chomped block leaf */
empty: |-
detected: |-
 All line breaks are significant,
 except for the trailing ones.
 Thus this value does not end
 with a line break.

# Comments may follow.
4.6.2.3 Escaped Block

The escaped block style variant is identical to the plain block, except that escape sequences are expanded in it, and escaping a line break causes it to be ignored.

[143] escaped_block(n) ::= block_indicator
explicit_indent?
trailing_line_break
escaped_block_value(n)
/* escaped block leaf */
[144] escaped_block_value(n) ::= escaped_block_line(n)* /* value of escaped block leaf */
[145] escaped_block_line(n) ::=


|
( indented_escaped_chars(n)
  ( normalized_line_break
  | escaped_line_break ) )
( less_indented_empty_line(n)
  normalized_line_break )
/* line of escaped block leaf */
empty: |\
detected: |\
 Escape sequences may be used,
 for example this\nforces a
 line break and th\
 is isn't one. Quotes " '
 and other indicators may be
 freely used, but the \\
 character must be escaped.
explicit: |\2

  This value starts with
  a line break but does
  not end with one.\
# Comments may follow.
4.6.2.4 Chomped Escaped Block

The chomped escaped block style variant combines stripping away the trailing line breaks as in a chomped block and expanding escape sequences as in an escaped block.

[146] chomped_escaped_block(n) ::= block_indicator
escape
chomp_indicator
explicit_indent?
trailing_line_break
chomped_escaped_block_value(n)
/* chomped escaped leaf */
[147] chomped_escaped_block_value(n) ::= ( escaped_block_line(n)*
  chomped_escaped_block_last(n) )?
/* value of chomped escaped leaf */
[148] chomped_escaped_block_last(n) ::=


|
( indented_escaped_chars(n)
  ( chomped_line_breaks(n)
  | escaped_line_break ) )
( less_indented_empty_line(n)
  chomped_line_breaks(n) )
/* final line of chomped escaped leaf */
empty: |\-
detected: |\-
 Trailing line breaks
 are stripped, so this
 value does not end
 with a line break.

escaped: |\-
 Escaped line breaks
 are not chomped so
 this value ends with
 a single line break.\n

ignored: |\-
 It is possible to
 explicitly ignore
 a trailing line break,
 for example in case
 it is an LS or PS
 character.\
# Comments may follow.
4.6.2.5 Plain Folded

The plain folded style variant is identical to the plain block, except that line breaks are subject to line folding.

[149] plain_folded(n) ::= folded_indicator
explicit_indent?
trailing_line_break
plain_folded_value(n)
/* plain folded leaf */
[150] plain_folded_value(n) ::= folded_leading_line_feeds(n)
( plain_folded_line(n)*
  plain_folded_last(n) )?
/* value of plain folded leaf */
[151] plain_folded_line(n) ::= ( indented_plain_chars(n)
| less_indented_empty_line(n) )
folded_internal_line_breaks(n)
/* line of plain folded leaf */
[152] plain_folded_last(n) ::= ( indented_plain_chars(n)
| less_indented_empty_line(n) )
folded_trailing_line_breaks(n)
/* final line of plain folded leaf */
empty: ]
detected: ]
 Line feeds are converted
 to spaces, and the final
 line break series is
 clipped, so this value
 contains no line breaks.
explicit: ]2

  An empty line, either
  at the start, end of
  in the value:

  Is interpreted as a
  line break. Thus this
  value contains three
  line breaks.

# Comments may follow.
4.6.2.6 Chomped Folded

The chomped folded style variant combines stripping away the trailing line breaks as in a chomped block and line folding as in the plain folded style variant.

[153] chomped_folded(n) ::= folded_indicator
chomp_indicator
explicit_indent?
trailing_line_break
chomped_folded_value(n)
/* chomped folded leaf */
[154] chomped_folded_value(n) ::= ( folded_leading_line_feeds(n)
  plain_folded_line(n)* )?
chomped_folded_last(n)
/* value of chomped folded leaf */
[155] chomped_folded_last(n) ::= ( indented_plain_chars(n)
| less_indented_empty_line(n) )
chomped_line_breaks(n)
/* final line of chomped folded leaf */
empty: ]-
detected: ]-
 The final sequence of
 line breaks is chomped,
 so this value contains
 no line breaks.

# Comments may follow.
4.6.2.7 Escaped Folded

The escaped folded style variant combines expanding escape sequences as in an escaped block and line folding as in the plain folded style variant.

[156] escaped_folded(n) ::= folded_indicator
explicit_indent?
trailing_line_break
escaped_folded_value(n)
/* escaped folded leaf */
[157] escaped_folded_value(n) ::= folded_leading_line_feeds(n)
( escaped_folded_line(n)*
  escaped_folded_last(n) )?
/* value of escaped folded leaf */
[158] escaped_folded_line(n) ::=


|
( indented_escaped_chars(n)
  ( folded_internal_line_breaks(n)
  | escaped_line_break ) )
( less_indented_empty_line(n)
  folded_internal_line_breaks(n) )
/* line of escaped folded leaf */
[159] escaped_folded_last(n) ::=


|
( indented_escaped_chars(n)
  ( folded_trailing_line_breaks(n)
  | escaped_line_break ) )
( less_indented_empty_line(n)
  folded_trailing_line_breaks(n) )
/* final line of escaped folded leaf */
empty: ]\
detected: ]\
 Escaped line feeds are not
 converted to a space, so
 this \n is a line feed.
explicit: ]\2

  This value starts with
  a line feed, but doesn't
  end witj one even though
  it has a trailing empty
  line.\

# Comments may follow.
4.6.2.8 Chomped Escaped Folded

The chomped escaped folded style variant combines stripping away the trailing line breaks as in a chomped block, expanding escape sequences as in an escaped block, and line folding as in the plain folded style variant.

[160] chomped_escaped_folded(n) ::= folded_indicator
escape
chomp_indicator
explicit_indent?
trailing_line_break
chomped_escaped_folded_value(n)
/* chomped escaped folded leaf */
[161] chomped_escaped_folded_value(n) ::= ( folded_leading_line_feeds(n)
  escaped_folded_line(n)* )?
chomped_escaped_folded_last(n)
/* value of chomped escaped folded leaf */
[162] chomped_escaped_folded_last(n) ::=


|
( indented_escaped_chars(n)
  ( chomped_line_breaks(n)
  | escaped_line_break ) )
( less_indented_empty_line(n)
  chomped_line_breaks(n) )
/* final line of chomped escaped folded leaf */
empty: ]\-
detected: ]\-
 Trailing line feeds are
 chomped, but escaped ones
 are preserved, so this
 value ends with a single
 line feed.\n

# Comments may follow.

4.6.3 In-line

An in-line leaf may be in one of three style variants. The double quoted style variant supports escaping and can be used to represent arbitrary Unicode strings. The single quoted style variant is limited to printable Unicode characters. The simple style variant is further restricted to not contain most indicator characters or leading or trailing space characters.

[163] inline_leaf ::=
|
|
single_quoted
double_quoted
simple
/* in-line leaf styles */
4.6.3.1 Single Quoted

The single quoted style variant is indicated by surrounding ''' characters. Therefore, within a single quoted leaf such characters need to be escaped. No other form of escaping is done, limiting single quoted leaves to printable characters. Also, single quoted leaves may not contain any line break characters.

[164] single_quoted ::= single_quote
single_quoted_char*
single_quote
/* single quoted leaf value */
[165] single_quoted_char ::=
|
escaped_single_quote
( line_char -
  single_quote )
/* characters valid in a single quoted leaf */
[166] escaped_single_quote ::= single_quote
single_quote
/* indicates a single quote */
empty: ''
second: '! : \ etc. can be used freely.'
third: 'a single quote '' must be escaped.'
4.6.3.2 Double Quoted

The double quoted style variant adds escaping to the single quoted style variant. This is indicated by surrounding '"' characters. Escaping allows arbitrary Unicode characters to be specified at the cost of some verbosity: escaping the printable '\' and '"' characters. It is an error for a double quoted value to contain invalid escape sequences.

[167] double_quoted ::= double_quote
double_quoted_char*
double_quote
/* double quoted leaf value */
[168] double_quoted_char ::=
-
escaped_char
double_quote
/* characters valid in a double quoted leaf */
empty: ""
second: "! : etc. can be used freely."
third: "a \" or a \\ must be escaped."
fourth: "this value ends with an LF.\n"
4.6.3.3 Simple

The simple style variant is a restricted form of the single quoted style variant. As it has no identifying markers, it may not start or end with white space characters, may not start with most indicators, and may not contain certain indicators. Also, a simple leaf is subject to implicit typing. This can be avoided by providing an explicit transfer method property.

[169] simple ::= simple_1st
( simple_char*
  simple_last )?
/* simple leaf value */
[170] simple_char ::=


|
( line_char
- space_indicators
- inline_indicators )
( space_indicators
  line_non_space )
/* non-space characters valid in a simple leaf */
[171] simple_last ::=
-
simple_char
line_space
/* characters valid at end of a simple leaf */
[172] simple_1st ::=
-
simple_last
non_space_indicators
/* characters valid at start of a simple leaf */
empty:
second: The value of the previous key is the empty string.
third: 12
fourth: The above entry is an integer.

5 Transfer Methods

A transfer method is the combination of the type family and string format used to serialize a value in a YAML document stream. This section provides a list of common type families and their their associated string formats defined under the yaml.org domain.

Every serialization node has, by definition, a transfer method (type family and string format). YAML provides three mechanisms for identifying the transfer method of a node.

Default Typing

By default the parser assigns the string type family to all leaf nodes (except for simple leaves), the map type family to all keyed nodes, and the seq type family to all series nodes.

Implicit Typing

All simple leaves are subject to implicit typing, unless they are annotated with an explicit transfer method property. For each type family, there is a set of implicit string formats, and each such format has a regular expression. The parser compares the leaf value with the list of these regular expressions. If the value matches one of these expressions, it is parsed as if it were explicitly annotated with the appropriate type family and format. It is an error for a value to match more than one such regular expression.

The active set of implicit transfer methods depends upon the application. Regular expressions for implicit string formats must start with '^`' if they are not defined in this specification or accepted into the YAML type repository. Values matching such private implicit transfer methods therefore always begin with the '`' character. This prevents private implicit transfer methods from interfering with public ones.

Explicit Typing

A node may be given an explicit transfer method property, specifying the node's type family and optionally its string format. If no format is given, the parser matches the value with the regular expressions of each of the implicit and explict string formats provided by the type family to determine the specific format used. It is an error for a value to match more than one such regular expression.

Using an explicit transfer method is required when default and implicit typing fail to identify the intended type family and string format for a node. Common cases are handling application-defined types and specifying empty sequences and maps.

Following is a list of common type families and their associated string formats defined under the yaml.org domain. YAML requires support for the sequence, map and string type families. While the other type families are not mandatory, they usually map to native data types in most programming languages, so using them promotes interoperability with other YAML systems.

Additional common type families are defined in the YAML type repository available at https://yaml.org/repository. An application may also use private type families or public type families defined on the basis of some URI or DNS domain name. The exact set of transfer methods used in a document is a part of the document's schema, and is tied to the expected document graph structure, the set of valid map keys, etc.

5.1 Sequence

This type family is used for series nodes unless they are given an explicit transfer method property. Example bindings include the Perl array, Python's list or tuple, and Java's array or vector.

name: https://yaml.org/seq
styles:

Series, keyed by integers, empty leaf.

definition:

Collections indexed by sequential integers starting with zero.

Applying this type family to an empty leaf provides a natural syntax for representing an empty sequence.

# The following is an empty
# top level sequence.
--- !seq
---
# An empty sequence.
empty: !seq
---

In some applications, large sequences may contain only a small number of non-null entries. While it is possible to serialize such sparse sequences using null values, this is awkward. YAML allows to serialize such sequences using the mapping style with an explicit sequence type family. The only supported keys are integers, serving as zero-based sequence entry indices.

# The following map style node is
# loaded to a sequence, with
# unspecified entries containing
# a null value.
sparse sequence: !seq
    2: Third entry
    4: ~
# The following sequence node is
# loaded into an identical
# in-memory sequence, which has
# a seperate identity.
equal sequence:
 - ~
 - ~
 - Third entry
 - ~
 - ~

5.2 Map

This type family is used for keyed nodes unless they are given an explicit transfer method property. Example bindings include the Perl hash, Python's dictionary, and Java's hash table.

name: https://yaml.org/map
styles:

Keyed, series, empty leaf.

definition:

Associative container, where each key is unique in the association and mapped to exactly one value.

Applying this type family to an empty leaf provides a natural syntax for representing an empty map.

# The following is an empty top level map.
--- !map
---
# An empty map.
empty map: !map

If the set of keys of a map happens to be all the integers in the range 0 to some N, YAML allows serializing the map using the series style with an explicit map type family.

# The following series style node is
# loaded to a map.
integer map: !map
 - Value for integer key '0'
 - Value for integer key '1'
 - Value for integer key '2'
# The following keyed style node is
# loaded to an equal in-memory map,
# which has a separate identity.
equal map:
 2: Value for integer key '2'
 0: Value for integer key '0'
 1: Value for integer key '1'

5.3 String

This type family is used for all leaf styles with the exception of simple leaves, unless they are given an explicit transfer method property. It is also used as the implicit type for all simple leaves starting with an alphabetic character. Note that all non-ASCII characters are assumed to be alphabetic for this purpose. This allows the detection pattern to be independent of the Unicode character properties table.

This type is usually bound to the native language's string or character array construct.

Name: https://yaml.org/str
Styles:

Leaf.

definition:

Unicode strings, a series of zero or more Unicode characters.

formats:

explicit
canonical

any ~= .* /* any sequence of characters */

implicit

alpha_first ~= [_a-zA-Z\x80-\
\Uffffffff].*
/* any sequence of characters starting with an alphabetic character */

Specifying an explicit string type family is required to bypass implicit typing for a simple leaf. The same effect can be achieved by converting it to another leaf style.

# The following leaves are
# loaded to the string
# value '1' '2'.
- !str 12
- '12'
- "12"
- ]
 12
- ]\
 1\
 2
- |-
 12

5.4 Null

The null type family accepts simple leaves with the value '~' and converts them into any native null-like value (e.g., undef in Perl, None in Python). A null value is used to indicate the lack of a value. Note that in most programming languages a map entry with a key and a null value is valid and different from not having that key in the map.

Name: https://yaml.org/null
Styles:

Simple leaf.

definition:

Devoid of value.

formats:

implicit
canonical

tilde ~= ~ /* single tilde character */
first: ~
second:
 - ~
 - Second entry.
 - ~
 - This sequence has 4 entries, two with values.
three:
 This map has three keys,
 only two with values.

5.5 Pointer

The pointer type family accepts a keyed node with a single key, '=', and is loaded into any native pointer-like data type, pointing to the value given for that key (e.g., a hard reference in Perl). Note that this is not necessarily the native data type used to implement alias nodes. For example, in Java aliases are directly supported, but pointers must be emulated using a special class.

Name: https://yaml.org/ptr
Styles:

Keyed.

definition:

A hard reference, explicit memory address.

Perl: |
 $map{YAML} = \"content";
# The following map is loaded
# into a pointer to a text string.
YAML: !ptr
 = : content

5.6 Integer

The integer represents arbitrarly sized mathematical integers less than infinity. Integers can be formatted using the familar decimal notation, or may have a leading '0x' to signal hexadecimal, or a leading '0' to signal an octal base. Leaves of this type should be represented by a native integer data type, if possible. However, there are cases where an integer provided may overflow the native type's storage capability. In this case, the loader should find some manner to round-trip the integer, perhaps as a string value. In general, integers representable using 32 binary digits should safely round-trip through most systems.

Name: https://yaml.org/int
Styles:

Simple leaf.

definition:

Mathematical integers.

formats:

canonical

int ~= 0|[-]?[1-9][0-9]* /* canonical integer format */

implicit

dec ~= [-+]?(0|[1-9][0-9]*) /* base 10 signed decimal integer format */

implicit

oct ~= [-+]?0[0-7]+ /* base 8 integer format */

implicit

hex ~= [-+]?0x[0-9a-fA-F]+ /* base 16 integer format */
canonical: 12
decimal: +12
octal: 014
hexadecimal: 0xC

5.7 Float

The floating point type family handles approximations to real numbers. This should be loaded to some native float data type. The loader may choose from a range of such native data types according to the size and accuracy of the floating point value. The valid range and accuracy depend on the loader, though 32 bit IEEE floats should be safe.

Name: https://yaml.org/float
Styles:

Simple leaf.

definition:

Floating point approximation to real numbers.

formats:

canonical

sci ~= [-]?[0-9]\.([0-9]*[1-9])\
?e[-+](0|[1-9][0-9]+)
/* canonical (scientific notation) floating point format */

implicit

exp ~= [-+]?[0-9]+\.[0-9]*\
[eE][-+][0-9]+
/* exponential notation floating point format */

implicit

fix ~= [-+]?[0-9]+\.[0-9]* /* fixed point notation floating point format */
canonical: 1.23e-1
exponential: 12.30e-02
fixed: 0.1230

5.8 Date

YAML supports a single date format which is a strict subset of the ISO 8601 standard and the formats proposed by the W3C note on datetime.

Name: https://yaml.org/date
Styles:

Simple leaf.

definition:

Gregorian date.

formats:

implicit
canonical

ymd ~= [0-9][0-9][0-9][0-9]\
-[0-9][0-9]-[0-9][0-9]
/* Date in YYYY-MM-DD format */
date: 2001-12-14

5.9 Time

YAML supports a single time of day format which is one of many formats defined in the ISO 8601 standard. The format chosen was motivated by the W3C note on this issue.

Name: https://yaml.org/time
Styles:

Simple leaf.

definition:

Time of day.

formats:

implicit
canonical

time ~= [0-9][0-9]:[0-9][0-9]\
:[0-9][0-9](\.[0-9]*[1-9])?
/* Canonical time of day HH:MM:SS.SS (24 hours) format */

implicit

hms ~= [0-9][0-9]:[0-9][0-9]\
:[0-9][0-9](\.[0-9]*)?
/* Generic time of day HH:MM:SS.SS (24 hours) format */
canonical: 21:59:43.1
hms: 21:59:43.10

5.10 Timestamp

A timestamp denotes a particular point in time, which is a combination of a date and a time of day. Hence the format for a timestamps builds upon the format for a date and a time. The caninical combination chosen is a subset of a valid ISO 8601 format. A similar, more readable format, is also supported.

Name: https://yaml.org/timestamp
Styles:

Simple leaf.

definition:

A point in time.

formats:

canonical

timestamp ~= [0-9][0-9][0-9][0-9]-\
[0-9][0-9]-[0-9][0-9]T\
[0-9][0-9]:[0-9][0-9]:\
[0-9][0-9](\.[0-9]*[1-9])?Z
/* Canonical specific ISO 8601 format based on UTC */

implicit

ymdhmsz ~= [0-9][0-9][0-9][0-9]-\
[0-9][0-9]-[0-9][0-9]T\
[0-9][0-9]:[0-9][0-9]:\
[0-9][0-9](\.[0-9]*[1-9])?\
(Z|[-+][0-9][0-9](:[0-9][0-9])?)
/* A valid ISO 8601 timestamp format variant */

implicit

ymd_hms_z ~= [0-9][0-9][0-9][0-9]-\
[0-9][0-9]-[0-9][0-9] \
[0-9][0-9]:[0-9][0-9]:\
[0-9][0-9](\.[0-9]*)? \
(Z|[-+][0-9][0-9](:[0-9][0-9])?)
/* Space separated (non ISO 8601) format for enhanced readability */
canonical: 2001-12-15T02:59:43.1Z
valid iso8601: 2001-12-14T21:59:43.10-05:00
space separated: 2001-12-14 21:59:43.10 -05:00

5.11 Binary

The binary type family accepts the base64 format and deserializes it into some native binary data type (e.g., byte[] in Java). This is the recommended way to store such data in YAML files. Note however that many forms of binary data have internal structure which may benefit from being represented as YAML nodes (e.g. the Java serialization format).

Name: https://yaml.org/binary
Styles:

Leaf.

definition:

Binary data, a series of zero or more octets (8 bit values).

formats:

canonical

binary ~= Clean base64 /* Base64 encoded data without any white space characters */

canonical

base64 ~= Generic base64 /* Base64 encoded data as per RFC2045 */
canonical: !binary ]\
 R0lGODlhDAAMAIQAAP//9/X17unp5WZmZgAAAOf\
 n515eXvPz7Y6OjuDg4J+fn5OTk6enp56enmlpaW\
 NjY6Ojo4SEhP/++f/++f/++f/++f/++f/++f/++\
 f/++f/++f/++f/++f/++f/++f/++SH+Dk1hZGUg\
 d2l0aCBHSU1QACwAAAAADAAMAAAFLCAgjoEwnuN\
 AFOhpEMTRiggcz4BNJHrv/zCFcLiwMWYNG84Bww\
 EeECcgggoBADs=
base64: !binary ]
 R0lGODlhDAAMAIQAAP//9/X17unp5WZmZgAAAOf
 n515eXvPz7Y6OjuDg4J+fn5OTk6enp56enmlpaW
 NjY6Ojo4SEhP/++f/++f/++f/++f/++f/++f/++
 f/++f/++f/++f/++f/++f/++f/++SH+Dk1hZGUg
 d2l0aCBHSU1QACwAAAAADAAMAAAFLCAgjoEwnuN
 AFOhpEMTRiggcz4BNJHrv/zCFcLiwMWYNG84Bww
 EeECcgggoBADs=
description: ]
 The binary value above is a tiny arrow
 encoded as a gif image.

5.12 Special Keys

The special key type family is used for special YAML defined items, which are used as map keys to denote structural information.

Name: https://yaml.org/special
Styles:

Simple leaf (or none).

definition:

Special mapping keys.

formats:

implicit
canonical

special ~= =|// /* serializable special keys */

virtual
canonical

virtual ~= !|\||&|\* /* non serializable special keys */

All special keys stand for special in-memory values which are different from any value in any other type family. Specifically, these special in-memory values must not be implemented as string values.

The '=' key is used to denote the "default value" of a map. In some cases, it is useful to migrate a schema so that a scalar value is replaced with a collection. A processor may present a "scalar value" method which provides the value directly if the node is a scalar, or, returns the value of this special key if the node is a collection. If applications only ask for the scalar value, then the schema may freely grow over time replacing scalar values with richer data constructs without breaking older processing systems.

The '//' key is used to attach a note or persistent comment to a map. A simple filter can remove these notes before reaching the application, while allowing such comments to survive round-trips and to be manipulated as normal data when necessary.

The rest of the keys should not be used in serialized YAML documents. Their names are merely a convention for representing the appropriate special in-memory values. Hence these keys are called "virtual keys".

Virtual keys are used when a YAML parser encounters a valid YAML value of an unknown transfer method. For a schema-specific application, this is not different from encountering any other valid YAML document which does not satisfy the schema. Such an application may safely use a parser which rejects any value of any unknown transfer method, or discards the transfer method property with an appropriate warning and parses the value as if the property was not present.

For a schema-independent application (for example, a hypothetical YAML pretty print application), this is not an option. Parsers used by such applications should encode the value instead. This may be done by wrapping the value in a map containing virtual special keys. The '!' key denotes the unsupported type family, and the '|' key denotes the format used. In some cases it may be necessary to encode anchors and alias nodes as well. The '&' and '*' keys are used for this purpose.

This encoding should be reversed on output, allowing the application to safely round-trip any valid YAML document. In-memory, the encoded data may be accessed and manipulated in a standard way using the three basic data types (map, sequence and string), allowing limited processing to be applied to arbitrary YAML data.

annotated text:
 - This text contains
 -
  = : colored
  color : red
 - characters.
"!": These three keys
"&": had to be quoted
"=": and are normal strings.
# NOTE: the following encoded node
# should NOT be serialized this way.
encoded node :
 !special '!' : '!type'
 !special '&' : 12
 = : value
# The proper way to serialize the
# above structure is as follows:
node : !!type &12 value
commented: !!point
 //: This is the center point.
 x : 12
 y : 3

6 Changes From Other Versions

Changes from the 10 Mar 2002 Draft
Changes from the 20 Feb 2002 Draft
Changes from the 16 Feb 2002 Draft
Changes from the 10 Feb 2002 Draft
Changes from the 10 Dec 2001 Draft
Changes from the 11 Nov 2001 Draft
Changes from the 04 Nov 2001 Draft
Changes from the 12 Aug 2001 Draft
Changes from the 01 Aug 2001 Draft
Changes from the 31 Jul 2001 Draft
Changes from the 22 Jul 2001 Draft
Changes from the 23 Jun 2001 Draft
Changes from the 16 Jun 2001 Draft
Changes from the 09 Jun 2001 Draft
Changes from the 26 May 2001 Draft

Changes From The 20 Feb 2002 Draft

Anchors for aliases

Are now forbidden.

Timestamp format

Fixed a problem with the regexp. Allow both a valid ISO8601 format (using 'T') and a space separated format (for readability).

Productions

Break node productions to smaller bits.

Changes From The 20 Feb 2002 Draft

YAC#22

The #TAB directive is now required to allow tabs in indentation.

Changes From The 16 Feb 2002 Draft

Corrected examples

Floating point numbers in all examples now statr with '0'.

Wording improvements

In various places.

Empty line indentation

Is now be indent(<=n) instead of the previous indent(n)?

Changes From The 10 Feb 2002 Draft

YAC 20

The yaml: scheme has been replaced by an http: one. Mapping of XML namespaces is now done using '$' instead of ';'. The prefix indicator is now '^' and the format indicator is now '|'.

Changes From The 10 Dec 2001 Draft

Transfer Method

Is now split to type family and format; type family is now a URI.

Date/Time Types

Were added.

Tree Model

Now includes the format.

Wording changes

Throughout the whole spec.

Format in Transfer Method

Is now separated by a '`'.

YAC List

Removed the Probable changes section; it is now replaced by the YAC list.

YAC#3, YAC#5

Added a uniform URI based namespace scheme.

YAC#6

Unrecognized explicitly typed implicit (simple) leafs are allowed.

YAC#7

C1 Control codes are explicitly forbidden.

YAC#10

In-series in-line syntax was modified according to the flexible indentation scheme (YAC needs to be updated).

YAC#11

Rename implicit leaves to simple leaves.

YAC#12

Throwaways are now allowed everywhere. Blank lines are comments. There are ambiguity problems in chomped leaf values.

YAC#13

Indentation is now generic (flexible).

YAC#14

Nested leaf format is now built out of three orthogonal properties.

YAC#15

Top level nodes can be in-line.

YAC#16

A '--- #YAML:1.0' is assumed if there's no header.

YAC#18

YAML Ain't Markup Language

Changes From The 11 Nov 2001 Draft

Quoted Strings

Are now implemented as a text implicit transfer format. This changed slightly the definition of an escaped leaf so that the two would be equivalent.

Relative DNS type familys

Are now supported using the simplest form only. the definition of an escaped leaf so that the two would be equivalent.

Syntax/Grammar/Formatting

Were fixed according to Brian's inputs.

Block

Is now chomped using '||' rather than '|-' for consistency.

Document Header

Can now be anything starting with '--'; therefore is required before the first document in a multi-document stream.

Transfer Methods

Can now accept any printable characters, not just words. '!' now means 'force implicit typing'.

New Scalar Styles

We now have the following: | || \ \\ ' " implicit.

Treat sequence/map as Collection Styles

In productions and in information model.

Add Structured Keys

Using a key indicator (done - Oren).

Add Examples

Added a detailed examples section to the introduction to better acquaint the user so that the spec can proceed with some basic knowledge.

In-line maps/sequences

Are now supported. Empty maps/sequences are a natural special case.

Minor Changes

Made list of prior versions shorter.

Moved list of changes down... it was cluttering the top of the spec.

Information model and Preview

Completely new rewrite.

Changes From The 04 Nov 2001 Draft

Polish

Minor wording fixes, added internal links, etc.

List

Was renamed to "sequence".

Separator

Was changed to "---" instead of "----".

Indentation

Was changed to one space instead of one tab.

Base64

Is no longer an implicit type. The surrounding '[=...=]' are kept, however, in case we change our mind later (e.g., if we introduce pipelining). The type was renamed to "binary" to stress its class rather than the encoding used.

Float

Was renamed to "real" to decouple it from specific in-memory representation. Mathematicians may object :-)

Length

Was removed from the sequence map.

Type vs. Class

Added some wording to clarify the difference. Most likely this will need to be changed once we settle the pipelining issue.

Productions

Were completely overhauled, again, to accomodate the new semantics.

Next Line Scalars

Now have two separate indicators, one for quoted and one for unquoted values.

Duplicate keys

Are now an error. The parser may ignore the second occurrence with a warning.

Changes From The 12 Aug 2001 Draft

Indentation

Has been changed to use tabs instead of spaces.

Throwaway comments

Were added. The persistent comment key was changed to '//'.

Indicators

Were changed. '-' now signifies a list entry and '\' signifies a next-line leaf value. '@' and '%' are no longer necessary (they may be if we ever support map/list keys). As a result no lookahead is ever required.

Multiple documents

Are now possible in a single file (again), using '----' as a separator.

Wording

Has been changed in numerous locations, hopefully to make it clearer. There was also some shuffling of the text sections to remove redundancy.

Productions

Were thoroughly overhauled and therefore undoubtedly contain new bugs. Also, all the shorthand production names were replaced by long ones to improve readability.

Indicator keys

Are no longer allowed. Structure keys are used instead, where some have only an in-memory representation.

Map/List keys

Are no longer allowed. This may have to be revisited when Perl 6 comes out.

Deep References

Are still supported but as an explicit type rather than as a hack.

Types List

Has been shrunk to only the common types, with a reference to yaml.org for a fuller list of types. The three core types were added as required types.

Type vs. Kind

This distinction was inserted explicitly into the text, with several examples to drive the point home.

Changes From The 01 Aug 2001 Draft

Character Set

Is now defined as simply printable Unicode characters without explicit ranges. This makes the spec resistant to the evolution of the Unicode spec.

Reserved Indicators

The set of such indicators has been minimized. There is now a conflict between reserving them for future use and allowing people to use them as markers for implicit leaf types.

Simple Scalar

Has been renamed to unquoted leaf.

Generic Model

Has been generalized to allow for types nodes.

Implicit Typing

Has been added with an assortment of suggested types.

General Keys

Keys can now be any nodes to allow for Java serialization.

Multi-level references

Are now supported for Perl serialization.

Changes From The 31 Jul 2001 Draft

Simple Scalar and End Of Lines

Moved eol productions to the end, rather than the start, of most productions. The wording and productions for the simple leaf were fixed to match each other and the intended semantics. The simple leaf example set was enhanced to clarify the proper interpretation.

Empty Document

Both empty top level maps and no top level maps are now allowed, and hence so are empty documents.

Changes From The 22 Jul 2001 Draft

Thanks to Joe Lapp for reviewing the 22 Jul 2001 draft and recommending these changes.

Phrasing fixes

Fixed phrasing in the abstract, and sections 1.3, 2.1, 2.3.1, 2.4.3, 2.4.4, 2.4.5, 2.4.6 and 2.5.3.

Production fixes

Fixed productions: added production 47, 59, fixed productions 57, 58, 60 and 64 (productions numbers in the 22 Jul 2001 draft are off by one in some cases). Most are bug fixes. Actual changes include allowing for empty lines surrounding a top level map, allowing an optional trailing separator line, and forbidding annotations which have no sensible semantics (anchor to null, anchor to a reference, shorthand for a reference).

Changes From The 23 Jun 2001 Draft

Merge Spec

Due to the decision to leave all API related issues outside the core spec, the spec has been re-merged into a single file, covering just what used to be the introduction and serialization sections of the previous specs.

Character Encodings

The spec now refers only to the Unicode standard. Due to the efforts by the Unicode and ISO/IEC 10646 groups, both standards are in almost complete agreement. The additional features provided by the ISO/IEC standard are rarely used in practice, while Unicode is simpler and is more widely supported by existing languages and systems.

Strict Indentation

Indentation is now a strict 4 spaces per level. This allows for the new whitespace policy and the new block notation.

Shorthand Notation

The spec introduces a shorthand notation for attaching special keys to any node kind (converting it to a map if necessary). This will need more work.

Null Nodes

Null nodes have finally been added, after somehow eluding all previous versions.

Bullet Lists

Change the * optional prefix for leaf list entries to a mandatory : and therefore remove the special name "bulleted list entries".

Simplify Keys

Multi-line simple keys are now out. The door is open for re-introducing them, however.

Change Whitespace Policy

White space folding has been replaced by line break folding. White space is now always significant, except for indentation and for separation of structure tokens.

Block Scalar Syntax

The syntax for block leaves has been replaced by a more elegant one.

Changes From The 16 Jun 2001 Draft

Split Spec

The spec is now separated into several files. This allows different versions of the spec to share the same version of unchanged section, and make it easier to refer to a particular version of important pieces of the spec such as serialization and interfaces. All the HTML files use the same shared CSS file. Cross references between the separate parts of the spec are now relative, though references to older versions are absolute and refer to the main site.

Cyclical Graph

Change the wording on the information model to allow for graphs with cycles. The alternative is to define the anchor semantics in such a way that would preclude cycles.

Null Character Escape

The escape sequence \z was added to allow convenient escaping of the ASCII zero (null) character.

Remove Binary Scalars

The information model now contains just one type of leaf. The special syntax for binary leaves has been removed. This functionality will be re-added in the form of a color.

Remove Class Shorthand

The syntax no longer supports the !class syntax. This functionality will be re-added in the form of a color.

Bullet Lists

Change the optional prefix for leaf list entries to * and rename such entries to "bulleted list entries".

Make Keys More Scalars-Compatible

Allow for multi-line simple keys and unify the description of leaf keys and values where it makes sense.

HTML Tidying

All the HTML pages have gone through Tidy. Also, all the HTML files have been run through an HTML validation service and a CSS validation service. Broken links and spelling were checked using another online HTML validator. This needs to be repeated for all future drafts.

Changes From The 09 Jun 2001 Draft

Relationship with MIME

Beyond using base64 for binary leaves, no additional special relationship with MIME is expected. Hence references to the MIME and mail RFCs were moved from section 1.1 ("required reading") to section 1.2 ("background material").

Strict Indentation

Indentation is now completely strict for all leaf styles. Also, the productions were changes to use a consistent semantics to the indentation level parameter.

List Scalar Prefixes

A list leaf entry may be prefixed by an optional : indicator to improve readability of multi-line simple leaf values.

Anchor Semantics

Leading zeros are now ignored for comparing anchor strings.

No Empty Line At Start

The document production was fixed so as not to require an empty line at the start of a document.

Character Escapes

The set of character escapes is now maximal (including the rare \e escape for the useful ASCII ESC character). Also, it is now possible to "escape" a line break in a quoted string (the previous drafts were inconsistent at this point).

32 Bit Characters

The current draft allows such characters, and includes a specialized escaping format ('\Uxxxxxxxx') to support them.

Changes From The 26 May 2001 Draft

Changes Section

The changes section was added for easier comparison of different versions. The final draft will not contain this section.

Class Indicator

The indicator was changed from # to ! to allow for # to be used for comments.

No Empty Line At End

The document production was fixed so as not to require an empty line at the end of a document.

Strict Indentation

Indentation in quoted strings and binary blocks is now strict to ensure readability.

Productions

Problems in the productions were fixed, especially where related to white space issues and formatting of the result.

BOM Comment

The link to the Unicode FAQ was moved to section 2.2.2.

Binary Scalars

The information model now distinguishes between text and binary leaves.