Copyright © 2001 Brian Ingerson, Clark Evans & Oren Ben-Kiki, all rights reserved. This document may be freely copied provided that it is not modified.
This specification is a working draft and reflects consensus reached by the members of the yaml-core mailing list. Any questions regarding this draft should be raised on this list. This is a draft and changes are expected. Therefore, implementers should follow this mailing list closely.
YAML(tm) (rhymes with "camel") is a straightforward machine parsable data serialization format designed for human readability and interaction with scripting languages such as Perl and Python. YAML is optimized for data serialization, configuration settings, log files, Internet messaging and filtering. This specification describes the YAML information model and serialization format. Together with the Unicode standard for characters, it provides all the information necessary to understand YAML Version 1.0 and construct computer programs to process it.
1 Introduction
1.1 Goals
1.2 Origin
1.3 Relation to
XML
1.4 Terminology
2 Preview
2.1 Collections
2.2 Structures
2.3 Styles
2.4 Type
Family
2.5 Full
Length Examples
3.1 General Concepts
3.1.1 Type Family
3.1.2 String Format
3.2 Graph Model
3.2.1 Node
3.2.2 Scalar
3.2.3 Identity
3.2.4 Node set
3.2.5 Collection
3.2.6 Equality
3.2.7 Documents Stream
3.3 Tree Model
3.3.1 Tree Node
3.3.2 Leaf
3.3.3 Alias
3.3.4 Pair
3.3.5 Branch
3.3.6 Ordering
3.4 Syntax Model
3.4.1 Style
3.4.2 Comment
3.4.3 Directive
4.1 Characters
4.1.1 Character Set
4.1.2 Encoding
4.1.3 Indicators
4.1.4 Line Breaks
4.1.5 Miscellaneous Characters
4.2 Line Processing
4.2.1 Indentation
4.2.2 Throwaway comments
4.3 YAML Stream
4.3.1 Header
4.3.2 Directive
4.3.3 Serialization Node
4.3.4 Node Property
4.3.5 Transfer Method
4.3.6 Anchor
4.4 Alias
4.5 Branch
4.5.1 Series
4.5.2 Keyed
4.6 Leaf
4.6.1 Nested Properties
4.6.1.1 Folding
4.6.1.2 Escaping
4.6.1.3 Chomping
4.6.1.4 Explicit
Indentation
4.6.2 Nested
4.6.2.1 Plain Block
4.6.2.2 Chomped
Block
4.6.2.3 Escaped
Block
4.6.2.4 Chomped
Escaped Block
4.6.2.5 Plain Folded
4.6.2.6 Chomped
Folded
4.6.2.7 Escaped
Folded
4.6.2.8 Chomped
Escaped Folded
4.6.3 In-line
4.6.3.1 Single Quoted
4.6.3.2 Double Quoted
4.6.3.3 Simple
5 Transfer Methods
5.1 Sequence
5.2 Map
5.3 String
5.4 Null
5.5 Pointer
5.6 Integer
5.7 Float
5.8 Date
5.9 Time
5.10 Timestamp
5.11 Binary
5.12 Special Keys
YAML Ain't Markup Language, abbreviated YAML, is a human-readable data serialization format and processing model. This text describes the class of data objects called YAML document streams and partially describes the behavior of computer programs that process them.
YAML document streams encode into a serialized form the native data constructs of modern scripting languages. Strings, arrays, hashes, and other user-defined data types are supported. A YAML document stream consists of a sequence of characters, some of which are considered part of the document's content, and others which are used to indicate structure within the information stream.
A YAML processor is a software module that is used to manipulate YAML information. A processor may perform multiple functions, such as parsing a YAML serialization into a series of events, loading these events into a native language representation, dumping a native representation into a series of events, and emitting these events into a serialized form. It is assumed that a YAML processor does its work on behalf of another module, called an application. This specification describes the required behavior of a YAML processor. It describes how a YAML processor must read or write YAML document streams and the information structures it must provide to or obtain from the application.
The design goals for YAML are:
YAML documents are very readable by humans.
YAML interacts well with scripting languages.
YAML uses host languages' native data structures.
YAML has a consistent information model.
YAML enables stream-based processing.
YAML is expressive and extensible.
YAML is easy to implement.
YAML was designed with experience gained from the construction and deployment of Brian Ingerson's Perl module Data::Denter. YAML has also enjoyed much markup language critique from SML-DEV list participants and builds upon the experiences with the Minimal XML and Common XML specifications.
YAML integrates and builds upon structures and concepts described by C, Java, Perl, Python, RFC0822 (MAIL), RFC1866 (HTML), RFC2045 (MIME), RFC2396 (URI), SAX, SOAP and XML.
YAML's core type system is based on the serialization requirements of Perl. YAML directly supports both scalar values (string, integer) and collections (array, hash). Support for common types enables programmers to use their language's native data constructs for YAML manipulation, instead of requiring a special document object model (DOM).
Like XML's SOAP, the YAML serialization supports native graph structures through a rich alias mechanism. Also like SOAP, YAML provides for application-defined types. This allows YAML to serialize rich data structures required for modern distributed computing.
YAML provides unique global type names using a namespace mechanism inspired by Java's DNS based package naming convention and XML's URI based namespaces.
YAML's block scoping is similar to Python's. In YAML, the extent of a node is indicated by its column. YAML's block leaf leverages this by enabling formatted text to be cleanly mixed within an aggregate structure without troublesome escaping. Further, YAML's block indenting provides for easy inspection of the document's structure.
Motivated by HTML's end-of-line normalization, YAML's folded leaf introduces a unique method of handling whitespace. In YAML, single line breaks may be folded into a single space. This technique allows for paragraphs to be word-wrapped without affecting the canonical form of the content.
YAML's escaped leaf
uses familar C-style escape
sequences. This enables ASCII representation of
non-printable or 8-bit (ISO 8859-1) characters such as '\x3B'. 16-bit
Unicode and 32-bit (ISO/IEC 10646) characters are supported
with escape sequences such as '\u003B' and '\U0000003B'.
The syntax of YAML was motivated by Internet Mail (RFC0822). Further, YAML borrows the document separator from MIME (RFC2045). YAML's top level production is a stream of independent documents; ideal for distributed processing systems.
YAML was designed to have an incremental interface which includes both a pull-style input stream and a push-style (SAX-like) output stream interfaces. Together this enables YAML to support the processing of large documents, such as a transaction log, or continuous streams, such as a feed from a production machine.
There are many differences between YAML and the eXtensible Markup Language (XML). XML was designed to be backwards compatible with Standard Generalized Markup Language (SGML) and thus had many design constraints placed on it that YAML does not share. Also XML, inheriting SGML's legacy, is designed to support structured documents, where YAML is more closely targeted at messaging and native data structures. Where XML is a pioneer in many domains, YAML is the result of many lessons from the XML community.
The YAML and XML information models are starkly different. In XML, the primary construct is an attributed tree, where each element has an ordered, named list of children and an unordered mapping of names to strings. In YAML, the primary graph constructs are keyed collections (natively stored as a hash or array) and scalar values (string, integer, floating point). This difference is critical since YAML's model is directly supported by native data structures in most modern programming languages, where XML's model requires mapping conventions, or an alternative programming component (e.g. a document object model).
The terminology used to describe YAML is defined in the body of this specification. The terms defined in the following list are used in building those definitions and in describing the actions of a YAML processor:
Conformant YAML streams and processors are permitted to but need not behave as described.
Conformant YAML texts and processors are encouraged to behave as described, but may do otherwise if a warning mesage is provided to the user and any deviant behavior requires concious effort to enable. (i.e. a non-default setting)
Conformant YAML texts and processors are required to behave as described, otherwise they are in error.
A violation of the rules of this specification; results are undefined. Conforming software must detect and report an error and may recover from it.
This section provides a quick glimpse into the expressive power of YAML without going into too much detail. It is not expected that the first-time reader grok all of the examples. Instead these selections are used as motivation for the following sections.
YAML collections allow for aggregation of data. There are two primary types of collections which YAML supports, sequences and mappings. Most tree structures can be constructed by nesting collections.
YAML streams can be commented and separated into multiple documents. To allow for graph serialization, YAML has a built-in alias mechanism.
--- name: Mark McGwire hr: 65 avg: 0.278 rbi: 147 --- name: Sammy Sosa hr: 63 avg: 0.288 rbi: 141
|
# Ranking of players by # season home runs. --- - Mark McGwire - Sammy Sosa - Ken Griffey
|
||||
# Home runs hr: # 1998 record - Mark McGwire - Sammy Sosa # Runs batted in rbi: - Sammy Sosa - Ken Griffey
|
# Home runs hr: # 1998 record - Mark McGwire - &SS Sammy Sosa # Runs batted in rbi: - *SS - Ken Griffey
|
Besides the simple in-line scalars used above, YAML has support for several nested and quoted scalar styles. For small sequences and mappings, an in-line style helps make YAML easy to author.
--- ]
Mark McGwire's
year was crippled
by a knee injury.
|
--- |
\/|\/|
/ | |_
|
||||
--- ]\ Sosa completed another fine season. \u263A
|
name: Mark McGwire occupation: baseball player comments: ] Mark set a major league home run record in 1998.
|
||||
years: "1998\t1999\t2000\n" msg: "Sosa did fine. \u263A"
|
- ' \/|\/| ' - ' / | |_ '
|
||||
- [ name , hr, avg ] - [ Mark McGwire, 65, 0.278 ] - [ Sammy Sosa , 63, 0.288 ]
|
Mark McGwire: {hr: 65, avg: 0.278}
Sammy Sosa: {hr: 63, avg: 0.288}
|
To encode data type and other application semantics in a YAML serialization, every node has a type family and leaf nodes have a syntax format.
invoice: 34843 date : 2001-01-23 buyer: given : Chris family : Dumars product: - Basketball: 4 - Superhoop: 1
|
invoice: !int|dec 34843 date : !date|ymd 2001-01-23 buyer: !map given : !str Chris family : !str Dumars product: !seq - !str Basketball: !int 4 - !str Superhoop: !int 1
|
||||
--- !binary|base64 ] R0lGODlhDAAMAIQAAP/ 9/X17unp5WZmZgAAAOf n515eXvPz7Y6OjuDg4J +fn5OTk6enp56enmlpa NjY6Ojo4SEhP/++f/++ f/++f/++f/++f/++f/+ EeECcgggoBADs=
|
--- !seq 0: Mark McGwire 1: Sammy Sosa 2: Ken Griffey --- empty: !map invoice: !str 34843
|
||||
--- !clarkevans.org/schedule/^entry who: Clark C. Evans when: 2001-11-18 hours: !^hours 3 description: ] Wrote up these examples and learned alot about baseball statistics.
|
--- !clarkevans.com/graph/^shape
- !^circle
center: &ORIGIN {x: 73, y: 129}
radius: 7
- !^line [23,32,300,200]
- !^text
center: *ORIGIN
color: 0x02FDBA
value: Center of circle
|
Following are two full-length examples. On the left is a sample invoice, on the right is a sample log file.
--- !clarkevans.com/^invoice
invoice: 34843
date : 2001-01-23
bill-to: &id001
given : Chris
family : Dumars
address:
lines: |
458 Walkman Dr.
Suite #292
city : Royal Oak
state : MI
postal : 48046
ship-to: *id001
product:
- sku : BL394D
quantity : 4
description : Basketball
price : 450.00
- sku : BL4438H
quantity : 1
description : Super Hoop
price : 2392.00
tax : 251.42
total: 4443.52
comments: ]
Late afternoon is best.
Backup contact is Nancy
Billsmer @ 338-4338.
|
---
Date: 2001-11-23
Time: 13:02+5:00
User: ed
Warning: ]
This is an error message
for the log file
---
Date: 2001-11-23
Time: 15:02+5:00
User: ed
Warning: ]
A slightly different error
message.
---
Date: 2001-11-23
Time: 15:03+5:00
User: ed
Fatal: ]
Unknown variable "bar"
Stack:
- file: TopClass.py
line: 23
code: |
x = MoreObject("345\n")
- file: MoreClass.py
line: 58
code: |
foo = bar
|
Conceptually, a YAML system may be understood as three interacting states: a serialization format, an event stream, and a native binding. Translating YAML information between these states are four processing components: a parser, a loader, a dumper and an emitter. The parser extracts structured information from the input stream. The loader converts this information into the appropriate native structures.
|
[serialization format] |
-->
|
[event stream] |
-->
|
[native binding] |
|
|
(parser) |
|
(loader) |
|
|
|
|
|
|
|
|
[serialization format] |
<--
|
[event stream] |
<--
|
[native binding] |
|
|
(emitter) |
|
(dumper) |
|
For each one of the states above, there is a corresponding information model. The graph model covers the native binding, the tree model covers the event stream, and the syntax model covers the serialization format. Type information is moved between these states using the the type family and string format constructs.
The graph model abstracts data structures of common programming languages. Nodes in the graph include collections or scalars. A collection is modeled as a function from one set of nodes to another. Scalars are nodes having a string representation. Both kinds of nodes have a type family.
The tree model flattens the graph structure into a hierarchy of branches, leaves and alias nodes. A branch represents the first occurrence of a collection, a leaf represents the first occurrence of a given scalar, and an alias is a surrogate used for subsequent occurrences of either collections or scalars. In this model, collections are realized as an ordered set of node pairs, called a branch.
The syntax model enhances the tree model with comments, leaf styles and other serialization specific details. Serializations must comply with the syntax productions given in the following section.
A processor need not expose the event stream (tree model) and may translate directly between a serialization and its native binding. However, such a direct translation should take place so that the native binding is constructed only from information available in the graph model. In particular, information particular to the the tree model (alias anchors and pair ordering) and syntax-specific information (comments and styles) should not be used in the construction of a native binding. Exceptions to this guideline include editors which must operate on a direct image of the serialization format.
There are several core concepts shared by each information model, primarily relating to type information and how it is communicated between the serialization format and a native binding.
The type family mechanism provides an abstraction of data types which is portable across various languages and platforms. Each native binding may have zero or more native concrete types or class constructs which correspond to a given type family.
nameA URI used as a globally unique identifier for the type family. YAML does not require that this URI point to anything in particular. However, where possible, it is considered good practice to have the URI point to some human-readable document providing information about the type data family.
definitionA description of the particular category of information, independent of language and platform.
formatsEach type family used for scalar nodes has associated string formats. These formats can be separated into two groups, implicit formats and explicit formats. In addition, one of the formats is designated to be the type family's canonical string format.
Type families used for collection nodes do not have any associated string formats.
implicit
formatsA set of zero or more string formats used for implicit typing. Each format may only be used in a single type family for this purpose.
explicit
formatsA set of zero or more string formats used for explicit typing. It is possible for two type families to share the same explicit format, though this practice is discouraged.
canonical
formatIn addition to the above, each scalar type family must provide a canonical string format. This must be one of the implicit or explicit formats, or a subset of one of these formats. The canonical format must provide exactly one unique string representation for each possible value of the scalar.
In general, there may be more than one native type
which corresponds to a YAML type family. In the Python
language, for example, the integer family may be bound to
either the plain integer capable of holding
32 bits, or the long integer with unlimited
size. In ambiguous situations like this, the loader
should choose between the alternative based on the
requirements of the native binding.
In other cases, a binding may not have an appropriate native construct for a given type family. This may be addressed with a generic YAML construct to act as a place-holder so that the data value and the type family may round-trip. Alternatively, with warning to the user, a value may be cast to a different, perhaps less specific family. Otherwise, when a native binding for a particular value is not possible, the parser must treat it as an error.
It may be possible to write a string value of a leaf in more than one way. For example, an integer value of 255 can also be written in hex as 0xFF. This distinction is covered by the concept of a string format.
nameEach string format has a name used for for explicit typing and for general identification. This name must comply with the format production, and must be unique within the type families it applies to.
definitionA description of the format as it applies to particular data values.
regexpRegular expressions may be provided to allow implicit typing using the string format, or to enable the YAML processor to validate that a given value is indeed compliant with the string format.
As noted above, each scalar type family has exactly one canonical string format, although more than one string format may apply. For example, the scientific format is the canonical format for floating point numbers, but such numbers are typically written using the fixed format.
The graph model abstracts data structures of common programming languages. The model is a graph of collection and scalar values, where each node in the graph is provided with type information. The model provides an intermediate interface between the parser/emitter, which can be shared by multiple native languages, and the loader/dumper, which is specific to a particular binding. The model also provides a concrete representation for language-independent storage, simple structural queries, and graph transformations.
In the graph model, YAML is viewed as a directed graph of typed nodes. Nodes that can reference other nodes are collections and nodes with a string representation are scalars. The graph model also requires node identity and a mechanism to determine if two different nodes have the same content.
A graph node is the building block of YAML structures. In the serialization, they are represented by indented blocks. Within a native binding they represent application-specific objects. In the graph model, a node is tagged with a type family and can either be a collection or a scalar.
kindA node may be one of two kinds, a collection or a scalar.
type familyEach node is associated with a type family. For native data, this association may be implicit, based on the native data type of the node.
A scalar is a graph node with a string representation.
valueEach scalar has a value as specified by the type family definition.
string
representationsEach scalar has one or more string representations. Each string representation is a series of zero or more printable Unicode characters compliant with one of the type family's string formats.
canonical
representationA single unique string representation of the scalar according to the type family's canonical string format.
A string representation of a scalar together with its type family and format should be sufficient to encode most native data types not having a composite structure.
YAML requires the Unicode string scalar type family. Other scalar type families include integer, float, date, time, timestamp and binary. Application specific type families may also be used.
In most programming languages, there are two manners in which variables can be equivalent. The first is by reference, where the two variables refer to the same memory address. We call this equivalence relation "identity".
The second form of equivalence occurs when two nodes are different (have a different memory addresses), but share the same content (same binary layout). We call this second form of equivalence "equality". It follows that when two nodes are identical they are also equal.
A node set is an unordered association of zero or more graph nodes. A node may participate in many node sets without restriction, allowing for a graph structure. Node sets may not contain duplicates, that is, a node with a particular identity may only appear once. The primary purpose of the node set is to provide a basis for the definition of a collection. A native binding usually exposes node sets through a mechanism to enumerate the keys of a hash or dictionary.
A collection is a graph node which represents sequences such as lists or arrays, or mappings such as hashes or dictionaries. In the graph model, sequences are treated uniformly as mappings with integer keys. There are three collection rules. First, a set of keys may not contain two nodes that are equal. Second, each key is associated with exactly one value. Finally, each value is associated with at least one key. Note that this does not prevent a value from being associated with more than one key.
domainA domain is a node set restricted such that no two nodes in the set may be equal. Nodes which are members of the domain are often called "keys".
rangeA range is node set without restrictions. Nodes which are members of the range are often called "values".
functionA function is a rule of correspondence from the domain onto the range such that there is a unique value in the range assigned to every key in the domain, and every value in the range is assigned to at least one key.
YAML requires the mapping collection type family, which covers associative containers such as the Perl hash or Python dictionary. When the domain is a series of sequential integers starting with zero, the preferred type family is the sequence which corresponds to a Perl array or a Python list.
Node equality determines when two given nodes have the same content. When two nodes are equivalent under this equivalence relation, they are said to be "equal". Equality is defined between scalar nodes and between collection nodes, as described below.
scalar
equalityTwo scalars are equal if and only if they have the same type family and their canonical string representations have exactly the same series of Unicode characters.
collection
equalityEquality of a collection is defined recursively. Two collections are equal if and only if they have the same type family and for each key in the domain of one, there is a corresponding key in the domain of the other such that both keys are equal and their corresponding values are equal; here corresponding value refers to the unique node in the range of the collection assigned to the key by the collection's function.
A YAML text (file or stream) is a series of disjoint graphs, each with a root node.
streamA series of zero or more document root nodes.
documentA top level graph node that is disjoint from all other root document nodes.
The term disjoint
means that for any two nodes x and
y, there does not exist a third node
z that is reachable from both
x and y. For any node
x, x is reachable
from y if and only if either
x and y are identical, or
y is a collection and there
exists a node z in the domain or the range of y
such that x is reachable from
z.
To allow for YAML to be communicated as a series of events, an ordered tree structure must be used instead of a graph. This section describes an extension to the graph model where the graph is flattened and ordered to provide a tree interface. The resulting tree-structured model imposes a linear ordering and uses several constructs which are not part of the graph model. Applications constructing a native binding from the tree model should not use these additional constructs and the imposed ordering for the preservation of important data.
To lay out graph nodes as a tree structure, a mechanism is needed to manage duplicate occurrences. This is solved with three node kinds: branch, leaf, and alias. The first occurrence of a scalar is represented by a leaf, the first occurrence of a collection is represented by a branch, and subsequent occurrences of either a collection or a scalar are represented by an alias. All tree nodes in this model have the following properties:
kindA tree node may be one of three kinds, a branch, a leaf or an alias.
parentThe parent property gives access to the branch which holds the current tree node.
anchorThe anchor is a Unicode string which complies with the anchor production. The anchor is used to associate the first occurrence of a graph node with subsequent occurrences, via the alias tree node. This property is optional for leaf or branch nodes, provided that the scalar or collection represented does not occur more than once.
Note that when a tree node is converted to a graph node, the anchor, if any, is not converted. Likewise the parent property and the alias kind are not preserved as the graph node may participate in several collections.
Leaf tree nodes represent the first occurrence of a scalar in a given serialization.
type familyLike a scalar, each leaf is associated with a type family.
formatUnlike a scalar, each leaf is associated with a specific string format.
string valueEach leaf has a string value which is a string representation of the scalar according to the specific string format used.
When a leaf is converted into a graph node it becomes a scalar of the same type family. The scalar's value would be such that its string representation according to the specific format used would be identical to the leaf's string value. Note that the particular format used is not converted.
The alias tree node represents subsequent occurrences of a scalar or collection in the serialization.
referentThe branch or leaf which the alias references is the closest preceding tree node having the same anchor.
When an alias is converted into a graph node it becomes a subsequent occurrence of its referent's graph node.
A pair is an ordered set of two tree nodes. The first member of the set is the key and the second member of the set is the value.
Branch tree nodes represent the first occurrence of a collection in a given serialization.
type familyLike a collection, each branch is associated with a type family.
pairsA branch has an ordered set of zero or more pairs.
When a branch is converted into a graph node, three operations occur. The domain is constructed with the graph node for each key in its set of pairs. Likewise, the range is constructed with the graph node for each value in its set of pairs. Last, the function is constructed via assocation of key graph nodes to value graph nodes, as provided by the set of pairs. Note that the ordering of the pairs is explicitly not converted.
When serializing a YAML graph, every tree node is put into a single linear sequence within a given document through the branch pair ordering. With the composition of branches, this ordering becomes total, so that for any two distinct tree nodes in a serialization, one can be said to precede another.
For any
two nodes or aliases, x and y
we say that x
precedes y
when any of the following holds:
To enhance readability, a YAML serialization extends the tree model with syntax styles, comments and directives. Although the parser may provide this information, applications should take care not to use these features to encode information found in a native binding.
The tree node is extended with a style property, which can have different values depending upon its kind.
leaf
styleLeaf styles include eight nested styles and three in-line styles. All but the escaped and double quoted styles are limited to scalars having only printable characters.
branch
styleBranch styles are series and keyed. The series style may only be used if the domain of the collection's function is the set of sequential positive integers starting at zero.
The syntax model allows optional comment blocks to be interleaved with the node blocks. Comment blocks may appear before or after any node block. A comment block can't appear in a nested leaf node block value.
commentA comment is a series of zero or more Unicode characters complying with the comment productions.
Attached to each document is a document directive section.
directive
sectionA collection of directives to the parser where each member of the domain and range are scalar values matching the directive_name and directive_value productions.
Following are the syntax productions for the YAML serialization.
Characters are the basis for a serialized version of a YAML document. Below is a general definition of a character followed by several characters which have specific meaning in particular contexts.
Serialized YAML uses a subset of the Unicode character set. A YAML parser must accept all printable ASCII characters, the space, tab, line break, and all Unicode characters beyond 0x9F. A YAML emitter must only produce those characters accepted by the parser, but should also escape all non-printable Unicode characters if a character table is readily available.
[001] |
printable_char |
::= |
#x9 |
The range above explicitly excludes the surrogate
block [#xD800-#xDFFF], DEL
0x7F, the C0 control block
[#x0-#x1F], the C1 control block
[#x80-#x9F], #xFFFE and
#xFFFF. Note that in UTF-16, characters
above #xFFFF are represented with a
surrogate pair. DEL and characters in the C0 and C1
control block may be represented in a YAML serilization
using escape
sequences.
A YAML processor is required to support the UTF-32, UTF-16 and UTF-8 character encodings. If an input stream does not begin with a byte order mark, the encoding shall be UTF-8. Otherwise the encoding shall be UTF-32 (LE or BE), UTF-16 (LE or BE) or UTF-8, as signaled by the byte order mark. Note that as YAML files may only contain printable characters, this does not raise any ambiguities. For more information about the byte order mark and the Unicode character encoding schemes see the Unicode FAQ.
[002] |
byte_order_mark |
::= |
#xFEFF |
Indicators are special characters which are used to describe the structure of a YAML document.
[003] |
series_entry_indicator |
::= |
'-' |
|
[004] |
keyed_entry_separator |
::= |
':' |
|
[005] |
series_inline_start |
::= |
'[' |
|
[006] |
series_inline_end |
::= |
']' |
|
[007] |
keyed_inline_start |
::= |
'{' |
|
[008] |
keyed_inline_end |
::= |
'}' |
|
[009] |
branch_inline_separator |
::= |
',' |
|
[010] |
nested_key_indicator |
::= |
'?' |
|
[011] |
alias_indicator |
::= |
'*' |
|
[012] |
anchor_indicator |
::= |
'&' |
|
[013] |
transfer_indicator |
::= |
'!' |
|
[014] |
block_indicator |
::= |
'|' |
|
[015] |
folded_indicator |
::= |
']' |
|
[016] |
single_quote |
::= |
''' |
|
[017] |
double_quote |
::= |
'"' |
|
[018] |
throwaway_indicator |
::= |
'#' |
|
[019] |
reserved_indicators |
::= |
'@' | '%' |
'^' |
Indicators can be grouped into three categories. The
'-'
and ':'
space indicators are
always followed by a white space character (space, tab or line break). If followed by
any other character, these indicators are treated as
content. The '[', ']', '{', '}' and ',' in line indicators are used
to denote in-line branch
structure and therefore must not be used as content text
characters unless protected in some way. The remaining
indicators are used to denote the start of various YAML
elements and hence may used as internal content text
character in most cases. The exact restrictions on the
use of indicators as content text characters depend on
the particular leaf style
used.
[020] |
space_indicators |
::= |
series_entry_indicator |
|
[021] |
inline_indicators |
::= |
series_inline_start |
|
[022] |
non_space_indicators |
::= |
nested_key_indicator |
The Unicode standard defines the following line break characters.
[023] |
line_feed |
::= |
#xA |
|
[024] |
carriage_return |
::= |
#xD |
|
[025] |
next_line |
::= |
#x85 |
|
[026] |
line_separator |
::= |
#x2028 |
|
[027] |
paragraph_separator |
::= |
#x2029 |
|
[028] |
line_break_char |
::= |
line_feed |
Line breaks can be grouped into two groups. Specific line breaks have well-defined sematics for breaking text into lines and paragraphs. The semantics of generic line break characters is not defined beyond ending a line.
Outside text content, YAML allows any line break to be used to terminate lines, and in most cases also allows such line breaks to be preceded by trailing line space characters. On output, a YAML emitter is free to emit non content line breaks using whatever convention is most appropriate. An emitter should avoid emitting trailing line spaces.
[029] |
generic_line_break |
::= |
( carriage_returngreedy |
|
[030] |
specific_line_break |
::= |
line_separator |
|
[031] |
any_line_break |
::= |
generic_line_break |
|
[032] |
trailing_line_break |
::= |
line_space* |
This section includes several common character range definitions.
[033] |
line_char |
::= |
printable_char |
|
[034] |
line_space |
::= |
#x20 |
#x9 |
|
[035] |
line_non_space |
::= |
line_char |
|
[036] |
line_non_ascii |
::= |
line_char |
|
[037] |
ascii_letter |
::= |
[#x41-#x5A] |
|
[038] |
non_zero_digit |
::= |
[#x31-#x39] |
|
[039] |
decimal_digit |
::= |
[#x30-#x39] |
|
[040] |
hexadecimal_digit |
::= |
decimal_digit |
|
[041] |
word_char |
::= |
decimal_digit |
Serialized YAML uses text lines to convey structure. This requires special processing rules for white space (space, tab and line break) characters. These rules are compatible with Unicode's newline guidelines.
In a YAML serialization, structure is determined from indentation, where indentation is defined as a line break character followed by zero or more space characters.
Tab characters are not allowed in indentation unless a
'#TAB'
directive is used. If such a directive is used, each
indentation tab is equivalent to a certain number of
spaces determined by the specified tab policy.
A node must be more indented than its parent node. All sibling nodes must use the exact same indentation level. However the content of each such node may be indented independently.
The indentation level is used exclusively to delineate structure. Indentation characters are otherwise ignored. In particular, they are never taken to be a part of the value of serialized text.
[042] |
indent(n) |
::= |
#x20 x n |
|
[043] |
indent(<n) |
::= |
indent(m) |
m such that m <
n */ |
[044] |
indent(<=n) |
::= |
indent(m) |
m such that m <=
n */ |
Since the YAML serialization depends upon indentation
level to delineate blocks, additional productions are a
function of an integer, based on the
Throwaway comments have no effect whatsoever on the tree or graph models represented in the file. Their usual purpose is to communicate between the human maintainers of the file. A typical example is comments in a configuration file.
A throwaway comment always spans a complete line. An
explicit throwaway comment line consists of of some
indentation, a '#'
indicator, and arbitrary comment characters to the end of
the line. Empty lines or lines containing only
indentation spaces are taken to be an implicit throwaway
comment.
A throwaway comment may appear before a document node or following any node. A throwaway comment may not appear inside a nested line leaf node, but may precede or follow such a node. When following a nested leaf value, the first comment line must be explicit and be less indented than the nested node value. Following comment lines are not restricted.
[045] |
|
::= |
indent(<n) |
|
[046] |
|
::= |
indent(<n) |
# These are three throwaway comment
# lines (the second line is empty).
this: |
contains two lines of text, the
# second of which starts