Brian Ingerson (mailto:ingy@ttul.org)),
Clark C. Evans
Oren Ben-Kiki (mailto:oren@ben-kiki.org)
Copyright © 2001 Brian Ingerson, Clark Evans & Oren Ben-Kiki, All Rights Reserved. This document may be freely copied provided it is not modified.
This specification is a working draft and reflects consensus reached by the members of the yaml-core mailing list. Any questions regarding this draft should be raised on this list. This is a draft and changes are expected, therefore implementers should closely follow this mailing list to stay up-to-date on trends and announcements.
YAML (rhymes with "camel") is a straightforward machine parsable data serialization format designed for human readability and interaction with scripting languages such as Perl and Python. YAML is optimized for data serialization, configuration settings, log files, Internet messaging and filtering. This specification describes the YAML information model and serialization format.
1 Introduction
1.1 Goals
1.2 Origin
1.3 Relation to XML
1.4 Terminology
2 Preview
2.1 Collections
2.2 Structures
2.3 Styles
2.4 Type Family
2.5 Full Length Examples
3 Key Concepts
3.1 General Concepts
3.1.1 Type Family
3.1.2 String Format
3.2 Graph Model
3.2.1 Node
3.2.2 Scalar
3.2.3 Identity
3.2.4 Node set
3.2.5 Collection
3.2.6 Equality
3.2.7 Documents
3.3 Tree Model
3.3.1 Tree Node
3.3.2 Leaf
3.3.3 Alias
3.3.4 Pair
3.3.5 Branch
3.3.6 Ordering
3.4 Syntax Model
3.4.1 Style
3.4.2 Format
3.4.3 Comment
3.4.4 Directive
4 Serialization Syntax
4.1 Characters
4.1.1 Character Set
4.1.2 Encoding
4.1.3 Indicators
4.1.4 Escape Codes
4.1.5 Miscellaneous Characters
4.2 White Space Processing
4.2.1 Indentation
4.2.2 End-of-Line Normalization
4.2.3 Throwaway comments
4.2.4 Line Folding
4.3 YAML Stream
4.3.1 Directive
4.3.2 Node
4.3.3 Property
4.3.4 Transfer Method
4.3.5 Anchor
4.4 Alias
4.5 Collection
4.5.1 Sequence
4.5.2 Map
4.6 Scalar
4.6.1 Block Scalar
4.6.2 Folded Scalar
4.6.3 Escaped Scalar
4.6.4 Plain Scalar
5 Transfer Methods
5.1 Explicit Typing
5.2 Implcit Typing
5.3 Common Type Families
5.3.1 Sequence
5.3.2 Map
5.3.3 String
5.3.4 Null
5.3.5 Pointer
5.3.6 Integer
5.3.7 Float
5.3.8 Binary
5.3.9 Special Keys
5.4 Unsupported Transfer Methods
Yet Another Markup Language, abbreviated YAML, is a human readable data serialization format and processing model. This text describes the class of data objects called YAML documents and partially describes the behavior of computer programs that process them.
YAML documents encode into a serialized form the native data constructs of modern scripting languages. Strings, arrays, hashes, and other user defined data types are supported. A YAML document stream consists of a sequence of characters, some of which are considered part of the document's content, and others that are used to indicate structure within the information stream.
A software module called a YAML parser is used to read YAML documents and provide access to their content and structure. In a similar way, a YAML emitter is used to write YAML documents, serializing their content and structure. A YAML processor is a module that provides parser or emitter functionality or both. It is assumed that a YAML processor does its work on behalf of another module, called an application. This specification describes the interface and required behavior of a YAML processor in terms of how it must read or write YAML document streams and the information it must provide to or obtain from the application.
The design goals for YAML are:
YAML documents are very readable by humans.
YAML interacts well with scripting languages.
YAML uses host languages' native data structures.
YAML has a consistent information model.
YAML enables stream based processing.
YAML is expressive and extensible.
YAML is easy to implement.
YAML was designed with experience gained from the construction and deployment of Data::Denter. YAML has also enjoyed much markup language critique from SML-DEV list participants, including experience with the Minimal XML and Common XML specifications.
YAML integrates and builds upon structures and concepts described by Perl, XML, SOAP, Python, HTML, C, RFC0822, RFC2045 and SAX.
YAML's core type system is based on serialization requirements of the Perl language. YAML directly supports both scalar values (string, integer) and collections (array,hash). Support for common types enables programmers to use their language's native data constructs for YAML manipulation, instead of requiring a special document object model (DOM).
Like XML's SOAP, the YAML serialization supports native graph structures through a rich alias mechanism. Also like SOAP, YAML provides for application defined types. This allows YAML to serialize rich data structures required for modern distributed computing.
YAML's block scoping is similar to Python's. In YAML, the extent of a node is indicated by its column. YAML's block scalar leverages this by enabling formatted text to be cleanly mixed within an aggregate structure without troublesome escaping. Further, YAML's block indenting provides for easy inspection of the document's structure.
Motivated by HTML's end of line normalization, YAML's folded scalars introduce a unique method of handling whitespace. In YAML, single line breaks may be folded into a single space. This technique allows for paragraphs to be word-wrapped without affecting the canonical form of the content.
YAML's escaped scalars use
familar C style escape sequences.
This enables ASCII representation of non-printables or
8-bit (ISO 8859-1) characters using '\x3B', 16-bit
(Unicode) characters with '\u003B', and 32-bit
(ISO/IEC 10646) characters can be specified using '\U0000003B' style
escapes.
The syntax of YAML was motivated by Internet Mail (RFC0822) and can be used for HTTP headers. Further, YAML borrows the document separator from MIME (RFC2045). With this insight, YAML's top level production is a stream of independent documents; ideal for distributed processing systems.
YAML was designed to have an incremental interface which includes both a pull style input stream and a push style (SAX like) output stream interfaces. Together this enables YAML to support the processing of large documents, such as a transaction log, or continuous streams, such as a feed from a production machine.
There are many differences between YAML and the eXtensible Markup Language ("XML"). XML was designed to be backwards compatible with Standard Generalized Markup Language ("SGML") and thus had many design constraints placed on it that YAML does not share. Also XML, inheriting SGML's legacy, is designed to support structured documents, where YAML is more closely targeted at messaging and native data structures. Where XML is a pioneer in many domains, YAML has been grown on the lessons learned by the XML community.
The YAML and XML information models are starkly different. In XML, the primary construct is an attributed tree, where each element has an ordered, named list of children and an unordered mapping of names to strings. In YAML, the primary graph constructs are keyed collections (natively stored as a hash or array) and scalar values (string, integer, float). This difference is critical since YAML's model is directly supported by native data structures in most modern programming languages, where XML's model requires mapping conventions, or an alternative programming component (e.g. a document object model).
The terminology used to describe YAML is defined in the body of this specification. The terms defined in the following list are used in building those definitions and in describing the actions of a YAML processor:
Conformant YAML streams and processors are permitted to but need not behave as described.
Conformant YAML texts and processors are encouraged to behave as described, but may do otherwise if a warning mesage is provided to the user and any deviant behavior requires consious effort (non-default setting) to enable.
Conformant YAML texts and processors are required to behave as described, otherwise they are in error.
A violation of the rules of this specification; results are undefined. Conforming software may detect and report an error and may recover from it.
This specification, together with the Unicode standard for characters, provides all the information necessary to understand YAML Version 1.0 and construct computer programs to process it.
This section provides a quick glimpse into the expressive power of YAML (and its clean syntax) without going into too much detail. It is not expected that the first time reader grok all of the examples. Instead these selections are used to motivate the information model and as guide posts for the serialization productions.
YAML collections allow for aggregation of data. There are two primary types of collections which YAML supports, sequences and mappings. Most tree structures can be constructed by nesting collections.
YAML streams can be commented and separated into multiple documents. To allow for graph serialization, YAML has a built-in alias mechanism.
--- name: Mark McGwire hr: 65 avg: .278 rbi: 147 --- name: Sammy Sosa hr: 63 avg: .288 rbi: 141
|
# Ranking of players by # season home runs. --- - Mark McGwire - Sammy Sosa - Ken Griffey
|
||||
# Home runs hr: # 1998 record - Mark McGwire - Sammy Sosa # Runs batted in rbi: - Sammy Sosa - Ken Griffey
|
# Home runs hr: # 1998 record - Mark McGwire - &001 Sammy Sosa # Runs batted in rbi: - *001 - Ken Griffey
|
Besides in-line scalars used above, YAML has support for several multi-line and quoted scalar styles. Furthermore, for small sequences and mappings, an in-line style helps make YAML easy to author.
--- \ Mark McGwire's year was crippled by a knee injury.
|
--- |
\/|\/|
/ | |_
|
||||
--- \\ Sosa completed another fine season. \u263A
|
name: Mark McGwire occupation: baseball player comments: \ Mark set a major league home run record in 1998.
|
||||
years: "1998\t1999\t2000\n" msg: "Sosa did fine. \u263A"
|
- ' \/|\/| ' - ' / | |_ '
|
||||
- [ name , hr, avg ] - [ Mark McGwire, 65, .278 ] - [ Sammy Sosa , 63, .288 ]
|
Mark McGwire: {hr: 65 , avg: .278}
Sammy Sosa: {hr: 63 , avg: .288}
|
To encode data type and other application semantics in a YAML serialization, every node has a type family and leaf nodes have a syntax format.
invoice: 34843 date : 2001-01-23 buyer: given : Chris family : Dumars product: - 4 Basketballs - 1 Superhoop
|
invoice: !int;decimal 34843 date : !date;iso8609 2001-01-23 buyer: !map given : !str Chris family : !str Dumars product: !seq - !str 4 Basketballs - !str 1 Superhoop
|
||||
--- !binary;base64 \ R0lGODlhDAAMAIQAAP/ 9/X17unp5WZmZgAAAOf n515eXvPz7Y6OjuDg4J +fn5OTk6enp56enmlpa NjY6Ojo4SEhP/++f/++ f/++f/++f/++f/++f/+ EeECcgggoBADs=
|
--- !seq 0: Mark McGwire 1: Sammy Sosa 2: Ken Griffey --- empty: !map invoice: !str 34843
|
||||
--- !org.clarkevans.timesheet who: Clark C. Evans when: 2001-11-18 hours: !.hours 3 description: \ Wrote up these examples and learned alot about baseball statistics.
|
--- !com.clarkevans.graph
- !.circle
center: &ORIG {x: 73 , y: 129}
radius: 7
- !.line [23,32,200,300]
- !.line [23,32,300,200]
- !.text
center: *ORIG
color: 0x02FDBA
value: Center of circle
|
Following are two full length examples. On the left is a sample invoice, on the right is a sample log file.
--- !com.clarkevans.invoice invoice: 34843 date : 2001-01-23 bill-to: &001 given : Chris family : Dumars address: line one: '458 Walkman Dr.' line two: Suite #292 city : Royal Oak state : MI postal : 48046 ship-to: *001 product: - quantity: 4 id : BL394D desc : Basketball price : $450.00 - quantity: 1 id : BL4438H desc : Super Hoop price : $2,392.00 tax : $251.42 total: $4443.52 comments: \ Late afternoon is best. Backup contact is Nancy Billsmer @ 338-4338.
|
---
Date: 2001-11-23
Time: 13:02+5:00
User: ed
Warning: \
This is an error message
for the log file
---
Date: 2001-11-23
Time: 15:02+5:00
User: ed
Warning: \
A slightly different error
message.
---
Date: 2001-11-23
Time: 15:03+5:00
User: ed
Fatal: \
Unknown variable "bar"
Stack:
-
file: TopClass.py
line: 23
code: x = MoreObject('345')
-
file: MoreClass.py
line: 58
code: foo = bar
|
Conceptually, a YAML system may be visualized as three interacting states: a serialization format, a event stream, and a native binding. Translating YAML information between these states are four processing components: a parser, a loader, a dumper, and an emitter. The parser extracts structured information from the input stream. The loader converts this information into the appropriate native structures.
|
|
|
|
|
|
| [serialization format] | -->
| [event stream] | --> |
[native binding] |
|
| (parser) |
| (loader) |
|
|
|
|
|
|
|
| [serialization format] | <-- | [event stream] | <-- | [native binding] |
|
|
(emitter) |
|
(dumper) |
For each one of the states above, there is a corresponding information model. The graph model covers the native binding, the tree model covers the event stream, and the syntax model covers the serialization format. Type information is moved between these states with the the type family and string format constructs.
| graph model | The graph model abstracts data structures of common programming languages. Nodes in the graph include collections or a scalars. A collection is modeled as a function from one set of nodes to another. Scalars are nodes having a string representation. Both node kinds have a type family. | |
| tree model | The tree model flattens the graph structure into a hierarchy of branches, leaves and alias nodes. A branch represents the first occurance of a collection, a leaf represents the first occurance of a given scalar, and an alias is a surrogate used for subsequent occurences of either graph nodes. The branch is modeled as an ordered set of tree node pairs. | |
| syntax model | The syntax model enhances the tree model with comments, leaf styles and string formats, and other serialization specific details. Character serializations must also comply with the syntax productions given in the following section. |
A processor need not expose the event stream (or the tree model) and may directly translate between a serialization and its native binding. However, such a direct translation should take place so that the native binding is constructed only from information available in the graph model. In particular, information particular to the the tree model (alias anchors and pair ordering) and syntax specific information (comments and styles) should not be used in the construction of a native binding. Exceptions to this guideline include editors which must operate on a direct image of the serialization format.
There are several core concepts shared by each information model primarly relating to type information and how it is communicated between the serialization format and a native binding.
The type family mechanism provides an abstraction of data types which is portable across various languages and platforms. Each native binding may have zero or more native concrete types or class constructs which correspond to a given type family.
namedefinitionformatimplicitIn general, there may be more than one native type
which corresponds to the type family. In the
Python languagek, for example, the integer family may be
bound to either the a plain integer
capable of holding 32 bits, or the long
integer with unlimited size. In situations
like this, the loader makes the choice.
In other cases, a binding may not have an appropriate native construct for a given type family. This may be addressed with a generic YAML construct to act as a place-holder so that the data value and the type family may round-trip. Alternatively, with warning to the user, a value may be cast to a different, perhaps less specific family. Otherwise, a processor must raise an exception when a native binding for a particular value is not possible.
It may be possible to write a string value of a leaf in more than one way. For example, an integer value of 254 can also be written in hex as 0xFF. This distinction is covered by the concept of a string format.
namedefinitionregexAs noted above, each type family has exactly one default string format; although more than one string format may apply. For example, the decimal format is the default for integers and the base64 format is the default for the binary type family.
The graph model abstracts data structures of common programming languages. The model is a graph of collection and scalar values, where each node in the graph is provided with type information. The model provides an intermediate interface between the parser/emitter which can be shared by multiple native languages, and the loader/dumper which is specific to a particular binding. The model also provides a concrete representation for language independent storage, simple structural queries, and graph transformations.
In the graph model, YAML is viewed as a directed graph of typed nodes. Nodes that can reference other nodes are collections and nodes with a string representation are scalars. The graph model also requires node identity and a mechanism to determine if two different nodes have the same content.
A graph node is the building block of YAML structures. In the serialization, they represent indented blocks. Within a native binding they represent an application specific objects. In the graph model, a node is tagged with a type family and can either be a collection or a scalar.
kind
familyA scalar is a graph node with a string representation.
string
The default type family for scalar nodes is org.yaml.str. The string representation of the scalar together with its type family should be sufficient to encode most native data types not having a composite structure. Other scalar type families include integer, float, and binary.
In most programming languages, there are two manners
in which variables can be equivalent. The first is
by reference, where the two variables refer to the
same memory address. We call this equivalence
identity.
The second form of equivalence occurs when two nodes are different (have a different memory address), but share the same content or have the same binary layout. We call this second form of equivalence equality. It follows that when two nodes are identical they are also equal.
A node set is an unordered association of zero or more graph nodes. A node may participate in many node sets without restriction, allowing for a graph structure. However, node sets may not contain duplicates, that is, a node with a particular identity may only appear once. The primary purpose of the node set is to provide a basis for the definition of a collection. A native binding usually exposes node sets through a mechanism to enumerate the keys of a hash or dictionary.
A collection is a graph node which represents sequences such as lists or arrays, or mappings such as hashes or dictionaries. In the graph model, sequences are treated uniformly as mappings with integer keys. There are two collection rules. First, a set of keys may not contain two nodes that are equal. Second, each key is associated with exactly one value. Note that this does not prevent a value from being associated with more than one key.
domain
keys.
range
values.
functionThe default type family for collection nodes is org.yaml.map, which covers associative containers such as the Perl hash or Python dictionary. When the domain is a continuous series of positive integers starting with zero, the preferred type family is org.yaml.seq which includes the Perl array or Python list.
Node equality determines when two given nodes have the
same content. Technically, equality is an equivalence relation
(like identity above). When two
nodes are equivalent under this relation, they are said to
be equal.
Equality is defined between scalar nodes and between
collection nodes, as described below.
scalar equality collection equality The start of a YAML text (file or stream) is a series of disjoint graphs, each with a root node.
root
document
The term disjoint
means that for any two nodes x and y,
there does not exist a third node z such that
is both reachable
from x and y.
For any node x, x is
reachable
from y means that either x and
y are identical;
or y is a collection
and there exists a node z in the
domain or the
range of y such
that x is reachable from z.
To allow for YAML to be communicated as a series of events, an ordered tree structure must be used instead of a graph. This section describes an extension to the graph model where the graph is flattened and ordered to provide a tree interface. The resulting tree structured model uses several constructs and imposes a linear ordering which is not part of the graph model. Applications constructing an native binding from an implementation of the tree model should not use these additional constructs and the imposed ordering to preserve important data.
To layout graph nodes as a
tree structure, a mechanism is needed to manage duplicates.
This is solved with a three node system: branch,
leaf, and alias. The first
occurance of a scalar is
represented by a leaf, the first occurance of a
collection is
represented by a branch, and subsequent occurances of
either a collection or a scalar is represented by an
alias. All tree nodes in the serial model have the
following properties:
kind
parent
parent property gives access to
the branch which
holds the current tree node.
anchor
Leaf tree nodes represent the first occurance of a scalar in a given serialization.
familystringWhen a leaf is converted into a graph node it becomes a scalar with the same type family and string representation. Note that the anchor, if any, is not converted.
The alias tree node represents subsequent occurances of a scalar or collection in the serialization.
referentWhen an alias is converted into a graph node it becomes a subsequent occurance of it's referent's graph node.
A pair is an ordered set of two
tree nodes. The first member of the set is called
the key and the second member of the set
is called the value.
Branch tree nodes represent the first occurance of a collection in a given serialization.
familypairsWhen an branch is converted to a graph node, three operations occur. The domain is constructed with the graph node for each key in it's set of pairs. Likewise, the range is constructed with the graph node for each value in it's set of pairs. Last, the function is constructed via assocation of key graph nodes to value graph nodes, as provided by the set of pairs. Note that the ordering of the pairs is explicitly not converted.
When serializing a YAML graph, every tree node is put into a single linear sequence within a given document through the branch ordering. Through the composition of branches, this ordering becomes total, so that for any two distinct tree nodes in a serialization, one can be said to precede another.
For any two nodes or aliases, x and
y we say that x
precedes
y when any of the following holds:
To enhance readability, a YAML serialization extends the tree model with syntax styles, string formats, comments, and directives. Although the parser may provide this information, applications should take care not to use these features to encode data which must be preserved.
The tree node is extended with a style property, which can have different values depending upon its kind.
leaf style
plain,
folded, escaped, and block.
All but the escaped style are limited to scalars having only printable
characters.
branch style
sequence and mapping.
The sequence style may only be used if the domain of the
collection's function are sequential positive integers starting
at zero.
Each leaf node is given a particular format to represent the actual format used by it's string representation. Note that once this property is added, the string representation stops being canonical since it overrides the default format for the leaf's family.
format
Before each pair in the serialization is an optional comment.
comment
Attached to each document is a document directive section.
directive section
Following are the syntax productions for the YAML serialization.
Characters are the basis for a serialized version of a YAML document. Below is a general definition of a character followed by several characters which have specific meaning in particular contexts.
Serialized YAML uses a subset of the Unicode character set. A YAML parser must accept all printable ASCII characters and all non-ASCII Unicode characters. However a YAML emitter should attempt to emit only printable characters (including space, tab and line break characters). Characters known to be non-printable may be escaped.
[001] |
printable_char |
::= |
#x9 | #xA |
#xD(printable Unicode characters starting at #x20 and upwards) |
As with standard practice, the surrogate block,
#xFFFE and #xFFFF are
excluded.
A YAML processor is required to support the UTF-32, UTF-16 and UTF-8 character encodings. If an input stream does not begin with a byte order mark, the initial encoding shall be UTF-8. Otherwise the initial encoding shall be UTF-32 (LE or BE), UTF-16 (LE or BE) or UTF-8, as deduced from the byte order mark. Note that as YAML files may only contain printable characters, this does not raise any ambiguities. For more information on the byte order mark and the Unicode character encoding schemes see the Unicode FAQ.
[002] |
byte_order_mark |
::= |
#xFEFF |
Indicators are special characters which are used to describe the structure of a YAML document.
[003] |
series_entry_indicator |
::= |
'-' |
|
[004] |
keyed_entry_separator |
::= |
':' |
|
[005] |
series_in_line_start |
::= |
'[' |
|
[006] |
series_in_line_end |
::= |
']' |
|
[007] |
keyed_in_line_start |
::= |
'{' |
|
[008] |
keyed_in_line_end |
::= |
'}' |
|
[009] |
branch_in_line_separator |
::= |
',' |
|
[010] |
nested_key_indicator |
::= |
'?' |
|
[011] |
alias_indicator |
::= |
'*' |
|
[012] |
anchor_indicator |
::= |
'&' |
|
[013] |
transfer_indicator |
::= |
'!' |
|
[014] |
block_indicator |
::= |
'|' |
|
[015] |
plain_indicator |
::= |
'\' |
|
[016] |
single_quote |
::= |
''' |
|
[017] |
double_quote |
::= |
'"' |
|
[018] |
throwaway_indicator |
::= |
'#' |
|
[019] |
reserved_indicators |
::= |
'^' | '@' |
'%' |
Indicators can be grouped into three categories. The
'-'
and ':'
space indicators are
always followed by a white space character (space, tab or line break). If followed by any
other character they are treated as content text
characters. The '[', ']', '{', '}' and ',' in line indicators are
used to denote in-line
branch structure and therefore must not be used as
content text characters unless protected in some way. The
remaining indicators are used to denote the start of
various YAML elements and hence may used as internal
content text character in most cases. The exact
restrictions on the use of indicators as content text
characters depend on the particular leaf style used.
[020] |
space_indicators |
::= |
series_entry_indicator |
|
[021] |
in_line_indicators |
::= |
series_in_line_start |
|
[022] |
non_space_indicators |
::= |
nested_key_indicator |
Escape codes are used in escaped and double quoted leaves to denote common non-printable characters, specify characters by a hexadecimal value, and produce the literal escape and double quote characters.
[023] |
escape |
::= |
'\' |
|
[024] |
escaped_escape |
::= |
escape escape |
|
[025] |
escaped_double_quote |
::= |
escape double_quote |
|
[026] |
escaped_bel |
::= |
escape 'a' |
|
[027] |
escaped_backspace |
::= |
escape 'b' |
|
[028] |
escaped_esc |
::= |
escape 'e' |
|
[029] |
escaped_form_feed |
::= |
escape 'f' |
|
[030] |
escaped_line_feed |
::= |
escape 'n' |
|
[031] |
escaped_return |
::= |
escape 'r' |
|
[032] |
escaped_tab |
::= |
escape 't' |
|
[033] |
escaped_vertical |
::= |
escape 'v' |
|
[034] |
escaped_null |
::= |
escape 'z' |
|
[035] |
escaped_8_bit |
::= |
escape 'x' |
|
[036] |
escaped_16_bit |
::= |
escape 'u' |
|
[037] |
escaped_32_bit |
::= |
escape 'U' |
|
[038] |
escape_sequence |
::= |
escaped_escape |
In single quoted leaves, a single quote character needs to be escaped. This is done by repeating the character.
[039] |
escaped_single_quote |
::= |
single_quote |
Unicode defines the following line break characters.
[040] |
line_feed |
::= |
#xA |
|
[041] |
carriage_return |
::= |
#xD |
|
[042] |
next_line |
::= |
#x85 |
|
[043] |
line_separator |
::= |
#x2028 |
|
[044] |
paragraph_separator |
::= |
#x2029 |
|
[045] |
line_break |
::= |
line_feed |
This section includes several common character range definitions.
[046] |
line_char |
::= |
printable_char |
|
[047] |
line_space |
::= |
#x20 |
#x9 |
|
[048] |
line_non_space |
::= |
line_char |
|
[049] |
ascii_letter |
::= |
[#x41-#x5A] |
|
[050] |
decimal_digit |
::= |
[#x30-#x39] |
|
[051] |
hexadecimal_digit |
::= |
decimal_digit |
|
[052] |
word_char |
::= |
ascii_letter | '-' |
|
[053] |
non_word_char |
::= |
line_non_space |
Serialized YAML uses text lines to convey structure. This requires special processing rules for white space (space, tab and line break) characters. These rules are compatible with Unicode's newline guidelines.
In a YAML serialization, structure is determined from indentation, where indentation is defined as an end of line marker followed by zero or more space characters. Indentation level is defined recursively.
[054] |
indent(0) |
::= |
||
[055] |
indent(n) |
::= |
indent(n-1)
#x20 |
Since the YAML serialization depends upon indentation
level to delineate blocks, additional productions are a
function of an integer, based on the
The indentation level is used exclusively to delineate blocks. Indentation characters are otherwise ignored. In particular, they are never taken to be a part of the value of serialized text.
On input and before parsing, a compliant YAML parser must translate the two-character combination CR LF, any CR which is not followed by an LF, and any NEL into a single LF (this does not apply to escaped characters). LS and PS characters are preserved. This functionality is indicated by the use of the normalized_line_break production defined below.
[056] |
line_feed_line_break |
::= |
( carriage_returngreedy |
|
[057] |
normalized_line_break |
::= |
line_feed_line_break |
On output, a YAML emitter is free to serialize end of line markers using whatever convention is most appropriate, though again LS and PS must be preserved.
To increase readability, YAML serialization allows for
breaking long text lines. Therefore in many cases the
parser replaces a single normalized line feed
with a single space (#x20). LS and PS characters are
preserved, so it is safe to use them to indicate
line/paragraph text structure even when line folding is
done.
When encountering two or more consecutive (possibly indented) normalized line feeds, the parser does not convert them into spaces. However, if the series of line feeds is surrounded by other text characters, the parser ignores the first line feed, requiring a single line feed to be serialized as two, two line feeds to be serialized as three etc. Thus each "empty line" in a folded text represents a single line feed character, be it at the start, middle or end of the value.
When this functionality is implied, the
[058] |
space_line_feed |
::= |
line_feed_line_break |
|
[059] |
empty_line_feeds(n) |
::= |
line_feed_line_break |
|
[060] |
folded_line_breaks(n) |
::= |
empty_line_feeds(n)
greedy |