Table of Contents
Break Syntax Chapter
As proposed by Rich Morin.
Nit Fixes
A lot of nits detected by Rich Morin, Tichy Braoo and Alexis Layton.
Broken Links
Several broken links detected by Tichy Braoo.
Break Syntax Chapter
To several smaller chapters.
Boolean canonicals
Change to
“y”/“n”.
Remove
“+”/“-”
from examples and the bool type definition. This should have been
done a while back (when a plain value could no longer be
“-”).
Time-less dates
Changed to 00:00:00Z instead of 12:00:00Z.
Renamed !special to
!yaml
Better reflects the intent.
DocBook format
To allow for easier media retargeting. Support HTML and PDF (using XEP).
Fix production bugs
Prevent header/trailer from appearing as content - Thanks to Robin Green for catching this one.
Change “Serialization” back to “Document”
By popular demand.
Implicit/explicit indentation
Allow leading empty lines, and indented content lines to start with a
“#” character.
Changed “taguri:” to
“tag:”
As per the updated tag URI RFC.
YAML Processor
Used the term “(YAML) processor” instead of parser, loader etc. This avoids forcing preconcieved notions on system architecture to affect the semantics.
Rearranged and renamed node/value productions
Hopefully this would increase production readability.
Hungarian explained
Add a note explaining the production prefixes.
Status
Changed to release candidate.
Allow inline collection form in complex keys
Reusing the productions from the *-in-seq form.
Allow omitting complex and flow keys values
Allows for a nicer set notation.
Allow implicit single pair maps in flow sequences
Allows for a nicer ordered map connection.
Major re-write of model and intro chapters
Add figures, rephrase, make more readable.
URI escaping
Gone.
Empty plain scalars
Are forbidden as keys and as flow sequence entries.
Trailing commas
Are allowed in flow collections and following ... line. Empty lines ending with a specific line break are now treated as comments even if following a block scalar.
Folding and Chomping
Rules have been massively simplified.
Examples
Completely new examples set, with highlights associating parts of the characters stream with specific productions.
Productions
Re-work almost all productions in the syntax section, together with examples.
Sam’s corrections
Incorporated minor wording issues (thanks to Sam Vilain).
YAML 1.1
New directive syntax, %TAG directives, patched indentation rules, moved !!str, !!seq and !!map to the repository. Hopefully this is the last "global" change.
Index
Replace all internal links with index entries. It is simply amazing how much work is expressed in this short sentence.
Implicit header
Was changed to “---” instead of “--- %YAML:1.0”.
Implicit document
Now allows all collection styles, not only block collections.
Block sequence indentation
Wording, productions and examples fixed to allow for more human-friendly interpretation of “-” as indentation.
BNF productions
Were renamed with a hungarian-like prefix notation and reformatted so that cut-and-paste from HTML would yield proper results.
Vocabulary instead of language
The term “vocabulary” better describes the intent of sub-domains of yaml.org (for type families).
Seq-in-seq
Was added (similar to map-in-seq).
Throwaway indicator
Changed from “ # ” to
“ #”.
Directive indicator
Changed from “#” to
“%”.
Anchors and directive names
Can be any non-space flow char.
Flow values
May start on a following line.
Special “-” plain value
Is canceled (ambiguous with next-line flow). Therefore the
-/+ Boolean implicits are gone as well.
Language-independent types
Corrected date indicators.
Integers and floats
Added sexagesimal (base 60) format (using
“:”) for time (and degrees).
Nulls
Removed the “Undef” possibility.
Merge
Changed from “<=” to
“<<”.
Brian’s wording fixes
Spelling and minor re-wording.
#TAB was removed
Tabs are now completely banned from indentation.
Escaping of type family name
The interaction between “\” and
“%” escaping has been clarified
in a better way.
Zero explicit indentation
Is now allowed for top level nodes.
Moved language-independent types out of the spec.
Only !map, !seq and !str are in it now.
Changed implicit types
No longer require (). Implicit empty string is !null. Changed nil to undef. These changes depend on a resolution to the implicit typing issues; hence all the separate type documents are not in “last call” status. Yet.
Changed NULL escape
From \z to \0.
Changed “- entry” indentation
Consider the “-” to be
indentation, but not the following spaces.
Changed format separator to “#”
This makes transfer methods into URI-references (allow fragments). Type families are URIs (without fragments).
Mutability == collection
Make all collections mutable and all scalars immutable. This simplifies generic tools and the native model.
Re-ordered sections
Moved information models to the end.
Graph implicit types
Allow types to remain unknown (implicit) all the way up to and including the graph model; allow implicit collection types.
Graph implicit types
Allow types to remain unknown (implicit) all the way up to and including the graph model; allow implicit collection types.
Status
Was changed to “Last Call”. At long last.
Implicit strings
May begin with a “/” to allow
paths to be unquoted.
Removed the // special keys.
They conflict with the new implicit string format. What to replace them with, if anything, is still TBD.
Example fixes
Minor nits were wrong in some examples (4.3.1, 4.6.4, 4.6.5, 4.6.7).
Renamed “public” type families to “global”
This better conveys the intent.
Separated changes
As of this draft, changes are in a separated document.
Repaginated draft
The previous pagination was nice but it was very wasteful of pages. In the spirit of keeping the spec small, page breaks were moved, reducing the printed size by 10 pages or so.
Table of content
Was enhanced to specify page numbers as well as links. Also, entry 3.1.4 was missing.
Section 1.2, paragraph 7
Remove the sentence about indentation being used for multiple documents (it isn’t).
Syntactical problems
Using “an flow” (leftover from the global substitute) and various other minor fixes.
Renamed “nested” to “block”
In all the wording. This makes the wording match the productions.
Cleaned canonical formats
In some cases, the canonical format was a subset of another format, rather than being an implicit format on its own. This is allowed, and is now explicit in the format definition. Also fixed the string formats to satisfy the requirement that a value would only match one implicit or explicit format.
Printing
Should be much improved. Explicit page breaks were added and some reorganization was done so that breaks should reflect logical chunks. At least in A4 and Letter pages size on IE6 :-)
Productions 94 and 98
Now reflect the correct indentation policy for map-in-seq.
Productions 129 and 130
Were renamed to “folded-paragraph” and “folded-text-line” to better reflect the intent.
The “...” “pause”
marker
Was added as per the discussions in the mailing list. This added two productions.
Examples
Some examples were re-worked as part of the re-organization. Hopefully for the better. The XML example was removed.
Printing enhancements
Add page-break in many locations for better printing, make table of contents two columns.
Terminology
Change “block scalar” to “literal scalar”, change “in-line” to “flow”. This allows for two sorts of mechanisms, the “block” mechanism and the “flow” mechanism. Within block are literal and folded scalars; within flow are plain, single, and double. This involved a few global substitutions which will probably introduce error, those are:
s/in-line/flow/
s/nest-/blk-/
s/line-/flow-/
s/flow-feed/line-feed
s/flow-separator/line-separator
s/flow-break-char/line-break-char
s/flow-break/break
Preview Re-structuring
Restructure the preview section for general readability.
Production Changes
Minor change to the comment-break production
Use (word) instead of .word for “enums”
This is going to be our generic way to handle implicits that otherwise would be taken to be strings.
Change “word” format names to “english”
More correctly reflects their semantics.
Canonical Booleans
Change canonical Booleans to +/-. Had to
make a special case to allow “-”
in unquoted values to do it. This doesn’t apply to keys, but using
the English format there makes more sense anyway. This allows
canonical boolean to be implicit.
Add (null)/(nil)
In addition to ~, consistent with booleans.
Unify date/time
Merge them into timestamp (and rename it to “time”).
Relax string regexp
Anything starting with a “text” character
(“-”, letter or digit),
except for int/flt/time.
Change space escapes
“\ ” is now a space,
“\-” is now NBSP.
Allow repeated #
To begin throwaway comments.
Renamed the three chomping variants
To “strip”, “clip” (default) and “keep”.
Changed in-line folding
To allow empty lines to represent line feeds and to preserve specific line breaks. The WIKI feature isn’t included as it depends on indentation.
Changed simple style
To reflect the new restrictions (keys can’t span; in-lines can’t have
“, ” and “:
”; for top-level values, anything goes except
comments. Comments can no longer be interleaved in such scalars.
Make “?” into a space separator
No reason not to. It was always followed by a space anyway.
Renamed scalar styles
Rename “plain” to “block” and
“simple” to “plain”. As a result of the
above it isn’t that simple any more :-) and
having both “plain” and “simple” was
confusing. Reverting to “block” more accurately reflects
the semantics, freeing “plain” for the unquoted style.
Renamed productions
To make the document more printer-friendly. The main change was to
rename nested- to
nest- and
inline- to
line-. The latter is particularly painful,
but there was no other good choice.
Float round-tripping
Added a warning that float values may not round-trip exactly due to the use of native float data types.
Provide mailing list URL
So that its address would be clear even from the printed version.
Line Processing section
Was renamed “Space processing”.
Document productions
Were cleaned up and now allow comments before the header line.
Printing
Many tweaking to ensure printing doesn’t clip the comments of the productions tables.
Float format
Extended to allow for infinity and not a number.
Block indicators
Changed to |, > and
\ (without the leading
|).
Information model
Whole section was re-worked to accommodate the latest decisions in the mailing list. Leaf and branch are now renamed to collection and scalar; keyed and series are renamed to mapping and sequence.
Multi-line in-lines
Are now allowed, including multi-line in-line scalars.
Limit implicit strings
To word and space characters only.
Unreserve # % ^
No need to reserve them - they can’t be used as first characters in a simple value anyway because there’s no implicit type using them. Unreserving them allows them to be used in implicit types in the future.
Modify folding
To preserve line breaks around more-indented and specific empty lines.
Spell checking
Run the whole document throught Word for spell checking.
Model chart
Changed to use style sheet in a more dependable way, and now reflects the fact the native and generic models “are one” in some respects.
Wording for String
Changed to reflect the fact
“-” isn’t a word character.
Chomp control
Was added (“-” for chomp,
“+” for keep).
Escaped nested style
Was removed. It is redundant due to “..” spanning lines.
Explicit Syntax
Explicit notation is now required for empty scalars. This reverts the change in the previous draft.
No alternate style for collections.
Map and seq can now only be written in keyed and series style.
Comments
Are now allowed to trail lines except for inside multi line text. Comments (explicit and implicit) may trail multi-line text values as long as they are less indented.
URI Scheme
Is now correctly limited to word characters.
Trailing comma
Is no longer allowed in in-line collections.
Simple char
Production 170 fixed to prohibit in-line indicators after a space indicator, or two space indicators followed by a space.
Base64 Format
Was listed as “canonical”. Fixed to “explicit”.
Time intro examples
Used the wrong format
Production fixes
Correct minor problems (missing “(n)”, improper handling
of spaces in in-line collections, missing
“|”, etc.)
Integer and Float formats
Now ignores any “,” inside the
value. This required making “,”
into a space separator.
Boolean
Is now a type of its own.
Ptr
Was removed. Will probably be a Perl-specific type.
Leaf styles
Were simplified to only 6 (3 nested and 3 in-line).
Document Separator
Is now restricted to “---”. BOM
is allowed if encoding does not change. Production 49 has been
corrected accordingly, and the reference to the obsolete
non-alias_node was corrected.
In-series keyed
This special case (production 93) has been restricted to in-line leaf keys without any properties. This minimizes lookahead in parsers.
Inline leaf
Now allows an empty unquoted value (production 163).
Anchors for aliases
Are now forbidden.
Timestamp format
Fixed a problem with the regexp. Allow both a valid ISO8601 format
(using “T”) and a space separated
format (for readability).
Productions
Break node productions to smaller bits.
YAC 22
The #TAB directive is now required to allow
tabs in indentation.
Corrected examples
Floating point numbers in all examples now start with
“0”.
Wording improvements
In various places.
Empty line indentation
Is now be indent(<=n) instead of the
previous indent(n)?
YAC 20
The yaml: scheme has been replaced by an http: one. Mapping of XML
namespaces is now done using “$”
instead of “;”. The prefix
indicator is now “^” and the
format indicator is now “|”.
Transfer Method
Is now split to type family and format; type family is now a URI.
Date/Time Types
Were added.
Tree Model
Now includes the format.
Wording changes
Throughout the whole spec.
Format in Transfer Method
Is now separated by a “`”.
YAC List
Removed the Probable changes section; it is now replaced by the YAC list.
YAC 3, YAC 5
Added a uniform URI based namespace scheme.
YAC 6
Unrecognized explicitly typed implicit (simple) leafs are allowed.
YAC 7
C1 Control codes are explicitly forbidden.
YAC 10
In-series in-line syntax was modified according to the flexible indentation scheme (YAC needs to be updated).
YAC 11
Rename implicit leaves to simple leaves.
YAC 12
Throwaways are now allowed everywhere. Blank lines are comments. There are ambiguity problems in chomped leaf values.
YAC 13
Indentation is now generic (flexible).
YAC 14
Nested leaf format is now built out of three orthogonal properties.
YAC 15
Top level nodes can be in-line.
YAC 16
A “--- #YAML:1.0” is assumed if
there’s no header.
YAC 18
YAML Ain’t Markup Language
Quoted Strings
Are now implemented as a text implicit transfer format. This changed slightly the definition of an escaped leaf so that the two would be equivalent.
Relative DNS type familys
Are now supported using the simplest form only. the definition of an escaped leaf so that the two would be equivalent.
Syntax/Grammar/Formatting
Were fixed according to Brian’s inputs.
Block
Is now chomped using “||” rather
than “|-” for consistency.
Document Header
Can now be anything starting with
“--”; therefore is required
before the first document in a multi-document stream.
Transfer Methods
Can now accept any printable characters, not just words.
“!” now means “force
implicit typing”.
New Scalar Styles
We now have the following: | || \ \\ ' "
implicit.
Treat sequence/map as Collection Styles
In productions and in information model.
Add Structured Keys
Using a key indicator (done - Oren).
Add Examples
Added a detailed examples section to the introduction to better acquaint the user so that the spec can proceed with some basic knowledge.
In-line maps/sequences
Are now supported. Empty maps/sequences are a natural special case.
Minor Changes
Moved list of changes down... it was cluttering the top of the spec.
Information model and Preview
Completely new rewrite.
Polish
Minor wording fixes, added internal links, etc.
List
Was renamed to “sequence”.
Separator
Was changed to “---” instead of “----”.
Indentation
Was changed to one space instead of one tab.
Base64
Is no longer an implicit type. The surrounding
“[=...=]” are kept,
however, in case we change our mind later (e.g., if we introduce
pipelining). The type was renamed to “binary” to stress
its class rather than the encoding used.
Float
Was renamed to “real” to decouple it from specific in-memory representation. Mathematicians may object :-)
Length
Was removed from the sequence map.
Type vs. Class
Added some wording to clarify the difference. Most likely this will need to be changed once we settle the pipelining issue.
Productions
Were completely overhauled, again, to accomodate the new semantics.
Next Line Scalars
Now have two separate indicators, one for quoted and one for unquoted values.
Duplicate keys
Are now an error. The parser may ignore the second occurrence with a warning.
Indentation
Has been changed to use tabs instead of spaces.
Throwaway comments
Were added. The persistent comment key was changed to
“//”.
Indicators
Were changed. “-” now signifies a
list entry and “\” signifies a
next-line leaf value. “@” and
“%” are no longer necessary (they
may be if we ever support map/list keys). As a result no lookahead is
ever required.
Multiple documents
Are now possible in a single file (again), using
“----” as a separator.
Wording
Has been changed in numerous locations, hopefully to make it clearer. There was also some shuffling of the text sections to remove redundancy.
Productions
Were thoroughly overhauled and therefore undoubtedly contain new bugs. Also, all the shorthand production names were replaced by long ones to improve readability.
Indicator keys
Are no longer allowed. Structure keys are used instead, where some have only an in-memory representation.
Map/List keys
Are no longer allowed. This may have to be revisited when Perl 6 comes out.
Deep References
Are still supported but as an explicit type rather than as a hack.
Types List
Has been shrunk to only the common types, with a reference to yaml.org for a fuller list of types. The three core types were added as required types.
Type vs. Kind
This distinction was inserted explicitly into the text, with several examples to drive the point home.
Character Set
Is now defined as simply printable Unicode characters without explicit ranges. This makes the spec resistant to the evolution of the Unicode spec.
Reserved Indicators
The set of such indicators has been minimized. There is now a conflict between reserving them for future use and allowing people to use them as markers for implicit leaf types.
Simple Scalar
Has been renamed to unquoted leaf.
Generic Model
Has been generalized to allow for types nodes.
Implicit Typing
Has been added with an assortment of suggested types.
General Keys
Keys can now be any nodes to allow for Java serialization.
Multi-level references
Are now supported for Perl serialization.
Simple Scalar and End Of Lines
Moved eol productions to the end, rather than the start, of most productions. The wording and productions for the simple leaf were fixed to match each other and the intended semantics. The simple leaf example set was enhanced to clarify the proper interpretation.
Empty Document
Both empty top level maps and no top level maps are now allowed, and hence so are empty documents.
Thanks to Joe Lapp for reviewing the 22 Jul 2001 draft and recommending these changes.
Phrasing fixes
Fixed phrasing in the abstract, and sections 1.3, 2.1, 2.3.1, 2.4.3, 2.4.4, 2.4.5, 2.4.6 and 2.5.3.
Production fixes
Fixed productions: added production 47, 59, fixed productions 57, 58, 60 and 64 (productions numbers in the 22 Jul 2001 draft are off by one in some cases). Most are bug fixes. Actual changes include allowing for empty lines surrounding a top level map, allowing an optional trailing separator line, and forbidding annotations which have no sensible semantics (anchor to null, anchor to a reference, shorthand for a reference).
Merge Spec
Due to the decision to leave all API related issues outside the core spec, the spec has been re-merged into a single file, covering just what used to be the introduction and serialization sections of the previous specs.
Character Encodings
The spec now refers only to the Unicode standard. Due to the efforts by the Unicode and ISO/IEC 10646 groups, both standards are in almost complete agreement. The additional features provided by the ISO/IEC standard are rarely used in practice, while Unicode is simpler and is more widely supported by existing languages and systems.
Strict Indentation
Indentation is now a strict 4 spaces per level. This allows for the new white space policy and the new block notation.
Shorthand Notation
The spec introduces a shorthand notation for attaching special keys to any node kind (converting it to a map if necessary). This will need more work.
Null Nodes
Null nodes have finally been added, after somehow eluding all previous versions.
Bullet Lists
Change the * optional prefix for leaf list
entries to a mandatory : and therefore remove the special name
“bulleted list entries”.
Simplify Keys
Multi-line simple keys are now out. The door is open for re-introducing them, however.
Change Whitespace Policy
White space folding has been replaced by line break folding. White space is now always significant, except for indentation and for separation of structure tokens.
Block Scalar Syntax
The syntax for block leaves has been replaced by a more elegant one.
Split Spec
The spec is now separated into several files. This allows different versions of the spec to share the same version of unchanged section, and make it easier to refer to a particular version of important pieces of the spec such as serialization and interfaces. All the HTML files use the same shared CSS file. Cross references between the separate parts of the spec are now relative, though references to older versions are absolute and refer to the main site.
Cyclical Graph
Change the wording on the information model to allow for graphs with cycles. The alternative is to define the anchor semantics in such a way that would preclude cycles.
Null Character Escape
The escape sequence \z was added to allow convenient escaping of the ASCII zero (null) character.
Remove Binary Scalars
The information model now contains just one type of leaf. The special syntax for binary leaves has been removed. This functionality will be re-added in the form of a color.
Remove Class Shorthand
The syntax no longer supports the !class syntax. This functionality will be re-added in the form of a color.
Bullet Lists
Change the optional prefix for leaf list entries to
* and rename such entries to “bulleted
list entries”.
Make Keys More Scalars-Compatible
Allow for multi-line simple keys and unify the description of leaf keys and values where it makes sense.
HTML Tidying
All the HTML pages have gone through Tidy. Also, all the HTML files have been run through an HTML validation service and a CSS validation service. Broken links and spelling were checked using another online HTML validator. This needs to be repeated for all future drafts.
Relationship with MIME
Beyond using base64 for binary leaves, no additional special relationship with MIME is expected. Hence references to the MIME and mail RFCs were moved from section 1.1 (“required reading”) to section 1.2 (“background material”).
Strict Indentation
Indentation is now completely strict for all leaf styles. Also, the productions were changes to use a consistent semantics to the indentation level parameter.
List Scalar Prefixes
A list leaf entry may be prefixed by an optional : indicator to improve readability of multi-line simple leaf values.
Anchor Semantics
Leading zeros are now ignored for comparing anchor strings.
No Empty Line At Start
The document production was fixed so as not to require an empty line at the start of a document.
Character Escapes
The set of character escapes is now maximal (including the rare \e escape for the useful ASCII ESC character). Also, it is now possible to “escape” a line break in a quoted string (the previous drafts were inconsistent at this point).
32 Bit Characters
The current draft allows such characters, and includes a specialized
escaping format (“\Uxxxxxxxx”) to
support them.
Changes Section
The changes section was added for easier comparison of different versions. The final draft will not contain this section.
Class Indicator
The indicator was changed from # to
! to allow for # to be
used for comments.
No Empty Line At End
The document production was fixed so as not to require an empty line at the end of a document.
Strict Indentation
Indentation in quoted strings and binary blocks is now strict to ensure readability.
Productions
Problems in the productions were fixed, especially where related to white space issues and formatting of the result.
BOM Comment
The link to the Unicode FAQ was moved to section 2.2.2.
Binary Scalars
The information model now distinguishes between text and binary leaves.