Welcome to YAML (tm) -- WORKING DRAFT 0.19a

This is the first serious working draft of YAML.

Key Concepts

YAML is founded on several key concepts from very successful languages.

  • YAML uses similar type structure as Perl. In YAML, there there are three fundamental structures: scalars ($), maps (%), and lists (@). YAML also supports references to enable the serilization of graphs. Furthermore, each data value can be associated with a class name to allow the use of specific data types.
  • YAML uses block scoping similar to Python. In YAML, the extent of a node is indicated by its child's nesting level, i.e., what column it is in. Block indenting provides for easy inspection of the document's structure which helps to identify scope errors.
  • YAML uses similar whitespace handling as HTML. In YAML, sequences of spaces, tabs, and carriage return characters are folded into a single space during parse. This wonderful technique makes markup code readable by enabling indentation and word-wrapping without affecting the canonical form of the content.
  • YAML uses similar slash style escape sequences as C. In YAML, \n is used to represent a new line, \t is used to represent a tab, and \\ is used to represent the slash. In addition, since whitespace is folded, YAML uses bash style "\ " to escape additional spaces that are part of the content and should not be folded. Lastly, the trailing \ is used as a continuation marker, allowing content to be broken into multiple lines without introducing unwanted whitespace.
  • YAML allows for a rfc822 compatible header area for comments, specific processing instructions, and encoding declarations. This provides a flexible and forward looking method to augment the YAML parser with other features such as a validator similar to TREX or RELAX. Furthermore, this will allow a mail processing system to directly use YAML as its input parser.
  • YAML supports binary and formatted text entities with MIME multi-part attachments. Each attachment is given an reference identifier which can be associated with a location in hierarchical YAML content. This allows leaf values which would distrupt the in-line structural flow to be handled out of band in a seperate block mechanism.
  • YAML has a SAX like sequential "C" API. This C library can be used to easily construct native-language representations of a YAML serilization. The API also show cases a clever substitutability technique which allows schema changes to occur at the leaf nodes in a backwards compatible manner without breaking older code. This brings resiliance to older code, while allowing the structure of your data to grow over time.
Example: Basic

Below is an example of an invoice expressed via YAML. Each value's type indicated by either percent (map), or an at (list) sign, or an optional dollar sign (scalar). The content for each value follows the indicator either on the same line for scalars or on subsequent indented lines. The content for a map, which is also the starting production, is a list of key value paris. Each key and value are seperated by a colon.

buyer    : %
    address     : %
       city       : Royal Oak
       line one   : 458 Wittigen's Way
       line two   : Suite #292
       postal     : 48046
       state      : MI
    family name : Dumars
    given name  : Chris
date     : 12-JAN-2001
comments :
    Mr. Dumars is frequently gone in the morning
    so it is best advised to try things in late
    afternoon. \nIf Joe isn't around, try his house\
    keeper, Nancy Billsmer @ (734) 338-4338.\n
delivery : %
    method : UZS Express Overnight
    price  : 45.50
invoice : 00034843
product : @
    %
        desc      : Grade A, Leather Hide Basketball
        id        : BL394D
        price     : 450.00
        quantity  : 4
    %
        desc      : Super Hoop (tm)
        id        : BL4438H
        price     : 2,392.00
        quantity  : 1
tax      : 0.00
total    : 4237.50

Since "product" is a list, it only has values and thus is missing the key and colon. Also notice that the "comments" scalar is on multiple lines. Since whitespace is folded, the carriage return (\n) is escaped and the line ending \ is required to keep housekeeper as a single word. By default, the serilizer will sort map keys, although this isn't a requirement of the serilization structure.

Example: References and Class Names

Below is an example of a YAML document which demonstrates the use of references and classes. Immediately after an indicator a class name can occur and then within parenthesis an optional reference handle. If the indicator is a "*", then no further content is allowed, as this indicator signifies a reference to another value. The class name may be used as a specific language specific binding to a particular object or type appropriate class, otherwise it can be considered a comment. The production for allowable names and a namespace mechanism have yet to be worked out.

buyer : %person
    comments :
        This is a person object accessable
        through the "buyer" key from the
        top level map.
    family name : Dumars
    given name  : Chris
inline : $(0001)
    This is a folded text entity
    that is associated with a
    reference so that it can be
    re-used later on.
seller : %person(0002)
    comments:
        This is another person object, only
        that it is given a handle of 0001 as 
        well as a class so that it can be 
        refered to later.  Handles must be
        numeric, and classes cannot start
        with a number.
    family name : Sellers
    given name  : Peter
zzz :
   comments:
       The first two items in this map are references
       The first is to the person object "Peter Sellers".
       The second is to the inline text object "This is..."
       The price scalar below is given a class "price".
   peter : *(0002)
   price : $currency 
       23.34
   text  : *(0001)
Example: Block References and Attachments

Below is an example of a YAML document which includes the optional rfc822 style header, specifically a rfc2046 multipart header. A YAML Parser must handle these headers to allow for application specific processing instructions, and MIME for raw/binary references.

Date: Sun, 13 May 2001 23:48:04 -0400
MIME-Version: 1.0
Content-Type: multipart/related; 
    boundary="================================"
X-YAML-Version: 1.0

--================================
Content-Type: text/plain; id="0001"

  XX XXX    XXXXX   XX   XX
   XXX XX       X   XX X XX
   XX      XXXXXX   XX X XX
   XX      X   XX   XXXXXXX
  XXXX     XXXXX X   XX XX

--================================
Content-Type: image/gif; id="0002"
Content-Transfer-Encoding: base64

R0lGODlhGQAPAOMAAAICBDaanAJSVAISFP7+/GbOzAJmZAIeHGbMzGbMzGbMzGbMzGbMzGbMzGbM
zGbMzCH+Dk1hZGUgd2l0aCBHSU1QACH5BAEKAAYALAAAAAAZAA8AQAR70EgZArlBWHw7Nts1gB6R
GV0wCBMlkp4qlHJppkNoyW1r5SmcTeV6wUwrFI4VEulSMyRLchhYrYLq4MDKYrm9XuFQuIzLhALA
pm6VV+g44FBSHybokQGdnivNfhJ8enwFSR12eB4jcWZ3gHeCJQJycXSJEzaIc5SIWz0RADs=

--================================
Content-Type: text/x-yaml; id="0003"

an inline : $(0004)
    This is a folded text entity
    that is associated with a
    reference.
content :
   comment:
       The cyclic item is a reference
       to the top-level map.
   cyclic : *(0003)
   image  : *(0002)
   inline : *(0004)
   raw    : *(0001)
   title  : This contains multiple references

Information Model

A map/list/scalar data structures found in modern programming languages such as ML, Python, Perl, and C. This model should also be very compatible with relational database tables. Note: This model lacks classes and references which are still under consideration.

Document The the starting production for YAML is a Map.
Map An un-ordered sequence of zero or more (Key,Value) tuples. Where they Key is unique within the sequence and matches the Key production.
Value Exactly one of Scalar, Map, or List
List An ordered sequence of zero or more Values.
Scalar Any type directly serilizable through or able to be constructed from a sequence of zero or more characters. These characters must match the Char production.

Default This is a synthesized attribute of every Value. If the Value is a Scalar, then the Default property refers to the Value itself. If this Value is a List, then the Default refers to the Default property of the first Value in its sequence. If the Value is a Map, then Default refers to the Default property of the Value in its Pair entry lacking a Key. By using Default, a Scalar Value can be substituted with a Map or a List Value without braking older code.

Take careful note that the information model does not admit a "parent" property of each value. Quite the contrary, YAML may be a graph structure and is not necessarly a tree.

Mapping To Popular Environments

For Python, the internal representation has a top-level object is a Dictionary, and from there, depending upon each value's indicator, can either be a List, Dictionary or a String. It is possible for a schema mechansim to be included which affords for more specific decoding into classes and types. The default attribute is implemented through a stand-alone function.

For Perl, the internal represenation starts with a top-level hash. And from there, depending upon the indicators can either be a list, hash, or string scalar. Of course, it is also possible for a schema mechanism to be included which affords for more specific decoding. The default attribute is implemeted through a stand-alone function.

Haven't done Java or Javascript since '98, but I remember Strings, Maps and Lists being Objects. So there shouldn't be any problem in Java. Javascript is probably in the same boat but I can't veryify since that book has mysteriously dissapeared as well.

For ML, C, and C++ all of which lack a built-in, variable type Map and List structure require a specific schema to build an internal representation. For these languages, a YamlValue type could be created with sub-types of Scalar, List, and Map. For C++, STL could make the implementation very quick, especially with iterator support. An alternative approach would be a class builder... but this, of course, requires a bit more smarts and a schema system.

Mapping to a relational database will also require some sort of schema to indicate how to pack/unpack. However, given that a tuple (record) is easlily associated with a Map, and a relation (table) is easily associated with a List, there should not be that much difficulty. Mapping NULL values will be represented by a lack of a particular map entry.

Serilization Format / BNF

This section contains the BNF productions for the YAML syntax. Much to do...

Parser Behavior

This section describes how a parser should parse YAML. Much to do...

Emitter Behavior / Canonical Form

This section describes how an emitter should write YAML into canonical form. Includes specific word-wrapping algorithem. Minimal content length of 20 chararacters, and does it's best to word-wrap by 76 columns.

Implementations

To do... an implementation in C, C++/STL, Python, Java, and ...

Credits

This work is the result of long, thoughtful discussions on the SML-DEV mailing list. Specific contributors include... (to do)

Some thoughts
  1. This is very preliminary thoughts on the subject, feedback is very welcome.
  2. Implementations needed... Clark is happy to write the Python, C, and perhaps even a C++ implementation. Any takers?
  3. Was thinking hard about using # for a comment indicator, or perhaps as a numeric indicator. Benfits? In any case, the BNF should leave all of these special characters open to future versions.
FAQ
  1. Don't the indicator characters need to be escaped in the content? Answer: No.
Specific Productions
Char  ::=  #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] Any unicode character, excluding surrogate blocks, FFFE, and FFFF. Where unicode is defined by ISO/IEC 10646-2000
Characters  ::=  Char* Zero or more characters.
WhiteChar  ::=  #x20 | #x9 | #xD | #xA A space, tab, new line or carriage return, escaped by \s, \t, \n, and \r respectively.
Whitespace  ::=  WhiteChar+ Any sequence of spaces, tabs, new lines or carriage returns.
Indicator  ::=  '$' | '%' | '@' | '*' The dollar sign indicates a scalar, a percent sign indicates a map, an at sign indicates a list, and a star represents a reference.
Reserved  ::=  WhiteChar | Indicator | [#x21-#x2F] | '/' | [#x3B-#x40] | [#x5B-#x5E] | #x60 | [#x7B-#x7F] Printable, non-alpha, non-numeric ASCII characters excluding the period, colon, underscore, and dash.
Key  ::=  (Char - Reserved)* One or more non-reserved characters.