Welcome to Day 79 of the #100DaysOfDevOps Challenge! Today we will see about the Advanced YAML Syntax
Advanced YAML Syntax
Documents
The below YAML snippet that we saw is called a document. A single YAML file can have more than one document. Each document can be interpreted as a separate YAML file which means multiple documents can contain the same/duplicate keys which are not allowed in the same document.
The beginning of a document is denoted by three hyphens —.
A YAML file with multiple documents would look like this, where each new document is indicated by ---
.
---
# document 1
codename: YAML
name: YAML ain't markup language
release: 2001
---
# document 2
uses:
- configuration language
- data persistence
- internet messaging
- cross-language data sharing
---
# document 3
company: spacelift
domain:
- devops
- devsecops
tutorial:
- name: yaml
- type: awesome
- rank: 1
- born: 2001
author: omkarbirade
published: true
...
Finally, triple dots are used to end a document without starting a new one ...
Before we learn more about YAML, this is a good time to practice writing your own YAML file. They can be validated here.
Now that we have seen an online YAML parser in action, it’s time we learn about schemas and tags.
Schemas and Tags
Let’s take a moment to consider how YAML will interpret the given document. Is the sequence’s first literal a string or a boolean ?
literals:
- true
- random
You are correct if you answer that the first item on the list is a boolean, and you are also correct if you answer that it is a string. The way it is resolved is determined by the YAML schema that the parser has implemented. But what exactly are schemas?
Schemas can be thought of as the way a parser resolves or understands nodes (values) present in a YAML file. There are primarily 3 default schemas:
FailSafe Schema: It only understands maps, sequences and strings and is guaranteed to work for any YAML file.
JSON schema: It understands all types supported within JSON including boolean, null, int and float as well as the ones in the FailSafe schema.
Core schema: It is an extension of the JSON schema, making it more human-readable supporting the same types but in multiple forms.
For e.g: 1. null | Null | NULL will all be resolved to the same type null and true | True | TRUE will all be resolved to the same boolean value.
Note: It is also possible to create your own custom schemas based on the above default schema.
So coming back to the original question, if the parser supports only the basic schema (FailSafe Schema), the first item will be evaluated as a string. Otherwise, it will be evaluated as a boolean.
This leads to the next question: What if we explicitly want a value to be parsed in a specific way?
Let’s say from the same example that we want the first true value to be parsed as a string instead of a boolean, even when the parser uses the JSON or the core schema.
This is where tags come into the picture. Tags can be thought of as types in YAML.
Even though we explicitly didn’t mention the tags/types in any of the YAML snippets we saw so far, they are inferred automatically by the YAML parser. For instance, the maps have the tag/type as tag:yaml.org,2002:map, sequences are tag:yaml.org,2002:seq and strings are tag:yaml.org,2002:str
The below snippet works perfectly fine, even when we specify the tags. It can be validated here.
---
# A sample yaml file
company: !!str spacelift
domain:
- !!str devops
- !!str devsecops
tutorial:
- name: !!str yaml
- type: !!str awesome
- rank: !!int 1
- born: !!int 2001
author: !!str omkarbirade
published: !!bool true
We can use these tags to explicitly specify a type. For our example, all we have to do is specify the type as a string, and the YAML parser will parse it as a string.
scalars:
- !!str true
- random
Anchors and Alias
With a lot of configuration, configuration files can become quite large.
In YAML files, anchors (&) and aliases (*) are used to avoid duplication. When writing large configurations in YAML, it is common for a specific configuration to be repeated. For example, the vars config is repeated for all three services in the following YAML snippet.
---
vars:
service1:
config:
env: prod
retries: 3
version: 4.8
service2:
config:
env: prod
retries: 3
version: 4.8
service3:
config:
env: prod
retries: 3
version: 4.8
...
As more and more things are repeated for large configuration files, this becomes tedious.
Anchors and aliases allow us to rewrite the same snippet without having to repeat any configuration.
Anchors (&) are used to define a chunk of configuration, and aliases are used to refer to that chunk at a different part of the configuration.
---
vars:
service1:
config: &service_config
env: prod
retries: 3
version: 4.8
service2:
config: *service_config
service3:
config: *service_config
...
Anchors and aliases here helped us cut down the repeated configuration.
But practically, configurations won’t be completely identical they would vary here and there. For instance, what if all the above services are running on different versions? Does this mean we have re-write and repeat the whole config again?
This is where overrides (<<:) come to the rescue. We can still use aliases and make the changes that we need.
---
vars:
service1:
config: &service_config
env: prod
retries: 3
version: 4.8
service2:
config:
<<: *service_config
version: 5
service3:
config:
<<: *service_config
version: 4.2
...
YAML files treat : , { , } , [ , ] , , , & , * , # , ? , | , -- , < , > , = , ! , % , @ , \
, etc, as special characters. But what if these special characters are actually a part of the data/value? How do we escape them?
Special characters can be escaped in various different ways:
Entity Escapes
space:
 
colon:
:
ampersand:
&
Unicode Escapes
space:
"\u0020"
single-quote:
"\u0027"
double quote:
"\u0022"
Quoted Escapes
Double quote in a single quote: ‘YAML is the “best” configuration language’
Single quote in a double quote: “ Yes, the ‘best’ “