How can I parse a YAML file in Python?
Parse YAML File in Python – How to Guide
pythonyaml
Related Solutions
If your YAML file looks like this:
# tree format
treeroot:
branch1:
name: Node 1
branch1-1:
name: Node 1-1
branch2:
name: Node 2
branch2-1:
name: Node 2-1
And you've installed PyYAML
like this:
pip install PyYAML
And the Python code looks like this:
import yaml
with open('tree.yaml') as f:
# use safe_load instead load
dataMap = yaml.safe_load(f)
The variable dataMap
now contains a dictionary with the tree data. If you print dataMap
using PrettyPrint, you will get something like:
{
'treeroot': {
'branch1': {
'branch1-1': {
'name': 'Node 1-1'
},
'name': 'Node 1'
},
'branch2': {
'branch2-1': {
'name': 'Node 2-1'
},
'name': 'Node 2'
}
}
}
So, now we have seen how to get data into our Python program. Saving data is just as easy:
with open('newtree.yaml', "w") as f:
yaml.dump(dataMap, f)
You have a dictionary, and now you have to convert it to a Python object:
class Struct:
def __init__(self, **entries):
self.__dict__.update(entries)
Then you can use:
>>> args = your YAML dictionary
>>> s = Struct(**args)
>>> s
<__main__.Struct instance at 0x01D6A738>
>>> s...
and follow "Convert Python dict to object".
For more information you can look at pyyaml.org and this.
Since PyYAML's yaml.load()
function parses YAML documents to native Python data structures, you can just access items by key or index. Using the example from the question you linked:
import yaml
with open('tree.yaml', 'r') as f:
doc = yaml.load(f)
To access branch1 text
you would use:
txt = doc["treeroot"]["branch1"]
print txt
"branch1 text"
because, in your YAML document, the value of the branch1
key is under the treeroot
key.
Best Answer
The easiest and purest method without relying on C headers is PyYaml (documentation), which can be installed via
pip install pyyaml
:And that's it. A plain
yaml.load()
function also exists, butyaml.safe_load()
should always be preferred to avoid introducing the possibility for arbitrary code execution. So unless you explicitly need the arbitrary object serialization/deserialization usesafe_load
.Note the PyYaml project supports versions up through the YAML 1.1 specification. If YAML 1.2 specification support is needed, see ruamel.yaml as noted in this answer.
Also, you could also use a drop in replacement for pyyaml, that keeps your yaml file ordered the same way you had it, called oyaml. View synk of oyaml here