About Parsing Nodes with Namespaces in XML

If you use defusedxml (or lxml) to parse RSS or other XML documents, you need to be able to read values from namespaced nodes, for example <content:encoded>. You can do that by passing a dictionary with your namespaces to the find() or findall() methods, like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from defusedxml.ElementTree import fromstring

namespaces = {
    "content": "http://purl.org/rss/1.0/modules/content/",
    "dc": "http://purl.org/dc/elements/1.1/",
}

xml_doc = fromstring(xml_string)
for item in xml_doc.findall("channel/item"):
    print(item.find("content:encoded", namespaces).text)

XML namespaces are usually declared in the root node of XML document with xmlns prefix, for example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
    xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:wfw="http://wellformedweb.org/CommentAPI/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:atom="http://www.w3.org/2005/Atom"
    xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
    xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
>
<!-- ... --->
</rss>

Tips and Tricks Programming Python 3 defusedxml XML RSS