About Parsing Nodes with Namespaces in XML

If you use defusedxml (or lxml) to parse RSS or other XML documents, you need to be able to read values from namespaced nodes, for example <content:encoded>. You can do that by passing a dictionary with your namespaces to the find() or findall() methods, like this:

from defusedxml.ElementTree import fromstring

namespaces = {
    "content": "http://purl.org/rss/1.0/modules/content/",
    "dc": "http://purl.org/dc/elements/1.1/",
}

xml_doc = fromstring(xml_string)
for item in xml_doc.findall("channel/item"):
    print(item.find("content:encoded", namespaces).text)

XML namespaces are usually declared in the root node of XML document with xmlns prefix, for example:

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
    xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:wfw="http://wellformedweb.org/CommentAPI/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:atom="http://www.w3.org/2005/Atom"
    xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
    xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
>
<!-- ... --->
</rss>

Tips and Tricks Programming Python 3 defusedxml XML RSS

Also by me

Django Paddle Subscriptions app

For Django-based SaaS projects.

Django App for You

Django GDPR Cookie Consent app

For Django websites that use cookies.

Django App for You

Book for You

Django 3 Web Development Cookbook

Learn how to build practical web projects with Django 3.

SaaS for You

Online prioritizer "1st things 1st"

It's not for everyone, but it might be for you!