937 lines
31 KiB
Text
937 lines
31 KiB
Text
Metadata-Version: 2.1
|
|
Name: defusedxml
|
|
Version: 0.6.0
|
|
Summary: XML bomb protection for Python stdlib modules
|
|
Home-page: https://github.com/tiran/defusedxml
|
|
Author: Christian Heimes
|
|
Author-email: christian@python.org
|
|
Maintainer: Christian Heimes
|
|
Maintainer-email: christian@python.org
|
|
License: PSFL
|
|
Download-URL: https://pypi.python.org/pypi/defusedxml
|
|
Keywords: xml bomb DoS
|
|
Platform: all
|
|
Classifier: Development Status :: 5 - Production/Stable
|
|
Classifier: Intended Audience :: Developers
|
|
Classifier: License :: OSI Approved :: Python Software Foundation License
|
|
Classifier: Natural Language :: English
|
|
Classifier: Programming Language :: Python
|
|
Classifier: Programming Language :: Python :: 2
|
|
Classifier: Programming Language :: Python :: 2.7
|
|
Classifier: Programming Language :: Python :: 3
|
|
Classifier: Programming Language :: Python :: 3.5
|
|
Classifier: Programming Language :: Python :: 3.6
|
|
Classifier: Programming Language :: Python :: 3.7
|
|
Classifier: Programming Language :: Python :: 3.8
|
|
Classifier: Topic :: Text Processing :: Markup :: XML
|
|
Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*
|
|
|
|
===================================================
|
|
defusedxml -- defusing XML bombs and other exploits
|
|
===================================================
|
|
|
|
.. image:: https://img.shields.io/pypi/v/defusedxml.svg
|
|
:target: https://pypi.org/project/defusedxml/
|
|
:alt: Latest Version
|
|
|
|
.. image:: https://img.shields.io/pypi/pyversions/defusedxml.svg
|
|
:target: https://pypi.org/project/defusedxml/
|
|
:alt: Supported Python versions
|
|
|
|
.. image:: https://travis-ci.org/tiran/defusedxml.svg?branch=master
|
|
:target: https://travis-ci.org/tiran/defusedxml
|
|
:alt: Travis CI
|
|
|
|
.. image:: https://codecov.io/github/tiran/defusedxml/coverage.svg?branch=master
|
|
:target: https://codecov.io/github/tiran/defusedxml?branch=master
|
|
:alt: codecov
|
|
|
|
.. image:: https://img.shields.io/pypi/dm/defusedxml.svg
|
|
:target: https://pypistats.org/packages/defusedxml
|
|
:alt: PyPI downloads
|
|
|
|
.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
|
|
:target: https://github.com/ambv/black
|
|
:alt: Code style: black
|
|
|
|
..
|
|
|
|
"It's just XML, what could probably go wrong?"
|
|
|
|
Christian Heimes <christian@python.org>
|
|
|
|
Synopsis
|
|
========
|
|
|
|
The results of an attack on a vulnerable XML library can be fairly dramatic.
|
|
With just a few hundred **Bytes** of XML data an attacker can occupy several
|
|
**Gigabytes** of memory within **seconds**. An attacker can also keep
|
|
CPUs busy for a long time with a small to medium size request. Under some
|
|
circumstances it is even possible to access local files on your
|
|
server, to circumvent a firewall, or to abuse services to rebound attacks to
|
|
third parties.
|
|
|
|
The attacks use and abuse less common features of XML and its parsers. The
|
|
majority of developers are unacquainted with features such as processing
|
|
instructions and entity expansions that XML inherited from SGML. At best
|
|
they know about ``<!DOCTYPE>`` from experience with HTML but they are not
|
|
aware that a document type definition (DTD) can generate an HTTP request
|
|
or load a file from the file system.
|
|
|
|
None of the issues is new. They have been known for a long time. Billion
|
|
laughs was first reported in 2003. Nevertheless some XML libraries and
|
|
applications are still vulnerable and even heavy users of XML are
|
|
surprised by these features. It's hard to say whom to blame for the
|
|
situation. It's too short sighted to shift all blame on XML parsers and
|
|
XML libraries for using insecure default settings. After all they
|
|
properly implement XML specifications. Application developers must not rely
|
|
that a library is always configured for security and potential harmful data
|
|
by default.
|
|
|
|
|
|
.. contents:: Table of Contents
|
|
:depth: 2
|
|
|
|
|
|
Attack vectors
|
|
==============
|
|
|
|
billion laughs / exponential entity expansion
|
|
---------------------------------------------
|
|
|
|
The `Billion Laughs`_ attack -- also known as exponential entity expansion --
|
|
uses multiple levels of nested entities. The original example uses 9 levels
|
|
of 10 expansions in each level to expand the string ``lol`` to a string of
|
|
3 * 10 :sup:`9` bytes, hence the name "billion laughs". The resulting string
|
|
occupies 3 GB (2.79 GiB) of memory; intermediate strings require additional
|
|
memory. Because most parsers don't cache the intermediate step for every
|
|
expansion it is repeated over and over again. It increases the CPU load even
|
|
more.
|
|
|
|
An XML document of just a few hundred bytes can disrupt all services on a
|
|
machine within seconds.
|
|
|
|
Example XML::
|
|
|
|
<!DOCTYPE xmlbomb [
|
|
<!ENTITY a "1234567890" >
|
|
<!ENTITY b "&a;&a;&a;&a;&a;&a;&a;&a;">
|
|
<!ENTITY c "&b;&b;&b;&b;&b;&b;&b;&b;">
|
|
<!ENTITY d "&c;&c;&c;&c;&c;&c;&c;&c;">
|
|
]>
|
|
<bomb>&d;</bomb>
|
|
|
|
|
|
quadratic blowup entity expansion
|
|
---------------------------------
|
|
|
|
A quadratic blowup attack is similar to a `Billion Laughs`_ attack; it abuses
|
|
entity expansion, too. Instead of nested entities it repeats one large entity
|
|
with a couple of thousand chars over and over again. The attack isn't as
|
|
efficient as the exponential case but it avoids triggering countermeasures of
|
|
parsers against heavily nested entities. Some parsers limit the depth and
|
|
breadth of a single entity but not the total amount of expanded text
|
|
throughout an entire XML document.
|
|
|
|
A medium-sized XML document with a couple of hundred kilobytes can require a
|
|
couple of hundred MB to several GB of memory. When the attack is combined
|
|
with some level of nested expansion an attacker is able to achieve a higher
|
|
ratio of success.
|
|
|
|
::
|
|
|
|
<!DOCTYPE bomb [
|
|
<!ENTITY a "xxxxxxx... a couple of ten thousand chars">
|
|
]>
|
|
<bomb>&a;&a;&a;... repeat</bomb>
|
|
|
|
|
|
external entity expansion (remote)
|
|
----------------------------------
|
|
|
|
Entity declarations can contain more than just text for replacement. They can
|
|
also point to external resources by public identifiers or system identifiers.
|
|
System identifiers are standard URIs. When the URI is a URL (e.g. a
|
|
``http://`` locator) some parsers download the resource from the remote
|
|
location and embed them into the XML document verbatim.
|
|
|
|
Simple example of a parsed external entity::
|
|
|
|
<!DOCTYPE external [
|
|
<!ENTITY ee SYSTEM "http://www.python.org/some.xml">
|
|
]>
|
|
<root>ⅇ</root>
|
|
|
|
The case of parsed external entities works only for valid XML content. The
|
|
XML standard also supports unparsed external entities with a
|
|
``NData declaration``.
|
|
|
|
External entity expansion opens the door to plenty of exploits. An attacker
|
|
can abuse a vulnerable XML library and application to rebound and forward
|
|
network requests with the IP address of the server. It highly depends
|
|
on the parser and the application what kind of exploit is possible. For
|
|
example:
|
|
|
|
* An attacker can circumvent firewalls and gain access to restricted
|
|
resources as all the requests are made from an internal and trustworthy
|
|
IP address, not from the outside.
|
|
* An attacker can abuse a service to attack, spy on or DoS your servers but
|
|
also third party services. The attack is disguised with the IP address of
|
|
the server and the attacker is able to utilize the high bandwidth of a big
|
|
machine.
|
|
* An attacker can exhaust additional resources on the machine, e.g. with
|
|
requests to a service that doesn't respond or responds with very large
|
|
files.
|
|
* An attacker may gain knowledge, when, how often and from which IP address
|
|
an XML document is accessed.
|
|
* An attacker could send mail from inside your network if the URL handler
|
|
supports ``smtp://`` URIs.
|
|
|
|
|
|
external entity expansion (local file)
|
|
--------------------------------------
|
|
|
|
External entities with references to local files are a sub-case of external
|
|
entity expansion. It's listed as an extra attack because it deserves extra
|
|
attention. Some XML libraries such as lxml disable network access by default
|
|
but still allow entity expansion with local file access by default. Local
|
|
files are either referenced with a ``file://`` URL or by a file path (either
|
|
relative or absolute).
|
|
|
|
An attacker may be able to access and download all files that can be read by
|
|
the application process. This may include critical configuration files, too.
|
|
|
|
::
|
|
|
|
<!DOCTYPE external [
|
|
<!ENTITY ee SYSTEM "file:///PATH/TO/simple.xml">
|
|
]>
|
|
<root>ⅇ</root>
|
|
|
|
|
|
DTD retrieval
|
|
-------------
|
|
|
|
This case is similar to external entity expansion, too. Some XML libraries
|
|
like Python's xml.dom.pulldom retrieve document type definitions from remote
|
|
or local locations. Several attack scenarios from the external entity case
|
|
apply to this issue as well.
|
|
|
|
::
|
|
|
|
<?xml version="1.0" encoding="utf-8"?>
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
<html>
|
|
<head/>
|
|
<body>text</body>
|
|
</html>
|
|
|
|
|
|
Python XML Libraries
|
|
====================
|
|
|
|
.. csv-table:: vulnerabilities and features
|
|
:header: "kind", "sax", "etree", "minidom", "pulldom", "xmlrpc", "lxml", "genshi"
|
|
:widths: 24, 7, 8, 8, 7, 8, 8, 8
|
|
:stub-columns: 0
|
|
|
|
"billion laughs", "**True**", "**True**", "**True**", "**True**", "**True**", "False (1)", "False (5)"
|
|
"quadratic blowup", "**True**", "**True**", "**True**", "**True**", "**True**", "**True**", "False (5)"
|
|
"external entity expansion (remote)", "**True**", "False (3)", "False (4)", "**True**", "false", "False (1)", "False (5)"
|
|
"external entity expansion (local file)", "**True**", "False (3)", "False (4)", "**True**", "false", "**True**", "False (5)"
|
|
"DTD retrieval", "**True**", "False", "False", "**True**", "false", "False (1)", "False"
|
|
"gzip bomb", "False", "False", "False", "False", "**True**", "**partly** (2)", "False"
|
|
"xpath support (7)", "False", "False", "False", "False", "False", "**True**", "False"
|
|
"xsl(t) support (7)", "False", "False", "False", "False", "False", "**True**", "False"
|
|
"xinclude support (7)", "False", "**True** (6)", "False", "False", "False", "**True** (6)", "**True**"
|
|
"C library", "expat", "expat", "expat", "expat", "expat", "libxml2", "expat"
|
|
|
|
1. Lxml is protected against billion laughs attacks and doesn't do network
|
|
lookups by default.
|
|
2. libxml2 and lxml are not directly vulnerable to gzip decompression bombs
|
|
but they don't protect you against them either.
|
|
3. xml.etree doesn't expand entities and raises a ParserError when an entity
|
|
occurs.
|
|
4. minidom doesn't expand entities and simply returns the unexpanded entity
|
|
verbatim.
|
|
5. genshi.input of genshi 0.6 doesn't support entity expansion and raises a
|
|
ParserError when an entity occurs.
|
|
6. Library has (limited) XInclude support but requires an additional step to
|
|
process inclusion.
|
|
7. These are features but they may introduce exploitable holes, see
|
|
`Other things to consider`_
|
|
|
|
|
|
Settings in standard library
|
|
----------------------------
|
|
|
|
|
|
xml.sax.handler Features
|
|
........................
|
|
|
|
feature_external_ges (http://xml.org/sax/features/external-general-entities)
|
|
disables external entity expansion
|
|
|
|
feature_external_pes (http://xml.org/sax/features/external-parameter-entities)
|
|
the option is ignored and doesn't modify any functionality
|
|
|
|
DOM xml.dom.xmlbuilder.Options
|
|
..............................
|
|
|
|
external_parameter_entities
|
|
ignored
|
|
|
|
external_general_entities
|
|
ignored
|
|
|
|
external_dtd_subset
|
|
ignored
|
|
|
|
entities
|
|
unsure
|
|
|
|
|
|
defusedxml
|
|
==========
|
|
|
|
The `defusedxml package`_ (`defusedxml on PyPI`_)
|
|
contains several Python-only workarounds and fixes
|
|
for denial of service and other vulnerabilities in Python's XML libraries.
|
|
In order to benefit from the protection you just have to import and use the
|
|
listed functions / classes from the right defusedxml module instead of the
|
|
original module. Merely `defusedxml.xmlrpc`_ is implemented as monkey patch.
|
|
|
|
Instead of::
|
|
|
|
>>> from xml.etree.ElementTree import parse
|
|
>>> et = parse(xmlfile)
|
|
|
|
alter code to::
|
|
|
|
>>> from defusedxml.ElementTree import parse
|
|
>>> et = parse(xmlfile)
|
|
|
|
Additionally the package has an **untested** function to monkey patch
|
|
all stdlib modules with ``defusedxml.defuse_stdlib()``.
|
|
|
|
All functions and parser classes accept three additional keyword arguments.
|
|
They return either the same objects as the original functions or compatible
|
|
subclasses.
|
|
|
|
forbid_dtd (default: False)
|
|
disallow XML with a ``<!DOCTYPE>`` processing instruction and raise a
|
|
*DTDForbidden* exception when a DTD processing instruction is found.
|
|
|
|
forbid_entities (default: True)
|
|
disallow XML with ``<!ENTITY>`` declarations inside the DTD and raise an
|
|
*EntitiesForbidden* exception when an entity is declared.
|
|
|
|
forbid_external (default: True)
|
|
disallow any access to remote or local resources in external entities
|
|
or DTD and raising an *ExternalReferenceForbidden* exception when a DTD
|
|
or entity references an external resource.
|
|
|
|
|
|
defusedxml (package)
|
|
--------------------
|
|
|
|
DefusedXmlException, DTDForbidden, EntitiesForbidden,
|
|
ExternalReferenceForbidden, NotSupportedError
|
|
|
|
defuse_stdlib() (*experimental*)
|
|
|
|
|
|
defusedxml.cElementTree
|
|
-----------------------
|
|
|
|
parse(), iterparse(), fromstring(), XMLParser
|
|
|
|
|
|
defusedxml.ElementTree
|
|
-----------------------
|
|
|
|
parse(), iterparse(), fromstring(), XMLParser
|
|
|
|
|
|
defusedxml.expatreader
|
|
----------------------
|
|
|
|
create_parser(), DefusedExpatParser
|
|
|
|
|
|
defusedxml.sax
|
|
--------------
|
|
|
|
parse(), parseString(), make_parser()
|
|
|
|
|
|
defusedxml.expatbuilder
|
|
-----------------------
|
|
|
|
parse(), parseString(), DefusedExpatBuilder, DefusedExpatBuilderNS
|
|
|
|
|
|
defusedxml.minidom
|
|
------------------
|
|
|
|
parse(), parseString()
|
|
|
|
|
|
defusedxml.pulldom
|
|
------------------
|
|
|
|
parse(), parseString()
|
|
|
|
|
|
defusedxml.xmlrpc
|
|
-----------------
|
|
|
|
The fix is implemented as monkey patch for the stdlib's xmlrpc package (3.x)
|
|
or xmlrpclib module (2.x). The function `monkey_patch()` enables the fixes,
|
|
`unmonkey_patch()` removes the patch and puts the code in its former state.
|
|
|
|
The monkey patch protects against XML related attacks as well as
|
|
decompression bombs and excessively large requests or responses. The default
|
|
setting is 30 MB for requests, responses and gzip decompression. You can
|
|
modify the default by changing the module variable `MAX_DATA`. A value of
|
|
`-1` disables the limit.
|
|
|
|
|
|
defusedxml.lxml
|
|
---------------
|
|
|
|
**DEPRECATED** The module is deprecated and will be removed in a future
|
|
release.
|
|
|
|
The module acts as an *example* how you could protect code that uses
|
|
lxml.etree. It implements a custom Element class that filters out
|
|
Entity instances, a custom parser factory and a thread local storage for
|
|
parser instances. It also has a check_docinfo() function which inspects
|
|
a tree for internal or external DTDs and entity declarations. In order to
|
|
check for entities lxml > 3.0 is required.
|
|
|
|
parse(), fromstring()
|
|
RestrictedElement, GlobalParserTLS, getDefaultParser(), check_docinfo()
|
|
|
|
|
|
defusedexpat
|
|
============
|
|
|
|
The `defusedexpat package`_ (`defusedexpat on PyPI`_)
|
|
comes with binary extensions and a
|
|
`modified expat`_ library instead of the standard `expat parser`_. It's
|
|
basically a stand-alone version of the patches for Python's standard
|
|
library C extensions.
|
|
|
|
Modifications in expat
|
|
----------------------
|
|
|
|
new definitions::
|
|
|
|
XML_BOMB_PROTECTION
|
|
XML_DEFAULT_MAX_ENTITY_INDIRECTIONS
|
|
XML_DEFAULT_MAX_ENTITY_EXPANSIONS
|
|
XML_DEFAULT_RESET_DTD
|
|
|
|
new XML_FeatureEnum members::
|
|
|
|
XML_FEATURE_MAX_ENTITY_INDIRECTIONS
|
|
XML_FEATURE_MAX_ENTITY_EXPANSIONS
|
|
XML_FEATURE_IGNORE_DTD
|
|
|
|
new XML_Error members::
|
|
|
|
XML_ERROR_ENTITY_INDIRECTIONS
|
|
XML_ERROR_ENTITY_EXPANSION
|
|
|
|
new API functions::
|
|
|
|
int XML_GetFeature(XML_Parser parser,
|
|
enum XML_FeatureEnum feature,
|
|
long *value);
|
|
int XML_SetFeature(XML_Parser parser,
|
|
enum XML_FeatureEnum feature,
|
|
long value);
|
|
int XML_GetFeatureDefault(enum XML_FeatureEnum feature,
|
|
long *value);
|
|
int XML_SetFeatureDefault(enum XML_FeatureEnum feature,
|
|
long value);
|
|
|
|
XML_FEATURE_MAX_ENTITY_INDIRECTIONS
|
|
Limit the amount of indirections that are allowed to occur during the
|
|
expansion of a nested entity. A counter starts when an entity reference
|
|
is encountered. It resets after the entity is fully expanded. The limit
|
|
protects the parser against exponential entity expansion attacks (aka
|
|
billion laughs attack). When the limit is exceeded the parser stops and
|
|
fails with `XML_ERROR_ENTITY_INDIRECTIONS`.
|
|
A value of 0 disables the protection.
|
|
|
|
Supported range
|
|
0 .. UINT_MAX
|
|
Default
|
|
40
|
|
|
|
XML_FEATURE_MAX_ENTITY_EXPANSIONS
|
|
Limit the total length of all entity expansions throughout the entire
|
|
document. The lengths of all entities are accumulated in a parser variable.
|
|
The setting protects against quadratic blowup attacks (lots of expansions
|
|
of a large entity declaration). When the sum of all entities exceeds
|
|
the limit, the parser stops and fails with `XML_ERROR_ENTITY_EXPANSION`.
|
|
A value of 0 disables the protection.
|
|
|
|
Supported range
|
|
0 .. UINT_MAX
|
|
Default
|
|
8 MiB
|
|
|
|
XML_FEATURE_RESET_DTD
|
|
Reset all DTD information after the <!DOCTYPE> block has been parsed. When
|
|
the flag is set (default: false) all DTD information after the
|
|
endDoctypeDeclHandler has been called. The flag can be set inside the
|
|
endDoctypeDeclHandler. Without DTD information any entity reference in
|
|
the document body leads to `XML_ERROR_UNDEFINED_ENTITY`.
|
|
|
|
Supported range
|
|
0, 1
|
|
Default
|
|
0
|
|
|
|
|
|
How to avoid XML vulnerabilities
|
|
================================
|
|
|
|
Best practices
|
|
--------------
|
|
|
|
* Don't allow DTDs
|
|
* Don't expand entities
|
|
* Don't resolve externals
|
|
* Limit parse depth
|
|
* Limit total input size
|
|
* Limit parse time
|
|
* Favor a SAX or iterparse-like parser for potential large data
|
|
* Validate and properly quote arguments to XSL transformations and
|
|
XPath queries
|
|
* Don't use XPath expression from untrusted sources
|
|
* Don't apply XSL transformations that come untrusted sources
|
|
|
|
(based on Brad Hill's `Attacking XML Security`_)
|
|
|
|
|
|
Other things to consider
|
|
========================
|
|
|
|
XML, XML parsers and processing libraries have more features and possible
|
|
issue that could lead to DoS vulnerabilities or security exploits in
|
|
applications. I have compiled an incomplete list of theoretical issues that
|
|
need further research and more attention. The list is deliberately pessimistic
|
|
and a bit paranoid, too. It contains things that might go wrong under daffy
|
|
circumstances.
|
|
|
|
|
|
attribute blowup / hash collision attack
|
|
----------------------------------------
|
|
|
|
XML parsers may use an algorithm with quadratic runtime O(n :sup:`2`) to
|
|
handle attributes and namespaces. If it uses hash tables (dictionaries) to
|
|
store attributes and namespaces the implementation may be vulnerable to
|
|
hash collision attacks, thus reducing the performance to O(n :sup:`2`) again.
|
|
In either case an attacker is able to forge a denial of service attack with
|
|
an XML document that contains thousands upon thousands of attributes in
|
|
a single node.
|
|
|
|
I haven't researched yet if expat, pyexpat or libxml2 are vulnerable.
|
|
|
|
|
|
decompression bomb
|
|
------------------
|
|
|
|
The issue of decompression bombs (aka `ZIP bomb`_) apply to all XML libraries
|
|
that can parse compressed XML stream like gzipped HTTP streams or LZMA-ed
|
|
files. For an attacker it can reduce the amount of transmitted data by three
|
|
magnitudes or more. Gzip is able to compress 1 GiB zeros to roughly 1 MB,
|
|
lzma is even better::
|
|
|
|
$ dd if=/dev/zero bs=1M count=1024 | gzip > zeros.gz
|
|
$ dd if=/dev/zero bs=1M count=1024 | lzma -z > zeros.xy
|
|
$ ls -sh zeros.*
|
|
1020K zeros.gz
|
|
148K zeros.xy
|
|
|
|
None of Python's standard XML libraries decompress streams except for
|
|
``xmlrpclib``. The module is vulnerable <https://bugs.python.org/issue16043>
|
|
to decompression bombs.
|
|
|
|
lxml can load and process compressed data through libxml2 transparently.
|
|
libxml2 can handle even very large blobs of compressed data efficiently
|
|
without using too much memory. But it doesn't protect applications from
|
|
decompression bombs. A carefully written SAX or iterparse-like approach can
|
|
be safe.
|
|
|
|
|
|
Processing Instruction
|
|
----------------------
|
|
|
|
`PI`_'s like::
|
|
|
|
<?xml-stylesheet type="text/xsl" href="style.xsl"?>
|
|
|
|
may impose more threats for XML processing. It depends if and how a
|
|
processor handles processing instructions. The issue of URL retrieval with
|
|
network or local file access apply to processing instructions, too.
|
|
|
|
|
|
Other DTD features
|
|
------------------
|
|
|
|
`DTD`_ has more features like ``<!NOTATION>``. I haven't researched how
|
|
these features may be a security threat.
|
|
|
|
|
|
XPath
|
|
-----
|
|
|
|
XPath statements may introduce DoS vulnerabilities. Code should never execute
|
|
queries from untrusted sources. An attacker may also be able to create an XML
|
|
document that makes certain XPath queries costly or resource hungry.
|
|
|
|
|
|
XPath injection attacks
|
|
-----------------------
|
|
|
|
XPath injeciton attacks pretty much work like SQL injection attacks.
|
|
Arguments to XPath queries must be quoted and validated properly, especially
|
|
when they are taken from the user. The page `Avoid the dangers of XPath injection`_
|
|
list some ramifications of XPath injections.
|
|
|
|
Python's standard library doesn't have XPath support. Lxml supports
|
|
parameterized XPath queries which does proper quoting. You just have to use
|
|
its xpath() method correctly::
|
|
|
|
# DON'T
|
|
>>> tree.xpath("/tag[@id='%s']" % value)
|
|
|
|
# instead do
|
|
>>> tree.xpath("/tag[@id=$tagid]", tagid=name)
|
|
|
|
|
|
XInclude
|
|
--------
|
|
|
|
`XML Inclusion`_ is another way to load and include external files::
|
|
|
|
<root xmlns:xi="http://www.w3.org/2001/XInclude">
|
|
<xi:include href="filename.txt" parse="text" />
|
|
</root>
|
|
|
|
This feature should be disabled when XML files from an untrusted source are
|
|
processed. Some Python XML libraries and libxml2 support XInclude but don't
|
|
have an option to sandbox inclusion and limit it to allowed directories.
|
|
|
|
|
|
XMLSchema location
|
|
------------------
|
|
|
|
A validating XML parser may download schema files from the information in a
|
|
``xsi:schemaLocation`` attribute.
|
|
|
|
::
|
|
|
|
<ead xmlns="urn:isbn:1-931666-22-9"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="urn:isbn:1-931666-22-9 http://www.loc.gov/ead/ead.xsd">
|
|
</ead>
|
|
|
|
|
|
XSL Transformation
|
|
------------------
|
|
|
|
You should keep in mind that XSLT is a Turing complete language. Never
|
|
process XSLT code from unknown or untrusted source! XSLT processors may
|
|
allow you to interact with external resources in ways you can't even imagine.
|
|
Some processors even support extensions that allow read/write access to file
|
|
system, access to JRE objects or scripting with Jython.
|
|
|
|
Example from `Attacking XML Security`_ for Xalan-J::
|
|
|
|
<xsl:stylesheet version="1.0"
|
|
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
|
|
xmlns:rt="http://xml.apache.org/xalan/java/java.lang.Runtime"
|
|
xmlns:ob="http://xml.apache.org/xalan/java/java.lang.Object"
|
|
exclude-result-prefixes= "rt ob">
|
|
<xsl:template match="/">
|
|
<xsl:variable name="runtimeObject" select="rt:getRuntime()"/>
|
|
<xsl:variable name="command"
|
|
select="rt:exec($runtimeObject, 'c:\Windows\system32\cmd.exe')"/>
|
|
<xsl:variable name="commandAsString" select="ob:toString($command)"/>
|
|
<xsl:value-of select="$commandAsString"/>
|
|
</xsl:template>
|
|
</xsl:stylesheet>
|
|
|
|
|
|
Related CVEs
|
|
============
|
|
|
|
CVE-2013-1664
|
|
Unrestricted entity expansion induces DoS vulnerabilities in Python XML
|
|
libraries (XML bomb)
|
|
|
|
CVE-2013-1665
|
|
External entity expansion in Python XML libraries inflicts potential
|
|
security flaws and DoS vulnerabilities
|
|
|
|
|
|
Other languages / frameworks
|
|
=============================
|
|
|
|
Several other programming languages and frameworks are vulnerable as well. A
|
|
couple of them are affected by the fact that libxml2 up to 2.9.0 has no
|
|
protection against quadratic blowup attacks. Most of them have potential
|
|
dangerous default settings for entity expansion and external entities, too.
|
|
|
|
Perl
|
|
----
|
|
|
|
Perl's XML::Simple is vulnerable to quadratic entity expansion and external
|
|
entity expansion (both local and remote).
|
|
|
|
|
|
Ruby
|
|
----
|
|
|
|
Ruby's REXML document parser is vulnerable to entity expansion attacks
|
|
(both quadratic and exponential) but it doesn't do external entity
|
|
expansion by default. In order to counteract entity expansion you have to
|
|
disable the feature::
|
|
|
|
REXML::Document.entity_expansion_limit = 0
|
|
|
|
libxml-ruby and hpricot don't expand entities in their default configuration.
|
|
|
|
|
|
PHP
|
|
---
|
|
|
|
PHP's SimpleXML API is vulnerable to quadratic entity expansion and loads
|
|
entities from local and remote resources. The option ``LIBXML_NONET`` disables
|
|
network access but still allows local file access. ``LIBXML_NOENT`` seems to
|
|
have no effect on entity expansion in PHP 5.4.6.
|
|
|
|
|
|
C# / .NET / Mono
|
|
----------------
|
|
|
|
Information in `XML DoS and Defenses (MSDN)`_ suggest that .NET is
|
|
vulnerable with its default settings. The article contains code snippets
|
|
how to create a secure XML reader::
|
|
|
|
XmlReaderSettings settings = new XmlReaderSettings();
|
|
settings.ProhibitDtd = false;
|
|
settings.MaxCharactersFromEntities = 1024;
|
|
settings.XmlResolver = null;
|
|
XmlReader reader = XmlReader.Create(stream, settings);
|
|
|
|
|
|
Java
|
|
----
|
|
|
|
Untested. The documentation of Xerces and its `Xerces SecurityMananger`_
|
|
sounds like Xerces is also vulnerable to billion laugh attacks with its
|
|
default settings. It also does entity resolving when an
|
|
``org.xml.sax.EntityResolver`` is configured. I'm not yet sure about the
|
|
default setting here.
|
|
|
|
Java specialists suggest to have a custom builder factory::
|
|
|
|
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
|
|
builderFactory.setXIncludeAware(False);
|
|
builderFactory.setExpandEntityReferences(False);
|
|
builderFactory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, True);
|
|
# either
|
|
builderFactory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", True);
|
|
# or if you need DTDs
|
|
builderFactory.setFeature("http://xml.org/sax/features/external-general-entities", False);
|
|
builderFactory.setFeature("http://xml.org/sax/features/external-parameter-entities", False);
|
|
builderFactory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", False);
|
|
builderFactory.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", False);
|
|
|
|
|
|
TODO
|
|
====
|
|
|
|
* DOM: Use xml.dom.xmlbuilder options for entity handling
|
|
* SAX: take feature_external_ges and feature_external_pes (?) into account
|
|
* test experimental monkey patching of stdlib modules
|
|
* improve documentation
|
|
|
|
|
|
License
|
|
=======
|
|
|
|
Copyright (c) 2013-2017 by Christian Heimes <christian@python.org>
|
|
|
|
Licensed to PSF under a Contributor Agreement.
|
|
|
|
See https://www.python.org/psf/license for licensing details.
|
|
|
|
|
|
Acknowledgements
|
|
================
|
|
|
|
Brett Cannon (Python Core developer)
|
|
review and code cleanup
|
|
|
|
Antoine Pitrou (Python Core developer)
|
|
code review
|
|
|
|
Aaron Patterson, Ben Murphy and Michael Koziarski (Ruby community)
|
|
Many thanks to Aaron, Ben and Michael from the Ruby community for their
|
|
report and assistance.
|
|
|
|
Thierry Carrez (OpenStack)
|
|
Many thanks to Thierry for his report to the Python Security Response
|
|
Team on behalf of the OpenStack security team.
|
|
|
|
Carl Meyer (Django)
|
|
Many thanks to Carl for his report to PSRT on behalf of the Django security
|
|
team.
|
|
|
|
Daniel Veillard (libxml2)
|
|
Many thanks to Daniel for his insight and assistance with libxml2.
|
|
|
|
semantics GmbH (https://www.semantics.de/)
|
|
Many thanks to my employer semantics for letting me work on the issue
|
|
during working hours as part of semantics's open source initiative.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
* `XML DoS and Defenses (MSDN)`_
|
|
* `Billion Laughs`_ on Wikipedia
|
|
* `ZIP bomb`_ on Wikipedia
|
|
* `Configure SAX parsers for secure processing`_
|
|
* `Testing for XML Injection`_
|
|
|
|
.. _defusedxml package: https://github.com/tiran/defusedxml
|
|
.. _defusedxml on PyPI: https://pypi.python.org/pypi/defusedxml
|
|
.. _defusedexpat package: https://github.com/tiran/defusedexpat
|
|
.. _defusedexpat on PyPI: https://pypi.python.org/pypi/defusedexpat
|
|
.. _modified expat: https://github.com/tiran/expat
|
|
.. _expat parser: http://expat.sourceforge.net/
|
|
.. _Attacking XML Security: https://www.isecpartners.com/media/12976/iSEC-HILL-Attacking-XML-Security-bh07.pdf
|
|
.. _Billion Laughs: https://en.wikipedia.org/wiki/Billion_laughs
|
|
.. _XML DoS and Defenses (MSDN): https://msdn.microsoft.com/en-us/magazine/ee335713.aspx
|
|
.. _ZIP bomb: https://en.wikipedia.org/wiki/Zip_bomb
|
|
.. _DTD: https://en.wikipedia.org/wiki/Document_Type_Definition
|
|
.. _PI: https://en.wikipedia.org/wiki/Processing_Instruction
|
|
.. _Avoid the dangers of XPath injection: http://www.ibm.com/developerworks/xml/library/x-xpathinjection/index.html
|
|
.. _Configure SAX parsers for secure processing: http://www.ibm.com/developerworks/xml/library/x-tipcfsx/index.html
|
|
.. _Testing for XML Injection: https://www.owasp.org/index.php/Testing_for_XML_Injection_(OWASP-DV-008)
|
|
.. _Xerces SecurityMananger: https://xerces.apache.org/xerces2-j/javadocs/xerces2/org/apache/xerces/util/SecurityManager.html
|
|
.. _XML Inclusion: https://www.w3.org/TR/xinclude/#include_element
|
|
|
|
Changelog
|
|
=========
|
|
|
|
defusedxml 0.6.0
|
|
----------------
|
|
|
|
*Release date: 17-Apr-2019*
|
|
|
|
- Increase test coverage.
|
|
- Add badges to README.
|
|
|
|
|
|
defusedxml 0.6.0rc1
|
|
-------------------
|
|
|
|
*Release date: 14-Apr-2019*
|
|
|
|
- Test on Python 3.7 stable and 3.8-dev
|
|
- Drop support for Python 3.4
|
|
- No longer pass *html* argument to XMLParse. It has been deprecated and
|
|
ignored for a long time. The DefusedXMLParser still takes a html argument.
|
|
A deprecation warning is issued when the argument is False and a TypeError
|
|
when it's True.
|
|
- defusedxml now fails early when pyexpat stdlib module is not available or
|
|
broken.
|
|
- defusedxml.ElementTree.__all__ now lists ParseError as public attribute.
|
|
- The defusedxml.ElementTree and defusedxml.cElementTree modules had a typo
|
|
and used XMLParse instead of XMLParser as an alias for DefusedXMLParser.
|
|
Both the old and fixed name are now available.
|
|
|
|
|
|
defusedxml 0.5.0
|
|
----------------
|
|
|
|
*Release date: 07-Feb-2017*
|
|
|
|
- No changes
|
|
|
|
|
|
defusedxml 0.5.0.rc1
|
|
--------------------
|
|
|
|
*Release date: 28-Jan-2017*
|
|
|
|
- Add compatibility with Python 3.6
|
|
- Drop support for Python 2.6, 3.1, 3.2, 3.3
|
|
- Fix lxml tests (XMLSyntaxError: Detected an entity reference loop)
|
|
|
|
|
|
defusedxml 0.4.1
|
|
----------------
|
|
|
|
*Release date: 28-Mar-2013*
|
|
|
|
- Add more demo exploits, e.g. python_external.py and Xalan XSLT demos.
|
|
- Improved documentation.
|
|
|
|
|
|
defusedxml 0.4
|
|
--------------
|
|
|
|
*Release date: 25-Feb-2013*
|
|
|
|
- As per http://seclists.org/oss-sec/2013/q1/340 please REJECT
|
|
CVE-2013-0278, CVE-2013-0279 and CVE-2013-0280 and use CVE-2013-1664,
|
|
CVE-2013-1665 for OpenStack/etc.
|
|
- Add missing parser_list argument to sax.make_parser(). The argument is
|
|
ignored, though. (thanks to Florian Apolloner)
|
|
- Add demo exploit for external entity attack on Python's SAX parser, XML-RPC
|
|
and WebDAV.
|
|
|
|
|
|
defusedxml 0.3
|
|
--------------
|
|
|
|
*Release date: 19-Feb-2013*
|
|
|
|
- Improve documentation
|
|
|
|
|
|
defusedxml 0.2
|
|
--------------
|
|
|
|
*Release date: 15-Feb-2013*
|
|
|
|
- Rename ExternalEntitiesForbidden to ExternalReferenceForbidden
|
|
- Rename defusedxml.lxml.check_dtd() to check_docinfo()
|
|
- Unify argument names in callbacks
|
|
- Add arguments and formatted representation to exceptions
|
|
- Add forbid_external argument to all functions and classes
|
|
- More tests
|
|
- LOTS of documentation
|
|
- Add example code for other languages (Ruby, Perl, PHP) and parsers (Genshi)
|
|
- Add protection against XML and gzip attacks to xmlrpclib
|
|
|
|
defusedxml 0.1
|
|
--------------
|
|
|
|
*Release date: 08-Feb-2013*
|
|
|
|
- Initial and internal release for PSRT review
|
|
|
|
|