XML 101
XML Parsers
XML-based applications will typically use a XML parser to validate the data and convert it into an internal object. XML parsing vulnerabilities can serve as primitives (building blocks) for further attacks, such as SSRF, information disclosure, , , etc.
XML Structure
Extensible Markup Language (XML) is focused on the storage and transportation of data and it encodes data in a way that's easier for human and machines to read. XML documents consist of markup, mostly tags, and content. It resembles an HTML file, although HTML documents have predefined tags and XML does not: each XML document defines its own set of tags. The key compoments of an XML file are depicted below (Figure 1).

<?xml version="1.0" encoding="UTF-8"?> <!-- XML declaration -->
<contacts> <!-- start tag of the contacts element -->
<contact id="123"> <!-- element attributes are defined within the start tag -->
<firstName>Tom</firstName> <!-- sub-element of contact -->
<lastName>Jones</lastName>
</contact>
<contact id="456">
<firstName>Tom</firstName>
<lastName>Petty</lastName>
</contact>
</contacts> <!-- end tag of the contacts element -->
If we need to use the XML-specific characters within an XML document they must be replaced with their corresponding entity references.
<
<
>
>
&
&
'
'
"
"
An alternative is to enclosed those characters in a Character Data (CDATA
) section. XML parsers will treat the contents of a CDATA
block as text instead of markup.
<![CDATA[ content ]]>
XML Entities
What we are mostly interesting in from an attacker's is the XML Document Type Definition (DTD) which allows the validation of an XML document against a pre-defined document structure. The DTD can be placed within the XML document itself (Figure 2), imported from an external file (Figure 3.1), or referenced through a URL (Figure 3.2).


Within DTDs we can define entities (XML variables) using the ENTITY
keyword. XML entities can be categorized in three ways:
Internal, i.e., locally within the DTD using the entity name and its value.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE note [
<!ENTITY sender "x7331">
]>
External, i.e., outside of the DTD, using the
SYSTEM
keyword and its path. These can be further classified as private or public, depending of their intented audience.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE note [
<!ENTITY senderPrivate SYSTEM "http://example.com/note.txt">
<!ENTITY senderPublic PUBLIC "public_id" "http://example.com/note.txt">
]>
Parameter entities exists only within a DTD and include the
%
prefix.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE note [
<!ENTITY % name SYSTEM "URI">
]>
Different entity types can be combined and used together.
<!-- defining a parameter entity -->
<!ENTITY % name 'x7331'>
<!-- defining an internal entity which includes a parameter entity -->
<!ENTITY Title 'This is the site of %name;'>
We can expand an entity using an entiry reference via the &
symbol.
<name>&name;</name>
Last updated
Was this helpful?