XML 101
XML Parsers
XML-based applications will typically use a XML parser to validate the data and convert it into an internal object. XML parsing vulnerabilities can serve as primitives (building blocks) for further attacks, such as SSRF, information disclosure, DoS, RCE, etc.
XML Structure
Extensible Markup Language (XML) is focused on the storage and transportation of data and it encodes data in a way that's easier for human and machines to read. XML documents consist of markup, mostly tags, and content. It resembles an HTML file, although HTML documents have predefined tags and XML does not: each XML document defines its own set of tags. The key compoments of an XML file are depicted below (Figure 1).

If we need to use the XML-specific characters within an XML document they must be replaced with their corresponding entity references.
<
<
>
>
&
&
'
'
"
"
An alternative is to enclosed those characters in a Character Data (CDATA) section. XML parsers will treat the contents of a CDATA block as text instead of markup.
XML Entities
What we are mostly interesting in from an attacker's POV is the XML Document Type Definition (DTD) which allows the validation of an XML document against a pre-defined document structure. The DTD can be placed within the XML document itself (Figure 2), imported from an external file (Figure 3.1), or referenced through a URL (Figure 3.2).
Within DTDs we can define entities (XML variables) using the ENTITY keyword. XML entities can be categorized in three ways:
Internal, i.e., locally within the DTD using the entity name and its value.
External, i.e., outside of the DTD, using the
SYSTEMkeyword and its path. These can be further classified as private or public, depending of their intented audience.
Parameter entities exists only within a DTD and include the
%prefix.
Different entity types can be combined and used together.
We can expand an entity using an entiry reference via the & symbol.
Last updated
Was this helpful?

