Understanding XXE

An XXE (XML External Entities) is a vulnerability ranked in the Top 10 OWASP and affecting programs interpreting XML.

Its main characteristic is the ability to read files on the target server. It can thus endanger it, for example, by accessing a configuration file containing passwords, by copying database files or by retrieving the source code of an application.

XML and Entities

XML (eXtensible Markup Language) is a markup metalanguage, in other words, a way of formalising text. It is recognisable by its use of angle brackets and its extensibility, allowing new tags to be added whenever desired.

Here is an example of an XML file which is used for formatting a breakfast menu:

One of the functionalities of XML is particularly interesting in our case: entities.
An entity is, globally, the XML equivalent of a variable and allows us to define elements that we can then place elsewhere.

Let’s go back to our previous menu; if we want the currency of the price to be easily modifiable, we can use an entity like this:

When reading this XML file, all instances of “&currency” will be replaced by the value defined in the ENTITY tag, which gives us this result:

A particularity of the entities is that it is possible to refer to a so-called “external” entity (i.e. external to the document, a file either local or located on a remote server) by adding “SYSTEM” when defining the entity. For example, if we want to refer to a pancake picture located in our /photos folder, we could define the following entity: <!DOCTYPE breakfast_menu [ <!ENTITY pancakePhoto SYSTEM “file:///photos/pancake.jpg” > ]>

And there lies the heart of an XXE.

How an XXE works

It is common for some websites to use user data to create XML files (or XML-based, such as PDF) before returning them to the user.

It is therefore possible for a malicious user to try to inject references to external entities, including sensitive files located on the server, into the XML code. Since it is the server that will read the XML, it will try to resolve the external entities before incorporating them into the final document, which will then be returned to the client containing sensitive data.

For example, if the server tries to create a breakfast menu from following XML:

The document generated by this XML code will incorporate the /etc/passwd file of the server that executed it (if it is vulnerable to XXE, of course).

To be vulnerable, a service must include three behaviours:

  • It must be possible for a user to manipulate XML content which will then be parsed by the server. This can be done via a file upload, a text editor (allowing XML formats) or the reuse of data from the client in an XML field (for example first and last names, an address, or an email).
  • The XML parser allows the definition and use of entities.
  • The XML parser must parse and interpret external references in entities.

Additional note:

There is another type of attack using XML entities (non-external this time): the XEE (XML Entity Expansion).
This is a denial of service attack that consists in creating billions of references to entities, which consumes a lot of resources and can slow down or even block a system.

For example:

The a7 entity contains 10 times the a6 entity, which contains itself 10 times the a5 entity, and so on. And this exponentially, up to the entity a0 which is just a simple string of characters.

This attack goes through the same vectors as an XXE. The only notable difference is that it does not use an external reference, which means that prohibiting external entities while allowing the definition of “standard” entities leaves you vulnerable.

Impact and Prevention of XXE attacks

The impact of an XXE can be devastating: since it allows extracting files from the web server, it makes possible:

  • The extraction of the application source code.
  • The theft of API keys, hard stored passwords, etc. In short, any data or password stored in a configuration file.
  • Database theft: it is possible to copy the entire database files. This is particularly the case for “single-file” databases such as sqlite.
  • Theft of files generated or uploaded by other users and stored on the server. This can be particularly annoying if they contain personal data (invoices, identity papers, medical data…).

To protect yourself from XXE, it is enough to remove one of the three behaviours mentioned above:

  •  Do not use user data in XML documents. This option may be viable if the dynamic data in the documents is independent of the users (for example a date). The XML code itself must NEVER pass through the client.
  • Prohibit the definition of external entities or entities. Most XML interpreters include such options, whether during configuration, initialisation or use. Read your interpreter’s documentation carefully to learn how to configure and secure your interpreter.

Trying to “clean up” user data before using it can be tempting, but it is a risky undertaking because it is difficult to take into account all use cases.

In conclusion, XXE are relatively unknow vulnerabilities whose impact can be very serious. The versatility of XML (present in .pdf, .docx, .svg, .xlsx… files) can make them difficult to detect, including for site owners. Fortunately, once the risk is identified, its prevention is (usually) easy and efficient.