November 12, 2008

HTML Form Controls reviewed

Intro

Inspired by a post by John Resig about conflicts between HTML element IDs and DOM properties/JavaScript variables I started to think about related techniques that would lead to security risks or even vulnerabilities. Garrett Smith and Frank Manno also crafted an excellent writeup about this topic and related problems if you prefer a deeper introduction into form controls and unsafe names. And guess who copied this feature from whom?

Some Code

Let's have a short look at what we are talking about here. First we have a bunch of markup - a simple form. Notice the IDs the elements have.

<form id="a">
<input id="b" />
</form>

It's now possible to access the elements directly with window.a and window.b - or just a or b. We can also traverse through a to get b with a.b. The traversal only works for elements which make their child elements accessible via a numeric index - this being basically forms and their input elements. You can do the same with name attributes of images and forms but only in the document scope - we won't touch that aspect here.

But what if the form elements have IDs like location and href? Theoretically they should be accessible via location.href - generating a severe conflict with an already existing and pretty important DOM property.

<form id="location">
<input id="href">
</form>

Opera, Firefox, Gecko and Safari know how to deal with attempts like these. Most of the really interesting DOM properties are protected from being touched via form controls - such as navigator, window, document, location etc. But actually it's possible to overwrite a lot of DOM properties like the following example shows.

<form id="a">
<button id="length">0</button>
<button id="style">1</button>
<button id="id">2</button>
<button id="className">3</button>
<button id="baseURI">4</button>
<button id="textContent">5</button>
<button id="innerHTML">6</button>
<button id="title">7</button>
<button id="elements">8</button>
<button id="method">9</button>
</form>

The element with the ID id can now be accessed via id. Or a.id - since it resides in a's properties too. We can also set variables - for example via var b = id.id - which in this case would be id. The next example tries to be less confusing and shows how window.lang can be overwritten and then have it's vale being executed by a single assignment. There's no reason why someone would write code like that but it works.

<a id="url" href="javascript:alert(1)">
<script>
location=url;
</script>

Conclusion already?

Altogether this thing doesn't seem to be very interesting from a security point of view. The juicy properties can't be overwritten, an attacker has to be able to inject form elements with IDs - most WYSIWYG implementations allow that by the way. What makes the issue even more boring is the fact that if the variable has already been set before the markup is being parsed, it won't get reset by the injected HTML elements.

But - maybe you noticed one important browser missing on the above mentioned list. The Internet Explorer of course. IE6 up to IE8 Beta 2 don't care if properties like location or document shouldn't be set via markup and IDs. So - incredible but true - the following code works perfectly in all tested IE versions.

<form id="document" cookie="foo">
<script>alert(document.cookie)</script>

Or:

<form id="location" href="bar">
<script>alert(location.href)</script>

It's also possible to interfere with really important variables like document.cookie, document.body.innerHTML and almost all others I tested. The technique doesn't depend on doctype or apparently other factors to work. Furthermore you can define own attributes and have their value being accessible via traversal - like in the document.cookie example.

<form id="document">
<select id="body">bar</select>
</form>
<script>
alert(document.body.innerHTML)
</script>

Scripts being used on millions of pages like Google Analytics work with those properties and are usually included right before the closing body tag. Depending on the position where the markup containing the malicious attributes and IDs can be injected it's at least possible to influence the JavaScript application flow or in the worst case execute arbitrary code - nested in the HTML attributes. In case an application allows the user to post inactive HTML it's very important to make sure the submitted and to be rendered elements mustn't contain IDs. In some cases it may make sense to initially set the properties with themselves - and therewith blocking them from being overridden by markup.