Read an Excerpt
 
 Seven Deadliest Web Application Attacks 
 By Mike Shema 
 SYNGRESS 
  Copyright © 2010   Elsevier Inc. 
All right reserved.
 ISBN: 978-1-59749-544-8 
    Chapter One 
  Cross-Site Scripting  
  
  INFORMATION IN THIS CHAPTER  
   Understanding HTML Injection  
   Employing Countermeasures  
  
  When the Spider invited the Fly into his parlor, the Fly at first declined with the wariness  of prey confronting its predator. The Internet is rife with traps, murky corners, and  malicious hosts that make casually surfing random Web sites a dangerous proposition.  Some areas are, if not obviously dangerous, at least highly suspicious. Web sites offering  warez (pirated software), free porn, or pirated music tend to be laden with viruses  and malicious software waiting for the next insecure browser to visit.  
     These Spiders' parlors also exist at sites typically assumed to be safe: social  networking, well-established online shopping, Web-based e-mail, news, sports,  entertainment, and more. Although such sites do not encourage visitors to download  and execute untrusted virus-laden programs, they serve content to the browser.  The browser blindly executes this content, a mix of Hypertext Markup Language  (HTML) and JavaScript, to perform all sorts of activities. If you're lucky, the browser  shows the next message in your inbox or displays the current balance of your bank  account. If you're really lucky, the browser isn't siphoning your password to a server  in some other country or executing money transfers in the background.  
     In October 2005, a user logged in to MySpace and checked out someone else's  profile. The browser, executing JavaScript code it encountered on the page, automatically  updated the user's own profile to declare someone named Samy their  hero. Then a friend viewed that user's profile and agreed on his own profile that  Samy was indeed "my hero." Then another friend, who had neither heard of nor met  Samy, visited MySpace and added the same declaration. This pattern continued with  such explosive growth that 24 hours later, Samy had over one million friends, and  MySpace was melting down from the traffic. Samy had crafted a cross-site scripting  (XSS) attack that, with approximately 4,000 characters of text, caused a denial  of service against a company whose servers numbered in the thousands and whose  valuation at the time flirted around $500 million. The attack also enshrined Samy  as the reference point for the mass effect of XSS. (An interview with the creator of  Samy can be found at http://blogoscoped.com/archive/2005-10-14-n81.html.)  
     How often have you encountered a prompt to reauthenticate to a Web site? Have you  used Web-based e-mail? Checked your bank account online? Sent a tweet? Friended  someone? There are examples of XSS vulnerabilities for every one of these Web sites.  
     XSS isn't always so benign that it acts merely as a nuisance for the user. (Taking  down a Web site is more than a nuisance for the site's operators.) It is also used to  download keyloggers that capture banking and online gaming credentials. It is used  to capture browser cookies to access victims' accounts with the need for a username  or password. In many ways, it serves as the stepping stone for very simple, yet very  dangerous attacks against anyone who uses a Web browser.  
  
  UNDERSTANDING HTML INJECTION  
  XSS can be more generally, although less excitingly, described as HTML injection.  The more popular name belies the fact that successful attacks need not cross sites or  domains and need not consist of JavaScript to be effective.  
     An XSS attack rewrites the structure of a Web page or executes arbitrary JavaScript  within the victim's Web browser. This occurs when a Web site takes some piece of  information from the user – an e-mail address, a user ID, a comment to a blog post,  a zip code, and so on – and displays the information in a Web page. If the Web site is  not careful, then the meaning of the HTML document can be disrupted by a carefully  crafted string.  
     For example, consider the search function of an online store. Visitors to the site  are expected to search for their favorite book, movie, or pastel-colored squid pillow,  and if the item exists, they purchase it. If the visitor searches for DVD titles that  contain living dead, the phrase might show up in several places in the HTML source.  Here, it appears in a meta tag.  
     [UNABLE TO REPRODUCE CHARACTER STRING IN ASCII]    <meta name="description" content="Cheap DVDs. Search results for        living dead" />     <meta name="keywords" content="dvds,cheap,prices" /><title>  
  
     However, later the phrase may be displayed for the visitor at the top of the search  results, and then near the bottom of the HTML inside a script element that creates  an ad banner.  
     matches for "<span id="ctl00_body_ctl00_lblSearchString">        living dead</span>"     ... lots of HTML here...     <script type="text/javascript"><!-           ggl_ad_client = "pub-6655321";           ggl_ad_width = 468;           ggl_ad_height = 60;           ggl_ad_format = "468x60_as";           ggl_ad_channel =";           ggl_hints = "living dead";     //->     </SCRIPT>  
  
     XSS comes in to play when the visitor can use characters normally reserved for  HTML markup as part of the search query. Imagine if the visitor appends a double  quote (") to the phrase. Compare how the browser renders the results of the two different  queries in each of the windows in Figure 1.1.  
     Note that the first result matched several titles in the site's database, but the second  search reported "No matches found" and displayed some guesses for a close match.  This happened because living dead" (with quote) was included in the database query  and no titles existed that ended with a quote. Examining the HTML source of the  response confirms that the quote was preserved:  
     matches for "<span id="ctl00_body_ctl00_lblSearchString">        living dead"</span>"  
     If the Web site will echo anything we type in the search box, what might happen  if a more complicated phrase were used? Figure 1.2 shows what happens when  JavaScript is entered directly into the search.  
     By breaking down the search phrase, we see how the page was rewritten to convey  a very different message to the Web browser than the Web site's developers intended.  The HTML language is a set of grammar and syntax rules that inform the browser  how to interpret pieces of the page. The rendered page is referred to as the Document  Object Model (DOM). The use of quotes and angle brackets enabled the attacker to  change the page's grammar to add a JavaScript element with code that launched a  pop-up window. This happened because the phrase was placed directly in line with  the rest of the HTML content.  
     matches for "<span id="ctl00_body_ctl00_lblSearchString">        living dead<script>alert("They're coming to get you, Barbara.")        </SCRIPT></span>"  
  
     Instead of displaying <script>alert ... as text like it does for the words living  dead, the browser sees the <script> tag as the beginning of a code block and renders  it as such. Consequently, the attacker is able to arbitrarily change the content of the  Web page by manipulating the DOM.  
     Before we delve too deeply into what an attack might look like, let's see what  happens to the phrase when it appears in the meta tag and ad banner. Here is the meta  tag when the phrase living dead" is used:  
     <meta name="description" content="Cheap DVDs. Search results for        living dead"" />  
  
     The quote character has been rewritten to its HTML-encoded version – " –  which browsers know to display as the " symbol. This encoding preserves the syntax  of the meta tag and the DOM in general. Otherwise, the syntax of the meta tag would  have been slightly different:  
     <meta name="description" content="Cheap DVDs. Search results for        living dead" />  
  
     This lands an innocuous pair of quotes inside the element and most browsers will  be able to recover from the apparent typo. On the other hand, if the search phrase  is echoed verbatim in the meta element's content attribute, then the attacker has a  delivery point for an XSS payload:  
     <meta name="description" content="Cheap DVDs. Search results for        living dead"/>     <script>alert("They're coming to get you, Barbara.")</SCRIPT>     <meta name=" />  
  
     Here's a more clearly annotated version of the XSS payload. Note how the syntax  and grammar of the HTML page have been changed. The first meta element is properly  closed, a script element follows, and a second meta element is added to maintain  the validity of the HTML.  
     <meta name="description" content="Cheap DVDs. Search results for        living dead"/> close content attribute with a quote, close        the meta element with />     <script> ... </SCRIPT> add some arbitrary JavaScript     <meta name=" create an empty meta element to prevent the browser        from displaying the dangling "/> from the original <meta        description ... element     " />  
  
     The ggl_hints parameter in the ad banner script element can be similarly manipulated.  Yet, in this case, the payload already appears inside a script element, so the  attacker needs only to insert valid JavaScript code to exploit the Web site. No new  elements needed to be added to the DOM for this attack. Even if the developers had  been savvy enough to blacklist <script> tags or any element with angle brackets, the  attack would have still succeeded.  
     <script type="text/javascript"><!-           ggl_ad_client = "pub-6655321";           ggl_ad_width = 468;           ggl_ad_height = 60;           ggl_ad_format = "468x60_as";           ggl_ad_channel =";           ggl_hints = "living dead"; close the ggl_hints string with";     ggl_ad_client="pub-attacker"; override the ad_client to give        the attacker credit     function nefarious() { } perhaps add some other function     foo=" create a dummy variable to catch the final ";     ";     //->     </SCRIPT>  
  
     Each of the previous examples demonstrated an important aspect of XSS attacks:  the location on the page where the payload is echoed influences what characters are  necessary to implement the attack. In some cases, new elements can be created, such  as <script> or <iframe>. In other cases, an element's attribute might be modified. If  the payload shows up within a JavaScript variable, then the payload need only consist  of code.  
     Pop-up windows are a trite example of XSS. More vicious payloads have been  demonstrated to  
   Steal cookies so attackers can impersonate victims without having to steal    passwords  
   Spoof login prompts to steal passwords (attackers like to cover all the angles)  
   Capture keystrokes for banking, e-mail, and game Web sites  
   Use the browser to port scan a local area network  
   Surreptitiously reconfigure a home router to drop its firewall  
   Automatically add random people to your social network  
   Lay the groundwork for a cross-site request forgery (CSRF) attack  
  
    Regardless of what the actual payload is trying to accomplish, all forms of the  XSS attack rely on the ability of a user-supplied bit of information to be rendered in  the site's Web page such that the DOM structure will be modified. Keep in mind that  changing the HTML means that the Web site is merely the penultimate victim of the  attack. The Web site acts as a broker that carries the payload from the attacker to the  Web browser of anyone who visits it.  
     Alas, this chapter is far too brief to provide a detailed investigation of all XSS  attack techniques. One in particular deserves mention among the focus on inserting  JavaScript code and creating HTML elements, but is addressed here only briefly:  Cascading Style Sheets (CSS). Cascading Style Sheets, abbreviated CSS and not  to be confused with this attack's abbreviation, control the layout of a Web site for  various media. A Web page could be resized or modified depending on whether  it's being rendered in a browser, a mobile phone, or sent to a printer. Clever use of  CSS can attain much of the same outcomes as a JavaScript-based attack. In 2006,  MySpace suffered a CSS-based attack that tricked victims into divulging their passwords  (www.caughq.org/advisories/CAU-2006-0001.txt). Other detailed examples  can be found at http://p42.us/css/.  
  
  Identifying Points of Injection  
  The Web browser is not to be trusted. Obvious sources of attack may be links  or form fields. Yet, all data from the Web browser should be considered tainted.  Just because a value is not evident, such as the User-Agent header that identifies  every type of browser, it does not mean that the value cannot be modified by a  malicious user. If the Web application uses some piece of information from the  browser, then that information is a potential injection point regardless of whether  the value is assumed to be supplied manually by a human or automatically by the  browser.  
  
  Uniform Resource Identifier Components  
  Any portion of the Uniform Resource Identifier (URI) can be manipulated for XSS.  Directory names, file names, and parameter name/value pairs will all be interpreted  by the Web server in some manner. The URI parameters may be the most obvious area  of concern. We've already seen what may happen if the search parameter contains  an XSS payload. The URI is dangerous even when it might be invalid, point to a  nonexistent page, or have no bearing on the Web site's logic. If the Web site echos  the link in a page, then it has the potential to be exploited. For example, a site might  display the URI if it can't find the location the link was pointing to.  
          Oops! We couldn't find [UNABLE TO REPRODUCE CHARACTER STRING IN ASCII].        Please return to our [UNABLE TO REPRODUCE CHARACTER STRING IN ASCII]