This is volume 78 of Advances in Computers. This series, which began publication in 1960, is the oldest continuously published anthology that chronicles the ever- changing information technology field. In these volumes we publish from 5 to 7 chapters, three times per year, that cover the latest changes to the design, development, use and implications of computer technology on society today.
- Covers the full breadth ofinnovations in hardware, software, theory, design, and applications.
- Many of the in-depth reviews have become standard references that continue to be of significant, lasting value in this rapidly expanding field.
Read an Excerpt
Advances in COMPUTERS VOLUME 78Improving the Web
Academic PressCopyright © 2010 Elsevier Inc.
All right reserved.
Chapter OneSearch Engine Optimization—Black and White Hat Approaches
ROSS A. MALAGA
Management and Information Systems, School of Business, Montclair State University, Montclair, New Jersey, USA
Abstract Today the first stop for many people looking for information or to make a purchase online is one of the major search engines. So appearing toward the top of the search results has become increasingly important. Search engine optimization (SEO) is a process that manipulates Web site characteristics and incoming links to improve a site's ranking in the search engines for particular search terms. This chapter provides a detailed discussion of the SEO process. SEO methods that stay within the guidelines laid out by the major search engines are generally termed "white hat," while those that violate the guidelines are called "black hat." Black hat sites may be penalized or banned by the search engines. However, many of the tools and techniques used by "black hat" optimizers may also be helpful in "white hat" SEO campaigns. Black hat SEO approaches are examined and compared with white hat methods.
1. Introduction 2 2. Background 4 2.1. Search Engines History and Current Statistics 4 2.2. SEO Concepts 5 3. The SEO Process 5 3.1. Keyword Research 5 3.2. Indexing 9 3.3. On-Site Optimization 11 3.4. Link Building 15 4. Black Hat SEO 18 4.1. Black Hat Indexing Methods 19 4.2. On-Page Black Hat Techniques 19 4.3. Cloaking 22 4.4. Doorway Pages 23 4.5. Content Generation 25 4.6. Link Building Black Hat Techniques 25 4.7. Negative SEO 30 5. Legal and Ethical Considerations 31 5.1. Copyright Issues 31 5.2. SEO Ethics 33 5.3. Search Engine Legal and Ethical Considerations 33 6. Conclusions 34 6.1. Conclusions for Site Owners and SEO Practitioners 35 6.2. Future Research Directions 35 References 37
The past few years have seen a tremendous growth in the area of search engine marketing (SEM). SEM includes paid search engine advertising and search engine optimization (SEO). According to the Search Engine Marketing Professional Organization (SEMPO), search engine marketers spent over $13.4 billion in 2008. In addition, this figure is expected to grow to over $26 billion by 2013. Of the $13.4 billion spent on SEM, about 10% ($1.4 billion) was spent on SEO.
Paid advertising are the small, usually text-based, ads that appear alongside the query results on search engine sites (see Fig.1). Paid search engine advertising usually works on a pay-per-click (PPC) basis. SEO is a process that seeks to achieve a high ranking in the search engine results for certain search words or phrases. The main difference between SEO and PPC is that with PPC, the merchant pays for every click. With SEO each click is free (but the Web site owner may pay a considerable amount to achieve the high ranking). In addition, recent research has shown that users trust the SEO (called organic) results and are more likely to purchase from them.
Industry research indicates that most search engine users only clicked on sites that appeared on the first page of the search results—basically the top 10 results. Very few users clicked beyond the third page of search results. These results confirm the research conducted by Granka et al., in which they found that almost 80% of the clicks on a search engine results page came went to those sites listed in the first three spots.
SEO has become a very big business. Some of the top optimizers and SEO firms regularly charge $20,000 or more per month for ongoing optimization. It is not uncommon for firms with large clients to charge them $150,000 or more on a monthly basis.
Because of the importance of high search engine rankings and the profits involved, search engine optimizers look for tools, methods, and techniques that will help them achieve their goals. Some focus their efforts on methods aimed at fooling the search engines. These optimizers are considered "black hat," while those that closely follow the search engine guidelines would be considered "white hat." There are two main reasons why it is important to understand the methods employed by black hat optimizers. First, some black hats have proven successful in achieving high rankings. When these rankings are achieved, it means that white hat sites are pushed lower in the search results. However, in some cases these rankings might prove fleeting and there are mechanisms in place to report such sites to the search engines. Second, some of the tools and methods used by black hat optimizers can actually be used by white hat optimizers. In many cases, it is just a matter of scope and scale that separates black and white hat.
While there are some studies dealing with SEO, notably Refs. [6–9], academic research in the area of SEO has been relatively scant given its importance in the online marketing field. This chapter combines the academic work with the extensive practitioner information. Much of that information comes in the form of blogs, forum discussions, anecdotes, and Web sites.
The remainder of this chapter proceeds as follows. Section 2 provides a back-ground on search engines in general and basic SEO concepts. After that a detailed discussion on the SEO process including keyword research, indexing, on-site factors, and linking ensues. The section that follows focuses on black hat SEO techniques. Legal and ethical implications of SEO are then discussed. Finally, implications for management, conclusions, and future research directions are detailed.
A search engine is simply a database of Web pages, a method for finding Web pages and indexing them, and a way to search the database. Search engines rely on spiders—software that followed hyperlinks—to find new Web pages to index and insure that pages that have already been indexed are kept up to date.
Although more complex searches are possible, most Web users conduct simple searches on a keyword or key phrase. Search engines return the results of a search based on a number of factors. All of the major search engines consider the relevance of the search term to sites in its database when returning search results. So, a search for the word "car" would return Web pages that had something to do with auto- mobiles. The exact algorithms used to determine relevance are constantly changing and are a trade secret.
2.1 Search Engines History and Current Statistics
The concept of optimizing a Web site so that it appears toward the top of the results when somebody searches on a particular word or term has existed since the mid-1990s. Back then the search engine landscape was dominated by about 6–10 companies, including Alta Vista, Excite, Lycos, and Northern Lights. At that time, SEO largely consisted of keyword stuffing. That is adding the search term numerous times to the Web site. A typical trick employed was repeating the search term hundreds of times using white letters on a white background. Thus, the search engines would "see" the text, but a human user would not.
The search engine market and SEO have changed dramatically over the past few years. The major shift had been the rise and dominance of Google. Google currently handles more than half of all Web searches . The other major search engines used in the United States are Yahoo and MSN. Combined, these three search engines are responsible for over 91% of all searches . It should be noted that at the time this chapter was written, Microsoft had just released Bing.com as its main search engine.
The dominance of the three major search engines (and Google in particular) combined with the research on user habits meant that for any particular search term, a site must appear in the top 30 spots on at least one of the search engines or it was effectively invisible. So, for a given term, for example "toyota corolla," there were only 90 spots available overall. In addition, 30 of those spots (the top 10 in each search engine) are highly coveted and the top 10 spots in Google are extremely important.
2.2 SEO Concepts
Curran states, "search engine optimization is the process of improving a website's position so that the webpage comes up higher in the search results [search engine results page (SERP)] of major search engines" (p. 202). This process includes manipulation of dozens or even hundreds of Web site elements. For example, some of the elements used by the major search engines to determine relevance include, but are not limited to: age of the site, how often new content is added, the ratio of keywords or terms to the total amount of content on the site, and the quality and number of external sites linking to the site.
3. The SEO Process
In general, the process of SEO can be broken into four main steps: (1) keyword research, (2) indexing, (3) on-site optimization, and (4) off-site optimization.
3.1 Keyword Research
A search engine query is basically just a word or phrase. It is the result of the query to a specific word or phrase that is of interest to search engine optimizers. The problem is that there are usually many words or phrases that can be used for a particular search. For example, if a user was looking to purchase a car—say a Toyota Prius, she might use any of the following words or phrases in her search:
New Toyota Prius
Toyota Prius New York City
Toyota Prius NYC
NYC Toyota Prius
It is easy to see that this list can keep going. In terms of SEO, which term or terms should we try to optimize our site for?
Keyword research consists of building a large list of relevant search words and phrases and then comparing them along three main dimensions. First, we need to consider the number of people who are using the term in the search engines. After all, why optimize for a term that nobody (or very few people) use? Fortunately, Google now makes search volume data available via its external keyword tool (available at https://adwords.google.com/select/KeywordToolExternal). Simply type the main keywords and terms and click on Get Keyword Ideas. Google will generate a large list of relevant terms and provide the approximate average search volume (see Fig. 2).
Clearly, we are looking for terms with a comparatively high search volume. So, for example, we can start building a keyword list with:
Many search engine optimizers also consider simple misspellings. For example, we can add the following to our list:
Once we have generated a large list of keywords and phrases (most optimizers generate lists with thousands of terms), the second phase is to determine the level of competition for each term. To determine the level of competition, simply type the term into Google and see how many results are reported in the top right part of the page (see Fig.3).
To compare keyword competition, optimizers determine the results to search (R/S) ratio. The R/S ratio is calculated by simply dividing the number results (competitors) by the number of searches over a given period of time. On this scale lower numbers are better. So, we might end up with a list like that in Table I.
Comparing R/S ratios is more effective than just looking at how many people are searching for a particular word or phrase as it incorporates the level of competition. In general, optimizers want to target terms that are highly searched and have a low level of competition. However, the R/S ratio can reveal terms that have a relatively low level of searches, but also a very low level of competition. For instance, Table I shows that the misspelled word "hybrid" has a lower search volume than many of the other terms. However, when the competition is also considered via the R/S ratio, the misspelled word appears to be a good potential target for SEO.
The third factor to consider, at least in most cases, is the commercial viability of the term. To determine commercial viability we must understand a bit about consumer buying behavior. The traditional consumer purchase model consists of five phases: (1) need recognition, (2) information search, (3) option evaluation, (4) purchase decision, and (5) postpurchase behavior.
Once a potential consumer becomes aware of a need she begins to search for more information about how to fulfill that need. For example, a person who becomes aware of a need for a car would begin by gathering some general information. For example, she might research SUVs, trucks, sedans, etc. At this point the consumer does not even know what type of vehicle she wants. She might use search terms like "how to buy a car" or "types of cars." Since the consumer does not know what type of car she wants at this point, these terms would be considered to have low commercial viability.
In the next phase, the consumer begins to narrow down the choices and evaluate the various options. Some exemplar search terms in this phase might include "car review," "SUV recommendation," and "best cars." These terms have a bit more commercial viability, but would still not be considered high viable.
During the fourth phase, consumers have made a choice and are now just looking for where to purchase—comparing options like price, warranties, service, trust, etc. At this point the search terms become much more specific. For example, the consumer might use terms like "Toyota Prius 2009," "prius best price," and "new jersey Toyota dealer." Since the consumer is ready to purchase these terms are considered to have high commercial viability.
A good optimizer will actually target multiple terms—some for the site's home- page and some for the internal pages of the site. For instance, the site for a Toyota dealer in Montclair New Jersey might use "New Jersey Toyota Dealer" as the main SEO target for the homepage. The same site might use "Toyota Prius 2009 Best Price" for an internal page that lists the features of that car and the details of the vehicles on the lot.
Clearly, determining commercial viability is a combination of art and science. It requires the optimizer to think like a consumer. Microsoft researchers have conducted research into this area. They have broken search into three main categories: navigational, informational, and transactional. In addition, queries are also categorized as either commercial or noncommercial based on the nature of the search term used, resulting in the 3 × 2 grid shown in Table II. For example, terms that include words such as "buy," "purchase," or "price" would be considered commercial in nature. The researchers determined the categorization of commercial and noncommercial by asking human reviewers to rate various terms along those dimensions. Obviously, this approach has serious limitations. However, Microsoft has developed a Detecting Commercial Online Intent tool, which is available at http://adlab.microsoft.com/Online-Commercial-Intention/. Many optimizers use this site to gauge the commercial viability and search type of their keywords.
Finally, some optimizers have attempted to capture the consumer earlier in the process—during the option evaluation phases in particular. This is typically accomplished by developing review and recommendation type sites. There are, to date, no reliable data on how well these types of sites perform in terms of moving the visitor from the information phase to the transactional phase.
Excerpted from Advances in COMPUTERS VOLUME 78 Copyright © 2010 by Elsevier Inc.. Excerpted by permission of Academic Press. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
Table of Contents
- Search Engine Optimization – Black and White Hat Approaches by Ross A. Malaga
- Web Searching and Browsing: A Multilingual Perspective by Wingyan Chung
- Features for Content-Based Audio Retrieval by Dalibor Mitrovic, Matthias Zeppelzauer and Christian Breiteneder
- Multimedia Services over Wireless Metropolitan Area Networks by Kostas Pentikousis, Jarno Pinola, Esa Piri, Pedro Neves, and Susana Sargento
- An Overview of Web Effort Estimation by Emilia Mendes
- Communication Media Selection for Remote Interaction of Ad Hoc Groups by Fabio Calefato and Filippo Lanubile