Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage / Edition 1

Hardcover (Print)
Buy New
Buy New from BN.com
$105.95
Used and New from Other Sellers
Used and New from Other Sellers
from $34.55
Usually ships in 1-2 business days
(Save 67%)
Other sellers (Hardcover)
  • All (15) from $34.55   
  • New (9) from $49.99   
  • Used (6) from $34.55   

Overview

This book introduces the reader to methods of data mining on the web, including uncovering patterns in web content (classification, clustering, language processing), structure (graphs, hubs, metrics), and usage (modeling, sequence analysis, performance).

Read More Show Less

Editorial Reviews

From the Publisher
"…it has to be noted that this book is an excellent resource for conducting Web mining lectures or single units within Data mining class. The data can be used for small as well as quite comprehensive business intelligence projects. The book's content is easy to access; even students with very basic statistical skills can get the flavor of the intriguing aspects of Web mining." (Journal of Statistical Software, April 2008)

"…highlight[s] the exciting research related to data mining the Web…a detailed summary of the current state of the art." (CHOICE, December 2007)

"I can say I really enjoyed reading this book…a great educational resource for students and teachers." (Information Retrieval, 2008)

Read More Show Less

Product Details

Meet the Author

Zdravko Markov, PhD, is Associate Professor of Computer Science at Central Connecticut State University. The author of three textbooks, Dr. Markov teaches undergraduate and graduate courses in computer science and artificial intelligence. He is currently a Principal Investigator (PI) in a National Science Foundation–funded project designed to introduce machine learning to undergraduates.

Daniel T. Larose, PhD, is Professor of Statistics in the Department of Mathematical Sciences at Central Connecticut State University. He is the author of three data mining books and a forthcoming textbook in undergraduate statistics. He developed and directs CCSU's DataMining@CCSU programs.

Read More Show Less

Table of Contents

PREFACE.

PART I: WEB STRUCTURE MINING.

1 INFORMATION RETRIEVAL AND WEB SEARCH.

Web Challenges.

Web Search Engines.

Topic Directories.

Semantic Web.

Crawling the Web.

Web Basics.

Web Crawlers.

Indexing and Keyword Search.

Document Representation.

Implementation Considerations.

Relevance Ranking.

Advanced Text Search.

Using the HTML Structure in Keyword Search.

Evaluating Search Quality.

Similarity Search.

Cosine Similarity.

Jaccard Similarity.

Document Resemblance.

References.

Exercises.

2 HYPERLINK-BASED RANKING.

Introduction.

Social Networks Analysis.

PageRank.

Authorities and Hubs.

Link-Based Similarity Search.

Enhanced Techniques for Page Ranking.

References.

Exercises.

PART II: WEB CONTENT MINING.

3 CLUSTERING.

Introduction.

Hierarchical Agglomerative Clustering.

k-Means Clustering.

Probabilty-Based Clustering.

Finite Mixture Problem.

Classification Problem.

Clustering Problem.

Collaborative Filtering (Recommender Systems).

References.

Exercises.

4 EVALUATING CLUSTERING.

Approaches to Evaluating Clustering.

Similarity-Based Criterion Functions.

Probabilistic Criterion Functions.

MDL-Based Model and Feature Evaluation.

Minimum Description Length Principle.

MDL-Based Model Evaluation.

Feature Selection.

Classes-to-Clusters Evaluation.

Precision, Recall, and F-Measure.

Entropy.

References.

Exercises.

5 CLASSIFICATION.

General Setting and Evaluation Techniques.

Nearest-Neighbor Algorithm.

Feature Selection.

Naive Bayes Algorithm.

Numerical Approaches.

Relational Learning.

References.

Exercises.

PART III: WEB USAGE MINING.

6 INTRODUCTION TO WEB USAGE MINING.

Definition of Web Usage Mining.

Cross-Industry Standard Process for Data Mining.

Clickstream Analysis.

Web Server Log Files.

Remote Host Field.

Date/Time Field.

HTTP Request Field.

Status Code Field.

Transfer Volume (Bytes) Field.

Common Log Format.

Identification Field.

Authuser Field.

Extended Common Log Format.

Referrer Field.

User Agent Field.

Example of a Web Log Record.

Microsoft IIS Log Format.

Auxiliary Information.

References.

Exercises.

7 PREPROCESSING FOR WEB USAGE MINING.

Need for Preprocessing the Data.

Data Cleaning and Filtering.

Page Extension Exploration and Filtering.

De-Spidering the Web Log File.

User Identification.

Session Identification.

Path Completion.

Directories and the Basket Transformation.

Further Data Preprocessing Steps.

References.

Exercises.

8 EXPLORATORY DATA ANALYSIS FOR WEB USAGE MINING.

Introduction.

Number of Visit Actions.

Session Duration.

Relationship between Visit Actions and Session Duration.

Average Time per Page.

Duration for Individual Pages.

References.

Exercises.

9 MODELING FOR WEB USAGE MINING: CLUSTERING, ASSOCIATION, AND CLASSIFICATION.

Introduction.

Modeling Methodology.

Definition of Clustering.

The BIRCH Clustering Algorithm.

Affinity Analysis and the A Priori Algorithm.

Discretizing the Numerical Variables: Binning.

Applying the A Priori Algorithm to the CCSU Web Log Data.

Classification and Regression Trees.

The C4.5 Algorithm.

References.

Exercises.

INDEX.

Read More Show Less

Customer Reviews

Be the first to write a review
( 0 )
Rating Distribution

5 Star

(0)

4 Star

(0)

3 Star

(0)

2 Star

(0)

1 Star

(0)

Your Rating:

Your Name: Create a Pen Name or

Barnes & Noble.com Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & Noble.com that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & Noble.com does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at BN.com or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation

Reminder:

  • - By submitting a review, you grant to Barnes & Noble.com and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Noble.com Terms of Use.
  • - Barnes & Noble.com reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & Noble.com also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on BN.com. It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

 
Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously

    If you find inappropriate content, please report it to Barnes & Noble
    Why is this product inappropriate?
    Comments (optional)