Filtering the Web to Feed Data Warehouses


Information is a key factor in business today, and data warehousing has become a major activity in the development and management of information systems to support the proper flow of information. Unfortunately, the majority of information systems are based on structured information stored in organizational databases, which means that the company is isolated from the business environment by concentrating on their internal data sources only. It is therefore vital that organizations take advantage of external ...

See more details below
Paperback (Softcover reprint of the original 1st ed. 2002)
$139.00 price
Other sellers (Paperback)
  • All (11) from $102.93   
  • New (9) from $102.93   
  • Used (2) from $138.99   
Sending request ...


Information is a key factor in business today, and data warehousing has become a major activity in the development and management of information systems to support the proper flow of information. Unfortunately, the majority of information systems are based on structured information stored in organizational databases, which means that the company is isolated from the business environment by concentrating on their internal data sources only. It is therefore vital that organizations take advantage of external business information, which can be retrieved from Internet services and mechanically organized within the existing information structures. Such a continuously extending integrated collection of documents and data could facilitate decision-making processes in the organization. Filtering the Web to Feed Data Warehouses discusses areas such as:
- how to use data warehouse for filtering Web content
- how to retrieve relevant information from diverse sources on the Web
- how to handle the time aspect
- how to mechanically establish links among data warehouse structures and documents filtered from external sources
- how to use collected information to increase corporate knowledge
and gives a comprehensive example, illustrating the idea of supplying data warehouses with relevant information filtered from the Web.

Read More Show Less

Product Details

  • ISBN-13: 9781447111078
  • Publisher: Springer London
  • Publication date: 1/28/2013
  • Edition description: Softcover reprint of the original 1st ed. 2002
  • Edition number: 1
  • Pages: 267
  • Product dimensions: 6.14 (w) x 9.21 (h) x 0.60 (d)

Table of Contents

1 Introduction.- 1.1 Information Systems.- 1.2 Information Filtering Systems.- 1.3 Database Systems.- 1.3.1 Transactional Systems.- 1.3.2 Analytical Systems.- 1.4 Organization of this Book.- 2 Data Warehouse: Corporate Knowledge Repository.- 2.1 Introduction.- 2.2 Data Warehouse Definition and Features.- 2.2.1 Definition.- 2.2.2 Metadata.- 2.2.3 Characteristic Features of Data in the Data Warehouse.- 2.3 Data Warehouse System.- 2.3.1 Architecture of the Data Warehouse System.- 2.3.2 Metadata Structures.- 2.3.3 Data Warehouse Products.- 2.4 Deploying Data Warehouse in the Organization.- 2.4.1 Data Warehouse Life Cycle.- 2.4.2 Analysis and Research.- 2.4.3 Identifying Architecture and Demands.- 2.4.4 Design and Development.- 2.4.5 Implementation and On-going Administration.- 2.5 Knowledge Management in Data Warehouses.- 2.5.1 Knowledge Management.- 2.5.2 Knowledge in Terms of Data Warehousing.- 2.5.3 Knowledge Discovery in Data Warehouses.- 2.5.4 Significance of Business Metadata.- 2.6 Evolution of the Data Warehouse.- 2.6.1 Criticism of the Traditional Data Warehouse.- 2.6.2 Virtual Data Warehouse.- 2.6.3 Information Data Superstore.- 2.6.4 Exploration Warehouse.- 2.6.5 Internet/Intranet Data Warehouse.- 2.6.6 Web Farming.- 2.6.7 Enterprise Information Portals.- 2.7 Chapter Summary.- 2.8 References.- 3 Knowledge Representation Standards.- 3.1 Introduction.- 3.1.1 Basic Concepts.- 3.1.2 Metadata Representation.- 3.1.3 Metadata Interoperability.- 3.1.4 Theory of Metadata.- 3.2 Markup Languages.- 3.2.1 Background.- 3.2.2 XML Document.- 3.2.3 Document Presentation.- 3.2.4 Document Linking.- 3.2.5 Programming Interfaces.- 3.3 Dublin Core.- 3.3.1 Dublin Core Metadata Elements.- 3.3.2 Dublin Core in HTML.- 3.4 Warwick Framework.- 3.5 Meta Content Framework.- 3.5.1 Origins of MCF.- 3.5.2 Conceptual Building Blocks of MCF.- 3.5.3 XML Syntax.- 3.5.4 Directed Labelled Graph Formalism.- 3.6 Resource Description Framework.- 3.6.1 Background.- 3.6.2 Formal RDF Data Model.- 3.6.3 The RDF Syntax.- 3.6.4 RDF Schema.- 3.7 Common Warehouse Metamodel.- 3.7.1 History of OMG Projects.- 3.7.2 Objectives of the CWM.- 3.7.3 Metadata Architecture.- 3.7.4 CWM Elements.- 3.7.5 Conclusions for CWM.- 3.8 Chapter Summary.- 3.9 References.- 4 Information Filtering And Retrieval From Web Sources.- 4.1 Introduction.- 4.1.1 Document, Information, Knowledge.- 4.1.2 Indexing.- 4.1.3 Hypertext.- 4.1.4 Information on the Web.- 4.1.5 Constraints of this Book.- 4.2 Information Retrieval Systems.- 4.2.1 Definitions.- 4.2.2 Information Retrieval System Architectures and Models.- 4.2.3 Sample Information Retrieval Systems.- 4.3 Information Filtering Systems.- 4.3.1 Filtering Versus Retrieval.- 4.3.2 Information Filtering Models and Architectures.- 4.3.3 Sample Filtering Systems.- 4.4 Internet Sources of Business Information.- 4.4.1 Business View on Internet Information Sources.- 4.4.2 General Characteristics of Business Information Sources.- 4.4.3 Information Overflow.- 4.5 Filtering the Web to Feed Business Information Systems.- 4.5.1 Problems with Web Filtering and Retrieval.- 4.5.2 New Information Filtering System Model Proposal.- 4.5.3 Transparent Filtering and Retrieval.- 4.6 Chapter Summary.- 4.7 References.- 5 Enhanced Data Warehouse.- 5.1 Introduction.- 5.2 Justification of the Need for Integration.- 5.2.1 Value of Knowledge.- 5.2.2 Attention Economy.- 5.2.3 Content Management and Lifecycle of Content.- 5.2.4 Example of Integration: Metadata and Data.- 5.3 Preliminary Vision of the System.- 5.3.1 Analytical Point of View.- 5.3.2 Trends.- 5.3.3 Goals of the System.- 5.3.4 User Requirements Towards the Information Retrieval Systems.- 5.4 Software Agents.- 5.4.1 Introduction.- 5.4.2 Intelligent Agents or Just Agents?.- 5.4.3 Software Agents or Just Agents?.- 5.4.4 Possible Applications of Agents.- 5.4.5 Definitions of Software Agents.- 5.4.6 Agent Properties.- 5.4.7 Classifications of Software Agents.- 5.4.8 Agent-based Systems and Multi-agent Systems.- 5.5 Proposed Solution: enhanced Data Warehouse.- 5.5.1 Introduction.- 5.5.2 Overview of the eDW System.- 5.5.3 Assumptions for the eDW System.- 5.5.4 Components.- 5.5.5 Agent-based System Architecture.- 5.5.6 Logging Server.- 5.5.7 Profiling Server.- 5.5.8 Source Agent Server.- 5.5.9 Document Server.- 5.5.10 Properties of eDW Agents.- 5.6 Formal Model of eDW.- 5.6.1 CSL: The Extension of the Organizational Metamodel.- 5.6.2 Time Consistency among Documents and Warehouse Data.- 5.6.3 DWL: The Intranet Collection of Relevant Documents for the Data Warehouse.- 5.6.4 enhanced Data Warehouse Report: The Final Product of the eDW System.- 5.6.5 Formal Definitions of eDW Agents.- 5.7 System Implementation.- 5.7.1 Programming Environment.- 5.7.2 System Control Centre.- 5.7.3 Communication.- 5.7.4 Status.- 5.7.5 Configuration File.- 5.7.6 Logging Server.- 5.8 Chapter Summary.- 5.9 References.- 6 Profiling.- 6.1 Introduction.- 6.2 Personalization and Data Warehouse Profiles.- 6.2.1 Classification of Information.- 6.2.2 Personalization.- 6.2.3 Personalization in Data Warehouses and its Aspects.- 6.2.4 Overview of Profile Creation.- 6.2.5 Data Warehouse Profiles.- 6.3 Algorithms Specification.- 6.3.1 Algorithm for Creating Warehouse Profiles.- 6.3.2 Computational Complexity.- 6.3.3 Thesauri.- 6.4 Profiling Server.- 6.4.1 Basic Assumptions.- 6.4.2 Profiling Agent.- 6.4.3 User Interface in Profiling Application.- 6.4.4 Sample Results.- 6.5 Chapter Summary.- 6.6 References.- 7 Source Exploitation.- 7.1 Introduction.- 7.2 Sample Business Content Providers.- 7.2.1 Sample Business Gateways.- 7.2.2 Sample Business Search Engines.- 7.2.3 Sample Business Portals and Vortals.- 7.2.4 Sample Business Online Databases.- 7.3 Information Ants to Filter Information from Internet Sources.- 7.3.1 Introduction.- 7.3.2 Ant Colony Optimization.- 7.3.3 Environment for Information Ants.- 7.3.4 Information Ants to Filter Information from the Web.- 7.3.5 Experiment with Ant-like Navigation.- 7.3.6 Advantages and Drawbacks of the Proposed Solution.- 7.4 Indexing Parser.- 7.4.1 Parsing Web Documents.- 7.4.2 Indexing Web Documents.- 7.5 Transparent Filtering in the eDW System.- 7.5.1 Building Warehouse Profiles.- 7.5.2 Registering Sources.- 7.5.3 Source Exploration.- 7.5.4 Source Penetration.- 7.6 Chapter Summary.- 7.7 References.- 8 Building Data Warehouse Library.- 8.1 Introduction.- 8.1.1 Characteristics of WWW: A Dream of Non-volatile Internet.- 8.1.2 Digital Libraries.- 8.2 Time Indexing.- 8.2.1 Finite State Automaton.- 8.2.2 Time Indexer.- 8.2.3 Trapezoidal Time Indices.- 8.2.4 Simple Overlap Measure for Trapezoidal Time Indices.- 8.3 Experiment with Time Indexing.- 8.3.1 Experiment with Time Indexing Real-World Documents.- 8.3.2 Conclusions for the eDW System.- 8.4 Future Trends: Multimedia Indexing.- 8.4.1 Introduction.- 8.4.2 Filtering Web Documents.- 8.4.3 Neural Nets for Image Categorization.- 8.4.4 The Proposed Solution ¡ª Perceptron Categorization Tree.- 8.4.5 Advantages and Drawbacks.- 8.4.6 Application for eDW.- 8.5 Chapter Summary.- 8.6 References.- 9 Context Queries And Enhanced Reports.- 9.1 Introduction.- 9.2 Context Queries.- 9.2.1 Definition of Context.- 9.2.2 Justification of Transparent Retrieval.- 9.2.3 Elements of Context.- 9.2.4 Conceptual Similarity Measure.- 9.2.5 Simple Temporal Similarity Measure.- 9.2.6 Parameterized Temporal Similarity Measure.- 9.2.7 Pertinence.- 9.3 enhanced Report.- 9.3.1 User Interface in Accessing the Information.- 9.3.2 How enhanced Report is Created.- 9.4 Reporting Application.- 9.4.1 Basic Assumptions.- 9.4.2 Description of the Algorithms.- 9.4.3 Context Query Agent.- 9.4.4 Computational Complexity.- 9.4.5 User Interface in Reporting Application.- 9.4.6 Results.- 9.5 Histograms: The Helpful Tool for Analysis.- 9.5.1 Non-parameterized Histogram.- 9.5.2 Past-oriented Analysis.- 9.5.3 Future-oriented Analysis.- 9.5.4 General Documents.- 9.5.5 Detailed Documents.- 9.5.6 Compact and Dispersed Histograms.- 9.6 Chapter Summary.- 9.7 References.- 10 Conclusions.- 10.1 Concluding Remarks.- 10.2 Improvements.- 10.3 Open Issues and Future Work.

Read More Show Less

Customer Reviews

Be the first to write a review
( 0 )
Rating Distribution

5 Star


4 Star


3 Star


2 Star


1 Star


Your Rating:

Your Name: Create a Pen Name or

Barnes & Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation


  • - By submitting a review, you grant to Barnes & and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Terms of Use.
  • - Barnes & reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously

    If you find inappropriate content, please report it to Barnes & Noble
    Why is this product inappropriate?
    Comments (optional)