- Shopping Bag ( 0 items )
Presents corporate portal technology from planning and modeling stages to implementation with a special emphasis on converting legacy system data to modern data warehouses.
I know the data is there, but I can't get the information I need.
How many times have you heard this cry from management? But you are not alone; the same cry has been expressed in most languages around the world. It is a common problem: the data is in the computer, but cannot be located readily; or it is not in a format that is suitable for use by management. So what do you do?
You have taken an important first step: you are reading this book. Between the three of us-Clive Finkelstein, Peter Aiken and yourself we will discuss approaches that can help you resolve this problem. The solution is based on Corporate Portals (also called Enterprise Portals or Enterprise Information Portals). It is also based on Engineering Enterprise Portals (EEP)-a methodology used for the design, development, and deployment of Enterprise Portals (EPs). We will discuss the basic concepts of Corporate Portals, Enterprise Portals and the Engineering Enterprise Portals methodology later in this chapter.
Enterprise Portals are based on Data Warehousing technologies, using Metadata and the Extensible Markup Language (XML) to integrate both structured and unstructured data throughout an enterprise. Metadata, XML, and EPs will be vital elements of the twenty-first century enterprise. We will briefly introduce the basic concepts of metadata, XML, and Enterprise Portals below, covering them in more detail in later chapters.
Structured data exists in databases and data files that are used by current and older operational systems in an enterprise. We call these older systems legacy systems; we call the data they use legacy data. In most enterprises, structured data comprises only 10 percent of the data , information, and knowledge resources of the business; the other 90 percent exists as unstructured data in textual documents, or as graphics and images, or in audio or video formats. These unstructured data sources are not easily accessible to Data Warehouses, but EPs use metadata and XML to integrate both structured and unstructured data seamlessly, for easy access throughout the enterprise.
IT staff in most enterprises have a common problem. How can they convince managers to plan, budget, and apply resources for metadata management? What is metadata and why is it important? What technologies are involved? Internet and intranet technologies are part of the answer and will get the immediate attention of management. XML is the other technology. The following analogy may help you outline to management the important role that metadata takes in an enterprise.
Every country is now interconnected in a vast, global telephone network. We are now able to telephone anywhere in the world. We can phone a number, and the telephone assigned to that number would ring in Russia, or China, or in Outer Mongolia. But when it is answered, we may not understand the person at the other end. They may speak a different language. So we can be connected, but what is said has no meaning. We cannot share information.
Today, we also use a computer and the World Wide Web. We enter a Web site address into a browser on our desktop machine-a unique address in words that is analogous to a telephone number. We can then be connected immediately to a computer assigned to that address and attached to the Internet anywhere in the world. That computer sends a Web page based on the address we have supplied, to be displayed in our browser. This is typically in English, but may be in another language. We are connected, but like the telephone analogy-if it is in another language, what is said has no meaning. We cannot share information.
Now consider the reason why it is difficult for some of the systems used in an organization to communicate with and share information with other systems. Technically, the programs in each system are able to be interconnected and so can communicate with other programs. But they use different terms to refer to the same data that needs to be shared. For example, an accounting system may use the term "customer" to refer to a person or organization that buys products or services. Another system may refer to the same person or organization as a "client." Sales may use the term "prospect." They all use different terminology-different language-to refer to the same data and information. But if they use the wrong language, again they cannot share information.
The problem is even worse. Consider terminology used in different parts of the business. Accountants use a "jargon"-a technical language-which is difficult for non-accountants to understand. So also the jargon used by engineers, or production people, or sales and marketing people, or managers is difficult for others to understand. They all speak a different "language." What is said has no meaning. They cannot easily share common information. In fact in some enterprises it is a miracle that people manage to communicate meaning at all!
Each organization has its own internal language, its own jargon, which has evolved over time so similar people can communicate meaning. As we saw above, there can be more than one language or jargon used in an organization. Metadata identifies an organization's own "language." Where different terms refer to the same thing, a common term is agreed for all to use. Then people can communicate more clearly. And systems and programs can intercommunicate with meaning. But without a clear definition and without common use of an organization's metadata, information cannot be shared effectively throughout the enterprise.
Previously each part of the business maintained its own version of "customer," or "client" or "prospect." They defined processes-and assigned staff-to add new customers, clients, or prospects to their own files and databases. When common details about customers, clients, or prospects changed, each redundant version of that data also had to be changed. It requires staff to make these changes. Yet these are all redundant processes making the same changes to redundant data versions. This is enormously expensive in time and people. It is also quite unnecessary.
The importance of metadata can now be seen. Metadata defines the common language used within an enterprise so that all people, systems, and programs can communicate precisely. Confusion disappears. Common data is shared. And enormous cost savings are made. For it means that redundant processes (used to maintain redundant data versions up to date) are eliminated, as the redundant data versions are integrated into a common data version for all to share.
Much effort has earlier gone into the definition and implementation of Electronic Data Interchange (EDI) standards to address the problem of intercommunication between dissimilar systems and databases. EDI has now been widely used for business-to-business commerce for many years. It works well, but it is quite complex and very expensive. As a result, it is cost-justifiable generally only for large corporations.
Once an organization's metadata is defined and documented, all programs can use it to communicate. EDI was the mechanism that was used previously. But now this intercommunication has become much easier.
Extensible Markup Language (XML) is a new Internet technology that has been developed to address this problem. XML can be used to document the metadata used by one system so that it can be integrated with the metadata used by other systems. This is analogous to language dictionaries which are used throughout the world, so that people from different countries can communicate. Legacy files and other databases can now be integrated more readily. Systems throughout the business can now coordinate their activities more effectively as a direct result of XML and management support for metadata.
XML now provides the capability that was previously only available to large organizations through the use of EDI. XML allows the metadata used by each program and database to be published as the language to be used for this intercommunication. But distinct from EDI, XML is simple to use and inexpensive to implement for both small and large organizations. Because of this simplicity, we like to think of XML as:
XML is EDI for the Rest of Us
XML will become a major part of the application development mainstream. It provides a bridge between structured and unstructured data, delivered via XML then converted to HTML for display in Web browsers. Together with metadata, XML is a key component in the design, development, and deployment of Enterprise Portals.
Metadata is used to define the structure of an XML document or file. Metadata is published in a Document Type Definition (DTD) file for reference by other systems. A DTD file defines the structure of an XML file or document. It is analogous to the Database Definition Language (DDL) file that is used to define the structure of a database, but with a different syntax.
An example of an XML document identifying data retrieved from a PERSON database is illustrated in Figure 1-1. This includes metadata markup tags (surrounded by < ... >, such as <p e r s o n_name >) that provide various details about a person. From this, we can see that it is easy to find specific contact information in < c o n t a c t _d e t a i 1 s >, such as <email>, <phone>, <f ax>, and <mobile> (cell phone) numbers. Although we have not shown it here, the DTD also specifies
Metadata that is used by various industries, communities, or bodies can be used with XML to define markup vocabularies. The World Wide Web Consortium (W3C) has developed a standard framework that can be used to define these vocabularies. This is called the Resource Description Framework (RDF). It is a model for metadata applications that support XML. RDF was initiated by the W3C to build standards for XML applications so that they can interoperate and intercommunicate more easily, avoiding the communication problems that we discussed earlier.
With XML, many applications that were difficult to implement before-often due to metadata differences-now become possible. For example, an organization can define the unique metadata used by each supplier's legacy inventory systems. This enables the organization to place orders via the Internet directly with those suppliers' systems, for automatic fulfillment of product orders. We will see an example of this in Chapter 12.
XML is enabling technology to integrate structured and unstructured data for next-generation E-Commerce and EDI applications. Web sites will evolve to use XML, with far greater power and flexibility than offered by HTML. Netscape Communicator 5.0 and Microsoft Internet Explorer 5.0 browsers both support XML. Most productivity tools and office suites (such as Microsoft Office 2000) support XML. Business Intelligence and Knowledge Management tools will support XML. XML development tools are also being released so that XML applications can be developed more easily.
The acceptance of XML is progressing rapidly, as it offers a very simple-yet extremely powerful-way to intercommunicate between different databases and systems, both within and outside an organization. How well an organization accesses and uses its knowledge resources can determine its competitive advantage and future prosperity. Use and application of knowledge will become even more important in the competitive Armageddon of the Internet, in which we will all participate.
The tools are coming, but a greater task still remains to be completed. This is the definition of your own metadata, your common enterprise language for intercommunication, so that you can use these tools effectively. The definition of metadata depends on knowledge of data modeling, previously carried out by IT people. But this is not just a task for IT. As it is vitally dependent on business knowledge, it also requires the involvement of business experts. Not by interview, but by their active participation. While data modeling has until now been a technical IT discipline, business data modeling is not. It can be learned by business people as well as IT staff. This was one of our motivations for the development of the Engineering Enterprise Portals methodology.
One thing we are not short of today, is information. We are swimming in it! Our information comes from traditional printed sources such as books, magazines, newspapers, subscription reports and newsletters; from audio sources such as radio; from video sources such as free-to-air television or cable TV; from email and from word-of-mouth. The saving grace with these information sources-apart from radio and free-to-air TV-is that they are limited only to those who have subscribed to receive that information.
Not any more. Even today, and certainly more so in the future, each of these sources is moving to the Internet. They are offered as free services, where the cost of preparation is paid not by subscription but by advertising. Even word-of-mouth, previously a reliable source of information from people you knew personally and whose opinion you respected, has moved to the Internet in newsgroups and chat rooms-but with opinions offered by people, perhaps in another country, who are totally unknown to you. Both accurate and inaccurate comment now circle the globe not at word-of-mouth speed, but at electronic speed.
Email is the killer application of the Internet; even more so of the corporate intranet. Enormous knowledge is retained in corporate email archives-much to the chagrin of Microsoft, with certain email messages used by government prosecutors in the Microsoft Antitrust trial as smoking guns to illustrate alleged abuses of monopoly power. Corporate email is a knowledge resource that is of great value, yet until now it has been largely inaccessible.
Text searches on the Internet by traditional search engines are largely ineffective; a simple query can return thousands of links containing the entered keywords or search phrase. Only a small fraction of these may be relevant, yet each link must be manually investigated to assess its content-if relevancy ratings are not also provided.
The problem is no less severe with enterprises. We are inundated with information. To the credit of the Information Technology (IT) industry, at least this information is being organized and made more readily available through Data Warehouses. We discuss the building of Data Warehouses extensively in this book.
Most information in Data Warehouses is based on structured data sources as operational databases used by older legacy systems and relational databases. Data Warehouse products are also now becoming available that use Internet technologies. These valuable information tools can now be used within an exterprise across the corporate intranet. The information is thus more readily available.
We discussed earlier that structured data represents only 10 percent of the information and knowledge resource in most enterprises. The remaining 90 percent exists as unstructured data that has been largely inaccessible to Data Warehouses. Text documents, email messages, reports, graphics, images, audio and video files all are valuable sources of data, information, and knowledge that have been untapped. They exist in physical formats that have been difficult to access by computer-as if they were behind locked doors.
The technologies are now available to open these doors. XML is technology, as we have briefly seen. XML enables structured and unstructured data sources to be integrated easily, where this was extremely difficult before. Organizations will develop new business processes and systems based on this integration, using Business Reengineering and Systems Reengineering methods. They will at last be able to break away from the business process constraints that have inhibited change in the past. . .