Business intelligence and analytics software enable businesses to analyze performance data in order to make better decisions through the use of cloud computing—an Internet-based model for convenient, on-demand network access to a shared pool of configurable computing resources. This book is a practitioner’s guide for successful evaluation and design for implementation of Cognos Business Intelligence cloud solution, for either Cognos 8 BI or Cognos Business Intelligence Version 10. With pragmatic and practical information about the best practices and guidelines, as well as specific software and configuration steps, this guide for solutions and IT architects includes detailed screen shots, code samples, and input instructions.
|Publisher:||MC Press, LLC|
|Product dimensions:||5.90(w) x 8.90(h) x 0.40(d)|
About the Author
Read an Excerpt
IBM Business Analytics and Cloud Computing
Best Practices for Deploying Cognos Business Intelligence to the IBM Cloud
By Anant Jhingran, Stephan Jou, William Lee, Thanh Pham, Biraj Saha
MC PressCopyright © 2010 IBM Corporation
All rights reserved.
Cloud Computing and Analytics
At the time of writing, a Google search on the phrase "cloud computing definition" returned more than 3.5 million results. There appear to be as many definitions of cloud computing as there are people excited about it! Some of these definitions are very good. For example, the U.S. National Institute of Standards and Technology (NIST) provides a concise but comprehensive effort at http://csrc.nist.gov/groups/SNS/cloud-computing
This book will not repeat such efforts at defining cloud computing. Instead, we intend this book to be a practical companion to leveraging cloud computing in your IBM Cognos analytics solution. As such, it focuses on the main characteristics of cloud computing with respect to their tangible advantages for you, the cloud practitioner.
On cloud computing platforms, the required IT infrastructure for your applications is provided to you, based on what you actually require. Nearly all clouds now provide compute cycles, networking, storage space, and memory capacity, all on an on-demand basis. Because you can simply release unused resources back into the pool, you do not have to worry about over-purchasing more hardware than you actually need.
In a pure and simple comparison with traditional data centers, this arrangement provides immediate and obvious cost advantages. Underutilization of purchased hardware is a genuine problem. It's what made virtualization such an attractive IT strategy in the early 2000s: replace the physical hardware with virtual hardware so you can allocate virtual machines when you need them and deallocate them when you're done. This strategy is particularly cost-effective for analytical applications that are tied to seasonal behavior, such as a sales application that is used only during the end of a quarter.
Small wonder that the major cloud platforms, including those from Amazon and IBM, are, at their lowest level, Web interfaces wrapped around virtual machines (VMs), storage, and networking. Being able to create and configure VMs through a simple browser interface or through Representational State Transfer (REST) calls is one simple way to think about and approach cloud computing.
This pay-as-you-go, utility-based cost model is, in some ways, the most innovative aspect of cloud computing. You trade away the requirement for up-front capital expenditures (capex) to purchase hardware and software, and instead favor ongoing operational expenditures (opex) based on what you actually use.
This book takes you through the process of leveraging such an infrastructure to create a fully working IBM Cognos Business Intelligence (BI) virtual instance running in the cloud. This instance is then saved as an image, consuming no resources or cost, until you are ready and have a need for a Cognos deployment.
While the steps in this book are based on the IBM Smart Business Development and Test Cloud, they are also applicable with little modification to other cloud infrastructures, such as Amazon's. And of course, the best practices we describe here have general applicability and relevance, no matter how you ultimately deploy your Cognos application.
On-Demand Higher Services
Moving above the so-called Infrastructure-as-a-Service (IaaS) layer to the higher so-called Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS) layers is where cloud starts to differentiate itself from simple virtualization. The PaaS and SaaS cloud layers bring higher-level services to the table, and things get much more interesting. Rather than thinking in terms of machines, networking, bandwidth, and storage space, imagine services related to the provisioning of complex topologies with defined quality-of-service constraints, analytical and reporting services, and Hadoop-style big-calculation jobs.
While nascent, there is a tremendous amount of growth in the cloud computing ecosystem around these higher-level cloud services. For example, both Amazon and Yahoo! offer platforms that can execute Hadoop applications in their cloud infrastructures.
There is a good and practical reason why higher-level services are interesting: they are more cost-efficient. For example, running a Hadoop job using Amazon's Elastic MapReduce, which leverages Amazon's Elastic Compute (EC2) and Simple Storage Service (S3) under the covers, costs less than directly using EC2 or S3 yourself. That's because not only do you avoid having to install and configure the software, but Amazon can optimize and manage the entire infrastructural stack much more efficiently.
The end result is that as we move up the cloud stack and focus more on higher-level services that provide targeted solutions and workloads, we are able to build more for less.
Resource Pooling and Rapid Elasticity
The distressing amount of hardware underutilization in traditional data centers that we noted previously remains the main reason why virtualization and cloud has been on the IT agenda for the past few years and will continue to be in the years to come. Being able to more closely match capacity and cost with demand is the cost justification we all need.
This ability to pool and share resources to match demand clearly requires rapid elasticity. The amount of storage available (and being paid for) should always be slightly ahead of the growth of your database. Any new virtual machines required to handle added load should be recruited and connected to the system in minutes or hours, not days or weeks.
This book provides techniques and best practices to scale up or down a Cognos BI system, dynamically recruiting additional virtual machines and connecting them or disconnecting them as required.
Flexible Deployment Models
When most people first hear of cloud computing, they think of the public cloud — an IT infrastructure that is delivered externally. While a public cloud is appropriate for many scenarios, there are many other cases in which data cannot leave the enterprise boundaries. Fortunately, in addition to public clouds, you can deploy your solution to private clouds, where the cloud infrastructure is erected within the enterprise firewalls and managed by the enterprise IT department itself, or even to hybrid clouds, which are a combination of public and private clouds, with systems on both sides and a secure connection between them.
No single model will work in all cases. Data security and sensitivity, bandwidth and latency, and even legal and regulatory requirements all need to factor into the deployment and topology of this solution. Fortunately, cloud computing offers a flexible allocation of resources and the ability to loosely couple systems together with standard Internet protocols, letting you design the right solution for the requirements on hand.
This book provides some general guidance about which components of your Cognos deployment can be located in the cloud, and when.
The Workload Model for Cloud
When you do any sort of reading in the area of cloud computing, you quickly run into the concept of a workload. A workload is a set of operations executed on IT resources for a particular purpose and typically considered as a single logical element. The key components of a given workload are the application(s), the usage pattern, the service level agreement, and a data structure. For example, you might hear people talk about a "departmental BI with a seasonal transaction model but large data volume" workload.
A workload approach to cloud computing helps us understand when an application is ideal for the cloud, as well as which cloud architecture is most appropriate. IBM has analyzed the various workload types, based on the relative cost and benefits of leveraging cloud computing. Figures 1.1 and 1.2 summarize the results for a typical example in each broad workload category for external and private clouds.
These figures serve as good guidance for representative workloads within each category, but, of course, there are always unique situations in each category. A more detailed cloud affinity tool that lets you input the specific characteristics of your particular analytics workload (or other workload types) is available at http://freedom.researchlabs.ibm.com/ibmappcr/applications/p1 /CloudAffinityAnalyzer/CloudAffinityAnalyzer.html.
An analytical workload needs to be examined first to see how appropriate it is for the cloud and then for what type of cloud deployment makes sense. This book covers several considerations, but the main ones are usually the nature of the data being analyzed and its sensitivity to public exposure. If the data is too large or too sensitive to be moved, a private or possibly a hybrid cloud is a strong candidate.
Examples of ideal IBM Cognos Business Intelligence workloads for the IBM Smart Business Development and Test Cloud include the following:
Development and test workloads, including pilots and proofs-of-concept, which typically involve non-sensitive or small amounts of data
Standalone BI implementations
Variable or seasonable workloads that take advantage of peaks and valleys and load balance between on-premises and cloud-based systems
Cloud-to-cloud applications, where the data is coming from another cloud service
These workloads all take advantage of several key benefits that cloud computing provides:
Significant hardware and software cost savings
The flexibility of opex versus capex
On-demand, elastic IT resources
Faster time to provisioning
These benefits enable you to standardize and share costs while maintaining the control and ownership within your IT department.
This book, and particularly Chapter 2, provides best practices on other characteristics of your analytics workload and its implications for your Cognos Business Intelligence solution architecture.
Cognos Software and Analytics as a Service
The concept of analytics as a service is a compelling one: identify higher-level services associated with analytical functions — such as reporting, querying, prediction, and exploration — and provide those in a cloud-hosted model. As we discussed earlier, providing such higher-level services in the cloud can have several advantages. Although it's still early days in this area, there are already a number of interesting and innovative analytics-as-service experiments in the cloud ecosystem.
A cloud deployment of IBM Cognos BI actually provides much of what you might expect from an analytics as a service offering and gives a hint of what might be in the future. With a platform built on service-oriented architecture concepts, Cognos 8 and Cognos 10 BI provide programmatic interfaces that let their capabilities be leveraged by both Cognos and custom applications. There are several integration points:
Cognos portlets can be embedded in external applications through the portlet SDK, Web Services for Remote Portlets (WSRP), Webparts, or iWidgets.
Web content can be embedded in Cognos portlets or reports through the portlet SDK or HTML components.
Reports and data can be processed by external applications, using the SOAP SDK or Cognos Mashup Service.
Data, models, and other content can be inserted or manipulated through the SOAP SDK, FM API, or data drivers.
Many of the relevant application program interfaces in these integration points are cloud-friendly, leveraging standard protocols such as ATOM, REST, SOAP, and XML. As a result, integrating to your cloud-deployed instance of IBM Cognos Business Intelligence is often an exercise in ensuring that you have network connectivity between the client machine and your cloud instance and then pointing the client application to the instance's dispatcher URI.
This book provides steps on how to configure other IBM Cognos applications that do not run in the cloud (e.g., Framework Manager, Mobile, Office, PowerPlay, and Transformer) and point them to your cloud installation of the Cognos Business Intelligence Server. This background, along with the details provided by the various Cognos SDK manuals, will ensure that you, too, will be successful in using your own Cognos analytics cloud deployment as a service.CHAPTER 2
The first steps in getting started with cloud computing involve data, security, topology, and Linux considerations.
Databases and data sources can be co-located in the cloud with your cloud application. Or, they can be located on-premises behind your firewall, but with a secure connection to your cloud. Cognos Business Intelligence has two classes of data sources: 1) the content store and metric store database and 2) the query databases and other data sources. Figure 2.1 depicts these aspects of the Cognos 8 tiered architecture.
The content store is a relational database that contains data that Cognos BI needs, including report specifications, published models (and the packages that contain them), connection information for data sources, information about your users, and information about scheduling and bursting reports. The metric store is the equivalent of the content store for Metric Studio (an optional component of Cognos BI). It contains content for metric packages and other Metric Studio settings, such as user preferences. If you are not using Metric Studio, you do not need a metric store.
The query databases are relational databases that can be accessed through Cognos BI. They provide the data for its reports and analyses, through a JDBC or Virtual View Manager connection.
The data sources include all relational databases. Other, less common data sources can be accessed through Cognos BI, as well. These are not relational databases; they are things such as dimensional cubes and files.
For best performance, the content store and metric store databases should be as close as possible to the application on the network. Close proximity on the network minimizes latency between the Cognos BI Server components and the content and metric store databases. Ideally, therefore, these databases should be in the IBM Cloud environment — either in a separate virtual machine instance or with your application tier components in the same instance.
For your query databases, consider the specifics of your intended workload and scenario, as outlined in Table 2.1. Perhaps, for example, your reports require high performance or rapid querying of your data, and this data can easily be moved to the cloud, too. At the same time, privacy or security concerns are not a priority. In this case, you can realize significant cost savings by creating the query databases in the IBM Cloud.
In other situations, high performance or rapid data querying is not a priority, and there is a large amount of data that is difficult or expensive to move. Privacy, security, or other legal reasons may require you to maintain the data within your corporate firewalls. In this case, the query databases should be kept on-premises, within the network bounded by your firewalls. These databases can then be accessed from the IBM Cloud through a secure network connection. This configuration is sometimes referred to as a "hybrid cloud" environment because it is a mix of cloud instances and traditional behind-the-firewall instances.
In some circumstances, your query database is already in the cloud (e.g., Salesforce data). In this case, the security and latency challenges associated with the query data are not new to a cloud solution. Such "cloud-born" query data sources are ideal candidates for leaving in the cloud.
Another combination is also worth mentioning. In some situations, database replication can be used to copy an on-premises database instance to a database instance in the cloud. For example, one of IBM DB2®'s various replication alternatives might be an attractive option for you, provided you leverage a secured connection for the transaction.
File-based data sources, such as dimensional cubes and other files, are usually amenable to synchronization or transport to the cloud in a directory that is accessible to the Cognos application instance. They can also be synchronized to an IBM Cloud storage instance that appears as a mounted directory to your instance.
Excerpted from IBM Business Analytics and Cloud Computing by Anant Jhingran, Stephan Jou, William Lee, Thanh Pham, Biraj Saha. Copyright © 2010 IBM Corporation. Excerpted by permission of MC Press.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
Table of Contents
The Organization of This Book 1
Conventions Used in This Book 2
Corrections and Errata 3
1 Cloud Computing and Analytics 5
On-Demand Infrastructure 5
On-Demand Higher Services 6
Flexible Deployment Models 8
The Workload Model for Cloud 8
Cognos Software and Analytics as a Service 11
2 Getting Started 13
Data Considerations 13
Security Provider Considerations 16
Designing and Testing Your Topology 17
Embracing Linux® 18
3 Installation and Configuration 19
Set Up the Windows Client 19
Set Up and Configure the Cloud Instance 20
Configure the Windows Client 25
Assemble Your Software 28
Set Up the Database and Web Server 33
Set Up Cognos 8 or Cognos 10 BI Server 41
Configure Security and Access 71
Create a Cognos BI Cloud Image 77
Installation Variations 95
4 Security Best Practices 97
Cloud Security Best Practices 98
5 Handling Cloud Topologies 101
Using the Hosts File to Manage Multiple Images 101
Example: An Elastic Cognos Cluster with a Single Image 102
Creating Snapshots Using Private Images 106
Files in the Cloud 107
6 Performance and Scalability Best Practices 109
User Community and Geographic Distribution 110
Application Complexity 111
Web-Server-Tier Performance and Scalability 111
Application-Tier Performance and Scalability 112
Content Manager Performance and Scalability 112
Post-Deployment Consideration 113
7 High Availability Best Practices 115
Cognos Gateways and Application Servers 11$$$
The Cognos Application Server as a Gateway 11$$$
Active and Standby Cognos Content Manager 11$$$
IBM DB2 High Availability and Disaster Recovery (HADR) 11$$$