BN.com Gift Guide

Computation and Storage in the Cloud: Understanding the Trade-Offs

( 1 )

Overview

Computation and Storage in the Cloud is the first comprehensive and systematic work investigating the issue of computation and storage trade-off in the cloud in order to reduce the overall application cost. Scientific applications are usually computation and data intensive, where complex computation tasks take a long time for execution and the generated datasets are often terabytes or petabytes in size. Storing valuable generated application datasets can save their regeneration cost when they are reused, not to ...

See more details below
Other sellers (Paperback)
  • All (9) from $19.50   
  • New (6) from $35.81   
  • Used (3) from $19.50   
Computation and Storage in the Cloud: Understanding the Trade-Offs

Available on NOOK devices and apps  
  • NOOK Devices
  • Samsung Galaxy Tab 4 NOOK 7.0
  • Samsung Galaxy Tab 4 NOOK 10.1
  • NOOK HD Tablet
  • NOOK HD+ Tablet
  • NOOK eReaders
  • NOOK Color
  • NOOK Tablet
  • Tablet/Phone
  • NOOK for Windows 8 Tablet
  • NOOK for iOS
  • NOOK for Android
  • NOOK Kids for iPad
  • PC/Mac
  • NOOK for Windows 8
  • NOOK for PC
  • NOOK for Mac
  • NOOK for Web

Want a NOOK? Explore Now

NOOK Book (eBook)
$28.49
BN.com price
(Save 42%)$49.95 List Price

Overview

Computation and Storage in the Cloud is the first comprehensive and systematic work investigating the issue of computation and storage trade-off in the cloud in order to reduce the overall application cost. Scientific applications are usually computation and data intensive, where complex computation tasks take a long time for execution and the generated datasets are often terabytes or petabytes in size. Storing valuable generated application datasets can save their regeneration cost when they are reused, not to mention the waiting time caused by regeneration. However, the large size of the scientific datasets is a big challenge for their storage. By proposing innovative concepts, theorems and algorithms, this book will help bring the cost down dramatically for both cloud users and service providers to run computation and data intensive scientific applications in the cloud.

  • Covers cost models and benchmarking that explain the necessary tradeoffs for both cloud providers and users
  • Describes several novel strategies for storing application datasets in the cloud
  • Includes real-world case studies of scientific research applications
  • Covers cost models and benchmarking that explain the necessary tradeoffs for both cloud providers and users
  • Describes several novel strategies for storing application datasets in the cloud
  • Includes real-world case studies of scientific research applications
Read More Show Less

Editorial Reviews

From the Publisher
"Cloud computing systems charge for both data storage and for calculating, say Yuan, Yang…and Chen…, so there is a trade-off between storing large data sets in the cloud or deleting them and regenerating then each time they are needed. They suggest some approaches to figuring out which is cheaper… they cover motivating example and research issues, a cost model of data set storage in the cloud, minimum cost benchmarking approaches,…"—ProtoView.com, January 2014 "Cloud computing systems charge for both data storage and for calculating, say Yuan, Yang….and Chen…so there is a trade-off between storing large data sets in the cloud or deleting them and regenerating then each time they are needed. They suggest some approaches to figuring out which is cheaper."—Reference & Research Book News, December 2013 "…this book does a good job at tackling a variety of complex subjects. It brings forward state-of-the-art concepts and elaborate algorithms, illustrates issues related to cost-effectiveness, and helps both cloud providers and users get a grip on the intricate world of cloud computing."—Help Net Security online, August 28, 2013
Read More Show Less

Product Details

  • ISBN-13: 9780124077676
  • Publisher: Elsevier Science
  • Publication date: 2/15/2013
  • Pages: 128
  • Product dimensions: 5.90 (w) x 8.90 (h) x 0.50 (d)

Meet the Author

Dong Yuan is currently a research fellow in School of Software and Electrical Engineering at Swinburne University of Technology, Melbourne, Australia. His research interests include data management in parallel and distributed systems, scheduling and resource management, grid and cloud computing.

Yun Yang is currently a full professor in School of Software and Electrical Engineering at Swinburne University of Technology, Melbourne, Australia. Prior to joining Swinburne in 1999 as an associate professor, he was a lecturer and senior lecturer at Deakin University, Australia, during 1996-1999. He has coauthored four books and published over 200 papers in journals and refereed conference proceedings. He is currently on the Editorial Board of IEEE Transactions on Cloud Computing. His current research interests include software technologies, cloud computing, p2p/grid/cloud workflow systems, and service-oriented computing.

Jinjun Chen received his PhD degree in Computer Science and Software Engineering from Swinburne University of Technology, Melbourne, Australia in 2007. He is currently an Associate Professor in the Faculty of Engineering and Information Technology, University of Technology, Sydney, Australia. His research interests include Scientific workflow management and applications, workflow management and applications in Web service or SOC environments, workflow management and applications in grid (service)/cloud computing environments, software verification and validation in workflow systems, QoS and resource scheduling in distributed computing systems such as cloud computing, service oriented computing, semantics and knowledge management, cloud computing.

Read More Show Less

Read an Excerpt

Computation and Storage in the Cloud

Understanding the Trade-Offs
By Dong Yuan Yun Yang Jinjun Chen

ELSEVIER

Copyright © 2013 Elsevier Inc.
All right reserved.

ISBN: 978-0-12-407879-6


Chapter One

Introduction

This book investigates the trade-off between computation and storage in the cloud. This is a brand new and significant issue for deploying applications with the pay-as-you-go model in the cloud, especially computation and data-intensive scientific applications. The novel research reported in this book is for both cloud service providers and users to reduce the cost of storing large generated application data sets in the cloud. A suite consisting of a novel cost model, benchmarking approaches and storage strategies is designed and developed with the support of new concepts, solid theorems and innovative algorithms. Experimental evaluation and case study demonstrate that our work helps bring the cost down dramatically for running the computation and data-intensive scientific applications in the cloud.

This chapter introduces the background and key issues of this research. It is organised as follows. Section 1.1 gives a brief introduction to running scientific applications in the cloud. Section 1.2 outlines the key issues of this research. Finally, Section 1.3 presents an overview for the remainder of this book.

1.1 Scientific Applications in the Cloud

Running scientific applications usually requires not only high-performance computing (HPC) resources but also massive storage. In many scientific research fields, like astronomy, high-energy physics and bioinformatics, scientists need to analyse a large amount of data either from existing data resources or collected from physical devices. During these processes, large amounts of new data might also be generated as intermediate or final products. Scientific applications are usually data intensive, where the generated data sets are often terabytes or even petabytes in size. As reported by Szalay et al. in, science is in an exponential world and the amount of scientific data will double every year over the next decade and on into the future. Producing scientific data sets involves a large number of computation-intensive tasks, e.g., with scientific workflows, and hence takes a long time for execution. These generated data sets contain important intermediate or final results of the computation, and need to be stored as valuable resources. This is because (i) data can be reused – scientists may need to re-analyse the results or apply new analyses on the existing data sets – and (ii) data can be shared – for collaboration, the computation results may be shared, hence the data sets are used by scientists from different institutions. Storing valuable generated application data sets can save their regeneration cost when they are reused, not to mention the waiting time caused by regeneration. However, the large size of the scientific data sets presents a serious challenge in terms of storage. Hence, popular scientific applications are often deployed in grid or HPC systems because they have HPC resources and/or massive storage. However, building and maintaining a grid or HPC system is extremely expensive and neither can easily be made available for scientists all over the world to utilise.

In recent years, cloud computing is emerging as the latest distributed computing paradigm which provides redundant, inexpensive and scalable resources on demand to system requirements. Since late 2007 when the concept of cloud computing was proposed, it has been utilised in many areas with a certain degree of success. Meanwhile, cloud computing adopts a pay-as-you-go model where users are charged according to the usage of cloud services such as computation, storage and network services in the same manner as for conventional utilities in everyday life (e.g., water, electricity, gas and telephone). Cloud computing systems offer a new way to deploy computation and data-intensive applications. As Infrastructure as a Service (IaaS) is a very popular way to deliver computing resources in the cloud, the heterogeneity of the computing systems of one service provider can be well shielded by virtualisation technology. Hence, users can deploy their applications in unified resources without any infrastructure investment in the cloud, where excessive processing power and storage can be obtained from commercial cloud service providers. Furthermore, cloud computing systems offer a new paradigm in which scientists from all over the world can collaborate and conduct their research jointly. As cloud computing systems are usually based on the Internet, scientists can upload their data and launch their applications in the cloud from anywhere in the world. Furthermore, as all the data are managed in the cloud, it is easy to share data among scientists.

However, new challenges also arise when we deploy a scientific application in the cloud. With the pay-as-you-go model, the resources need to be paid for by users; hence the total application cost for generated data sets in the cloud highly depends on the strategy used to store them. For example, storing all the generated application data sets in the cloud may result in a high storage cost since some data sets may be seldom used but large in size, but if we delete all the generated data sets and regenerate them every time they are needed, the computation cost may also be very high. Hence there should be a trade-off between computation and storage for deploying applications; this is an important and challenging issue in the cloud. By investigating this issue, this research proposes a new cost model, novel benchmarking approaches and innovative storage strategies, which would help both cloud service providers and users to reduce application costs in the cloud.

1.2 Key Issues of This Research

In the cloud, the application cost highly depends on the strategy of storing the large generated data sets due to the pay-as-you-go model. A good strategy is to find a balance to selectively store some popular data sets and regenerate the rest when needed, i.e. finding a trade-off between computation and storage. However, the generated application data sets in the cloud often have dependencies; that is, a computation task can operate on one or more data set(s) and generate new one(s). The decision about whether to store or delete an application data set impacts not only the cost of the data set itself but also that of other data sets in the cloud. To achieve the best trade-off and utilise it to reduce the application cost, we need to investigate the following issues:

1. Cost model. Users need a new cost model that can represent the amount that they actually spend on their applications in the cloud. Theoretically, users can get unlimited resources from the commercial cloud service providers for both computation and storage. Hence, for the large generated application data sets, users can flexibly choose how many to store and how many to regenerate. Different storage strategies lead to different consumptions of computation and storage resources and ultimately lead to different total application costs. The new cost model should be able to represent the cost of the applications in the cloud, which is the trade-off between computation and storage.

2. Minimum cost benchmarking approaches. Based on the new cost model, we need to find the best trade-off between computation and storage, which leads to the theoretical minimum application cost in the cloud. This minimum cost serves as an important benchmark for evaluating the cost-effectiveness of storage strategies in the cloud. For different applications and users, cloud service providers should be able to provide benchmarking services according to their requirements. Hence benchmarking algorithms need to be investigated, so that we develop different benchmarking approaches to meet the requirements of different situations in the cloud.

3. Cost-effective dataset storage strategies. By investigating the trade-off between computation and storage, we determine that cost-effective storage strategies are needed for users to use in their applications at run-time in the cloud. Different from benchmarking, in practice, the minimum cost storage strategy may not be the best strategy for the applications in the cloud. First, storage strategies must be efficient enough to be facilitated at run-time in the cloud. Furthermore, users may have certain preferences concerning the storage of some particular data sets (e.g. tolerance of the accessing delay). Hence we need to design cost-effective storage strategies according to different requirements.

1.3 Overview of This Book

In particular, this book includes new concepts, solid theorems and complex algorithms, which form a suite of systematic and comprehensive solutions to deal with the issue of computation and storage trade-off in the cloud and bring cost-effectiveness to the applications for both users and cloud service providers. The remainder of this book is organised as follows.

In Chapter 2, we introduce the work related to this research. We start by introducing data management in some traditional scientific application systems, especially in grid systems, and then we move to the cloud. By introducing some typical cloud systems for scientific application, we raise the issue of cost-effectiveness in the cloud. Next, we introduce some works that also touch upon the issue of computation and storage trade-off and analyse the differences to ours. Finally, we introduce some works on the subject of data provenance which are the important foundation for our own work.

In Chapter 3, we first introduce a motivating example: a real-world scientific application from astrophysics that is used for searching for pulsars in the universe. Based on this example, we identify and analyse our research problems.

In Chapter 4, we first give a classification of the application data in the cloud and propose an important concept of data dependency graph (DDG). DDG is built on data provenance which depicts the generation relationships of the data sets in the cloud. Based on DDG, we propose a new cost model for datasets storage in the cloud.

In Chapter 5, we develop novel minimum cost benchmarking approaches with algorithms for the best trade-off between computation and storage in the cloud. We propose two approaches, namely static on-demand benchmarking and dynamic on-the-fly benchmarking, to accommodate different application requirements in the cloud.

In Chapter 6, we develop innovative cost-effective storage strategies for user to facilitate at run-time in the cloud. According to different user requirements, we design different strategies accordingly, i.e. a highly efficient cost-rate-based strategy and a highly cost-effective local-optimisation-based strategy.

In Chapter 7, we demonstrate experiment results to evaluate our work as described in the entire book. First, we introduce our cloud computing simulation environment, i.e. SwinCloud. Then we conduct general random simulations to evaluate the performance of our benchmarking approaches and storage strategies. Finally, we demonstrate a case study of the pulsar searching application in which all the research outcomes presented in this book are utilised.

Finally, in Chapter 8, we summarise the new ideas presented in this book and the major contributions of this research.

In order to improve the readability of this book, we have included a notation index in Appendix A; all proofs of theories, lemmas and corollaries in Appendix B; and a related method in Appendix C.

Chapter Two

Literature Review

This chapter reviews the existing literature related to this research. It is organised as follows. In Section 2.1, we summarise the data management work about scientific applications in the traditional distributed computing systems. In Section 2.2, we first review some existing work about deploying scientific applications in the cloud and raise the issue of cost-effectiveness; we then analyse some research that has touched upon the issue of the trade-off between computation and storage and point out the differences to our work. In Section 2.3, we introduce some work about data provenance which is the important foundation for our work.

2.1 Data Management of Scientific Applications in Traditional Distributed Systems

Alongside the development of information technology (IT), e-science has also become increasingly popular. Since scientific applications are often computation and data intensive, they are now usually deployed in distributed systems to obtain high-performance computing resources and massive storage. Roughly speaking, one can make a distinction between two subgroups in the traditional distributed systems: clusters (including the HPC system) and grids.

Early studies about data management of scientific applications are in cluster computing systems. Since cluster computing is a relative homogenous environment that has a tightly coupled structure, data management in clusters is usually straightforward. The application data are commonly stored according to the system's capacity and moved within the cluster via a fast Ethernet connection while the applications execute.

(Continues...)



Excerpted from Computation and Storage in the Cloud by Dong Yuan Yun Yang Jinjun Chen Copyright © 2013 by Elsevier Inc. . Excerpted by permission of ELSEVIER. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.

Read More Show Less

Table of Contents

  1. Introduction
  2. Data management and cost-effectiveness
  3. Motivating example and research
  4. Cost model of dataset storage in the cloud
  5. Minimum cost benchmarking approaches
  6. Cost-effective dataset storage strategies
  7. Evaluations
  8. Conclusions
Read More Show Less

Customer Reviews

Average Rating 5
( 1 )
Rating Distribution

5 Star

(1)

4 Star

(0)

3 Star

(0)

2 Star

(0)

1 Star

(0)

Your Rating:

Your Name: Create a Pen Name or

Barnes & Noble.com Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & Noble.com that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & Noble.com does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at BN.com or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation

Reminder:

  • - By submitting a review, you grant to Barnes & Noble.com and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Noble.com Terms of Use.
  • - Barnes & Noble.com reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & Noble.com also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on BN.com. It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

 
Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously
Sort by: Showing 1 Customer Reviews
  • Posted September 29, 2013

    Are you a cloud service provider? If you are, then this book is

    Are you a cloud service provider? If you are, then this book is for you! Authors Dong Yuan, Yun Yang and Jinjun Chen, have done an outstanding job of writing a book that shows you how to reduce the cost of storing large generated application data sets in the cloud.




    Authors Yuan, Yang and Chen, begin by introducing the background and key issues of computation and storage in the cloud. Next, they focus on data management in some traditional scientific application systems (grid systems), and then move on to the cloud. Then, the authors examine a motivating example: a real world scientific application for astrophysics that is used for searching for pulsars in the universe. They continue by giving a classification of the application data in the cloud, and propose an important concept of data dependency graph. Next, the authors show you how to develop novel minimum cost benchmarking approaches with algorithms for the best trade-off between computation and storage in the cloud. Then, they show you how to develop innovative cost-effective storage strategies for a user to facilitate at run-time in the cloud. The authors continue by demonstrating experiment results to evaluate their work in cloud computing simulation environments (SwimCloud). Finally, the authors summarize the new ideas presented in this book and the major contributions of their research.




    This most excellent book is the first comprehensive and systematic work investigating the issue of computation and storage trade-off in the cloud, in order to reduce the overall application cost.  The authors also proposed innovative concepts (theorems and algorithms) in this great book that helped bring the cost down dramatically for both cloud users and service providers to run computation and data-intensive scientific applications in the cloud. 

    Was this review helpful? Yes  No   Report this review
Sort by: Showing 1 Customer Reviews

If you find inappropriate content, please report it to Barnes & Noble
Why is this product inappropriate?
Comments (optional)