Deduplication For Large Scale Backup And Archival Storage.

Overview

The focus of this dissertation is to provide scalable solutions for problems unique to chunk-based deduplication. Chunk-based deduplication is used in backup and archival storage systems to reduce storage space requirements. We show how to conduct similarity-based searches over large repositories, and how to scale out these searches as the repository grows; how to deduplicate low-locality file-based workloads, and how to scale out deduplication via parallelization, data and index organization; how to build a ...
See more details below
This Paperback is Not Available through BN.com
Sending request ...

More About This Book

Overview

The focus of this dissertation is to provide scalable solutions for problems unique to chunk-based deduplication. Chunk-based deduplication is used in backup and archival storage systems to reduce storage space requirements. We show how to conduct similarity-based searches over large repositories, and how to scale out these searches as the repository grows; how to deduplicate low-locality file-based workloads, and how to scale out deduplication via parallelization, data and index organization; how to build a unified deduplication solution that can adapt to tape-based and file-based workloads; and, how to introduce strategic redundancies in deduplicated data to improve the overall robustness of the system.

Our scalable similarity-based search solution finds for an object, highly similar objects from within a large store by examining only a small subset of its features. We show how to partition the feature index to scale out the search, and how to select a small subset of the partitions (less than 3%), independent of object size, based on the content of query object alone to conduct distributed similarity-based searches.

We show how to deduplicate low-locality file-based workloads using Extreme Binning. Extreme Binning uses file similarity to find duplicates accurately and makes only one disk access for chunk lookup per file to yield reasonable throughput. Multi-node backup systems built with Extreme Binning scale gracefully with the data size. Each backup node is autonomous---there is no dependency between nodes, making house keeping tasks robust and low overhead.

We build a 'unified deduplication' solution that can adapt and deduplicate a variety of workloads. We have workloads consisting of large byte streams with high-locality, and workloads made up of files of varying sizes without any locality between them. There are separate deduplication solutions for each kind of workload, but so far no unified solution that works well for all. Our unified deduplication solution simplifies administration---organizations do not have to deploy dedicated solutions for each kind of workload---and, it yields better storage space savings than dedicated solutions because it deduplicates across workloads.

Deduplication reduces storage space requirements by allowing common chunks to be shared between similar objects. This reduces the reliability of the storage system because the loss of a few shared chunks can lead to the loss of many objects. We show how to eliminate this problem by choosing for each chunk a replication level that is a function of the amount of data that would be lost if that chunk were lost. Experiments show that this technique can achieve significantly higher robustness than a conventional approach combining data mirroring and Lempel-Ziv compression while requiring about half the storage space.

Read More Show Less

Product Details

  • ISBN-13: 9781244576698
  • Publisher: BiblioLabsII
  • Publication date: 9/30/2011
  • Pages: 198
  • Product dimensions: 7.44 (w) x 9.69 (h) x 0.42 (d)

Customer Reviews

Be the first to write a review
( 0 )
Rating Distribution

5 Star

(0)

4 Star

(0)

3 Star

(0)

2 Star

(0)

1 Star

(0)

Your Rating:

Your Name: Create a Pen Name or

Barnes & Noble.com Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & Noble.com that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & Noble.com does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at BN.com or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation

Reminder:

  • - By submitting a review, you grant to Barnes & Noble.com and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Noble.com Terms of Use.
  • - Barnes & Noble.com reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & Noble.com also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on BN.com. It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

 
Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously

    If you find inappropriate content, please report it to Barnes & Noble
    Why is this product inappropriate?
    Comments (optional)