Data Analysis with Open Source Tools

( 2 )


Collecting data is relatively easy, but turning raw information into something useful requires that you know how to extract precisely what you need. With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics...

See more details below
$26.95 price
(Save 32%)$39.99 List Price

Pick Up In Store

Reserve and pick up in 60 minutes at your local store

Other sellers (Paperback)
  • All (24) from $10.99   
  • New (16) from $22.13   
  • Used (8) from $10.99   
Data Analysis with Open Source Tools

Available on NOOK devices and apps  
  • NOOK Devices
  • Samsung Galaxy Tab 4 NOOK 7.0
  • Samsung Galaxy Tab 4 NOOK 10.1
  • NOOK HD Tablet
  • NOOK HD+ Tablet
  • NOOK eReaders
  • NOOK Color
  • NOOK Tablet
  • Tablet/Phone
  • NOOK for Windows 8 Tablet
  • NOOK for iOS
  • NOOK for Android
  • NOOK Kids for iPad
  • PC/Mac
  • NOOK for Windows 8
  • NOOK for PC
  • NOOK for Mac
  • NOOK for Web

Want a NOOK? Explore Now

NOOK Book (eBook)
$17.99 price
(Save 43%)$31.99 List Price


Collecting data is relatively easy, but turning raw information into something useful requires that you know how to extract precisely what you need. With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.

Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve — rather than rely on tools to think for you.

  • Use graphics to describe data with one, two, or dozens of variables
  • Develop conceptual models using back-of-the-envelope calculations, as well asscaling and probability arguments
  • Mine data with computationally intensive methods such as simulation and clustering
  • Make your conclusions understandable through reports, dashboards, and other metrics programs
  • Understand financial calculations, including the time-value of money
  • Use dimensionality reduction techniques or predictive analytics to conquer challenging data analysis situations
  • Become familiar with different open source programming environments for data analysis

"Finally, a concise reference for understanding how to conquer piles of data."—Austin King, Senior Web Developer, Mozilla

"An indispensable text for aspiring data scientists."—Michael E. Driscoll, CEO/Founder, Dataspora

Read More Show Less

Product Details

  • ISBN-13: 9780596802356
  • Publisher: O'Reilly Media, Incorporated
  • Publication date: 11/18/2010
  • Edition number: 1
  • Pages: 509
  • Sales rank: 952,547
  • Product dimensions: 7.00 (w) x 9.10 (h) x 1.20 (d)

Meet the Author

After previous careers in physics and softwaredevelopment, Philipp K. Janert currentlyprovides consulting services for data analysis,algorithm development, and mathematical modeling.He has worked for small start-ups and in largecorporate environments, both in the U.S. andoverseas. He prefers simple solutions that workto complicated ones that don't, and thinks thatpurpose is more important than process. Philippis the author of "Gnuplot in Action - UnderstandingData with Graphs" (Manning Publications), and haswritten for the O'Reilly Network, IBM developerWorks,and IEEE Software. He is named inventor on a handfulof patents, and is an occasional contributor to CPAN.He holds a Ph.D. in theoretical physics from theUniversity of Washington. Visit his company websiteat

Read More Show Less

Table of Contents

Before We Begin;
Conventions Used in This Book;
Using Code Examples;
Safari® Books Online;
How to Contact Us;
Chapter 1: Introduction;
1.1 Data Analysis;
1.2 What’s in This Book;
1.3 What’s with the Workshops?;
1.4 What’s with the Math?;
1.5 What You’ll Need;
1.6 What’s Missing;
Graphics: Looking at Data;
Chapter 2: A Single Variable: Shape and Distribution;
2.1 Dot and Jitter Plots;
2.2 Histograms and Kernel Density Estimates;
2.3 The Cumulative Distribution Function;
2.4 Rank-Order Plots and Lift Charts;
2.5 Only When Appropriate: Summary Statistics and Box Plots;
2.6 Workshop: NumPy;
2.7 Further Reading;
Chapter 3: Two Variables: Establishing Relationships;
3.1 Scatter Plots;
3.2 Conquering Noise: Smoothing;
3.3 Logarithmic Plots;
3.4 Banking;
3.5 Linear Regression and All That;
3.6 Showing What’s Important;
3.7 Graphical Analysis and Presentation Graphics;
3.8 Workshop: matplotlib;
3.9 Further Reading;
Chapter 4: Time As a Variable: Time-Series Analysis;
4.1 Examples;
4.2 The Task;
4.3 Smoothing;
4.4 Don’t Overlook the Obvious!;
4.5 The Correlation Function;
4.6 Optional: Filters and Convolutions;
4.7 Workshop: scipy.signal;
4.8 Further Reading;
Chapter 5: More Than Two Variables: Graphical Multivariate Analysis;
5.1 False-Color Plots;
5.2 A Lot at a Glance: Multiplots;
5.3 Composition Problems;
5.4 Novel Plot Types;
5.5 Interactive Explorations;
5.6 Workshop: Tools for Multivariate Graphics;
5.7 Further Reading;
Chapter 6: Intermezzo: A Data Analysis Session;
6.1 A Data Analysis Session;
6.2 Workshop: gnuplot;
6.3 Further Reading;
Analytics: Modeling Data;
Chapter 7: Guesstimation and the Back of the Envelope;
7.1 Principles of Guesstimation;
7.2 How Good Are Those Numbers?;
7.3 Optional: A Closer Look at Perturbation Theory and Error Propagation;
7.4 Workshop: The Gnu Scientific Library (GSL);
7.5 Further Reading;
Chapter 8: Models from Scaling Arguments;
8.1 Models;
8.2 Arguments from Scale;
8.3 Mean-Field Approximations;
8.4 Common Time-Evolution Scenarios;
8.5 Case Study: How Many Servers Are Best?;
8.6 Why Modeling?;
8.7 Workshop: Sage;
8.8 Further Reading;
Chapter 9: Arguments from Probability Models;
9.1 The Binomial Distribution and Bernoulli Trials;
9.2 The Gaussian Distribution and the Central Limit Theorem;
9.3 Power-Law Distributions and Non-Normal Statistics;
9.4 Other Distributions;
9.5 Optional: Case Study—Unique Visitors over Time;
9.6 Workshop: Power-Law Distributions;
9.7 Further Reading;
Chapter 10: What You Really Need to Know About Classical Statistics;
10.1 Genesis;
10.2 Statistics Defined;
10.3 Statistics Explained;
10.4 Controlled Experiments Versus Observational Studies;
10.5 Optional: Bayesian Statistics—The Other Point of View;
10.6 Workshop: R;
10.7 Further Reading;
Chapter 11: Intermezzo: Mythbusting—Bigfoot, Least Squares, and All That;
11.1 How to Average Averages;
11.2 The Standard Deviation;
11.3 Least Squares;
11.4 Further Reading;
Computation: Mining Data;
Chapter 12: Simulations;
12.1 A Warm-Up Question;
12.2 Monte Carlo Simulations;
12.3 Resampling Methods;
12.4 Workshop: Discrete Event Simulations with SimPy;
12.5 Further Reading;
Chapter 13: Finding Clusters;
13.1 What Constitutes a Cluster?;
13.2 Distance and Similarity Measures;
13.3 Clustering Methods;
13.4 Pre- and Postprocessing;
13.5 Other Thoughts;
13.6 A Special Case: Market Basket Analysis;
13.7 A Word of Warning;
13.8 Workshop: Pycluster and the C Clustering Library;
13.9 Further Reading;
Chapter 14: Seeing the Forest for the Trees: Finding Important Attributes;
14.1 Principal Component Analysis;
14.2 Visual Techniques;
14.3 Kohonen Maps;
14.4 Workshop: PCA with R;
14.5 Further Reading;
Chapter 15: Intermezzo: When More Is Different;
15.1 A Horror Story;
15.2 Some Suggestions;
15.3 What About Map/Reduce?;
15.4 Workshop: Generating Permutations;
15.5 Further Reading;
Applications: Using Data;
Chapter 16: Reporting, Business Intelligence, and Dashboards;
16.1 Business Intelligence;
16.2 Corporate Metrics and Dashboards;
16.3 Data Quality Issues;
16.4 Workshop: Berkeley DB and SQLite;
16.5 Further Reading;
Chapter 17: Financial Calculations and Modeling;
17.1 The Time Value of Money;
17.2 Uncertainty in Planning and Opportunity Costs;
17.3 Cost Concepts and Depreciation;
17.4 Should You Care?;
17.5 Is This All That Matters?;
17.6 Workshop: The Newsvendor Problem;
17.7 Further Reading;
Chapter 18: Predictive Analytics;
18.1 Topics in Predictive Analytics;
18.2 Some Classification Terminology;
18.3 Algorithms for Classification;
18.4 The Process;
18.5 The Secret Sauce;
18.6 The Nature of Statistical Learning;
18.7 Workshop: Two Do-It-Yourself Classifiers;
18.8 Further Reading;
Chapter 19: Epilogue: Facts Are Not Reality;
Programming Environments for Scientific Computation and Data Analysis;
Software Tools;
A Catalog of Scientific Software;
Writing Your Own;
Further Reading;
Results from Calculus;
Common Functions;
Useful Tricks;
Notation and Basic Math;
Where to Go from Here;
Further Reading;
Working with Data;
Sources for Data;
Cleaning and Conditioning;
Data File Formats;
The Care and Feeding of Your Data Zoo;
Further Reading;
About the Author;

Read More Show Less

Customer Reviews

Average Rating 2.5
( 2 )
Rating Distribution

5 Star


4 Star


3 Star


2 Star


1 Star


Your Rating:

Your Name: Create a Pen Name or

Barnes & Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation


  • - By submitting a review, you grant to Barnes & and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Terms of Use.
  • - Barnes & reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously
Sort by: Showing all of 2 Customer Reviews
  • Anonymous

    Posted December 6, 2011

    Not a good review data analysis with oss

    if some one uses R for one example why use python for another

    0 out of 1 people found this review helpful.

    Was this review helpful? Yes  No   Report this review
  • Anonymous

    Posted August 9, 2013

    No text was provided for this review.

Sort by: Showing all of 2 Customer Reviews

If you find inappropriate content, please report it to Barnes & Noble
Why is this product inappropriate?
Comments (optional)