Test-Driven Data Analysis
Test-driven data analysis is the synthesis of ideas from test-driven development of software to data-intensive work including data science, data analysis, and data engineering. It is a methodology for improving the quality of data and of analytical pipelines and processes. It can be thought of as data analysis as if the answers actually matter.

 

Part I of the book describes the use of constraints for data validation and shows how tools can use a machine-learning-style approach to infer suitable constraints from data. This is illustrated using the command-line tools in the open-source Python tdda library, as well as alternatives. These can be used regardless of the programming language in use. Part II covers reference testing of analytical pipelines and processes. Again, there is support in the tdda library for this, including automatic test-generation capabilities for testing software in any language. There are specific discussions of testing models and modeling, including the large language models used in chatbots, coding bots, and summarizers. Part III covers broader kinds of errors that can occur in analysis, including

  • errors of formulation, when the problem domain, methods or libraries are misinterpreted;
  • errors of process, as developed analytical processes are used;
  • errors of applicability, when analytical processes are used in situations for which they were not designed;
  • errors of communication, when valid analyses are misinterpreted; and
  • errors of judgement, when harmful analyses are carried out.

Extensive checklists are provided that can be used to improve quality before, during, and after analysis.

Test-driven data analysis can be thought of as a sibling to reproducible research, with similar concerns, but greater emphasis on automated testing, and less requirement for a human to reproduce results.

 

Key Features:

  • Prevents costly errors in analytical processes before they reach production through automated data validation and reference testing of data pipelines.
  • Provides actionable checklists for issues beyond the reach of automated testing.
  • Equips readers with open-source Python tools and language-agnostic command-line interfaces.
  • Addresses testing challenges for modern LLM-based systems including chatbots and coding assistants.
  • Instills in analysts an inner voice that is always asking: “How is this misleading data misleading me?”

Nicholas Radcliffe is the Founder and Director of Stochastic Solutions Limited, a Scottish company specializing in consulting in data science, data analysis, and data engineering. He has also, since 1995, been a Visiting Professor in the Operations Research Group in the School of Mathematics at the University of Edinburgh. He is known for developing forma analysis (sic) of genetic algorithms and uplift modeling, before more recent work on test-driven data analysis.

1148405354
Test-Driven Data Analysis
Test-driven data analysis is the synthesis of ideas from test-driven development of software to data-intensive work including data science, data analysis, and data engineering. It is a methodology for improving the quality of data and of analytical pipelines and processes. It can be thought of as data analysis as if the answers actually matter.

 

Part I of the book describes the use of constraints for data validation and shows how tools can use a machine-learning-style approach to infer suitable constraints from data. This is illustrated using the command-line tools in the open-source Python tdda library, as well as alternatives. These can be used regardless of the programming language in use. Part II covers reference testing of analytical pipelines and processes. Again, there is support in the tdda library for this, including automatic test-generation capabilities for testing software in any language. There are specific discussions of testing models and modeling, including the large language models used in chatbots, coding bots, and summarizers. Part III covers broader kinds of errors that can occur in analysis, including

  • errors of formulation, when the problem domain, methods or libraries are misinterpreted;
  • errors of process, as developed analytical processes are used;
  • errors of applicability, when analytical processes are used in situations for which they were not designed;
  • errors of communication, when valid analyses are misinterpreted; and
  • errors of judgement, when harmful analyses are carried out.

Extensive checklists are provided that can be used to improve quality before, during, and after analysis.

Test-driven data analysis can be thought of as a sibling to reproducible research, with similar concerns, but greater emphasis on automated testing, and less requirement for a human to reproduce results.

 

Key Features:

  • Prevents costly errors in analytical processes before they reach production through automated data validation and reference testing of data pipelines.
  • Provides actionable checklists for issues beyond the reach of automated testing.
  • Equips readers with open-source Python tools and language-agnostic command-line interfaces.
  • Addresses testing challenges for modern LLM-based systems including chatbots and coding assistants.
  • Instills in analysts an inner voice that is always asking: “How is this misleading data misleading me?”

Nicholas Radcliffe is the Founder and Director of Stochastic Solutions Limited, a Scottish company specializing in consulting in data science, data analysis, and data engineering. He has also, since 1995, been a Visiting Professor in the Operations Research Group in the School of Mathematics at the University of Edinburgh. He is known for developing forma analysis (sic) of genetic algorithms and uplift modeling, before more recent work on test-driven data analysis.

79.99 Pre Order
Test-Driven Data Analysis

Test-Driven Data Analysis

by Nicholas J. Radcliffe
Test-Driven Data Analysis

Test-Driven Data Analysis

by Nicholas J. Radcliffe

Hardcover

$79.99 
  • SHIP THIS ITEM
    Available for Pre-Order. This item will be released on April 21, 2026

Related collections and offers


Overview

Test-driven data analysis is the synthesis of ideas from test-driven development of software to data-intensive work including data science, data analysis, and data engineering. It is a methodology for improving the quality of data and of analytical pipelines and processes. It can be thought of as data analysis as if the answers actually matter.

 

Part I of the book describes the use of constraints for data validation and shows how tools can use a machine-learning-style approach to infer suitable constraints from data. This is illustrated using the command-line tools in the open-source Python tdda library, as well as alternatives. These can be used regardless of the programming language in use. Part II covers reference testing of analytical pipelines and processes. Again, there is support in the tdda library for this, including automatic test-generation capabilities for testing software in any language. There are specific discussions of testing models and modeling, including the large language models used in chatbots, coding bots, and summarizers. Part III covers broader kinds of errors that can occur in analysis, including

  • errors of formulation, when the problem domain, methods or libraries are misinterpreted;
  • errors of process, as developed analytical processes are used;
  • errors of applicability, when analytical processes are used in situations for which they were not designed;
  • errors of communication, when valid analyses are misinterpreted; and
  • errors of judgement, when harmful analyses are carried out.

Extensive checklists are provided that can be used to improve quality before, during, and after analysis.

Test-driven data analysis can be thought of as a sibling to reproducible research, with similar concerns, but greater emphasis on automated testing, and less requirement for a human to reproduce results.

 

Key Features:

  • Prevents costly errors in analytical processes before they reach production through automated data validation and reference testing of data pipelines.
  • Provides actionable checklists for issues beyond the reach of automated testing.
  • Equips readers with open-source Python tools and language-agnostic command-line interfaces.
  • Addresses testing challenges for modern LLM-based systems including chatbots and coding assistants.
  • Instills in analysts an inner voice that is always asking: “How is this misleading data misleading me?”

Nicholas Radcliffe is the Founder and Director of Stochastic Solutions Limited, a Scottish company specializing in consulting in data science, data analysis, and data engineering. He has also, since 1995, been a Visiting Professor in the Operations Research Group in the School of Mathematics at the University of Edinburgh. He is known for developing forma analysis (sic) of genetic algorithms and uplift modeling, before more recent work on test-driven data analysis.


Product Details

ISBN-13: 9781032897158
Publisher: CRC Press
Publication date: 04/21/2026
Series: Chapman & Hall/CRC Data Science Series
Pages: 432
Product dimensions: 6.12(w) x 9.19(h) x (d)

About the Author

Nicholas Radcliffe is the Founder and Director of Stochastic Solutions Limited, a Scottish company specializing in consulting in data science, data analysis, and data engineering. He has also, since 1995, been a Visiting Professor in the Operations Research Group in the School of Mathematics at the University of Edinburgh. He is known for developing forma analysis (sic) of genetic algorithms and uplift modeling, before more recent work on test-driven data analysis.

Table of Contents

Foreword Preface Acknowledgements Author 1 Orientation I Data Validation with Constraints  2 Data Validation Textual Data 4 Profiling and Auditing Data 5 Constraint Discovery and Validation 6 Custom Constraints Practical Considerations Serial Data II Reference Testing Introduction to Reference Tests 10 Modern Software Testing 11 Reference Tests for Analytical Pipelines 12 Testing Models and Modeling III Errors of Interpretation, of Process, & of Applicability 13 Errors of Interpretation I: Formulation 14 Errors of Interpretation II: Communication 15 Errors of Interpretation III: Graphing Sins 16 Errors of Process 17 Errors of Applicability and Errors of Judgement IV Appendices The TDDA Library, Resources, & Tools Glossary Bibliography

From the B&N Reads Blog

Customer Reviews