Data Management for Researchers

A comprehensive guide to everything scientists need to know about data management, this book is essential for researchers who need to learn how to organize, document and take care of their own data.

Researchers in all disciplines are faced with the challenge of managing the growing amounts of digital data that are the foundation of their research. Kristin Briney offers practical advice and clearly explains policies and principles, in an accessible and in-depth text that will allow researchers to understand and achieve the goal of better research data management.

Data Management for Researchers includes sections on:

* The data problem – an introduction to the growing importance and challenges of using digital data in research. Covers both the inherent problems with managing digital information, as well as how the research landscape is changing to give more value to research datasets and code.

* The data lifecycle – a framework for data’s place within the research process and how data’s role is changing. Greater emphasis on data sharing and data reuse will not only change the way we conduct research but also how we manage research data.

* Planning for data management – covers the many aspects of data management and how to put them together in a data management plan. This section also includes sample data management plans.

* Documenting your data – an often overlooked part of the data management process, but one that is critical to good management; data without documentation are frequently unusable.

* Analyzing your data – covers managing information through the analysis process. This section starts by comparing the management of raw and analyzed data and then describes ways to make analysis easier, such as spreadsheet best practices. It also examines practices for research code, including version control systems.

* Managing secure and private data – many researchers are dealing with data that require extra security. This section outlines what data falls into this category and some of the policies that apply, before addressing the best practices for keeping data secure.

* Short-term storage – deals with the practical matters of storage and backup and covers the many options available. This section also goes through the best practices to insure that data are not lost.

* Preserving and archiving your data – digital data can have a long life if properly cared for. This section covers managing data in the long term including choosing good file formats and media, as well as determining who will manage the data in the long-term.

* Sharing/publishing your data – the reasons for and against data sharing and some of the practical aspects of sharing. This section covers intellectual property and licenses for datasets, before ending with the altmetrics that measure the impact of shared data.

* Collaborations and data – this section addresses how to make data sharing across research groups easier. It covers the practical aspects of systems for collaboration as well as policy concerns like ownership.

* Reusing data – as more data are shared, it becomes possible to use outside data in your research. This chapter discusses strategies for finding datasets and lays out how to cite data once you have found it.

This book is designed for active scientific researchers but it is useful for anyone who wants to get more from their data: academics, educators, professionals or anyone who teaches data management, sharing and preservation.

1132134005

Data Management for Researchers

A comprehensive guide to everything scientists need to know about data management, this book is essential for researchers who need to learn how to organize, document and take care of their own data.

Data Management for Researchers includes sections on:

* Planning for data management – covers the many aspects of data management and how to put them together in a data management plan. This section also includes sample data management plans.

* Documenting your data – an often overlooked part of the data management process, but one that is critical to good management; data without documentation are frequently unusable.

47.99 In Stock

Data Management for Researchers

Add to Wishlist

Data Management for Researchers

Paperback(New Edition)

$47.99

View All Available Formats & Editions

Paperback(New Edition)
$47.99

View All Available Formats & Editions

SHIP THIS ITEM

In stock. Ships in 1-2 days.
PICK UP IN STORE

Your local store may have stock of this item.

Available within 2 business hours

Want it Today?
Check Store Availability

Related collections and offers

Overview

A comprehensive guide to everything scientists need to know about data management, this book is essential for researchers who need to learn how to organize, document and take care of their own data.

Data Management for Researchers includes sections on:

* Planning for data management – covers the many aspects of data management and how to put them together in a data management plan. This section also includes sample data management plans.

* Documenting your data – an often overlooked part of the data management process, but one that is critical to good management; data without documentation are frequently unusable.

Product Details

ISBN-13:	9781784270117
Publisher:	Pelagic Publishing
Publication date:	11/01/2015
Series:	Research Skills
Edition description:	New Edition
Pages:	250
Product dimensions:	6.10(w) x 9.10(h) x 0.20(d)

About the Author

Kristin Briney has a PhD in physical chemistry and a Master’s degree in library and information studies from the University of Wisconsin-Madison, and currently works in an academic library, advising researchers on data management planning. Her blog can be found at www.dataabinitio.com.

Read an Excerpt

CHAPTER 1

THE DATA PROBLEM

On July 20, 1969, Neil Armstrong climbed out of his spacecraft and placed his feet on the moon. The landing was broadcast live all over the world and was a significant event in both scientific and human history. Today, we can still watch the grainy video of the moon landing but what we cannot do is watch the original, higher quality footage or examine some of the data from this mission. This is because much of the data from early space exploration is lost forever.

Among the lost data are the original Apollo 11 tapes containing high-quality video footage of the moon landing. Their loss first came to light in 2006 (Macey 2006) and NASA personnel spent the next three years searching for the tapes across multiple continents before concluding that they were likely wiped and reused for data storage sometime in the 1970s (NASA 2009; O'Neal 2009; Pearlman 2009). Other data from this era fared better but at the cost of significant time and money. The Lunar Orbiter Image Recovery Project (LOIRP 2014), for example, spent years and well over a half a million dollars recovering images taken of the moon by the five Lunar Orbiter spacecraft missions preparing for the moon landing in 1969 (Wood 2009; Turi 2014). The project required finding specialized and obsolete hardware to read the original magnetic tapes, reconstructing how to process the raw data into high-quality images, decoding the labeling scheme on each of the tapes, and doing all of this with little to no documentation. Only the cultural importance of the data on these tapes, such as the first image of the earth as seen from the moon, made such efforts worthwhile.

The story of this momentous occasion in scientific history ends with an all-toocommon example of failing to plan for data management. Almost 50 years later, researchers are still inadvertently destroying data or having trouble finding data that still exists. A recent study of biology data, for example, found that data disappears at a rate of 17% per year after publishing the results (Vines et al. 2014). Another estimate says that 31% of all PC users have suffered complete data loss due to events outside of their control; this correlates with 6% of PCs losing data in any given year (Anon 2014a). Unfortunately, very few of us have significant resources – as with the lunar data projects – to recover our own data when something happens to it. Lost, misplaced, and even difficult to understand data represents a real cost in terms of time and money. Fortunately, there are practices you can use to make it easier to find and use your data when you need it; those practices are collectively called "data management".

At its most essential, data management is about taking care of your data better so that you don't experience small frustrations when actively working with your data, like having trouble finding documentation for a particular dataset, or bigger problems after a project ends, like lost data. Having well-managed data means that you can find a particular dataset, will have all of the notes you need, can prevent a security breach, can easily use a co-worker's data, and can manage the chaos of an ever-growing number of digital files. Basically, many of the little headaches that researchers often encounter around data during the research process can be prevented through good data management. Just as you need to periodically clean your home, so too should you do regular upkeep on your data.

The good news is that dealing with your digital research data does not have to be difficult, though it is different than managing analog content. This book will show you many practices you can use to take care of your research data better. The ultimate goal is for you to be able to easily find and use your data when needed, whether it is historic 50-year-old data or the critical dissertation data you collected last week.

1.1 WHY IS EVERYONE TALKING ABOUT DATA MANAGEMENT?

"Data management" is a relatively new term within research, arising in the mid-2000s with funder requirements for both data management and data sharing. Such mandates gained momentum in the UK with the 2011 Common Principles on Data Policy from Research Councils UK (Research Councils UK 2011) and in the United States with the National Science Foundation's data management plan requirement in 2011 (NSF 2013). Data management and sharing policies are now becoming commonplace in science, with recent adoption by journals such as Science (Science/AAAS 2014), Nature (Nature Publishing Group 2006), and PLOS (Bloom 2013). The overall trend is for increased data management but let's examine why this trend exists in the first place.

We cannot discuss the rise in data management requirements without examining its partner, data sharing. The two concepts often pair together in addressing similar problems in the scientific process, such as limited resources, reproducibility issues, and advancing science at a faster rate (Borgman 2012). The pairing also occurs because well-managed data requires less preparation for sharing. Taken as a whole, most of the reasons why you are now required to manage and share your data are external, though there are many personal benefits to having well managed research data, which we will examine throughout the book.

One of the main reasons behind the implementation of data management and sharing requirements relates to money. The rise of data management requirements roughly coincided with the global economic recession of the late 2000s when many research funding groups faced smaller budgets. With limited resources, funders want to be sure that researchers are making the best use of those resources, for example, by preventing the common occurrence of losing data at the end of a project (Vines et al. 2014). Public funders face additional pressure to make research products like articles and data available to the public who support the research; the current default is that these resources are locked behind paywalls, or are not even made available in the first place. By requiring data management and sharing, funders can not only stem the loss of important data but also provide accountability to those ultimately paying for the research. As an added benefit, any data reuse – either by the original researcher or other researchers – means that the same amount of money will result in more research because data usually costs more to collect than to reuse. Therefore, many research funders see data management and sharing requirements as advantageous.

Another key reason for data management and sharing policies is the prevalence of digital data in scientific research. Research data is digital on a scale never seen before which opens up a whole new set of possibilities in scientific research. First, digital data is shareable in a way not easily done with physical samples and paperbased measurements. It's simple to copy and paste digital values, attach a file to an email, or upload a dataset to the web, meaning it's easy to share research data. We are discussing data sharing so much more because it's actually possible to share data on a global scale. We also generate more data than ever before. The world created an estimated 1.8 zettabytes (1.8 x 1021 bytes) of digital content in 2011, a number which is expected to be 50 times bigger in 2020 (EMC 2011; Mearian 2011). Researchers are seeing a similar increase in not only their own data but an added availability of external data. This changes the types of analysis scientists can do. You can now perform meta-analysis or correlate your data with third-party data you would otherwise not be able to collect. Researchers lacking funding or from less developed countries are now able to take part in cutting-edge research because of shared data. Basically, by sharing research data, we open up scientific research to many new types of analysis and can increase scientific research at a faster rate.

In spite of all the benefits of digital data, digital data is fragile. It you do not care for your data, many things can go wrong. Storage devices become corrupt, files are lost, and software becomes out-of-date and media obsolete. Most people have digital files from ten years ago that they cannot use. However, this does not have to be the fate of your research data. Data is a valuable research product that should be treated with care and data management requirements are one way to make that happen.

Finally, data management and sharing policies arose in response to recent reproducibility crises in several scientific disciplines. For example, prominent psychology researcher Deiderick Stapel prompted a reproducibility crisis in his field when it came to light that he committed widespread data fabrication (Bhattacharjee 2013); Stapel amassed over 50 retractions as a result (Oransky 2013a). In economics, a graduate student, Thomas Herndon, proved that the seminal paper supporting economic austerity policies was fundamentally flawed after examining the raw dataset behind the paper (Alexander 2013). In medical research, a study of cancer researchers found that half of the survey respondents had trouble reproducing published results at some point in time (Mobley et al. 2013). Clearly, there is a reproducibility crisis in scientific research as these stories represent just a few highlights of reproducibility issues in recent years. Adding to this is the fact that it can be difficult to tell from an article alone whether a study is reproducible because "a scientific publication is not the scholarship itself, it is merely advertising of the scholarship" (Buckheit and Donoho 1995). The creation of data management and sharing policies is one response to this reproducibility crisis, as these policies help ensure that data is available for review should questions arise about the research. Misconduct investigations are also starting to look at data management. For example, investigations leading to the high-profile retractions of two STAP (stimulus-triggered acquisition of pluri potency) stem cell papers in 2014 "found inadequacies in data management, record-keeping and oversight" (Anon 2014b). With a growing number of retractions in recent years (Fanelli 2013; Steen et al. 2013), good data management increasingly needs to be part of a good defense against charges of fabrication or even more political attacks on high-profile research.

All of these issues – limited funding, the ease of sharing digital data, the availability of new types of analysis, the fragility of digital data, and reproducibility issues within scientific research – coincided to provide an optimal environment for the creation of data management and sharing policies. Along with new policy requirements, they will continue to fuel the drive toward better data management in scientific research.

1.2 WHAT IS DATA MANAGEMENT?

While many researchers were introduced to the concept of data management through a funder's requirement to write a data management plan, there's actually a lot more to data management than planning. Moreover, it's the data management you do after writing the plan that really helps in your research. This section covers what "data management" actually entails, but first we need to define what is meant by the "data" portion of "data management".

1.2.1 Defining data

Defining research data is challenging because data by its very nature is heterogeneous. Research fields are diverse and even specific subfields use a huge variety of data types. So instead of limiting ourselves to one definition of data – which likely doesn't cover everything – let's explore several definitions.

In the United States, research data created under federal funding falls under the definition of data in OMB Circular A-81:

Research data means the recorded factual material commonly accepted in the scientific community as necessary to validate research findings, but not any of the following: preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, or communications with colleagues. This "recorded" material excludes physical objects (e.g., laboratory samples). Research data also do not include:

(i) Trade secrets, commercial information, materials necessary to be held confidential by a researcher until they are published, or similar information which is protected under law; and

(ii) Personnel and medical information and similar information the disclosure of which would constitute a clearly unwarranted invasion of personal privacy, such as information that could be used to identify a particular person in a research study. (White House Office of Management and Budget 2013)

This definition is very broad, covering anything necessary to validate research funding, but is helpful in that it outlines what definitely is not research data. These exclusions are particularly useful in complying with data sharing requirements to know what you are not required to share.

More globally, the Organisation for Economic Co-operation and Development (OECD), consisting of 34 member nations, provides a similar definition in their "Principles and Guidelines for Access to Research Data from Public Funding":

"Research data" are defined as factual records (numerical scores, textual records, images, and sounds) used as primary sources for scientific research, and that are commonly accepted in the scientific community as necessary to validate research findings. A research data set constitutes a systematic, partial representation of the subject being investigated.

This term does not cover the following: laboratory notebooks, preliminary analyses, and drafts of scientific papers, plans for future research, peer reviews, or personal communication with colleagues or physical objects (e.g. laboratory samples, strains of bacteria and test animals such as mice). (Organisation for Economic Co-operation and Development 2007)

This report focuses on the sharing of digital datasets so this definition of data skews toward digital content. In actuality, physical samples can be research data and, in some cases, fall under data sharing requirements (see Chapter 10).

You may also see data defined by type. Just as social science data often falls into one of two categories – quantitative or qualitative – so too does scientific data fall into specific groups. For scientific research data, those categories are:

• Observational data

• Experimental data

• Simulation data

• Compiled data

Observational data results from monitoring events, often at a specific time and place, and yields data such as species counts and weather measurements. Scientists produce experimental data in highly controlled environments so that similar conditions will always result in similar data; examples of experimental data are spectra of chemical reaction products and the measurements coming from the Large Hadron Collider. The third category of scientific data is simulation data, which results from computer models of scientific systems. Global warming simulations and optimized protein folding pathways represent two types of simulation data. The final category, compiled data, applies when you amass data from other sources for secondary use, such as performing meta-analysis or building a database containing a variety of data on one topic. While not perfect, most of the content we consider to be scientific research data fits into one of these four categories.

For this book, we'll use a broad definition of research data: data is anything you perform analysis upon. This means that data can be spreadsheets of numbers, images and video, text, or another type of content necessary for your research. Data can also be physical samples or paper-based measurements, though analog content usually has fairly established management practices. In the end, it's simply too much to try to define every possible type of data and a broad definition of data allows you, the researcher, to be generous in identifying content that you need to manage better.

1.2.2 Defining data management

If you've ever gotten halfway through a project and thought "why didn't I write down that information?" or "where did I put that file?" or "why didn't I back up my data?" then you could benefit from data management. Data management is the compilation of many small practices that make your data easier to find, easier to understand, less likely to be lost, and more likely to be usable during a project or ten years later. Data management is fundamentally about taking care of one of the most important things you create during the research process: your data.

Data management involves many practices. We will examine these more in Chapter 2. Briefly, data management includes data management planning, documenting your data, organizing your data, improving analysis procedures, securing sensitive data properly, having adequate storage and backups during a project, taking care of your data after a project, sharing data effectively, and finding data for reuse in a new project. Such a wide range of practices means that data management is something you do before the start of a research project, during the project, and after the project's completion.

(Continues…)

Excerpted from "Data Management for Researchers"
by .
Copyright © 2015 Kristin Briney.
Excerpted by permission of Pelagic Publishing.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.

About the author, ix,
Acknowledgements, x,
Chapter 1 The data problem, 1,
Chapter 2 The data lifecycle, 9,
Chapter 3 Planning for data management, 16,
Chapter 4 Documentation, 35,
Chapter 5 Organization, 62,
Chapter 6 Improving data analysis, 80,
Chapter 7 Managing sensitive data, 94,
Chapter 8 Storage and backups, 116,
Chapter 9 Long-term storage and preservation, 127,
Chapter 10 Sharing data, 140,
Chapter 11 Data reuse and restarting the data lifecycle, 163,
References, 171,
Index, 186,

From the B&N Reads Blog

Page 1 of

Editorial Reviews

Kristin Briney's Data Management for Researchers is a book that should be on the shelf (physical or virtual) of every librarian, researcher and research administrator. Scientists, engineers, social scientists, humanists - anyone who's work involves generating and keeping track of digital data. This is the book for you.
.... I recommend this book without hesitation for all academic libraries. Individual researchers, research administrators, funding agency employees and academic librarians would all find much useful information. Simply giving a copy to new graduate students is probably a worthwhile investment at any institution.
http: //scienceblogs.com/confessions/2016/01/11/reading-diary-data-management-for-researchers-organize-maintain-and-share-your-data-for-research-success-by-kristin-briney/
This concise book offers sound advice on all aspects of data management, in an accessible style. The intended audience is researchers, as the title makes clear. And I could easily imagine it being bought and recommended to PhD students to be read as part of their first year training. The author has a PhD in Chemistry, and most examples are drawn from the natural sciences, rather than social science, but much of the content would also be relevant to social scientists. It might be less easy for a humanities researcher to relate to, however.
The book is pretty comprehensive in its coverage of the topic. The opening chapter, gives some context for the importance of managing research data, with a little about the context of policy change, but Briney presents it mostly as a practical issue, with a concern for working effectively and not losing data. The book's dedication is "In memory of data lost." The chapter then reviews a range of definitions and I really liked the definition she herself coins that data is "anything you perform analysis on." This is a really good universal definition, complementing the approaches that list lots of examples - though arguably the definition does not differentiate data from literature. She also discusses whether the word data should be singular or plural, opting for the former.
Chapter 2 considers the concept of the data lifecycle. She contrasts an old linear model ending in publishing, with a more complex, cyclic model, that repeats itself, through reuse. Usefully she also maps subsequent chapters to this lifecycle, as her "data roadmap". This serves as a neat outline of the whole book.
Chapter 3 is about data management planning. Briney presents the key questions that need to be answered in any DMP. She then reviews policies that might affect the DMP, be they about privacy, retention, ownership etc. This section I found a bit general, but as she is trying to write for an international audience, and for researchers in many subjects, it would be hard to be more specific. I think if I were a researcher I would still be wondering how to be sure I had a comprehensive view of the policies that affected my own work. Also, she does not stress university policies very much. Case studies include DMP for the book itself.
Chapter 4 covers documentation including lab notebooks for recording the research process, protocols for methods, data dictionaries, metadata and standards (general and subject specific). She gives examples of standards for some of the main science disciplines - but again, given the proliferation of standards within research specialisms this is inevitably a quite broad brush stroke.
Chapter 5 explores file organization, including filenaming conventions and version control. There is also an introduction to databases. Chapter 6 is entitled "improving your data analysis". There is advice here on processing data and on using spreadsheets effectively. It also offers advice on managing code, including how to use GitHub. This is getting further into advice on the research process itself.
Chapter 7 is on managing sensitive data, and includes advice on basic computer security, including encryption. She also outlines approaches to anonymization. Chapter 8 addresses storage and back-ups. Again the advice is very practical. Chapter 9 is about long term storage and preservation - and has especially useful advice on how to collect data with preservation in mind.
Chapter 10 explores data sharing. The first part of the chapter is mostly about issues around licensing data, followed by public sharing of data. Given the importance of open data in current policy agendas, this section seemed a bit buried. Chapter 11 deals with data reuse, explaining how to go about finding data for secondary analysis and how to cite such resources.
Thus the coverage of data management the book is pretty comprehensive; the structure logical. Briney writes from e
... recommended as a textbook for graduate-level research techniques courses. It's an important resource for academic and special library shelves and a vital reference for anyone working with data.
Apparently, NASA lost much of the early data from space exploration, including high quality video footage of the first moon landing. All the more reason to do as it says in the sub-title to the book.
Briney has written a useful primer on data management for researchers which provides practical advice throughout on managing data. It is easy to read and clearly structured. http: //www.ariadne.ac.uk/issue75/cole
Data Management for Researchers: Organize, Maintain and Share Your Data for Research Success joins others in the 'Research Skills' series with an in-depth guide to data management and manipulation created especially with the research community in mind.
As digital data sources and results translate to larger chunks of data and databases available for research purposes, it becomes necessary to develop different types of data management strategies that use a researcher's structure and purposes to best advantage. Data Management for Researchers is one of the few books on the market to delve into such basics as documenting data, improving analytical approaches, assuring security for sensitive data, and backing up work.
Examples and case histories pepper the approach, adding interest and real-world examples to validate the importance of the data management process in research circles. Without the proper protocols in place, data may be compromised, corrupted, or even lost - along with the PhD or study associated with it.
Any serious researcher working with data must make this book a priority read.
For researchers and consumers of data who are often fraught with managing excess information, Briney's book offers valuable techniques, strategies and standards to help achieve proficient data management and successful outcomes. This book can be useful to both novice researchers and well-established scientists alike.

[I] intend to give a copy of this book to each graduate student / trainee that joins my lab

Briney takes the reader through a pragmatic and sensible route through the activities of data management.

I cannot recommend this slender, seemingly innocent looking book enough - it will literally change how you think about data management.

This practical handbook can help bring new researchers quickly up-to-speed on the topic, as well as serve as a reference to meet specific data management needs they encounter throughout the data life cycle.

From the Publisher

Data Management for Researchers

Data Management for Researchers

Paperback(New Edition)

Paperback(New Edition)

Related collections and offers

Overview

Product Details

About the Author

Read an Excerpt

Table of Contents

Customer Reviews

Related collections and offers

Overview

Product Details

About the Author

Read an Excerpt

Table of Contents

Related Subjects

Customer Reviews