ISBN-10:
1119978157
ISBN-13:
9781119978152
Pub. Date:
09/25/2012
Publisher:
Wiley
Statistical Disclosure Control / Edition 1

Statistical Disclosure Control / Edition 1

Current price is , Original price is $110.95. You

Temporarily Out of Stock Online

Please check back later for updated availability.

Product Details

ISBN-13: 9781119978152
Publisher: Wiley
Publication date: 09/25/2012
Series: Wiley Series in Survey Methodology Series
Pages: 302
Product dimensions: 9.10(w) x 6.20(h) x 0.80(d)

About the Author

Anco Hundepool, Statistics Netherlands, The Netherlands.

Josep Domingo-Ferrer, Universitat Rovira i Virgili, Spain.

Luisa Franconi, Head of Unit on Statistical Disclosure Control Methods, ISTAT, Italy.

Sarah Giessing, Federal Statistical Office of Germany, Germany.

Keith Spicer, Office for National Statistics, Portsmouth, UK.

Eric Schulte Nordholt, Senior researcher and project leader at Statistics, The Netherlands.

Peter-Paul De Wolf, Methodologist at National Institute of Statistics, The Netherlands.

Read an Excerpt

Click to read or download

Table of Contents

Preface xi

Acknowledgements xv

1 Introduction 1

1.1 Concepts and definitions 2

1.1.1 Disclosure 2

1.1.2 Statistical disclosure control 3

1.1.3 Tabular data 3

1.1.4 Microdata 3

1.1.5 Risk and utility 4

1.2 An approach to Statistical Disclosure Control 7

1.2.1 Why is confidentiality protection needed? 7

1.2.2 What are the key characteristics and uses of the data? 8

1.2.3 What disclosure risks need to be protected against? 8

1.2.4 Disclosure control methods 8

1.2.5 Implementation 9

1.3 The chapters of the handbook 9

2 Ethics, principles, guidelines and regulations – ageneral background 10

2.1 Introduction 10

2.2 Ethical codes and the new ISI code 11

2.2.1 ISI Declaration on Professional Ethics 11

2.2.2 New ISI Declaration on Professional Ethics 12

2.2.3 European Statistics Code of Practice 15

2.3 UNECE principles and guidelines 16

2.3.1 UNECE Principles and Guidelines on Confidentiality Aspectsof Data Integration 18

2.3.2 Future activities on the UNECE principles and guidelines19

2.4 Laws 19

2.4.1 Committee on Statistical Confidentiality 20

2.4.2 European Statistical System Committee 20

3 Microdata 23

3.1 Introduction 23

3.2 Microdata concepts 24

3.2.1 Stage 1: Assess need for confidentiality protection 24

3.2.2 Stage 2: Key characteristics and use of microdata 27

3.2.3 Stage 3: Disclosure risk 30

3.2.4 Stage 4: Disclosure control methods 32

3.2.5 Stage 5: Implementation 34

3.3 Definitions of disclosure 36

3.3.1 Definitions of disclosure scenarios 37

3.4 Definitions of disclosure risk 38

3.4.1 Disclosure risk for categorical quasi-identifiers 39

3.4.2 Notation and assumptions 40

3.4.3 Disclosure risk for continuous quasi-identifiers 41

3.5 Estimating re-identification risk 43

3.5.1 Individual risk based on the sample: Threshold rule 44

3.5.2 Estimating individual risk using sampling weights 44

3.5.3 Estimating individual risk by Poisson model 47

3.5.4 Further models that borrow information from other sources48

3.5.5 Estimating per record risk via heuristics 49

3.5.6 Assessing risk via record linkage 50

3.6 Non-perturbative microdata masking 51

3.6.1 Sampling 51

3.6.2 Global recoding 52

3.6.3 Top and bottom coding 53

3.6.4 Local suppression 53

3.7 Perturbative microdata masking 53

3.7.1 Additive noise masking 54

3.7.2 Multiplicative noise masking 57

3.7.3 Microaggregation 60

3.7.4 Data swapping and rank swapping 72

3.7.5 Data shuffling 73

3.7.6 Rounding 73

3.7.7 Re-sampling 74

3.7.8 PRAM 74

3.7.9 MASSC 78

3.8 Synthetic and hybrid data 78

3.8.1 Fully synthetic data 79

3.8.2 Partially synthetic data 84

3.8.3 Hybrid data 86

3.8.4 Pros and cons of synthetic and hybrid data 98

3.9 Information loss in microdata 100

3.9.1 Information loss measures for continuous data 101

3.9.2 Information loss measures for categorical data 108

3.10 Release of multiple files from the same microdata set110

3.11 Software 111

3.11.1 μ-argus 111

3.11.2 sdcMicro 113

3.11.3 IVEware 115

3.12 Case studies 116

3.12.1 Microdata files at Statistics Netherlands 116

3.12.2 The European Labour Force Survey microdata for researchpurposes 118

3.12.3 The European Structure of Earnings Survey microdata forresearch purposes 121

3.12.4 NHIS-linked mortality data public use file, USA 128

3.12.5 Other real case instances 130

4 Magnitude tabular data 131

4.1 Introduction 131

4.1.1 Magnitude tabular data: Basic terminology 131

4.1.2 Complex tabular data structures: Hierarchical and linkedtables 132

4.1.3 Risk concepts 134

4.1.4 Protection concepts 137

4.1.5 Information loss concepts 137

4.1.6 Implementation: Software, guidelines and case study138

4.2 Disclosure risk assessment I: Primary sensitive cells138

4.2.1 Intruder scenarios 138

4.2.2 Sensitivity rules 140

4.3 Disclosure risk assessment II: Secondary risk assessment152

4.3.1 Feasibility interval 152

4.3.2 Protection level 154

4.3.3 Singleton and multi cell disclosure 155

4.3.4 Risk models for hierarchical and linked tables 155

4.4 Non-perturbative protection methods 157

4.4.1 Global recoding 157

4.4.2 The concept of cell suppression 157

4.4.3 Algorithms for secondary cell suppression 158

4.4.4 Secondary cell suppression in hierarchical and linkedtables 161

4.5 Perturbative protection methods 163

4.5.1 A pre-tabular method: Multiplicative noise 165

4.5.2 A post-tabular method: Controlled tabular adjustment165

4.6 Information loss measures for tabular data 166

4.6.1 Cell costs for cell suppression 166

4.6.2 Cell costs for CTA 167

4.6.3 Information loss measures to evaluate the outcome of tableprotection 167

4.7 Software for tabular data protection 168

4.7.1 Empirical comparison of cell suppression algorithms169

4.8 Guidelines: Setting up an efficient table modelsystematically 173

4.8.1 Defining spanning variables 174

4.8.2 Response variables and mapping rules 175

4.9 Case studies 178

4.9.1 Response variables and mapping rules of the case study178

4.9.2 Spanning variables of the case study 179

4.9.3 Analysing the tables of the case study 179

4.9.4 Software issues of the case study 181

5 Frequency tables 183

5.1 Introduction 183

5.2 Disclosure risks 184

5.2.1 Individual attribute disclosure 185

5.2.2 Group attribute disclosure 186

5.2.3 Disclosure by differencing 187

5.2.4 Perception of disclosure risk 190

5.3 Methods 191

5.3.1 Pre-tabular 191

5.3.2 Table re-design 192

5.3.3 Post-tabular 193

5.4 Post-tabular methods 193

5.4.1 Cell suppression 193

5.4.2 ABS cell perturbation 193

5.4.3 Rounding 194

5.5 Information loss 199

5.6 Software 201

5.6.1 Introduction 201

5.6.2 Optimal, first feasible and RAPID solutions 202

5.6.3 Protection provided by controlled rounding 203

5.7 Case studies 204

5.7.1 UK Census 204

5.7.2 Australian and New Zealand Censuses 205

6 Data access issues 208

6.1 Introduction 208

6.2 Research data centres 209

6.3 Remote execution 209

6.4 Remote access 210

6.5 Licensing 211

6.6 Guidelines on output checking 211

6.6.1 Introduction 211

6.6.2 General approach 212

6.6.3 Rules for output checking 215

6.6.4 Organisational/procedural aspects of output checking224

6.6.5 Researcher training 233

6.7 Additional issues concerning data access 236

6.7.1 Examples of disclaimers 236

6.7.2 Output description 236

6.8 Case studies 237

6.8.1 The US Census Bureau Microdata Analysis System 237

6.8.2 Remote access at Statistics Netherlands 239

Glossary 243

References 261

Author index 279

Subject index 282

Customer Reviews

Most Helpful Customer Reviews

See All Customer Reviews