Protecting respondents identities in microdata release
Top Cited Papers
- 1 January 2001
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Knowledge and Data Engineering
- Vol. 13 (6), 1010-1027
- https://doi.org/10.1109/69.971193
Abstract
Today's globally networked society places great demands on the dissemination and sharing of information. While in the past released information was mostly in tabular and statistical form, many situations call for the release of specific data (microdata). In order to protect the anonymity of the entities (called respondents) to which information refers, data holders often remove or encrypt explicit identifiers such as names, addresses, and phone numbers. Deidentifying data, however, provides no guarantee of anonymity. Released information often contains other data, such as race, birth date, sex, and ZIP code, that can be linked to publicly available information to reidentify respondents and inferring information that was not intended for disclosure. In this paper we address the problem of releasing microdata while safeguarding the anonymity of respondents to which the data refer. The approach is based on the definition of k-anonymity. A table provides k-anonymity if attempts to link explicitly identifying information to its content map the information to at least k entities. We illustrate how k-anonymity can be provided without compromising the integrity (or truthfulness) of the information released by using generalization and suppression techniques. We introduce the concept of minimal generalization that captures the property of the release process not distorting the data more than needed to achieve k-anonymity, and present an algorithm for the computation of such a generalization. We also discuss possible preference policies to choose among different minimal generalizations.Keywords
This publication has 9 references indexed in Scilit:
- Aggregation and inference: facts and fallaciesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Detection and elimination of inference channels in multilevel relational database systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- A security policy model for clinical information systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Weaving Technology and Policy Together to Maintain ConfidentialityJournal of Law, Medicine & Ethics, 1997
- Statistical Disclosure Control in PracticeLecture Notes in Statistics, 1996
- The Computer-Based Patient Record and ConfidentialityNew England Journal of Medicine, 1995
- Information privacy issues for the 1990sPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1990
- Security-control methods for statistical databases: a comparative studyACM Computing Surveys, 1989
- Suppression Methodology and Statistical Disclosure ControlJournal of the American Statistical Association, 1980