Using k-anonymization for registry data: pitfalls and alternatives

Authors

  • Sten Anspal Estonian Centre for Applied Research, Tallinn
  • Mart Kaska Estonian Centre for Applied Research, Tallinn
  • Indrek Seppo Estonian Centre for Applied Research, Tallinn

DOI:

https://doi.org/10.12697/ACUTM.2017.21.05

Keywords:

privacy-preserving computing, k-anonymization

Abstract

We describe an applied study of ICT students' employment in Estonia based on data from two national registries. The study offered an opportunity to compare results from both k-anonymised data as well as those from the novel Sharemind platform for privacy-preserving statistical computing, which offers a way to use confidential data for research without loss of information. Comparison of results using k-anonymized and lossless data indicate substantial differences in estimates of students' employment rates. The results illustrate, on the basis of a real-world study, how the effects of k-anonymization can lead to considerable bias in estimates. While privacy-preserving computing does entail inconveniences because original microdata is not revealed to the statistician, this can be offset by greater confidence in the results.

Downloads

Download data is not yet available.

Downloads

Published

2017-07-03

Issue

Section

Articles