Abstract:
Data swapping is a common technique for statistical disclosure limitation, but its effects on real data are not understood completely. In this paper, we consider measures that can be used to quantify distortion to the data engendered by data swapping when the variables in the data set are categorical. These measures are applied to a data set derived from the Current Population Survey. Their behavior is studied and compared for various values of the swapping rate and different choice of the variable swapped.
Keywords:
data utility; data confidentiality; statistical disclosure limitation; Hellinger distance; Shannon entropy; total variation distance; contingency coefficient; Cramer's V.
Publication Date: 
Wednesday, January 1, 2003File Attachment: 
 tr131.pdf
 tr131.pdfReport Number: 
131