From the Introduction:
The barriers to integrating databases are numerous. One is confidentiality: the database holders—we term them “agencies”—almost always wish to protect the identities of their data subjects. Another is regulation: agencies such as the Census Bureau (Census) and Bureau of Labor Statistics (BLS) are largely forbidden by law to share their data, even with each other, let alone with a trusted third party. A third is scale: despite advances in networking technology, there are few ways to move a terabyte of data from point A today to point B tomorrow. In this paper we focus on linear regression and related analyses.
The regression setting is important because of its prediction aspect; for example, vulnerable critical infrastructure components might be identified using a regression model. We begin in §2 with background on data confidentiality and on secure multi-party computation. Linear regression is treated for “horizontally partitioned data” in §3 and for “vertically partitioned data” in §4. Two methods for secure data integration and an application to secure contingency tables appear in §5, and a concluding discussion in §6
