National Institute of Statistical Sciences
Digital Government:
Project Update - May, 2000
1. GEOGRAPHICAL AGGREGATION: Code is being turned over to MCNC for development of a prototype using NASS data. A design meeting is scheduled for 6/5/00. Target date for completion of the prototype is 8/31/00.
2. TABLE AGGREGATION. Code to make tables disclosable by aggregating "adjacent" categories in various dimensions is being developed. Initial versions are operative; their behavior is being explored. The algorithms use various objective functions (e.g., information entropy), and include penalties for removing variables entirely.
3. TABLE SERVERS. A prototype table server, in which queries are for cross-tabulations from a large contingency table and risk is assessed dynamically, using a criterion based on iterative proportional fitting, has been developed, and was demonstrated at the PI meeting (see item 5). Visualization of the query space, the query history and the risk effects of releasing particular tables is a key functionality. A version involving cell suppression is under discussion.
4. BOUNDS FOR TABLE ENTRIES. Research at CMU on scalable methods to compute bounds on entries in large tables is continuing.
5. RISK VS. INFORMATIVENESS. Research is proceeding at CMU/LANL on problem formulations that accommodate both disclosure risk and informativeness of releasing information. In an initial formulation, the former is uncertainty regarding a (data set) parameter sought by an intruder (to be maximized), and the latter is uncertainty in a parameter of interest to legitimate users (to be minimized).
6. PROJECT WEB SITE. As of 6/5/00, a project Web site is in operation: www.niss.org/dg. This site is also accessible via the Digital Government Program web site: www.diggov.org.
7. SUMMER INTERNS. Two graduate interns are spending the summer at NISS working on the project. Chris Holloman (Duke) is working on visualizations of the query space and risk for table servers. Karen Brady (George Washington), who arrives June 5, will be working on summarizing and comparing methods for risk computation and reduction in large tables.
8. PRESENTATIONS concerning the project were made at: