Classification of Partially Observed Data with Association Trees (2006)

Abstract:

Classification methods have troubles with missing data. Even CART, which was designed to deal with missing data, performs poorly when run with over 90% of the predictors unobserved. We use the Apriori algorithm to fit decision trees by converting the continuous predictors to categorical variables, bypassing the missing data problem by treating missing data as absent items. We demonstrate our methodology in a setting simulating a distributed, low-overhead, quality assurance system, where we have control over which predictors are missing for each observation. We also demonstrate how performance can be improved by the introduction of a simple adaptive sampling method. 

Author: 
Michael LastSandro FoucheAlan F. KarrAlex OrsoAdam PorterS. Stanley Young
Publication Date: 
Wednesday, November 1, 2006
File Attachment: 
PDF icon tr164.pdf
Report Number: 
164