thinkingmachine
 

KDD2004: Data Mining and Spam

Pedro Domingos (from UW) presented a paper (co-authored with a number of UWers) about data mining in the presence of an adversary who is deliberately trying to deceive the data miner. This was the big hit of the conference so far. He made the point that this happens in many cases -- spam detection, intrusion detection, counterterrorism, etc -- where there is an adversary who can alter the data to prevent the data miner detecting what he seeks to detect. He argued that this problem has not been addressed before in the data mining field but is interesting and important.

The basic contribution of the paper was to formalize this situation as a game, where the miner and the adversary alternate turns -- the miner creates a system to extract data (e.g. a spam detector), and then the adversary alters the data specifically to foil the detector (e.g. the way spammers act to foil anti-spam filters). it's an adversarial game where each side is trying to maximize his own utility function. unfortunately, the problem is incredibly intractable (doubly exponential) because the space of possible detection algorithms and adversarial responses is exponential. instead they defined an iterated game and presented an algorithm that adapts a naive bayes classifier to account for the way the spammer adversary tries to get around the classifier. their approach makes many assumptions but is an interesting first step.

response to the talk was strong. the room was standing-room-only, and the questions went on and on until the session was broken up. people were still discussing the problem during the next break; it's a fertile idea. identifying and formalizing the problem is a great step; i expect more work will follow in this area.

TrackBack

TrackBack URL for this entry:
http://www.perkowitz.net/lib/mt/mt-tb.cgi/2211

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)