Abstract | This paper presents the design, implementation, evalu-ation, and initial deployment of SpamSpotter, the first
open, large-scale, real-time reputation system for filtering
spam. Existing blacklists (e.g., SpamHaus) have trouble
keeping pace with spammers’ increasing ability to send
spam from “fresh” IP addresses, and filters based purely
on content are easily evadable. In contrast, SpamSpotter
dynamically classifies email senders in real time based
on their global sending behavior, rather than based on
ephemeral features such as an IP address or the content of
the message. In implementing SpamSpotter, we address
significant challenges involving both dynamism (i.e., de-
termining when to “retrain” our dynamic classification
algorithms) and scale (i.e., maintaining fast, accurate per-
formance in the face of tremendous email message vol-
ume). We have evaluated the performance and accuracy
of SpamSpotter using traces from a large email-hosting
provider and a spam appliance vendor that receives 300
million messages a day. Our evaluation shows that
SpamSpotter is scalable, fast, and accurate. SpamSpotter
is also operational today: it will currently answer queries
from existing spam filtering software (e.g., SpamAssas-
sin) with only minor configuration changes.
|