Tuesday, March 27, 2007

Blog spam

I've been scratching my head over this one most of the day. How do you get rid of blog spam once and for all? You can make it harder, but you cannot actually stop automated posting. If you're running a high profile site, any measures you put in place may be rendered useless.

Consider the product called BotMaster. I'm not going to put a link to them for obvious reasons, but add ".net" to the end of that name if you want to see what we're up against.

Most of my pondering actually started when an abuse email was received from our hosting company for bugs.python.org. This is a brand new roundup tracker that is supposed to take over from the sourceforge tracker currently in use. Unfortunately is is being targeted quite actively by spammers. They actually register an account using a free email address, then they post their spam as attachments to existing issues. The process is completed by spamming various blogs on the internet with links to these attachments.

Two very effective prevention methods have been put into place. One is a mandatory 4 second delay. The current implementation of the bot does not wait 4 seconds. This causes the automated registration to fail. The other is that attachments are always shown with a content-type of text/plain: this renders javascript redirects ineffective and makes the spam message a lot less readable.

The problem with a site like this is it's popularity. If it is popular enough, someone will work at it to defeat any measures you put in place. If the attacker is determined enough, the program will soon be changed to wait while filling in forms.

Kapcha? No longer sufficient. Work out the squareroot of 3 squared plus minus four squared? If they are out to get you someone will write a parser for that. Who sang the theme song to "Spy Hard"? Pretty soon the bot will google for the answer and fill it in. Multiple choice? Just try every possible answer.

At the moment the only good answer seems to be an actively managed bayesian classifier. Nothing sufficiently easy for the lowest common denominator of humans (including those with visual disabilities) remains that cannot be broken by a computer.

I'll be very interested in a simple method to verify that someone is human. Something that is hard for a machine to do. A little like breaking RSA by brute force. Unfortunately math of that kind is a little hard for humans.

It would seem that the only solution is to legislate, disconnect and/or prosecute, but first you need to catch the little bastards.

0 Comments:

Post a Comment

<< Home