Lately, the war between spammers and anyone protecting an online community from them with CAPTCHAs (the portion of registration in forums or a social network or whatnot where you typically need to type out the distorted letters displayed to verify that you’re a human and not a bot) has intensified, with MySpace apparently re-working their algorithms every couple of days and the spammers moving just as quickly to adjust.
It’s going to be very interesting to see what happens to social networks now that the barrier CAPTCHAs erected to stop automated spammers is about to completely crumble.
16 comments
Comments feed for this article
November 26th, 2006 at 7:03 pm
Pentharian
I’ve noticed a few sites lately that require you to listen to a randomized, distorted sound file that reads you numbers and/or letters that you need to type. I’d imagine that such a system is harder, although still not impossible, to use as a spam bot.
November 26th, 2006 at 8:32 pm
Marc/Richter
Gah, the spamming these days is terrible. I have five or so porn friend requests every few days on my MySpace account, and my inbox on the Lusternia forums was full of reported posts asking to delete spam threads. That kind of stuff seems to get worse and worse every few months.
The end is upon us!
November 27th, 2006 at 1:53 am
Yeshaya Distara
they’ve gotten the Aetolia forum… Tydeus even changed the names of one of them in order to try and stop the spammage. Tis rediculous.
November 27th, 2006 at 3:01 pm
Riashain
My gmail is up to 250+! About 4000 less than my old hotmail used to get, but still. Sick of spam!
November 27th, 2006 at 11:53 pm
Dellaster
Wouldn’t it be fairly simple to design the registration page to allow the site administrator to display the text/image of her own choosing and define the correct input? Examples:
A) The text reads “Bob’s Mother’s Sister’s Brother is named Tom. What is Tom’s relationship to Bob?” Correct input would be “Uncle”.
B) An abstract, but clear, image of a duck is displayed. The text asks “what is this?”. Correct input is “duck”.
C) A photograph of a girl riding a horse through a mountain meadow, with forest and a lake in the background. The text asks “what is the girl riding?”. Correct input is “horse”.
D) (Specific to The Forge) The text asks “what structure is shown in the image at the top of the page?” Correct input is “tower”.
E) (Also specific to The Forge) The text reads “What does Matt Mihaly have little interest in sharing? [Click the ‘About’ link at the top of this page for the answer.]” Correct input is “personal life”.
You’d want the input parser to ignore caps and allow several alternate acceptable answers if desired. So in answer to E above, “His Personal Life” would be accepted. In answer to C, “Palomino” might be accepted (if that was what kind of horse she was riding). Nothing difficult here.
Could a spam bot get past any of these, keeping in mind that no two sites would have anything resembling each others’ questions or the format for those questions? The bots couldn’t be optimized to search for specifics as they can for CAPTCHAs.
Seems too easy. Surely someone’s come up with the same idea and discarded it for some reason or other. I’d be interested to learn what flaw(s) I’m not seeing.
November 28th, 2006 at 12:08 am
Matt
Well, let me go through your examples and point out problems that I see with them. I’m not an expert in this area, of course:
A. “What is Tom’s relationship to Bob?” The answer you want is Uncle, but asking a user to think through a logic problem, however simple, creates a barrier to entry, and barriers to entry are B - A - D.
B. “What is this?” You might think duck, whereas I might think “goose” (well, I wouldn’t, but still), or who knows what. Again, barrier to entry, as some people will guess incorrectly. You have to show them something different pretty much every time too, or spammers will scrape the specific pictures off the site and learn to recognize them.
C. Same problem as B.
D. Same problem as B.
E. Depends on the user actually reading your site.
The problem is fundamentally that long-term, any barrier you place in front of a computer in terms of recognizing an image is also a barrier to some segment of the human population. I’m sure we’ve all encountered standard alphanumeric CAPTCHAs that are actually kind of difficult to read. They’re difficult to read in order to fool the computer, but the computer gets better at this all the time.
November 28th, 2006 at 12:46 am
Dellaster
Point taken. But …
I had in mind small sites, like The Forge, rather than big ones like Myspace. A spam bot’s programmer would have no incentive to scrape the specific picture, logic puzzle, or whatever off of one blog among millions to get past that one registration.
The bar for entry with my examples seems pretty low. Of course the blogger/site admin would decide if such a test was appropriate for her site. It would be an optional, extra feature for blog/site software. Personally I’m not sure I’d desire comments from someone who couldn’t do, say, example E. What could they offer a discussion if they couldn’t follow a link and understand a short paragraph? Am I being elitist here?
Thanks for the response. I can see now why the idea hasn’t been implemented.
November 28th, 2006 at 12:50 am
Matt
Yeah, I’d have no problem with implementing it for something like the Forge, but for most commercial enterprises (insofar as the Forge is just my personal blog rather than really ‘part’ of Iron Realms), increasing the registration burden is not an enviable option.
November 28th, 2006 at 4:41 am
Andrew Crystall
For forums, you can make an human user-invisible forum which is “first on the list”. 90%+ of spambots will post there.
For posting on blogs and the like, the idea of a ~5 second peocessing task you need to do before the post shows returns. The average poster will wait 5 seconds. The average spam PC will be seriously impacted.
November 28th, 2006 at 7:18 am
Doogal
Most of the ideas that have been presented forget the fact that the IRE muds are played by blind people and a text reader will not be able to determine what is in that picture.
November 28th, 2006 at 11:13 am
Matt
You’d be surprised how much Spam blogs get, Andrew. The spam is auto-edited out of the Forge by a service that looks for spam across many, many blogs and categorizes similar posts as spam. It doesn’t keep the spammers out (of course, you don’t have to register to post here either) unfortunately.
November 30th, 2006 at 2:53 pm
chas
As Doogal mentions, most of the tools that block machine readability also block access by blind users. While I love kittenAuth, ( http://www.thepcspy.com/kav2test?close=1 ) that IS the downfall of the system…
December 1st, 2006 at 8:01 am
Andrew Crystall
Well yes, but as I said making the the posting computer (sorry, wasn’t clear on that side - not the Human) work through a 5 second processing task would rapidly jam the spammer’s PC. And I can live with my PC doing it..
Sure, they can “only” then send twenty spams per minute, but that’s ruinously low compared to the current rate, AND if they’re hijacking PC’s people WILL notice when they start using 90% CPU time on that task.
December 2nd, 2006 at 1:46 am
Chrissie
I’ll just add that it’s gotten really horrid on the Achaea forums too. If I had a dollar for each of those threads I’ve deleted, I’d be filthy stinkin’ rich.
And unfortunately, the “invisible forum at the top” idea wouldn’t work there… almost all of the bots post in Politics for some reason, which is near the bottom but not at the very bottom of the list.
What could work would be auto-deleting any threads that have a “:)” smiley in the header, ’cause all of the bots use those and no one else ever does -___-
December 2nd, 2006 at 11:31 am
Andrew Crystall
Chrissie, a question - is the politics forum the lowest *numbered*?
Because that’s what I meant (again, should of been clearer)
December 3rd, 2006 at 5:42 am
Chrissie
Newp! It’s number 12. I can’t figure out which one is number 1 actually (can you tell I’m a bit clueless *shifty* ); the ones at the top of the list are 2 and 3.