Mentalfloss.com made news recently for claiming they had to take down their RSS feeds in order to prevent content-scrapers from getting their website blacklisted in Google - after their entire website was removed from Google's index. Google's Matt Cutts vehemently denied that was why they were blacklisted, claiming Mental Floss was removed from the index only after it got hacked. Matt Cutts even republished the email he claims Google sent to Mental Floss' webmaster(s) here. In it Google supposedly explains MF's sudden and complete removal from the index:
"The site was hacked. RSS has nothing to do with it." Matt Cutts on Jul 21st, 2008 @ 1:30pm
We emailed this site on July 7th to let them know exactly why we were removing the site; looks like it got hacked and was showing nasty content. It has *nothing to do* with full-text RSS feeds.
Here's some of the email that we sent on July 7th to this site owner:
Dear site owner or webmaster of mentalfloss.com
While we were indexing your webpages, we detected that some of your pages were using techniques that are outside our quality guidelines, which can be found here: http://www.google.com/webmasters/guideli
nes.html. This appears to be because your site has been modified by a third party. Typically, the offending party gains access to an insecure directory that has open permissions. Many times, they will upload files or modify existing ones, which then show up as spam in our index. The following is some example hidden text we found at eg: http://www.mentalfloss.com/blogs/archive
s/2192: economics times india
The application fee is collected by the JUPAS economics times india on behalf of the 9 participating institutions and is not refundable or transferable to another year. free 2004 income tax forms
Request for use of Accumulated Surplus must be signed by the Hon Fin Sec/Treasurer and countersigned by the President of the Union/Club and submitted to OSA for approval. According to the agreement, Castrol will use Deutsche Bank's complete end-to-end payment and collection solution, as well as db-eBills - the Bank's innovative electronic invoice presentment and payment (EIPP) solution. The Internet's largest source of legitimate, copyrighted 100% digital sheet music since 1997, we now have over 10,000 songs for instant download! For extremely poor families, free 2004 income tax forms provides emergency assistance, while the conditionalities promote longer-term investments in human capital. Australia order viagra online clinic uk in Australia order viagra without a prescription in Australia order generic viagra and other prescription drugs online in Australia viagra order by phone in Australia viagra order on line in Australia order cheap viagra in Australia levitra cialis viagra comparison online order in Australia buy online order viagra in Australia order generic viagra in Australia order viagra overnight in Australia order by phone generic viagra in Australia viagra no prescr chase mastercard rewards program
A device which forms a digitised image of a human fmger print for the purpose of biometric authentication. T subject to search without a warrant while on prison property, according to the lawsuit. It is rare to find an amateur player using this move in a poker game, so if your opponents see you using this move they can be fairly sure you know how to play good poker, and may think twice about bluffing you out of future pots. Download one of listed teens for chase mastercard rewards program taylor torrents or choose from category bit torrent downloads listed here to download your favorite torrent at torrentz. ACI Worldwide Eastern Europe Development is the fast-growing Romanian branch of ACI Worldwide.
bad credit personal finance loans
[...]
In order to preserve the quality of our search engine, we have temporarily removed some of your webpages from our search results. Currently pages from mentalfloss.com are scheduled to be removed for at least 30 days.
We would prefer to have your pages in Google's index. If you wish to be reconsidered, please correct or remove all pages (may not be limited to the examples provided) that are outside our quality guidelines. One potential remedy is to contact your web host technical support for assistance. For more information about security for webmasters, see Security Checklist for Webmasters.
When you are ready, please visit Google for Webmasters [MM's note: I had to hide the Google links Matt included because they were breaking my layout] to learn more and submit your site for reconsideration.
Sincerely,
Google Search Quality Team
The people who run Mental Floss never received that email. The first time they knew of it was after seeing it on TechDirt. Even after seeing it there, and though it seems they were unaware of their website having been hacked, Will Pearson, the President of Mental Floss, commented to TechDirt later:
I was just informed of this post/conversation and wanted to chime in. I'm the president of mental_floss and simply wanted to clear up some confusion. We did not claim that Google instructed us to tweak our RSS feed and we are not blaming Google for any of this. For some reason I did not see the note from Google posted above and so we did not realize why we'd been pulled from their search.
Once we realized we were no longer in Google's natural search, we immediately began taking steps to try and figure out what was going on. After asking a few others with experience in this area, it was suggested to us that we make sure no one was lifting our content from our RSS feed and publishing it in full on their site. We discovered another site that was and decided to tweak our RSS feed just in case that was the cause.
We are continuing to look into this and will resolve the problem Matt has pointed out.
It's very important to us that we are included in Google's index again so we'll work quickly to get this fixed. It's unfortunate because we run a clean operation so I hate that this has happened.
But again, this is not Google's fault. They've simply recognized a problem and we'll work to fix it.
Matt, if you'd be willing to discuss, I would love to have a conversation with you. Thanks for your attention to these matters.
Thanks,
Will
So what was Mental Floss originally claiming? That they got removed from Google's index for allowing their content to be scraped through full-summary RSS feeds, a bit of nonsense that they picked up from their supposedly SEO-knowledgeable friends. How did Google reply? By claiming that wasn't the reason for the removal - hacking was. Also, Will is really careful to "not blame" Google, since they, like 99% of the population, fear Google's wrath much too much to dispense with the usual butt-kissing.
But what of this hacking? You can't tell if Mental Floss was hacked by checking Google since Google removed the cache for that page. You can't tell by checking the page source - everything looks fine. You can't tell by checking Yahoo!'s cache or the Wayback machine at archive.org - though TWBM did index Mental Floss' content for October, 2006, which is the year and month in question, the page that supposedly got "hacked" is missing. You can't tell by checking MSN's cache, either - the page source is as clean as a whistle, having been retrieved just yesterday.
So what's the deal? Most likely comment spam. The page Google claims got "hacked" is almost two years old and received only one comment. Old web pages are usually good targets for comment spam since they are often quite neglected, especially on websites with tremendous amounts of content. Mental Floss certainly fits that bill. Would Google ever admit they tossed a site merely for having a single spam comment? Probably not. Would Google actually toss a site for having a single spam comment? It's possible. What the robots crawl might "look" like a hacked page when in fact it is not.
Am I suggesting that Google kicked an entire popular website out of the index for having a single spammy comment that was identical to the Viagra-laden gooblygook above that Matt Cutts claims was "hidden text in a web page"? Why yes, I am. Short of a better explanation from the people who run Mental Floss, I don't buy that a single two year old web page on a huge website with perhaps thousands of pages of content was "hacked" to suddenly include "hidden text". Google never explains how that could have been done or who did it, and having seen plenty of hacked pages on the Web I know for a fact that when they're hacked it's usually to include not just hidden text but tons of links to spammy websites that are just waiting to profit from the hacking.


Comments
You'll need to use "view source" to see the hidden content. There's a lot of hidden text with a bunch of hidden links. As mentioned in our guidelines at http://www.google.com/support/webmasters/b
I could not figure out who you were even after visiting your OpenID URL so forgive me if I, ah, "googled" you until LinkedIn gave me my answer.
I don't know how I missed the additional comment by Matt Cutts on TechDirt with the link to the cached results from Yahoo! - but I did - was TechDirt screening comments the day I wrote the post? - I just don't recall seeing that - so thanks for the link.
But I have a question -
I didn't like what I saw when I viewed MentalFloss' page source. Every script the hacker(s) called upon to insert spam into Mental Floss is being stored on other sites that were also hacked - those sites seem to be serving as "drone" sites that inject spam into any other site the hackers want to infiltrate.
I checked the home pages of the first ten or so sites used by the hackers to deliver "injection scripts" and they are, with the exception of one website, people's personal blogs and websites, perfectly legitimate and not at all spammy.
Then I checked those hacked websites against Google's results and with the exception of one website, all of the hacked websites are still indexed - and in your cache - which is still crawling with the hacker's spam and lists of redirected pages they used.
Here are results for five of the hacked websites that were used by the hackers to inject scripts into Mentalfloss.com:
http://www.google.com/search?hl=en&client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&q=http%3A%2F%2Fwww.lifecoachingsecret.co
http://www.google.com/search?hl=en&client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&q=http%3A%2F%2Fwww.island-bound.ca&btnG=Search
http://www.google.com/search?hl=en&client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&q=http%3A%2F%2Fwww.newalberta.net%2F&btnG=Search
http://www.google.com/search?hl=en&client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&q=http%3A%2F%2Fwww.birgittes.eu%2Fblog%2
http://www.google.com/search?hl=en&client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&q=http%3A%2F%2Fwww.misterrosco.com&btnG=Search
I could go on cutting every URL from Mental Floss' hacked page source, pasting it into Google and copying result URLs into this reply but I think you get my point: you took Mental Floss out of search results - completely - totally blacklisted them - yet you left hundreds of others sites used to hack into Mental Floss alone. Why?
Because most of those sites are not as popular as Mental Floss is, so it's not as important if people stumble upon those tainted sites and tainted results while searching Google? Those sites were hacked too - so why weren't they treated exactly the same way Mental Floss was?
Edited at 2008-07-28 06:45 pm (UTC)
If a hacker can add hidden text and links, he can also modify existing links to point to bad sites or even add code to infect visitors with malware. That would be a big problem for us, and given that all we have seen is that the hacker was able to add content, it's something that must be expected at any time. If a site is as popular as mentalfloss.com, then the number of users that would be affected by something like this could be very high -- therefore it's possible that we react stronger when a site like that is found to have hacked content.
We see hacked sites with malware on a daily basis in our webmaster help groups, it's a big problem & I'm glad mentalfloss.com was able to resolve the issue before it went that far.
Thanks for the clarification.
I wondered where Google stood on this issue since as some of your US co-workers might recall, last year I was responsible for the removal of some 850,000 pages of redirected content in Google search results from sites all over the Web including Yahoo!, community.livejournal, and my other journal on LJ (Anti-AOL).
I also pinpointed the website hacking all of us and reported them to your company along with the search results they infected, which got their website permanently blacklisted and their AdSense account canceled.
While I may come off as slightly anti-Google I am even more against spammers, hackers, and malware when it all comes down to the wire. I intensely despise that sort of stuff.
I also wondered why Google removed Mental Floss from search results in the first place - even if they were hacked, so what? Your explanation is exactly the one I guessed at, though - that you removed Mental Floss from your index in order to reduce the chance of large amounts of visitors becoming infected by malware. Good enough. Thank you.