Since moving the ELI site to new servers and working on bugs, glitches and other items, I had the opportunity to have a look at the log files and made a couple of discoveries of interest..
Something somewhere was triggering mod-security to block IP's, the last issue I looked into led to the discovery of 46 alerts triggered! yes 46 different alert strings! I immediately suspected some sort of bot, crawler or scraper. As it turns out a crawler named "magpie crawler" was hammering the server relentlessly.
some quick research told me that magpie crawler is owned and operated by a company named "Brandwatch" and there are plenty of google entries slamming them for being "bad bots" and sucking away bandwidth like crazy..
Who are Brandwatch?
"We are a social media monitoring company, helping our customers find useful and relevant comments and discussion on the web. We crawl blogs, forums, news sites, and all kinds of social media content. The content is indexed, much like a search engine, allowing our users to find the pages that mention the words they are interested in."
Brandwatch is apparantly involved in "Online Reputation Management and Brand Tracking in Social Media". They have a crawler that will just swamp a reasonably small site. No pacing, no maximum number of requests per second, just blast away at the fastest possible rate they can.
They perform no useful function for ELI or our users / readers. They sell information to their clients on who is talking about them. That's right, they are trawling ELI in order to sell information to someone who is worried we might be saying something negative.
Don't be surprised is someone from Brandwatch appears in this thread, apologizing and justifying their bots behavior, it is their typical MO. At this time they "claim" they adhere to robots.txt, I have my doubts, they will be monitored closely, and worst case they will get their IP range blocked from ELI.
Something somewhere was triggering mod-security to block IP's, the last issue I looked into led to the discovery of 46 alerts triggered! yes 46 different alert strings! I immediately suspected some sort of bot, crawler or scraper. As it turns out a crawler named "magpie crawler" was hammering the server relentlessly.
some quick research told me that magpie crawler is owned and operated by a company named "Brandwatch" and there are plenty of google entries slamming them for being "bad bots" and sucking away bandwidth like crazy..
Who are Brandwatch?
"We are a social media monitoring company, helping our customers find useful and relevant comments and discussion on the web. We crawl blogs, forums, news sites, and all kinds of social media content. The content is indexed, much like a search engine, allowing our users to find the pages that mention the words they are interested in."
Brandwatch is apparantly involved in "Online Reputation Management and Brand Tracking in Social Media". They have a crawler that will just swamp a reasonably small site. No pacing, no maximum number of requests per second, just blast away at the fastest possible rate they can.
They perform no useful function for ELI or our users / readers. They sell information to their clients on who is talking about them. That's right, they are trawling ELI in order to sell information to someone who is worried we might be saying something negative.
Don't be surprised is someone from Brandwatch appears in this thread, apologizing and justifying their bots behavior, it is their typical MO. At this time they "claim" they adhere to robots.txt, I have my doubts, they will be monitored closely, and worst case they will get their IP range blocked from ELI.