Click Official ELI Links
Get Help With Your Extortion Letter | ELI Phone Support | ELI Legal Representation Program
Show your support of the ELI website & ELI Forums through a PayPal Contribution. Thank you for supporting the ongoing fight and reporting of Extortion Settlement Demand Letters.

Author Topic: off topic thread for the tech geeks (Lucia)  (Read 10470 times)

Robert Krausankas (BuddhaPi)

  • ELI Defense Team Member
  • Administrator
  • Hero Member
  • *****
  • Posts: 3354
    • View Profile
    • ExtortionLetterInfo
off topic thread for the tech geeks (Lucia)
« on: January 23, 2013, 10:58:03 AM »
Since moving the ELI site to new servers and working on bugs, glitches and other items, I had the opportunity to have a look at the log files and made a couple of discoveries of interest..

Something somewhere was triggering mod-security to block IP's, the last issue I looked into led to the discovery of 46 alerts triggered! yes 46 different alert strings! I immediately suspected some sort of bot, crawler or scraper. As it turns out a crawler named "magpie crawler" was hammering the server relentlessly.

some quick research told me that magpie crawler is owned and operated by a company named "Brandwatch" and there are plenty of google entries slamming them for being "bad bots" and sucking away bandwidth like crazy..

Who are Brandwatch?

"We are a social media monitoring company, helping our customers find useful and relevant comments and discussion on the web. We crawl blogs, forums, news sites, and all kinds of social media content. The content is indexed, much like a search engine, allowing our users to find the pages that mention the words they are interested in."

Brandwatch is apparantly involved in "Online Reputation Management and Brand Tracking in Social Media". They have a crawler that will just swamp a reasonably small site. No pacing, no maximum number of requests per second, just blast away at the fastest possible rate they can.

They perform no useful function for ELI or our users / readers. They sell information to their clients on who is talking about them. That's right, they are trawling ELI in order to sell information to someone who is worried we might be saying something negative.

Don't be surprised is someone from Brandwatch appears in this thread, apologizing and justifying their bots behavior, it is their typical MO. At this time they "claim" they adhere to robots.txt, I have my doubts, they will be monitored closely, and worst case they will get their IP range blocked from ELI.
Most questions have already been addressed in the forums, get yourself educated before making decisions.

Any advice is strictly that, and anything I may state is based on my opinions, and observations.
Robert Krausankas

I have a few friends around here..

lucia

  • Hero Member
  • *****
  • Posts: 767
    • View Profile
Re: off topic thread for the tech geeks (Lucia)
« Reply #1 on: January 23, 2013, 12:17:21 PM »
Since moving the ELI site to new servers and working on bugs, glitches and other items, I had the opportunity to have a look at the log files and made a couple of discoveries of interest..

Something somewhere was triggering mod-security to block IP's, the last issue I looked into led to the discovery of 46 alerts triggered! yes 46 different alert strings! I immediately suspected some sort of bot, crawler or scraper. As it turns out a crawler named "magpie crawler" was hammering the server relentlessly.
Yep.  Been there. Seen that. :)

magpie is blocked by default with ZBblock. It's really worth adding ZBblock that to the front end of the SMF which uses php.  That will block a bunch of other seo bots.
some quick research told me that magpie crawler is owned and operated by a company named "Brandwatch" and there are plenty of google entries slamming them for being "bad bots" and sucking away bandwidth like crazy..
Yep. These are the sorts of things that need to be blocked.  After I saw the image crawlers racing, I watched more and realized that there is just to much c*ap stuff out there that does your site no good, and needs to be blocked. As far as positive SEO: Google, and maybe bing are worth letting in.  Most other SEO exist for "research" (i.e. party A reads your site "B" to sell information to site C.  There is nothing positive in it for "site B".  )

Don't be surprised is someone from Brandwatch appears in this thread, apologizing and justifying their bots behavior, it is their typical MO. At this time they "claim" they adhere to robots.txt, I have my doubts, they will be monitored closely, and worst case they will get their IP range blocked from ELI.
Don't even wait. Just block them. I advise getting ZBblock http://www.spambotsecurity.com/zbblock_download.php  It's pretty easy to use-- and you'll be glad you do.  Zaphod has pretty good directions to install the thing and once it's installed, for most CMS's you add 1 line to the top of a php file. (For Wordpress it's the config.php or something like that.)

If you use that, you'll block all sorts of other things: majestic12, 80 legs, mobsters in Russia, a fair amount of spambots.  I'm enough up to speed that I can suggest custom things if you need quick help.

This can be in addition to any blocking you do at your router.   (I'm on shared hosting... so ZBblock is a big thing for me. I now use Cloudflare as a CDN and automatically transfer blocks so they happen at Cloudflare. That's great too.  But I think since you have control of the server you don't necessarily need to have the Cloudflare bit. )

FWIW: If you end up using ZBblock, I'm going to ask you to let me read your killed_log.txt files.  I want more data!! :)

Robert Krausankas (BuddhaPi)

  • ELI Defense Team Member
  • Administrator
  • Hero Member
  • *****
  • Posts: 3354
    • View Profile
    • ExtortionLetterInfo
Re: off topic thread for the tech geeks (Lucia)
« Reply #2 on: January 23, 2013, 12:22:13 PM »
make no mistake, they are blocked...albeit at this time thur robots.txt...i'll be watching in case i need to be more proactive..I'll also be looking at zblock.
Most questions have already been addressed in the forums, get yourself educated before making decisions.

Any advice is strictly that, and anything I may state is based on my opinions, and observations.
Robert Krausankas

I have a few friends around here..

lucia

  • Hero Member
  • *****
  • Posts: 767
    • View Profile
Re: off topic thread for the tech geeks (Lucia)
« Reply #3 on: January 23, 2013, 12:30:00 PM »
Carnac predicts: They will violate robots.txt. :)

Robert Krausankas (BuddhaPi)

  • ELI Defense Team Member
  • Administrator
  • Hero Member
  • *****
  • Posts: 3354
    • View Profile
    • ExtortionLetterInfo
Re: off topic thread for the tech geeks (Lucia)
« Reply #4 on: January 23, 2013, 12:32:23 PM »
Carnac predicts: They will violate robots.txt. :)

 i totally agree, but see when they do I can call them out on it, and slam Brandwatch for being shifty in their practices and liars!!
Most questions have already been addressed in the forums, get yourself educated before making decisions.

Any advice is strictly that, and anything I may state is based on my opinions, and observations.
Robert Krausankas

I have a few friends around here..


jot

  • Jr. Member
  • **
  • Posts: 25
    • View Profile
Re: off topic thread for the tech geeks (Lucia)
« Reply #6 on: January 23, 2013, 06:05:35 PM »
I feel if the bot serves no valuable service to a website, it does not need to be crawling it.  As I am finding out the hard way, most malicious bots ignore the robots.txt file.  I'm with lucia and would just go ahead and block them.   

I wish I could use the ZBblock, but our website is on an IIS7.5 server, so all I have to work with is the rewrite rule add on tool and I am still learning how to use it properly.  Lucia, any recommendations if what ZBblock is doing can be done with a rewrite rule?

lucia

  • Hero Member
  • *****
  • Posts: 767
    • View Profile
Re: off topic thread for the tech geeks (Lucia)
« Reply #7 on: January 23, 2013, 09:55:23 PM »
ZBblock can be made to work for with *anything* that uses php.  The ELI forums uses SMF which is a php script.  So, with respect to the forum, ZBBlock would do a great job keeping crawlers off. Reading the killed_log.txt files could also help Robert & Matt identify "things" they might want to block in .htaccess (which is for Apache).

Zbblock doesn't protect static files (unless you do really fancy stuff). So, for that, those on Apache can use .htaccess, or some sort of firewall. (I end up using Cloudflare as my "firewall".)

 I don't know anything about how to protect servers using software that doesn't permit you to use .htaccess.  But *in principle* you might be able to do a lot of stuff with .htaccess you can do with ZBblock-- but ZBblock is easier because Zaphod already wrote it, updates &etc.  ZBblock may also be faster and it permits you to come up with quite a few "tailored" rules if you so desire.   

Are you trying to protect anything dynamic? (A blog? Shopping cart? Etc.)

jot

  • Jr. Member
  • **
  • Posts: 25
    • View Profile
Re: off topic thread for the tech geeks (Lucia)
« Reply #8 on: January 24, 2013, 08:20:18 AM »
We don't use php, but IIS7.5 uses htaccess and webconfig, so a lot of items that can be written for those types of files, I can use.  Was just hoping for some software that will help automate it a bit and make it easier.  From some more research last nite, it looks like URL rewrite will do what I am wanting to do, just that it is time consuming and I have an entire company network to manage along with being a webmaster and the network security specialist.

You know I was so curious to why our website that I had not worked on much all of sudden last year had almost doubled the traffic hits and bandwidth, only to realize later that it is scanning and trolling bots making my life a living hell.  Our site is static, but when we started posting more on on our twitter and FB pages with links back to our website, it seems the "trolls" got interested in us.  I suppose popularity comes at a price  :(

lucia

  • Hero Member
  • *****
  • Posts: 767
    • View Profile
Re: off topic thread for the tech geeks (Lucia)
« Reply #9 on: January 24, 2013, 08:32:29 AM »
There are lots of twitterbots too; I think their goal is advertizing/seo for their customers (not the site visited).  If you post a link to twitter, a swarm comes and it comes instantaneously.   Because my site is only a blog  I ban most of those too.  Eli probably should too (though I don't think anyone is tweeting ELI's address much. But if it does, most twitter bots are useless.  A few might be useful-- someone could let them in and block the others.)

Robert Krausankas (BuddhaPi)

  • ELI Defense Team Member
  • Administrator
  • Hero Member
  • *****
  • Posts: 3354
    • View Profile
    • ExtortionLetterInfo
Re: off topic thread for the tech geeks (Lucia)
« Reply #10 on: January 24, 2013, 08:41:55 AM »
We don't use php, but IIS7.5 uses htaccess and webconfig, so a lot of items that can be written for those types of files, I can use.  Was just hoping for some software that will help automate it a bit and make it easier.  From some more research last nite, it looks like URL rewrite will do what I am wanting to do, just that it is time consuming and I have an entire company network to manage along with being a webmaster and the network security specialist.

You know I was so curious to why our website that I had not worked on much all of sudden last year had almost doubled the traffic hits and bandwidth, only to realize later that it is scanning and trolling bots making my life a living hell.  Our site is static, but when we started posting more on on our twitter and FB pages with links back to our website, it seems the "trolls" got interested in us.  I suppose popularity comes at a price  :(

ahhhhh, there are a number of bots that come running when things are posted on twitter, it's known as a twitter swarm, most of them seem to come from amazonaws.com IPs...
Most questions have already been addressed in the forums, get yourself educated before making decisions.

Any advice is strictly that, and anything I may state is based on my opinions, and observations.
Robert Krausankas

I have a few friends around here..

Robert Krausankas (BuddhaPi)

  • ELI Defense Team Member
  • Administrator
  • Hero Member
  • *****
  • Posts: 3354
    • View Profile
    • ExtortionLetterInfo
Re: off topic thread for the tech geeks (Lucia)
« Reply #11 on: January 24, 2013, 08:45:59 AM »
There are lots of twitterbots too; I think their goal is advertizing/seo for their customers (not the site visited).  If you post a link to twitter, a swarm comes and it comes instantaneously.   Because my site is only a blog  I ban most of those too.  Eli probably should too (though I don't think anyone is tweeting ELI's address much. But if it does, most twitter bots are useless.  A few might be useful-- someone could let them in and block the others.)

No ELI's address is not getting tweeted often, I would like to change that however, and have been concentrating some effort into getting a bit more exposure via twitter... I'm not going to go nuts blocking bots , as I don't have the time to invest, but I will make the time if server resources are effected enough in a negative way.. This could easily be a full time job...I lalready have 2 or 3 of those..
Most questions have already been addressed in the forums, get yourself educated before making decisions.

Any advice is strictly that, and anything I may state is based on my opinions, and observations.
Robert Krausankas

I have a few friends around here..

lucia

  • Hero Member
  • *****
  • Posts: 767
    • View Profile
Re: off topic thread for the tech geeks (Lucia)
« Reply #12 on: January 24, 2013, 10:43:09 AM »
ahhhhh, there are a number of bots that come running when things are posted on twitter, it's known as a twitter swarm, most of them seem to come from amazonaws.com IPs...
ZBblock blocks most of Amazonaws.com with a few bypasses for the wayback machine and other popular with hosts services. One can then block wayback in customsigs.inc.  But many people like the wayback, so Zap has a bypass for that.

That twitterswarm can wreck havoc on a dynamic site with cheap hosting. I escalate Amazonaws.com IP blocks to cloudflare and never unblock those that got blocked.  It's just too much cpu/memory for my hobby site.

lucia

  • Hero Member
  • *****
  • Posts: 767
    • View Profile
Re: off topic thread for the tech geeks (Lucia)
« Reply #13 on: January 24, 2013, 10:48:56 AM »
No ELI's address is not getting tweeted often, I would like to change that however, and have been concentrating some effort into getting a bit more exposure via twitter... I'm not going to go nuts blocking bots , as I don't have the time to invest, but I will make the time if server resources are effected enough in a negative way.. This could easily be a full time job...I lalready have 2 or 3 of those..
The "don't have time to do it full time" is where ZBblock can be useful to people running smaller sites especially hobby sites and forums.  It ends up saving time resources (including human). But it's not necessarily the solution for everything.  You have access to the logs and thus situated to know if excess bot traffic is a problem for ELI.  If it is a big problem: ZBBlock is a good thing to add quickly. If it's not, then no.

jot

  • Jr. Member
  • **
  • Posts: 25
    • View Profile
Re: off topic thread for the tech geeks (Lucia)
« Reply #14 on: January 24, 2013, 09:57:22 PM »
There are lots of twitterbots too; I think their goal is advertizing/seo for their customers (not the site visited).  If you post a link to twitter, a swarm comes and it comes instantaneously.   Because my site is only a blog  I ban most of those too.  Eli probably should too (though I don't think anyone is tweeting ELI's address much. But if it does, most twitter bots are useless.  A few might be useful-- someone could let them in and block the others.)

No ELI's address is not getting tweeted often, I would like to change that however, and have been concentrating some effort into getting a bit more exposure via twitter... I'm not going to go nuts blocking bots , as I don't have the time to invest, but I will make the time if server resources are effected enough in a negative way.. This could easily be a full time job...I lalready have 2 or 3 of those..

LOL..I hear you, and I know the feeling. 

It seems since the whole Getty thing, I have now turned into being more of a security specialist working on hardening our network even more.  We host our own web server on a DMZ, so I have the luxury of blocking large swaths of IP ranges too, but the sheer number going after our web server is ridiculus.  I have over 4000 attempts on our network every month! Good news is I have learned a lot about URL Rewrite, the htaccess and the webconfig configurations over the last few days, so maybe I can slow some of them down.  Compiling a databse and some instructions on how to deal with some of them for future postings.  :)

 

Official ELI Help Options
Get Help With Your Extortion Letter | ELI Phone Support Call | ELI Defense Letter Program
Show your support of the ELI website & ELI Forums through a PayPal Contribution. Thank you for supporting the ongoing fight and reporting of Extortion Settlement Demand Letters.