Detecting and refusing crawlers/robots

View: New views
4 Messages — Rating Filter:   Alert me  

Detecting and refusing crawlers/robots

by David Ringsmuth :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

A4D v3+

 

After reviewing my web logs I can see that a range of IP address has been
crawling my customer's sites.

 

I can think of a way to detect and prevent them from causing too much
activity by using arrays of IP Addresses, Request Counters, and Time-of-Day
that which when SamePageRequests > HourlyCounterLimit(maybe 45), refuses
further responses to that IP address for 1 hour.

 

Is there a better way to solve this problem beside using the HTML and
robots.txt?

<META NAME="ROBOTS" CONTENT="NOINDEX">

<META NAME="ROBOTS" CONTENT="NOFOLLOW">

 

TIA!

 

David

_______________________________________________
Active4D-dev mailing list
Active4D-dev@...
http://mailman.aparajitaworld.com/mailman/listinfo/active4d-dev
Archives: http://mailman.aparajitaworld.com/archive/active4d-dev/

Re: Detecting and refusing crawlers/robots

by Peter Schumacher-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

David,

that's for the good ones. Check who's reading the robot.txt file and  
put them automatically on your "Is_a_bot-list". Save the IP address  
_and_ the browser client. For my own system I generated so a list of  
possible robots and just send them a "bad-robot-page" from the OWA.

Am 11.04.2009 um 18:15 schrieb David Ringsmuth:

> Is there a better way to solve this problem beside using the HTML and
> robots.txt?
>
> <META NAME="ROBOTS" CONTENT="NOINDEX">
>
> <META NAME="ROBOTS" CONTENT="NOFOLLOW">

Mit freundlichen Grüßen
[4D-Consulting.com]eK, Wiesbaden
Peter Schumacher
--------------------------------------------------------
Web: http://www.4D-Consulting.com/
FreeCall:  0800-434 636 7
Tel.:      0611-9406.850 - Fax: 0611-9406.744
iChat/Skype: PeterInWiesbaden
4D-Consulting.com eK - Scharnhorststr. 36 - 65195 Wiesbaden
HR Wiesbaden: HRA 4867
Mitglied im Entwicklernetzwerk http://www.die4dwerkstatt.de







_______________________________________________
Active4D-dev mailing list
Active4D-dev@...
http://mailman.aparajitaworld.com/mailman/listinfo/active4d-dev
Archives: http://mailman.aparajitaworld.com/archive/active4d-dev/

Re: Detecting and refusing crawlers/robots

by B. Perkins :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Peter,

Don't the 'bad robots' typically skip the robots.txt file? If they don't
read the file, how would your proposal work?

I would think the only way you could do this is compare your robots.txt
readers to bot-like requests in your log files to find the bots that
aren't playing by the rules then automatically blacklist them. Or is that
what you are proposing?

-- Brad Perkins

> David,
>
> that's for the good ones. Check who's reading the robot.txt file and
> put them automatically on your "Is_a_bot-list". Save the IP address
> _and_ the browser client. For my own system I generated so a list of
> possible robots and just send them a "bad-robot-page" from the OWA.
>
> Am 11.04.2009 um 18:15 schrieb David Ringsmuth:
>
>> Is there a better way to solve this problem beside using the HTML and
>> robots.txt?
>>
>> <META NAME="ROBOTS" CONTENT="NOINDEX">
>>
>> <META NAME="ROBOTS" CONTENT="NOFOLLOW">
>
> Mit freundlichen Grüßen
> [4D-Consulting.com]eK, Wiesbaden
> Peter Schumacher
> --------------------------------------------------------
> Web: http://www.4D-Consulting.com/
> FreeCall:  0800-434 636 7
> Tel.:      0611-9406.850 - Fax: 0611-9406.744
> iChat/Skype: PeterInWiesbaden
> 4D-Consulting.com eK - Scharnhorststr. 36 - 65195 Wiesbaden
> HR Wiesbaden: HRA 4867
> Mitglied im Entwicklernetzwerk http://www.die4dwerkstatt.de
>
>
>
>
>
>
>
> _______________________________________________
> Active4D-dev mailing list
> Active4D-dev@...
> http://mailman.aparajitaworld.com/mailman/listinfo/active4d-dev
> Archives: http://mailman.aparajitaworld.com/archive/active4d-dev/
>

_______________________________________________
Active4D-dev mailing list
Active4D-dev@...
http://mailman.aparajitaworld.com/mailman/listinfo/active4d-dev
Archives: http://mailman.aparajitaworld.com/archive/active4d-dev/

Re: Detecting and refusing crawlers/robots

by Peter Schumacher-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Exactly.

1. The request comes in
2. Compare IP address
3. Compare browser client
4. if 2 or 3 matches -> goto-my-own-bad-robot-page

Am 11.04.2009 um 19:50 schrieb Bradley D. Perkins:

> Peter,
>
> Don't the 'bad robots' typically skip the robots.txt file? If they  
> don't
> read the file, how would your proposal work?
>
> I would think the only way you could do this is compare your  
> robots.txt
> readers to bot-like requests in your log files to find the bots that
> aren't playing by the rules then automatically blacklist them. Or is  
> that
> what you are proposing?
>
> -- Brad Perkins
>
>> David,
>>
>> that's for the good ones. Check who's reading the robot.txt file and
>> put them automatically on your "Is_a_bot-list". Save the IP address
>> _and_ the browser client. For my own system I generated so a list of
>> possible robots and just send them a "bad-robot-page" from the OWA.
>>
>> Am 11.04.2009 um 18:15 schrieb David Ringsmuth:
>>
>>> Is there a better way to solve this problem beside using the HTML  
>>> and
>>> robots.txt?
>>>
>>> <META NAME="ROBOTS" CONTENT="NOINDEX">
>>>
>>> <META NAME="ROBOTS" CONTENT="NOFOLLOW">
>>
>> Mit freundlichen Grüßen
>> [4D-Consulting.com]eK, Wiesbaden
>> Peter Schumacher
>> --------------------------------------------------------
>> Web: http://www.4D-Consulting.com/
>> FreeCall:  0800-434 636 7
>> Tel.:      0611-9406.850 - Fax: 0611-9406.744
>> iChat/Skype: PeterInWiesbaden
>> 4D-Consulting.com eK - Scharnhorststr. 36 - 65195 Wiesbaden
>> HR Wiesbaden: HRA 4867
>> Mitglied im Entwicklernetzwerk http://www.die4dwerkstatt.de
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Active4D-dev mailing list
>> Active4D-dev@...
>> http://mailman.aparajitaworld.com/mailman/listinfo/active4d-dev
>> Archives: http://mailman.aparajitaworld.com/archive/active4d-dev/
>>
>
> _______________________________________________
> Active4D-dev mailing list
> Active4D-dev@...
> http://mailman.aparajitaworld.com/mailman/listinfo/active4d-dev
> Archives: http://mailman.aparajitaworld.com/archive/active4d-dev/

Mit freundlichen Grüßen
[4D-Consulting.com]eK, Wiesbaden
Peter Schumacher
--------------------------------------------------------
Web: http://www.4D-Consulting.com/
FreeCall:  0800-434 636 7
Tel.:      0611-9406.850 - Fax: 0611-9406.744
iChat/Skype: PeterInWiesbaden
4D-Consulting.com eK - Scharnhorststr. 36 - 65195 Wiesbaden
HR Wiesbaden: HRA 4867
Mitglied im Entwicklernetzwerk http://www.die4dwerkstatt.de







_______________________________________________
Active4D-dev mailing list
Active4D-dev@...
http://mailman.aparajitaworld.com/mailman/listinfo/active4d-dev
Archives: http://mailman.aparajitaworld.com/archive/active4d-dev/