|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
Detecting and refusing crawlers/robotsA4D v3+
After reviewing my web logs I can see that a range of IP address has been crawling my customer's sites. I can think of a way to detect and prevent them from causing too much activity by using arrays of IP Addresses, Request Counters, and Time-of-Day that which when SamePageRequests > HourlyCounterLimit(maybe 45), refuses further responses to that IP address for 1 hour. Is there a better way to solve this problem beside using the HTML and robots.txt? <META NAME="ROBOTS" CONTENT="NOINDEX"> <META NAME="ROBOTS" CONTENT="NOFOLLOW"> TIA! David _______________________________________________ Active4D-dev mailing list Active4D-dev@... http://mailman.aparajitaworld.com/mailman/listinfo/active4d-dev Archives: http://mailman.aparajitaworld.com/archive/active4d-dev/ |
|
|
Re: Detecting and refusing crawlers/robotsDavid,
that's for the good ones. Check who's reading the robot.txt file and put them automatically on your "Is_a_bot-list". Save the IP address _and_ the browser client. For my own system I generated so a list of possible robots and just send them a "bad-robot-page" from the OWA. Am 11.04.2009 um 18:15 schrieb David Ringsmuth: > Is there a better way to solve this problem beside using the HTML and > robots.txt? > > <META NAME="ROBOTS" CONTENT="NOINDEX"> > > <META NAME="ROBOTS" CONTENT="NOFOLLOW"> Mit freundlichen Grüßen [4D-Consulting.com]eK, Wiesbaden Peter Schumacher -------------------------------------------------------- Web: http://www.4D-Consulting.com/ FreeCall: 0800-434 636 7 Tel.: 0611-9406.850 - Fax: 0611-9406.744 iChat/Skype: PeterInWiesbaden 4D-Consulting.com eK - Scharnhorststr. 36 - 65195 Wiesbaden HR Wiesbaden: HRA 4867 Mitglied im Entwicklernetzwerk http://www.die4dwerkstatt.de _______________________________________________ Active4D-dev mailing list Active4D-dev@... http://mailman.aparajitaworld.com/mailman/listinfo/active4d-dev Archives: http://mailman.aparajitaworld.com/archive/active4d-dev/ |
|
|
Re: Detecting and refusing crawlers/robotsPeter,
Don't the 'bad robots' typically skip the robots.txt file? If they don't read the file, how would your proposal work? I would think the only way you could do this is compare your robots.txt readers to bot-like requests in your log files to find the bots that aren't playing by the rules then automatically blacklist them. Or is that what you are proposing? -- Brad Perkins > David, > > that's for the good ones. Check who's reading the robot.txt file and > put them automatically on your "Is_a_bot-list". Save the IP address > _and_ the browser client. For my own system I generated so a list of > possible robots and just send them a "bad-robot-page" from the OWA. > > Am 11.04.2009 um 18:15 schrieb David Ringsmuth: > >> Is there a better way to solve this problem beside using the HTML and >> robots.txt? >> >> <META NAME="ROBOTS" CONTENT="NOINDEX"> >> >> <META NAME="ROBOTS" CONTENT="NOFOLLOW"> > > Mit freundlichen Grüßen > [4D-Consulting.com]eK, Wiesbaden > Peter Schumacher > -------------------------------------------------------- > Web: http://www.4D-Consulting.com/ > FreeCall: 0800-434 636 7 > Tel.: 0611-9406.850 - Fax: 0611-9406.744 > iChat/Skype: PeterInWiesbaden > 4D-Consulting.com eK - Scharnhorststr. 36 - 65195 Wiesbaden > HR Wiesbaden: HRA 4867 > Mitglied im Entwicklernetzwerk http://www.die4dwerkstatt.de > > > > > > > > _______________________________________________ > Active4D-dev mailing list > Active4D-dev@... > http://mailman.aparajitaworld.com/mailman/listinfo/active4d-dev > Archives: http://mailman.aparajitaworld.com/archive/active4d-dev/ > _______________________________________________ Active4D-dev mailing list Active4D-dev@... http://mailman.aparajitaworld.com/mailman/listinfo/active4d-dev Archives: http://mailman.aparajitaworld.com/archive/active4d-dev/ |
|
|
Re: Detecting and refusing crawlers/robotsExactly.
1. The request comes in 2. Compare IP address 3. Compare browser client 4. if 2 or 3 matches -> goto-my-own-bad-robot-page Am 11.04.2009 um 19:50 schrieb Bradley D. Perkins: > Peter, > > Don't the 'bad robots' typically skip the robots.txt file? If they > don't > read the file, how would your proposal work? > > I would think the only way you could do this is compare your > robots.txt > readers to bot-like requests in your log files to find the bots that > aren't playing by the rules then automatically blacklist them. Or is > that > what you are proposing? > > -- Brad Perkins > >> David, >> >> that's for the good ones. Check who's reading the robot.txt file and >> put them automatically on your "Is_a_bot-list". Save the IP address >> _and_ the browser client. For my own system I generated so a list of >> possible robots and just send them a "bad-robot-page" from the OWA. >> >> Am 11.04.2009 um 18:15 schrieb David Ringsmuth: >> >>> Is there a better way to solve this problem beside using the HTML >>> and >>> robots.txt? >>> >>> <META NAME="ROBOTS" CONTENT="NOINDEX"> >>> >>> <META NAME="ROBOTS" CONTENT="NOFOLLOW"> >> >> Mit freundlichen Grüßen >> [4D-Consulting.com]eK, Wiesbaden >> Peter Schumacher >> -------------------------------------------------------- >> Web: http://www.4D-Consulting.com/ >> FreeCall: 0800-434 636 7 >> Tel.: 0611-9406.850 - Fax: 0611-9406.744 >> iChat/Skype: PeterInWiesbaden >> 4D-Consulting.com eK - Scharnhorststr. 36 - 65195 Wiesbaden >> HR Wiesbaden: HRA 4867 >> Mitglied im Entwicklernetzwerk http://www.die4dwerkstatt.de >> >> >> >> >> >> >> >> _______________________________________________ >> Active4D-dev mailing list >> Active4D-dev@... >> http://mailman.aparajitaworld.com/mailman/listinfo/active4d-dev >> Archives: http://mailman.aparajitaworld.com/archive/active4d-dev/ >> > > _______________________________________________ > Active4D-dev mailing list > Active4D-dev@... > http://mailman.aparajitaworld.com/mailman/listinfo/active4d-dev > Archives: http://mailman.aparajitaworld.com/archive/active4d-dev/ Mit freundlichen Grüßen [4D-Consulting.com]eK, Wiesbaden Peter Schumacher -------------------------------------------------------- Web: http://www.4D-Consulting.com/ FreeCall: 0800-434 636 7 Tel.: 0611-9406.850 - Fax: 0611-9406.744 iChat/Skype: PeterInWiesbaden 4D-Consulting.com eK - Scharnhorststr. 36 - 65195 Wiesbaden HR Wiesbaden: HRA 4867 Mitglied im Entwicklernetzwerk http://www.die4dwerkstatt.de _______________________________________________ Active4D-dev mailing list Active4D-dev@... http://mailman.aparajitaworld.com/mailman/listinfo/active4d-dev Archives: http://mailman.aparajitaworld.com/archive/active4d-dev/ |
| Free embeddable forum powered by Nabble | Forum Help |