|
View:
New views
13 Messages
—
Rating Filter:
Alert me
|
|
|
referrer spam detectionHi list,
here is a very rough referrer spam detection and blocking script I wrote for antville.org. I think it may be useful for other big antville installations. It's very rough in its current state, and not at all integrated into the antville app infrastructure. It needs to be polished and probably should be be integrated into the antville SysMgr. Attached you find file refspam, which provides a global object containing two functions: Refspam.track(), which should be called as first thing in HopObject.onRequest(), and Refspam.dump(), which should be called from Root.refspam_action() and provides output for current referrer blocking state and blocked requests. The way referrer detection and blocking works is very simple, it's described here <http://www.henso.com/log/2006.05.28/1154/>. We keep a least-recently-used Hashtable of size 128 in app.data.refspam which is keyed with the host names of referrer headers we get. As soon as we see more than 20 requests with a given referrer host, we check if the number of IP addresses the requests came from is below a given ratio, and if the number of referrer path names is above a certain ratio (this is to prevent valid intranet links to be qualified as spam), and if so, requests are redirected to the /refspam action which displays a message and provides a link to continue to the original target. Referrer bots won't follow the redirect, so it's a good safety net. The script also contains a hardcoded whiltelist for hostnames which currently contains ".antville.org" and ".google.". The parameters and the whitelist should probably be configurable through antville's management interface, and there probably should also be configurable blacklist. I hope this will be useful for somebody, and that somebody is going to integrate this into the antville code base. hannes _______________________________________________ Antville-dev mailing list Antville-dev@... http://helma.org/mailman/listinfo/antville-dev |
|
|
Re: referrer spam detectionHannes Wallnoefer wrote: > Hi list, Hi, > here is a very rough referrer spam detection and blocking script I > wrote for antville.org. I think it may be useful for other big > antville installations. It's very rough in its current state, and not > at all integrated into the antville app infrastructure. It needs to be > polished and probably should be be integrated into the antville > SysMgr. <snip /> thanxs for that. I thought of writing such a "thing" (global spam detection) myself, but hadn't the time. I will testdrive your script and let you know how it works. Thanxs again cu Philipp -- XML is the ASCII for the new millenium (Cocoon Documentation) _______________________________________________ Antville-dev mailing list Antville-dev@... http://helma.org/mailman/listinfo/antville-dev |
|
|
Re: referrer spam detectionI just found there's an error in the original script that makes the
spam redirect go into an infinite loop. I'm attaching the new script with a fix. Also, the detection check thresholds changed a little bit since the last version. Actually, I think the approach I chose is not the ideal one. For instance, one person clicking through a (large) blogroll on his/her weblog looks very much like referrer spam to this script. From what I know now, I would suggest the following approach: * Track referrer hosts like my script currently does * Whenever one host is *definitely* referrer spam, automatically add it to a permanent blacklist and send the site admin a mail about it * Offer the site admin a list of referrer hosts sorted by requests/ip address ratio and let him/her manually add sites to the permanent blacklist. * A request with a referrer host that is blacklisted is redirected to the refspam page so if it's a valid request, users can still click through. If anybody is interested in implementing this you're welcome. I'm available for any questions you may have. hannes 2006/5/29, Franz Philipp Moser <philipp.moser@...>: > > Hannes Wallnoefer wrote: > > Hi list, > > Hi, > > > here is a very rough referrer spam detection and blocking script I > > wrote for antville.org. I think it may be useful for other big > > antville installations. It's very rough in its current state, and not > > at all integrated into the antville app infrastructure. It needs to be > > polished and probably should be be integrated into the antville > > SysMgr. > <snip /> > > thanxs for that. I thought of writing such a "thing" (global spam > detection) myself, but hadn't the time. > > I will testdrive your script and let you know how it works. > > Thanxs again > > cu Philipp > -- > XML is the ASCII for the new millenium > (Cocoon Documentation) > _______________________________________________ > Antville-dev mailing list > Antville-dev@... > http://helma.org/mailman/listinfo/antville-dev > _______________________________________________ Antville-dev mailing list Antville-dev@... http://helma.org/mailman/listinfo/antville-dev |
|
|
Re: referrer spam detectionHi,
I am not much of a Javascript Developer.. But is it maybe possible to also add a check against the referrerspam filterlist that is already existant in Antville? I have often thought that this would be a nice addon, because I am suffering from some special spammer that keeps "referring" links from only a few domains and usually with obvious words like 'viagra' or 'casino' in the URL. It is actually pretty easy to filter those, but they're still being tracked - which I would like to avoid aswell. On 5/29/06, Hannes Wallnoefer <hannesw@...> wrote: > I just found there's an error in the original script that makes the > spam redirect go into an infinite loop. I'm attaching the new script > with a fix. Also, the detection check thresholds changed a little bit > since the last version. > > Actually, I think the approach I chose is not the ideal one. For > instance, one person clicking through a (large) blogroll on his/her > weblog looks very much like referrer spam to this script. From what I > know now, I would suggest the following approach: > > * Track referrer hosts like my script currently does > * Whenever one host is *definitely* referrer spam, automatically add > it to a permanent blacklist and send the site admin a mail about it > * Offer the site admin a list of referrer hosts sorted by requests/ip > address ratio and let him/her manually add sites to the permanent > blacklist. > * A request with a referrer host that is blacklisted is redirected to > the refspam page so if it's a valid request, users can still click > through. > > If anybody is interested in implementing this you're welcome. I'm > available for any questions you may have. > > hannes > > 2006/5/29, Franz Philipp Moser <philipp.moser@...>: >> >> Hannes Wallnoefer wrote: >> > Hi list, >> >> Hi, >> >>> here is a very rough referrer spam detection and blocking script I >>> wrote for antville.org. I think it may be useful for other big >>> antville installations. It's very rough in its current state, and not >>> at all integrated into the antville app infrastructure. It needs to be >>> polished and probably should be be integrated into the antville >>> SysMgr. >> <snip /> >> >> thanxs for that. I thought of writing such a "thing" (global spam >> detection) myself, but hadn't the time. >> >> I will testdrive your script and let you know how it works. >> >> Thanxs again >> >> cu Philipp Antville-dev mailing list Antville-dev@... http://helma.org/mailman/listinfo/antville-dev |
|
|
Re: referrer spam detectionHi,
Hannes Wallnoefer wrote: > I just found there's an error in the original script that makes the > spam redirect go into an infinite loop. I'm attaching the new script > with a fix. Also, the detection check thresholds changed a little bit > since the last version. I started to implement some things like skins, and so on, and found some other problem. *) what to do with weblogs with their own domain. req.path doesn't work here, or am I wrong? > Actually, I think the approach I choose is not the ideal one. For > instance, one person clicking through a (large) blogroll on his/her > weblog looks very much like referrer spam to this script. From what I > know now, I would suggest the following approach: > > * Track referrer hosts like my script currently does > * Whenever one host is *definitely* referrer spam, automatically add > it to a permanent blacklist and send the site admin a mail about it > * Offer the site admin a list of referrer hosts sorted by requests/ip > address ratio and let him/her manually add sites to the permanent > blacklist. > * A request with a referrer host that is blacklisted is redirected to > the refspam page so if it's a valid request, users can still click > through. your suggestions. We also should not add hosts to the cache that are allready on the blacklist/whitelist, doesn't make sense. > If anybody is interested in implementing this you're welcome. I'm > available for any questions you may have. I am and I started some things as you can see in the attached file. Got this implementation working on our weblogs. logAcces may not be the right place for the check, but it worked good for me. > hannes <snip /> cu Philipp -- XML is the ASCII for the new millenium (Cocoon Documentation) _______________________________________________ Antville-dev mailing list Antville-dev@... http://helma.org/mailman/listinfo/antville-dev |
|
|
Re: referrer spam detectionHannes Wallnoefer wrote: <snip /> > * Track referrer hosts like my script currently does Done ;) > * Whenever one host is *definitely* referrer spam, automatically add > it to a permanent blacklist and send the site admin a mail about it Done > * Offer the site admin a list of referrer hosts sorted by requests/ip > address ratio and let him/her manually add sites to the permanent > blacklist. Done > * A request with a referrer host that is blacklisted is redirected to > the refspam page so if it's a valid request, users can still click > through. Done. Added a security feature so not every url is accepted > If anybody is interested in implementing this you're welcome. I'm > available for any questions you may have. I hope this looks like you want it. I also implemented a whitelist, manually adding/removing hosts. As I sayed I added the track function to the Global/logAccess() function. The whole thing is encapsuled in an AntiSpamRefMgr mounted on root. Added support for domains, maybe not needed for antville.org Please take a look. I tested it on our weblogs and it worked out of the box. > hannes <snip /> I wonder if app.data Objects are stored(serialized) when the app is restarted? Because we would loose the white and blacklist. cu Philipp -- XML is the ASCII for the new millenium (Cocoon Documentation) _______________________________________________ Antville-dev mailing list Antville-dev@... http://helma.org/mailman/listinfo/antville-dev |
|
|
Re: referrer spam detectionOne last question, why this redirect? We could just skip the Access entry?
cu Philipp Franz Philipp Moser wrote: <snip /> -- XML is the ASCII for the new millenium (Cocoon Documentation) _______________________________________________ Antville-dev mailing list Antville-dev@... http://helma.org/mailman/listinfo/antville-dev |
|
|
Re: referrer spam detectionI guess it saves the load of generating the HTML source. Esp. those
spammers often send a lot of requests at once and don't even bother read the answer from the webserver - so its really just a waste of cpu power that could be avoided. On 5/30/06, Franz Philipp Moser <philipp.moser@...> wrote: > One last question, why this redirect? We could just skip the Access entry? > > cu Philipp > > Franz Philipp Moser wrote: > <snip /> _______________________________________________ Antville-dev mailing list Antville-dev@... http://helma.org/mailman/listinfo/antville-dev |
|
|
Re: referrer spam detectionnighthawk wrote: > I guess it saves the load of generating the HTML source. Esp. those > spammers often send a lot of requests at once and don't even bother > read the answer from the webserver - so its really just a waste of cpu > power that could be avoided. <snip /> Ohh, sorry yes missed that ;) you are right of course. So logAccess is not the right place to redirect, it should than be onRequest() instead. Where is it integrated in antville? cu Philipp -- XML is the ASCII for the new millenium (Cocoon Documentation) _______________________________________________ Antville-dev mailing list Antville-dev@... http://helma.org/mailman/listinfo/antville-dev |
|
|
Re: referrer spam detectionI added it now to HopObject/onRequest() just after res.handlers.site is
filled and it works fine. We need res.handlers.site on our blogs, but for antville.org I think you can just add "root.refspam.track()" as the first instruction in HopObject/onRequest(). Works good thx to hns for thinking about that and finding a quick solution. I put the whole thing under GPL on my blog if anybody needs it: http://weblogs.brandnews.at/pm/stories/3808/ cu Philipp nighthawk wrote: > I guess it saves the load of generating the HTML source. Esp. those > spammers often send a lot of requests at once and don't even bother > read the answer from the webserver - so its really just a waste of cpu > power that could be avoided. > > > On 5/30/06, Franz Philipp Moser <philipp.moser@...> wrote: >> One last question, why this redirect? We could just skip the Access entry? >> >> cu Philipp -- XML is the ASCII for the new millenium (Cocoon Documentation) _______________________________________________ Antville-dev mailing list Antville-dev@... http://helma.org/mailman/listinfo/antville-dev |
|
|
Re: referrer spam detectionFranz Philipp Moser wrote: <snip /> > I put the whole thing under GPL on my blog if anybody needs it: > > http://weblogs.brandnews.at/pm/stories/3808/ <snip /> Sorry for that but I released it now, under the antville licence so everybody can use it. cu Philipp -- XML is the ASCII for the new millenium (Cocoon Documentation) _______________________________________________ Antville-dev mailing list Antville-dev@... http://helma.org/mailman/listinfo/antville-dev |
|
|
Re: referrer spam detectionHi list,
I don't know why but the implementation makes trouble. As I mentioned on the helma-user list: http://helma.org/pipermail/helma-user/2006-May/006533.html I get all the time these strange tomany open files errors from java, so I think there is something buggy. Another thing strange is that I tried to add the black/whitelist to the root object so it gets stored after a restart. First of all it doesn't get stored, and today in the morning the black/whitelist on the root object disappeared. They where simply null. Maybe I should use an other java Object to store the lists, or simply a HopObject. Can somebody help? The current version can be downloaded from my weblog: http://weblogs.brandnews.at/pm/stories/3808/ cu Philipp <snip /> -- XML is the ASCII for the new millenium (Cocoon Documentation) _______________________________________________ Antville-dev mailing list Antville-dev@... http://helma.org/mailman/listinfo/antville-dev |
|
|
Re: referrer spam detectionFor the record, here's the current and probably final (as far as I'm
concerned) version of my refspam.js file. It's been working very well for over a week on antville.org, most of the blocking being done by the blacklist, with occasional attacks from new spammers being detected quite reliably. hannes 2006/5/29, Hannes Wallnoefer <hannesw@...>: > Hi list, > > here is a very rough referrer spam detection and blocking script I > wrote for antville.org. I think it may be useful for other big > antville installations. It's very rough in its current state, and not > at all integrated into the antville app infrastructure. It needs to be > polished and probably should be be integrated into the antville > SysMgr. > > Attached you find file refspam, which provides a global object > containing two functions: Refspam.track(), which should be called as > first thing in HopObject.onRequest(), and Refspam.dump(), which should > be called from Root.refspam_action() and provides output for current > referrer blocking state and blocked requests. > > The way referrer detection and blocking works is very simple, it's > described here <http://www.henso.com/log/2006.05.28/1154/>. We keep a > least-recently-used Hashtable of size 128 in app.data.refspam which is > keyed with the host names of referrer headers we get. As soon as we > see more than 20 requests with a given referrer host, we check if the > number of IP addresses the requests came from is below a given ratio, > and if the number of referrer path names is above a certain ratio > (this is to prevent valid intranet links to be qualified as spam), and > if so, requests are redirected to the /refspam action which displays a > message and provides a link to continue to the original target. > Referrer bots won't follow the redirect, so it's a good safety net. > > The script also contains a hardcoded whiltelist for hostnames which > currently contains ".antville.org" and ".google.". The parameters and > the whitelist should probably be configurable through antville's > management interface, and there probably should also be configurable > blacklist. > > I hope this will be useful for somebody, and that somebody is going to > integrate this into the antville code base. > > hannes > > > _______________________________________________ Antville-dev mailing list Antville-dev@... http://helma.org/mailman/listinfo/antville-dev |
| Free embeddable forum powered by Nabble | Forum Help |