|
View:
New views
10 Messages
—
Rating Filter:
Alert me
|
|
|
The "clean out spam from archives" effort is laggingAs one can see on http://wiki.debian.org/DebianInstaller/SpamClean,
this effort initiated by Frans back in April is lagging. Last 3 months of debian-boot archives have been reviewed by 3 persons only (Frans, Giacomo Catenazzi and me) and are thus missing at least two more people to review them so that spams are nominated...and can later be processed in the cleaning second step. Old archives are also missing reviews, particularly a few from 2005 and nearly all from 2004, not to mention older archives. Please take some time to do this work. This is not that time consuming: one month can be reviewed in about 10-15 minutes....even less when you're used to methods for spotting spams. -- |
|
|
Re: The "clean out spam from archives" effort is laggingOn Sun, Nov 1, 2009 at 10:02 AM, Christian Perrier <bubulle@...> wrote:
> As one can see on http://wiki.debian.org/DebianInstaller/SpamClean, > this effort initiated by Frans back in April is lagging. > > Last 3 months of debian-boot archives have been reviewed by 3 persons > only (Frans, Giacomo Catenazzi and me) and are thus missing at least > two more people to review them so that spams are nominated...and can > later be processed in the cleaning second step. I did the most recent three months of 2009, but the density was pretty low. > Old archives are also missing reviews, particularly a few from 2005 > and nearly all from 2004, not to mention older archives. So I started at the beginning (part of 1998) and went to the end of 2002. If I have time this week I will look at 2003-2005. > Please take some time to do this work. This is not that time > consuming: one month can be reviewed in about 10-15 minutes....even > less when you're used to methods for spotting spams. The work is pretty tedious and reviewing non-spam emails five time is extremely inefficient. Consider a solution that would allow one person to scan the archive to generate a list of spam targets. If the other four reviewers only had to review the listed spam candidates they would not have to waste their time reviewing non-spam. -- Lee -- To UNSUBSCRIBE, email to debian-boot-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: The "clean out spam from archives" effort is laggingQuoting Lee Winter (lee.j.i.winter@...):
> I did the most recent three months of 2009, but the density was pretty low. I haven't checked the wiki and I'm not online right now, but please take care to register this in the page. > > > Old archives are also missing reviews, particularly a few from 2005 > > and nearly all from 2004, not to mention older archives. > > So I started at the beginning (part of 1998) and went to the end of > 2002. If I have time this week I will look at 2003-2005. Ditto. > > Please take some time to do this work. This is not that time > > consuming: one month can be reviewed in about 10-15 minutes....even > > less when you're used to methods for spotting spams. > > The work is pretty tedious and reviewing non-spam emails five time is > extremely inefficient. Consider a solution that would allow one > person to scan the archive to generate a list of spam targets. If the > other four reviewers only had to review the listed spam candidates > they would not have to waste their time reviewing non-spam. I'm sure the listmasters would welcome such improvements but, well, we already have a very good tool. Also, restricting the list to what the first person has identified would increase the risk of missing some spams. When I worked on the entire archive, I finally dropped the web interface and used an alternative method: - download the list archives as mailboxes - pass them through my CRM114 spam filter - open them in my MUA (mutt) - tag spam messages (being processed by CRM114, most spams are already identified by CRM114 markers) - bounce them to the spam report mail addresse (report-listspam@...) with the following key macro: macro index \eL "breport-listspam@...\no\nq" "report as spam to Debian lists" I found this much more efficient. Downloading list archives as mailboxes is only accessible to Debian developers but I can provide them to people who might need them. |
|
|
Re: The "clean out spam from archives" effort is laggingOn Mon, Nov 2, 2009 at 1:01 AM, Christian Perrier <bubulle@...> wrote:
> Quoting Lee Winter (lee.j.i.winter@...): > >> I did the most recent three months of 2009, but the density was pretty low. > > I haven't checked the wiki and I'm not online right now, but please > take care to register this in the page. I am a little hesitant to edit the page because I don't understand the process and found no doc or howto. > >> >> > Old archives are also missing reviews, particularly a few from 2005 >> > and nearly all from 2004, not to mention older archives. >> >> So I started at the beginning (part of 1998) and went to the end of >> 2002. If I have time this week I will look at 2003-2005. > > Ditto. > >> > Please take some time to do this work. This is not that time >> > consuming: one month can be reviewed in about 10-15 minutes....even >> > less when you're used to methods for spotting spams. >> >> The work is pretty tedious and reviewing non-spam emails five time is >> extremely inefficient. Consider a solution that would allow one >> person to scan the archive to generate a list of spam targets. If the >> other four reviewers only had to review the listed spam candidates >> they would not have to waste their time reviewing non-spam. > > I'm sure the listmasters would welcome such improvements but, well, we > already have a very good tool. > > Also, restricting the list to what the first person has identified > would increase the risk of missing some spams. > > When I worked on the entire archive, I finally dropped the web > interface and used an alternative method: > > - download the list archives as mailboxes > - pass them through my CRM114 spam filter > - open them in my MUA (mutt) > - tag spam messages (being processed by CRM114, most spams are already > identified by CRM114 markers) > - bounce them to the spam report mail addresse > (report-listspam@...) with the following key macro: > > macro index \eL "breport-listspam@...\no\nq" "report as spam to Debian lists" > > I found this much more efficient. Sounds like the beginning/foundation of an automation script. If the candidates can be found mechanically, then there is a potential tradeoff available. We have 11 years = 132 months; times 5 reviewers = 660 reviewer-months. At 10-15 min each that is 110-165 man-hours. That's a lot of manual effort. Just how important are the last few messages that would make it through a (purposfully loose) mechanical filter? If the whole mess could be 98% cleaned up with say, 5 man-hours then it would be a tremendous efficiency improvement. > Downloading list archives as mailboxes is only accessible to Debian > developers but I can provide them to people who might need them. In the '80s I spent a lot of time doing natural language processing software, so I may be more tuned up than the typical reviewer. But I find it more efficient to review the author/subject/thread indicies and inspect message content only to confirm the presence of spam in a suspect message. So offline access to the archive would not help me. -- Lee -- To UNSUBSCRIBE, email to debian-boot-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: The "clean out spam from archives" effort is laggingQuoting Lee Winter (lee.j.i.winter@...):
> > I haven't checked the wiki and I'm not online right now, but please > > take care to register this in the page. > > I am a little hesitant to edit the page because I don't understand the > process and found no doc or howto. Well, that's a wiki, soso basically enter edit mode et make the required changes. > > macro index \eL "breport-listspam@...\no\nq" "report as spam to Debian lists" > > > > I found this much more efficient. > > Sounds like the beginning/foundation of an automation script. If the > candidates can be found mechanically, then there is a potential > tradeoff available. We have 11 years = 132 months; times 5 reviewers > = 660 reviewer-months. At 10-15 min each that is 110-165 man-hours. > That's a lot of manual effort. We have done a big part of the effort already. The main point is that automated recognition is not reliable enough and manual review is still needed... > Just how important are the last few messages that would make it > through a (purposfully loose) mechanical filter? If the whole mess > could be 98% cleaned up with say, 5 man-hours then it would be a > tremendous efficiency improvement. If someonen is considering investing some time on this, maybe. However, I'm not sure we'll find such volunteer. Please also note that processing the current traffic that flows through the list is even easier: if a few people just commi tthemselves to bounce to the reporting address every spam they find in debian-boot while they read the list...then processing the incoming traffic is just done on the fly. For instance, when I registered that I "processed" October 2009, I actually just record that during the entire month I bounce every incoming spam mail in the list to the spam reporting address. -- |
|
|
Re: The "clean out spam from archives" effort is laggingOn Mon, Nov 2, 2009 at 1:01 AM, Christian Perrier <bubulle@...> wrote:
> I haven't checked the wiki and I'm not online right now, but please > take care to register this in the page. Done. -- To UNSUBSCRIBE, email to debian-boot-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
Re: The "clean out spam from archives" effort is laggingHi,
Christian Perrier <bubulle@...> wrote: > As one can see on http://wiki.debian.org/DebianInstaller/SpamClean, > this effort initiated by Frans back in April is lagging. > > Last 3 months of debian-boot archives have been reviewed by 3 persons > only (Frans, Giacomo Catenazzi and me) and are thus missing at least > two more people to review them so that spams are nominated...and can > later be processed in the cleaning second step. October is reviewed by 5 now. I will work on the other targets soon. Lee: when you change the wiki page for this to add your name to a month, please remember to increase the number of reviewers for that month, too (that's the number in the second column). (I already did that for the entries you made until now) -- = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Created with Sylpheed 2.5.0 under DEBIAN GNU/LINUX 5.0.0 - L e n n y Registered LinuxUser #311290 - http://counter.li.org/ = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = -- To UNSUBSCRIBE, email to debian-boot-REQUEST@... with a subject of "unsubscribe". Trouble? Contact listmaster@... |
|
|
The "clean out spam from archives" effort is *no longer* laggingQuoting Christian Perrier (bubulle@...):
> As one can see on http://wiki.debian.org/DebianInstaller/SpamClean, > this effort initiated by Frans back in April is lagging. Wow. After this mail and one week of work, I found 447 proposed spams to review this morning, after the weekly script run (this script collects signalled spam and, for those that have ben signalled at least 5 times, it adds them to a list of spams to review). So, now, the DDs of us have to review those messages and confirm that they're spam (I confirmed *all* of them!). It needs at least 3 people to do this for the messages to be really removed at the next run of the weekly script. I guess that Frans will do such review so it needs only another DD to do it so that we have more than 400 spams removed from the archive next Sunday. Congratulations to all people who worked on this. Keep up with the good work! |
|
|
The "clean out spam from archives" effort is *no longer* lagging (UPDATE)Quoting Christian Perrier (bubulle@...):
> I guess that Frans will do such review so it needs only another DD to > do it so that we have more than 400 spams removed from the archive > next Sunday. And that was apparently done. This Sunday, 364 more spam mails were removed, so the total number of removed posts is now 3986. See statistics at the bottom of http://wiki.debian.org/DebianInstaller/SpamClean 208 more spam "nominations" were considered and are proposed this week to reviewers (I already did my review). Thanks again to those people who resumed that work. We're now not that far from being able to say "we reviewed the entire archive of debian-boot and had NNNN spams removed". Kudos! |
|
|
"Clean out spam from archives" : November 22nd updateThis week, some more report work happened, though a little bit less
actively than last week. As a consequence, when I ran my review step today, I "only" had 7 more nominated posts to review. I suspect this is because a few months listed in http://wiki.debian.org/DebianInstaller/SpamClean are missing one or two people to process them. The bump that happened when Lee entered the game is slowing down as we need people *other than him* to review the list archives, now (particularly the old months). Frans, maybe consider looking at the 2001 archives? You've bene very busy in doing coding last weeks so I'm reluctant to distract you with this... Franklin, Giacomo, maybe? Holger Wansing can't do more as he did process everything or nearly everything... Concerning the final removal step, this week saw a great bump, which is, as expected, the result of the effort during the week before. 253 spam mails were thus removed from the archive. It means that the "review by a DD step" seems to be working nearly nominally. I'm doing such reviews. I suspect that Frans is, also. And, apparently, a third DD is doing DD reviews as well (Bastian?). Still, there are about 60 mails that I did review the week before that weren't removed. So I think that one of the other 2 DD's didn't review all what (s)he had to review. No shame for this, of course..:-) Again, please continue the good work. We're close to announce that we cleaned out the list archive entirely (or as entirely as we can do it with semi-manual methods). |
| Free embeddable forum powered by Nabble | Forum Help |