short log with dcc

View: New views
7 Messages — Rating Filter:   Alert me  

short log with dcc

by Bokhan Artem-2 :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message

Hello.

I want to find out, if there a way (may be dirty one) to log to file or
syslog "email_address message-id checksum_type checksum" fields of
messages, passed through dccm+dccd, without logging the whole body?

With help of feedback from users ("this is spam" button) I want to use
this log to find and mark messages (which are already sent to user
mailboxes) with spam flag.

If there is no standard way, could anybody point me the best place (may
be variables names) I could inject my own code into? Any other help is
also appreciated!
_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc

Re: short log with dcc

by Vernon Schryver :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message

> From: Artem Bokhan <aptem@...>

> I want to find out, if there a way (may be dirty one) to log to file or
> syslog "email_address message-id checksum_type checksum" fields of
> messages, passed through dccm+dccd, without logging the whole body?
>
> With help of feedback from users ("this is spam" button) I want to use
> this log to find and mark messages (which are already sent to user
> mailboxes) with spam flag.
>
> If there is no standard way, could anybody point me the best place (may
> be variables names) I could inject my own code into? Any other help is
> also appreciated!

What is the purpose of not logging the entire message body?  Are you
trying to minimize disk space used for log files or are there privacy
issues?  Building dccm with `./configure --with-max-log-size=1` would
limit log files to 1 KByte of message body.

For a "this is spam" button, I would use something like the "this is
not spam; stop greylist" button in proof-of-concept cgi scripts in the
DCC source.  That mechanism feeds checksum lines from log files to the
dccsight program.

Note that message-IDs are not a reliable key for incoming mail
messages.  Not only does plenty of spam lack message-ID headers,
but so does mail from systems using qmail.  If you use dccm+sendmail,
your users will see message-ID headers in all mail, but only
because sendmail will have added them.  Because sendmail adds the
message-ID headers after dccm sees the message, they will not be in
dccm log files.

Note also that sendmail IDs in syslog are mostly distinct from SMTP
message-IDs.


Vernon Schryver    vjs@...
_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc

Re: short log with dcc

by Bokhan Artem-2 :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message


> What is the purpose of not logging the entire message body?
> Are you trying to minimize disk space used for log files or are there privacy issues?  
> Building dccm with `./configure --with-max-log-size=1` would
> limit log files to 1 KByte of message body.
>  
The reason is the waste of resources, servers are quite busy with email
traffic.
Writing files to disk is expensive (all stuff is in memory now, no any
disk i/o),
writing files into memory and frequent postprocessing them with script
is an alternative,
but it does not look elegant and needs more memory.
> For a "this is spam" button, I would use something like the "this is
> not spam; stop greylist" button in proof-of-concept cgi scripts in the
> DCC source.  That mechanism feeds checksum lines from log files to the
> dccsight program.
>  
I will look, thanks.
> Note that message-IDs are not a reliable key for incoming mail
> messages.  Not only does plenty of spam lack message-ID headers,
> but so does mail from systems using qmail.
I understand that. Did not know about qmail.
>  If you use dccm+sendmail,
>  
I use postfix+dccm, I do not know yet when postfix writes message-id,
before or after milter.
I do not see any other appropriate keys. Probably, I could create one
with milter before dccm. Probably, the dcc checksum could be the key itself.
>
> Vernon Schryver    vjs@...
>  
Any advice about code hook place?

_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc

Re: short log with dcc

by Vernon Schryver :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message

> From: Bokhan Artem <APTEM@...>

> > Building dccm with `./configure --with-max-log-size=1` would
> > limit log files to 1 KByte of message body.
> >  
> The reason is the waste of resources, servers are quite busy with email
> traffic.

I don't think you have a local DCC server, and you have not attracted
attention by using the public DCC servers to more than 100K msgs/day.
Therefore it seems likely that your mail systems are handling
fewer than 200K messages per day.

20 years ago 200K msgs/day was a big deal.  (I'll spare you war stories
of days when computers and networks were 1000 times and more slower.)
Today 200K msgs/day is not trivial, but not worth mentioning.  I now run
spam traps that feed 30K spam/day through sendmail+dccm in about 1% of
a cheap computer.
If your mail system is quite busy with less than 200K msgs/day, it might
pay to look at your other spam filters that use lots of CPU cycles
such as DNSBLs, ClamAV, and SpamAssassin.  


> Writing files to disk is expensive (all stuff is in memory now, no any
> disk i/o),
> writing files into memory and frequent postprocessing them with script
> is an alternative,
> but it does not look elegant and needs more memory.

If you don't have spare resources to write a 4K Byte log file, then you
surely do not have the larger resources needed to fork(), exec(), parse,
and run a script.  Just creating the u area and the stack for the new
process for the script probably involves more than 4KBytes of I/O (of
course generally not to the disk).

It is likely that there is no difference between writing a new log file
of 100 bytes and writing a new log file of 4 KBytes, whether you
use a memory file system or classic disk.
Both will use at most data block and the same amount of inode and
indirect I/O in a classic filesystem.  In a journaling filesystem, you
are also unlikely to be able to measure a difference between 100 bytes
and 4 KBytes.

Yes, I've encountered byte copy issues, bus occupancy, cache thrashing,
and other issues.  However, they don't apply to the relatively small
amounts of data handled even by a busy mail system.


> >  If you use dccm+sendmail,
> >  
> I use postfix+dccm, I do not know yet when postfix writes message-id,
> before or after milter.

How are you using postfix+dccm?  That last time I checked, I found
that the postfix milter interface incompatible with the sendmail milter
interface as far as dccm is concerned.

Why not use postfix with dccifd as a before-queue filter?  That's
the recommended DCC configuration with postfix.


> Any advice about code hook place?

The best thing about open source is that you can read the source and
make needed changes.  That is also the worst thing about open source.
People with much experience try to make as few changes as if the source
were secret.  One reason is that local changes break the warrenty; admit
that you've changed the code and you'll find that any and all problems
you encounter are blamed on your changes.  Another reason is that
integrating local changes into the next version, the version after that,
and the version after that, and so on is no fun at all after you've
done it a few times.

Over the decades, I've accumulated a big box of tools to make it easier
to port my improvements to successive versions other people's programs.
However, my most powerful and most often used tool today is resisting
the urge to make changes.
I predict that if you do change dccm, then in 6 months or a year from
now you or your successor will discard those changes and probably stop
using DCC.  But of course, no one few who not been on the open source
merrygoround for decades sees it that way.


Vernon Schryver    vjs@...
_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc

Re: short log with dcc

by Bokhan Artem-2 :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message

Vernon Schryver пишет:
> I don't think you have a local DCC server, and you have not attracted
> attention by using the public DCC servers to more than 100K msgs/day.
> Therefore it seems likely that your mail systems are handling
> fewer than 200K messages per day.
>  
At the moment dcc is used for outgoing traffic only with local dcc server.
Incoming traffic averages per day are: 10M of recipients,  4.5 M
connections, 400K messages are passed to mailboxes.
I do not use global DCC servers because commercial filter does
checksum-based filtering job and does it well.
But we have special type of spam oriented only for our users, it is the
reason I started the topic.

>> Writing files to disk is expensive (all stuff is in memory now, no any
>> disk i/o),
>> writing files into memory and frequent postprocessing them with script
>> is an alternative,
>> but it does not look elegant and needs more memory.
>>    
>
> If you don't have spare resources to write a 4K Byte log file, then you
> surely do not have the larger resources needed to fork(), exec(), parse,
> and run a script.
Script does batch job, so everything is not so bad as you said.
>  Just creating the u area and the stack for the new
> process for the script probably involves more than 4KBytes of I/O (of
> course generally not to the disk).
>
> It is likely that there is no difference between writing a new log file
> of 100 bytes and writing a new log file of 4 KBytes, whether you
> use a memory file system or classic disk.
>  
Writes are buffered, so I believe short log is about 4k/100=40 times faster.
> How are you using postfix+dccm?  That last time I checked, I found
> that the postfix milter interface incompatible with the sendmail milter
> interface as far as dccm is concerned.
>  
With current versions of postfix I tried a lot of different milters,
they all work as they should.
The only difference is you should always use extended smtp codes for
replies.
> Why not use postfix with dccifd as a before-queue filter?  That's
> the recommended DCC configuration with postfix.
>
>  
Milter is before-queue too. With milter it is easier to track
connections as all log records for particular connection always has the
same ID (inode name). Also it is easier to manage system because all
other filters are milters too.

 > I predict that if you do change dccm, then in 6 months or a year from
now you or your successor will discard those changes and probably stop
using DCC.

That is not my case, sorry :)


_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc

Re: short log with dcc

by Vernon Schryver :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message

> From: Bokhan Artem <APTEM@...>

> At the moment dcc is used for outgoing traffic only with local dcc server=
> =2E
> Incoming traffic averages per day are: 10M of recipients,  4.5 M=20
> connections, 400K messages are passed to mailboxes.
> I do not use global DCC servers because commercial filter does=20
> checksum-based filtering job and does it well.
> But we have special type of spam oriented only for our users, it is the=20
> reason I started the topic.

The license on the free version of the DCC software clearly requires
that you share the DCC checksums you compute with the rest of the
world with these words:

 * This agreement is not applicable to any entity which sells anti-spam
 * solutions to others or provides an anti-spam solution as part of a
 * security solution sold to other entities, or to a private network
 * which employs the DCC or uses data provided by operation of the DCC
 * but does not provide corresponding data to other users.

Because you are not sharing the checksums of the spam sent by your
users, you are violating the license on the free DCC source.  Please
stop using the DCC software.


Vernon Schryver    vjs@...
_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc

Re: short log with dcc

by Bokhan Artem-2 :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message

Vernon Schryver пишет:
> Because you are not sharing the checksums of the spam sent by your
> users, you are violating the license on the free DCC source.  Please
> stop using the DCC software.
>
>  
The system is in "proof of concept" stage now. And your behavior does
not look friendly. Instead of asking to share checksums you are asking
to stop using DCC. Probably, this mail list is not the place where
people are trying to help each other. Sorry if I caused inconvenience.

> Vernon Schryver    vjs@...
> _______________________________________________
> DCC mailing list      DCC@...
> http://www.rhyolite.com/mailman/listinfo/dcc
>  


_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc