On 2011-06-01 17.16, Christiano F. Haesbaert wrote:
> On 1 June 2011 11:01, LeviaComm Networks <NOC@...> wrote:
>> On 01-Jun-11 05:46, Benny Lofgren wrote:
>>> On 2011-05-31 14.45, Artur Grabowski wrote:
>>>> The load average is a decaying average of the number of processes in
>>>> the runnable state or currently running on a cpu or in the process of
>>>> being forked or that have spent less than a second in a sleep state
>>>> with sleep priority lower than PZERO, which includes waiting for
>>>> memory resources, disk I/O, filesystem locks and a bunch of other
>>>> things. You could say it's a very vague estimate of how much work the
>>>> cpu might need to be doing soon, maybe. Or it could be completely
>>>> wrong because of sampling bias. It's not very important so it's not
>>>> really critical for the system to do a good job guessing this number,
>>>> so the system doesn't really try too hard.
>>>> This number may tell you something useful, or it might be totally
>>>> misleading. Or both.
>>> One thing that often bites me in the butt is that cron relies on the
>>> load average to decide if it should let batch(1) jobs run or not.
>>> The default is if cron sees a loadavg> 1.5 it keeps the batch job
>>> enqueued until it drops below that value. As I often see much, much
>>> higher loads on my systems, invariably I find myself wondering why my
>>> batch jobs never finish, just to discover that they have yet to run.
>>> So whenever I remember to, on every new system I set up I configure a
>>> different load threshold value for cron. But I tend to forget, so...
>>> I have no really good suggestion for how else cron should handle this,
>>> otherwise I would have submitted a patch ages ago...
>> I had tinkered with a solution for this:
>> Cron wakes up a minute before the batch run is scheduled to run. Cron will
>> then copy a random 4kb sector from the hard disk to RAM, then run either an
>> MD5 or SHA hash against it. The whole process would be timed and if it
>> completed within a a reasonable amount of time for the system then it would
>> kick off a batch job
>> This was the easiest way I thought of measuring the actual performance of
>> the system at any given time since it measures the entire system and
>> emulates actual work.
>> While this isn't really the right thing to do, I found it to be the most
>> effective on my systems.
> You really think cron should be doing it's own calculation ? I don't
> like that *at all*.
> Can't we just have a higher default threshold for cron ?
> Can't we default to 0 ?
> I think this is something that should be looked up, if we admit load
> average is a shitty measure, we shouldn't rely on it for running cron
> I hereby vote for default to 0. (Thank god this isn't a democracy :-) )
I didn't really like Christophers suggestion either.
For one thing, *any* kind of attempt at userland performance measurement
will over time (as hardware gets faster) become less accurate to the
point of not being usable unless tuned, and we really DON'T want to have
to tune cron (or anything else userland for that matter) for different
architectures and/or generations of systems.
Also, what kind of metric should cron measure? What if the batch job is
CPU-bound only, but will take two weeks to run and it's simply most
convenient to start it using batch(1)? Or if the second batch job is i/o
bound and doesn't get to run because I just started up the two-week CPU
bound job and cron only measures that?
In fact I really don't feel the load average is such a bad metric for
cron to use, it's just that the default was probably set a millenia
ago and hasn't changed since then.
Easiest is to set the default to 0.0 as you suggest, disabling the
feature altogether, more complicated but perhaps better in this world
of multi-core systems might be to set it = number of cores.
(Which also reminds me, sendmail have a similar feature using load
average, which have also bugged me from time to time. Might be others
as well, but none come to mind right now.)