Hey,
the question on which make parallelism to use comes up repeatedly. However the answer usually is driven by anecdotal evidence and not by empirical data. To this end, I ran a small benchmark test to add one data point. I have no idea about confidence intervals, so somebody will have to chime in here.
Experimental setup
==================
Machine: Dell Precision T3400
CPU: Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz (2826.24-MHz 686-class CPU)
Memory: avail memory = 2063409152 (2015048K bytes)
HDD: da0: <SATA Hitachi HDP72505 GM4O> Fixed Direct Access SCSI-4 device (via AHCI)
filesystem: HAMMER v2
/usr/src: v2.5.1-77-gd894b0e
/usr/obj: flags nohistory, nullfs mount
executed command: make -j $j_level buildworld buildkernel
make levels used: 1-10
repetitions: 5
There were no other tasks performed during the tests, although Xorg, windowmaker, terminals, xmms, firefox and thunderbird were running (idling). Standard background jobs were not disabled.
Discussion
==========
The plot shows the median build time as line and the errorbars show the min/max build times. The max spike at -j4 is probably due to it running concurrently with the 3am hammer cleanup.
We can see a monotonic drop in total run time from -j1 to -j5. After that the run time plateaus. User and sys times increase at the same time, also plateauing beyond -j5. This shows that increased parallelism in make will add slightly to the total overhead (sys+user), but total run time is significantly reduced. Beyond -j ncpu+1 we can not see any improvement in run time.
A -j 2 build does not offer significant benefit over -j 1, which is not intuitive and might need some further investigation.
The -j 5 build achieves a 42% reduction in build time, respective to the -j 1 base line.
Compared to the -j 4 (i.e. -j ncpu) build, the -j 5 (i.e. -j ncpu+1) build reduces run time by an additional 5.4%. This shows that not all CPU cores can be kept busy if there is only a parallelism level of ncpu.
Conclusion
==========
I advise to run builds at -j ncpu+1 for 4-cpu systems. Until we have numbers for 2-cpu and UP systems, we can not provide conclusive advice, however I would try using -j3 for those two cases.
cheers
simon