HPL low performance result.

View: New views
16 Messages — Rating Filter:   Alert me  

HPL low performance result.

by martin cech :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dear Everyone,
I used Pelican HPC to make my own cluster. I had 5 nodes including frontnode. Its all the same PCs Intel Pentium core2duo 1.8 GHz with 1024MB RAM connected throw 100Mbps 5port switch. I tried to run HPL on first node with default setting in HPL.dat, and result was 2,476 Gfpops (7,2 sec). Then consecutively conected one more node = 2,34 Gflops (7.71 sec). With third node 1,893Gflops (9,3 sec), fourth 1,88Gflops (9,56 sec) and finally whole 5 nodes cluster was 1,893 (9,52 sec). The performance is decreesing with more added nodes  :(  Do you have any solutions for this?



Best Regards Martin Cech.

Re: HPL low performance result.

by Michael Creel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

The benchmark requires tuning to get good numbers. I really don't recall what the default tuning is on the released versions, and I don't make any effort to ensure that the results will be good. Please see the forum post http://www.nabble.com/How-to-get-big-numbers-with-the-HPL-benchmark-td19685268.html for more information.

If you or anyone else comes up with a good tuning, I'd be happy to make it the default.

Cheers, M.

Re: HPL low performance result.

by martin cech :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thank you for answer, now its better. I will try tuning HPL.dat and then send you some results. I get 12Gflops on 8 machines.

One other question is: is possible to use octave mpitb for benchmarking and run it on cluster (multiple machines)?

Re: HPL low performance result.

by Michael Creel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sure you can use mpitb for benchmarking. I have a few academic papers that do just this, see below. The MPITB site has additional references. On Pelican, after setting up the cluster,  if you open a terminal, enter octave, and type "parallel_performance" you'll get some results that could be used to make a simple benchmark.

I strongly encourage you to see the MPITB page for a broader perspective - my own work is biased towards certain types of models and is certainly not representative of the general nature of applications of MPITB for Octave. I only cite the papers as examples that can give clues about how things can be done.

AUTORES M. Creel
TÍTULO: Using Parallelization to Solve a Macroeconomic Model: A Parallel Parameterized 
Expectations Algorithm
Computational Economics, 2008, 32(4), pp. 343­352.
CLAVE: A


AUTORES M. Creel
TÍTULO: I ran four million probits last night: HPC clustering with ParallelKnoppix
REF.: (2007) Journal of Applied Econometrics,  22, (1), 215­223
CLAVE: A


AUTORES M. Creel 
TÍTULO:  User­Friendly Parallel Computations with Econometric Examples
REF.: (2005) Computational Economics, 26, (2), 107­128  Computational Economics, 2005, vol. 26, 
issue 2, pages 107­128
CLAVE: A

Re: HPL low performance result.

by martin cech :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dear Michael,
I tried to run parallel_performance in octave on single computer and then on 3nodes cluster, but the result was the same. (about 44s on 1 node and 22s on 2 nodes.) Only one core was fully used and then both cores were computing, but no more cores in cluster. Are there some imoportant parameters for execution and get better results?

Re: HPL low performance result.

by Michael Creel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

While in octave, type "edit parallel_performance". At the bottom of the file you'll see

# loop over several cluster sizes
printf("Sample size: %d burnin: %d  maxiters %d\n", T, burnin, maxiters);
for nodes = 0:1
        pea_args{6} = nodes;
        pea(model, model_params, exp_model, exp_params, pea_args);
endfor



Just edit the line "for nodes = 0:1" to increase the maximum number of nodes.



Re: HPL low performance result.

by martin cech :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dear Michael,
there are some results of my measurements measurements results. Can you look at this and tell me why is there quite diferent in total performance on 32b and 64b, especially on low number of used PC (about 60%).

Thanks Martin

Re: HPL low performance result.

by martin cech :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

results ones angain HERE

Re: HPL low performance result.

by Michael Creel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Martin,
Interesting results, I'm especially glad to see  the results for PEA using MPITB - that's a real world problem, and seeing a good speedup there using MPITB and GNU Octave is something that I believe will interest people.

Why 64 bits is faster than 32 bits? I don't worry about that, I just use 64 bit Linux for all my work. I'm sure that an explanation is somewhere out there on the Internet. With modern CPUs, almost everyone should be using the 64 bit version. I don't understand why the 32 bit version gets downloaded more than the 64 bit version - either people don't realize that they can use the 64 bit version, or there are a lot of people clustering old computers (which is a waste of money, except for possible educational benefits).

Are you going to publish this work somewhere? I'm sure that the developer of MPITB would like to know about your results.

Re: HPL low performance result.

by martin cech :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Michael, I am writing diploma thesis which is focused to computer clustering. I will present this results there. If is it usefull for MPITB developers, of course you can send these result to them.

Do you know somebody how does similar tests with HPL or MPITB? I would like to compare my results with somebody. I tried contact mukarram (by email) but threre was no answer :(

Martin

Re: HPL low performance result.

by Michael Creel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

The MPITB page is at http://atc.ugr.es/javier-bin/mpitb. You'll also find references to other work that uses MPITB there (http://atc.ugr.es/~javier/investigacion/papers/mpitb_octave_papers.html).  One of the papers listed there is by myself, and benchmarks the PEA similarly to what you have done.

Please note "Please, use the ICCS'06 conference paper below to cite MPITB for Octave. Thanks!" at the top of the MPITB papers page. You should definitely cite that paper - keeping projects like MPITB (and PelicanHPC) going requires continued funding, and citations help a lot.

Re: HPL low performance result.

by martin cech () :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

yes I will note ICCS'06 conference and others. Is it possible download some of these documents for students purpose for free?

Re: HPL low performance result.

by martin cech :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thank you for material you sent me Michael. Can I ask you what version of BLAS, ATLAS, MPI are you using for pelican HPC ? I found (you probably know it) optimized BLAS library called GOTO BLAS. With this library you should get better performance in HPL benchmark. http://www.tacc.utexas.edu/general/staff/goto/

I got some email from HPL developers, after I sent them my results. They say that decrease of performance is probably because, 100 Mbps ethernet network has not big enough wideband (throughput), for theese processors Intel C2D.

Re: HPL low performance result.

by Michael Creel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Martin,
To get the versions of packages, "type dpkg -l atlas*" while running Pelican, and it will tell you the version. The package name for MPI is "openmpi". I'm not sure if BLAS is used, since ATLAS is installed. When making an image, I use whatever version is in Debian at the time.

I have heard of GOTO BLAS, but I believe that this has a non-free license.

I agree that network latency and bandwidth are probably the reason for the HPL results. HPL is used to test the performance of top level supercomputers, so it has to be sensitive to these things, and it's not surprising that lowly 100Mb/s ethernet drags things down. I really don't worry at all about HPL, because it tests things that one would not expect Pelican to do well on, while running on commodity hardware and run of the mill networking. I put a lot more weight on benchmarks like the one you did using parallel_performance.m. Those are a lot more representative of a real-world situation, and they show that good speedups are possible. HPL on Pelican provides a well-known example that can be used simply to show that the cluster is working, albeit not too well by Top500 standards.

Cheers, M.

Re: HPL low performance result.

by martin cech :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Michael,
I tried HPL and OCTAVE parallel_performance on 1Gbps Ethernet LAN. There are changes for HPL. Decrease of performance is not so critical. Efficiency is going from 72% (1 node= 2 CPUs) to 54% (25 nodes = 50 CPUs). But there is no different for MPITB, times are very similar. I can explain it that program parallel_performance do not need to communicate throw network a lot and then 100Mbps is good enough. Is it correct?

Regards Martin

Re: HPL low performance result.

by Michael Creel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Martin,
For HPL, good, that seems to be the expected behavior. For MPITB, I'm surprised that there is no improvement, That benchmark does include internode communication. I guess that with T=200000, internode communication is unimportant with respect to the pure number crunching, so there is little difference. For smaller values of T, I would guess that you would see a difference depending on the network bandwidth. Latency is also important. Possibly the latency of the 1GB/s network is high enough that the increased bandwidth gives little benefit.

Thanks for the information, I will definitely want to read your paper when it's done.
Michael