glxsb(4) doesn't appear to be working for me (was: AMD Geode LX Security Block)

View: New views
6 Messages — Rating Filter:   Alert me  

glxsb(4) doesn't appear to be working for me (was: AMD Geode LX Security Block)

by Greg A. Woods; Planix, Inc. :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

["Jared D. McNeill" wrote some time ago:]

>
> Ok, thanks to a bunch of helpful hints on and off list, here we go:
>
> swcrypto:
>
>   aes-128-cbc 3688.28k 4064.06k 4185.64k 4216.48k 4221.59k
>
> hwcrypto:
>
>   aes-128-cbc 372.70k 1422.76k 5098.58k 13612.23k 26804.31k

I've got NetBSD-4 running here on a PC Engines ALIX.2d3 board.

My dmesg shows:

cpu0: AMD Geode LX (586-class), 498.08 MHz, id 0x5a2
cpu0: features 88a93d<FPU,DE,PSE,TSC,MSR,CX8,SEP>
cpu0: features 88a93d<PGE,CMOV,MPC,MMX>
cpu0: "Geode(TM) Integrated Processor by AMD PCS"
cpu0: I-cache 64 KB 32B/line 16-way, D-cache 64 KB 32B/line 16-way
cpu0: L2 cache 128 KB 32B/line 4-way
cpu0: ITLB 16 4 KB entries fully associative
cpu0: DTLB 16 4 KB entries fully associative
cpu0: 8 page colors
[[....]]
glxsb0 at pci0 dev 1 function 2: revision 0: RNG AES


Open SSL seems to say what I'm told I should expect it to say:

        # openssl version
        OpenSSL 0.9.8e 23 Feb 2007

        # openssl engine -c
        (cryptodev) BSD cryptodev engine
         [RSA, DSA, DH, AES-128-CBC]
        (padlock) VIA PadLock (no-RNG, no-ACE)
        (dynamic) Dynamic engine loading support
        (4758cca) IBM 4758 CCA hardware engine support
         [RSA, RAND]
        (aep) Aep hardware engine support
         [RSA, DSA, DH]
        (atalla) Atalla hardware engine support
         [RSA, DSA, DH]
        (cswift) CryptoSwift hardware engine support
         [RSA, DSA, DH, RAND]
        (chil) CHIL hardware engine support
         [RSA, DH, RAND]
        (nuron) Nuron hardware engine support
         [RSA, DSA, DH]
        (sureware) SureWare hardware engine support
         [RSA, DSA, DH, RAND]
        (ubsec) UBSEC hardware engine support
         [RSA, DSA, DH]

However unlike Jared's report above when I run "openssl speed
aes-128-cbc" in any of various ways I never see any difference in
performance between when the crypto(4) device is enabled or disable, and
certainly I don't see the accelerated speeds Jared reported.

My best numbers from the average of 10 runs of the following command on
an idle system:

        openssl speed -multi 10 aes-128-cbc -elapsed

are:

    # sysctl -w kern.usercrypto=1

    aes-128 cbc   5303.27k   5654.65k   5722.45k   5753.15k   8364.36k

    # sysctl -w kern.usercrypto=0

    aes-128 cbc   5200.41k   5698.54k   5746.44k   5764.66k   8201.28k


FreeBSD-7 with an identical version of OpenSSL seems slightly slower
(again this is an average of 10 runs on an idle system):

    aes-128 cbc   4567.62k   5015.47k   5151.20k   5239.94k   6543.29k

(and it's supposedly got the same driver for the AMD Geode LX block
enabled too!)


What the heck am I doing wrong?  Or is something busted?  How do I
figure out what's going on with the hardware device short of adding
printfs to it?

Where are the kern.*crypt* sysctl settings documented!?!?!?!?

--
                                                Greg A. Woods
                                                Planix, Inc.

<woods@...>       +1 416 218 0099        http://www.planix.com/


attachment0 (193 bytes) Download Attachment

Re: glxsb(4) doesn't appear to be working for me (was: AMD Geode LX Security Block)

by Thor Lancelot Simon-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Oct 29, 2009 at 09:20:16PM -0400, Greg A. Woods wrote:

> ["Jared D. McNeill" wrote some time ago:]
> >
> > Ok, thanks to a bunch of helpful hints on and off list, here we go:
> >
> > swcrypto:
> >
> >   aes-128-cbc 3688.28k 4064.06k 4185.64k 4216.48k 4221.59k
> >
> > hwcrypto:
> >
> >   aes-128-cbc 372.70k 1422.76k 5098.58k 13612.23k 26804.31k
>
>
> I've got NetBSD-4 running here on a PC Engines ALIX.2d3 board.
>
> My dmesg shows:
>
> cpu0: AMD Geode LX (586-class), 498.08 MHz, id 0x5a2
> cpu0: features 88a93d<FPU,DE,PSE,TSC,MSR,CX8,SEP>
> cpu0: features 88a93d<PGE,CMOV,MPC,MMX>
> cpu0: "Geode(TM) Integrated Processor by AMD PCS"
> cpu0: I-cache 64 KB 32B/line 16-way, D-cache 64 KB 32B/line 16-way
> cpu0: L2 cache 128 KB 32B/line 4-way
> cpu0: ITLB 16 4 KB entries fully associative
> cpu0: DTLB 16 4 KB entries fully associative
> cpu0: 8 page colors
> [[....]]
> glxsb0 at pci0 dev 1 function 2: revision 0: RNG AES
>
>
> Open SSL seems to say what I'm told I should expect it to say:
>
> # openssl version
> OpenSSL 0.9.8e 23 Feb 2007
>
> # openssl engine -c
> (cryptodev) BSD cryptodev engine

You may need to explicitly specify -engine cryptodev, and note that you
will not get *any* accelleration from openssl speed for any cipher
unless you specify it as an "evp" instead of by the shortcut name:

openssl speed -engine cryptodev -elapsed -evp aes-128-cbc

FWIW, glxsb is not very efficient and the syscall overhead will just
kill you for all but very large requests.  You may see better results
with -multi 32 to get some parallelism going to hide the latency.

Thor

Re: glxsb(4) doesn't appear to be working for me (was: AMD Geode LX Security Block)

by Patrick Lamaiziere :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Le Thu, 29 Oct 2009 22:30:23 -0400,
Thor Lancelot Simon <tls@...> a écrit :

Hello,

> openssl speed -engine cryptodev -elapsed -evp aes-128-cbc

I always prefer to measure the throughput with dd and openssl enc

dd if=/dev/zero bs=1k count=100000 | openssl enc -e -aes-128-cbc -k
abcd -out /dev/null [-engine cryptodev]

here (FreeBSD 8) without cryptodev:
102400000 bytes transferred in 19.881321 secs (5150563 bytes/sec)
=> 39 MBytes/s

With cryptodev => 120 MBytes/s

> FWIW, glxsb is not very efficient and the syscall overhead will just
> kill you for all but very large requests.  You may see better results
> with -multi 32 to get some parallelism going to hide the latency.

Yes but it's not so bad IMHO. The throughput of 40 Mbytes/s (ie the
same as without glxsb on openssl) is reached very fast with requests >
256 bytes.

http://user.lamaiziere.net/patrick/glxsb-171108/glxsb-perf.pdf


While I'm here there is a small mistake in glxsb.c in NetBSD (and
OpenBSD), but this does not hurt.

#define SB_AI_AES_A_COMPLETE 0x0100
#define SB_AI_AES_B_COMPLETE 0x0200
#define SB_AI_EEPROM_COMPLETE 0x0400

Should be:
#define SB_AI_AES_A_COMPLETE   0x10000
#define SB_AI_AES_B_COMPLETE   0x20000
#define SB_AI_EEPROM_COMPLETE  0x40000

Source:
http://support.amd.com/us/Embedded_TechDocs/33234H_LX_databook.pdf
6.12.3.3 SB AES Interrupt (SB_AES_INT) (page 522)

(I've sent a bug report to OpenBSD)

Regards.

Re: glxsb(4) doesn't appear to be working for me (was: AMD Geode LX Security Block)

by Greg A. Woods; Planix, Inc. :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

At Thu, 29 Oct 2009 22:30:23 -0400, Thor Lancelot Simon <tls@...> wrote:
Subject: Re: glxsb(4) doesn't appear to be working for me (was: AMD Geode LX Security Block)
>
> You may need to explicitly specify -engine cryptodev, and note that you
> will not get *any* accelleration from openssl speed for any cipher
> unless you specify it as an "evp" instead of by the shortcut name:
>
> openssl speed -engine cryptodev -elapsed -evp aes-128-cbc

I'm not sure I understand.  None of the examples I saw on the NetBSD
lists show this (and it's not explained at all in the manual page).

It looks like the algorithm can also be given on the command line:

  openssl speed -engine cryptodev -elapsed -evp aes-128-cbc aes-128-cbc

and then the program seems to runs the test twice, once in a way that
will make use of /dev/crypto.

"-engine cryptodev" does now indeed make the huge difference I was
expecting, and I see the same kinds of stats others have posted.

I've since found similar examples using "-evp aes-128-cbc" on the
FreeBSD lists (regarding the same driver and device), as well as other
tests that make use of the device such as:

  # dd if=/dev/zero bs=4k count=100000 | \
    openssl enc -aes-128-cbc -e -out /dev/null -nosalt -k abcdefhij -engine cryptodev
  10000+0 records in
  10000+0 records out
  81920000 bytes transferred in 5.465 secs (14989935 bytes/sec)

I can also confirm that on NetBSD-4 with the native OpenSSL 0.9.8e the
"cryptodev" engine must be specified in order to make use of the device.

# for i in 1 2 3 4 5 6 7 8 9 0 ; do
        openssl speed -multi 10 -evp aes-128-cbc -elapsed 2>/dev/null | tail -1;
   done | awk '
        {n1=$1; t1+=$2; t2+=$3; t3+=$4; t4+=$5; t5+=$6;}
        END{printf("%-13s %11.2fk %11.2fk %11.2fk %11.2fk %11.2fk  (%d runs)\n",
                n1, t1/NR, t2/NR, t3/NR, t4/NR, t5/NR, NR)}'
evp                310.16k     1229.65k     4354.50k    10540.60k    62369.80k  (10 runs)

# sysctl -w kern.usercrypto=0
evp               4917.08k     5519.23k     5746.64k     5808.70k     8549.20k  (10 runs)


For comparison my Dell PE2650 2*2.4GHz HTT server gets:

evp              22753.38k    26595.67k    31588.73k    31056.11k    35666.74k  (10 runs)


> FWIW, glxsb is not very efficient and the syscall overhead will just
> kill you for all but very large requests.  You may see better results
> with -multi 32 to get some parallelism going to hide the latency.

Indeed.

Thank you very much!


--
                                                Greg A. Woods
                                                Planix, Inc.

<woods@...>       +1 416 218 0099        http://www.planix.com/


attachment0 (193 bytes) Download Attachment

Re: glxsb(4) doesn't appear to be working for me (was: AMD Geode LX Security Block)

by Thor Lancelot Simon-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Oct 30, 2009 at 12:18:00PM -0400, Greg A. Woods wrote:

> At Thu, 29 Oct 2009 22:30:23 -0400, Thor Lancelot Simon <tls@...> wrote:
> Subject: Re: glxsb(4) doesn't appear to be working for me (was: AMD Geode LX Security Block)
> >
> > You may need to explicitly specify -engine cryptodev, and note that you
> > will not get *any* accelleration from openssl speed for any cipher
> > unless you specify it as an "evp" instead of by the shortcut name:
> >
> > openssl speed -engine cryptodev -elapsed -evp aes-128-cbc
>
> I'm not sure I understand.  None of the examples I saw on the NetBSD
> lists show this (and it's not explained at all in the manual page).

I can't say why people would post wrong examples to the NetBSD lists.  I
do often wish that if people didn't know what they were talking about,
they'd pipe down already with the "helpful" advice on the lists...

I can say why the manual page is wrong: OpenSSL manual pages in general
just plain suck.

Here is what is going on: the OpenSSL "engine" interface is jammed in at
their abstract-algorithm layer (fsvo "layer") which lies between their
SSL-record-handling layer and the raw encryption routines.  This layer
is called "EVP".

The openssl 'speed' utility calls the raw encryption routines when you
tell it to do a speed test for a cipher.  So the cryptodev engine never
sees the requests.  However, it calls the EVP routines when you tell it
to do a speed test for any other kind of algorithm, such as a hash
function like MD5 or SHA!  This can be extremely confusing.

The workaround is to trick it into thinking it's testing some other
kind of block-oriented algorithm by telling it to look up the cipher
*by its EVP* which forces it to use the EVP layer, so the engine layer
sees the requests.  This is what the -evp switch on the command line
accomplishes.

Thor

Re: glxsb(4) doesn't appear to be working for me (was: AMD Geode LX Security Block)

by Thor Lancelot Simon-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Oct 30, 2009 at 04:12:26PM -0400, Thor Lancelot Simon wrote:
>
> The workaround is to trick it into thinking it's testing some other
> kind of block-oriented algorithm by telling it to look up the cipher
> *by its EVP* which forces it to use the EVP layer, so the engine layer
> sees the requests.  This is what the -evp switch on the command line
> accomplishes.

The other thing is, using the "cryptodev" engine causes most of the
actual work to be done in the kernel.  So you need -elapsed on the
openssl speed command line or you'll get false, insanely high results
because it will track only the amount of time spent in the userspace
openssl process.

When you use -multi N I think it also forces the use of -elapsed.

--
Thor Lancelot Simon                                   tls@...
    "Even experienced UNIX users occasionally enter rm *.* at the UNIX
     prompt only to realize too late that they have removed the wrong
     segment of the directory structure." - Microsoft WSS whitepaper