OpenALsoft CPU usage?

View: New views
7 Messages — Rating Filter:   Alert me  

OpenALsoft CPU usage?

by Christian Ohm :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

In-Reply-To=34c20c270905251027k1986d4a1r38552d72f7e0ecdf%40mail.gmail.com

> Interesting. I wouldn't have expected OpenAL Soft to create such a large
> difference between versions. Were these tests done on the same system with
> the same configuration? What is the system's and OpenAL Soft's configuration?
> The more info you can provide, the better. :)

Hello,

I did the tests mentioned before. I repeated them after updating to ALSA
1.0.20, the results are below (just copied the test notes, I hope it is clear
enough). I'll also attach my OpenAL config, though I think it's just the
defaults.

I also bisected to find what caused the difference between 1.5 and 1.6. The
smaller differences are not that easy to find, since the results vary some
percentage points depending on the in-game action. I can try to isolate those
as well, but that needs some more time (need to do several test runs per
version to get a good average, and a good cut-off point).

If you need some more info, or want me to do other tests (of a more predictable
nature perhaps), feel free to ask.

Best regards,
Christian Ohm


System:
Athlon X2 5000+ (2x2.6GHz)
8GB Ram, Radeon X800GTO
M-Audio Audiophile 24/96

Debian unstable 64bit

Kernel: homemade 2.6.29.4

gcc (Debian 4.3.3-10) 4.3.3

libasound2: 1.0.20-1, added dmix to the default config for the Audiophile

Recompiled for the debug symbols with "./configure && make" from "apt-get source libasound2"
build-deps of libasound2 but NOT installed:
gcc-4.3-multilib gcc-multilib libcxxtools-dev libcxxtools6 python2.4 python2.4-dev python2.4-minimal

Warzone 2100: branches/2.2, r7548, built with --disable-debug
640x480 windowed, music disabled
http://www.filefactory.com/file/ag148d4/n/autogame2_7z
sleep 300 && killall warzone2100 & LD_PRELOAD="/tmp/openal-soft-1.7.411/libopenal.so" src/warzone2100 --savegame=autogame2.gam
Sticky the window (fvwm), press space

OpenAL Soft archives from the website, compiled with "cmake . && make"

Results from oprofile with per-thread profiles via "opreport -l -g -m all
.../warzone2100" (only one run per version).

1.7:
samples  %        linenr info                 app name                 symbol name
59265    64.0232  ALu.c:895                   libopenal.so.1.7.411     aluMixData
1978      2.1368  astar.c:203                 warzone2100              fpathCompare
1315      1.4206  pcm_dmix_x86_64.h:132       libasound.so.2.0.0       mix_areas_32_smp
1157      1.2499  visibility.c:134            warzone2100              rayTerrainCallback
805       0.8696  mapgrid.c:264               warzone2100              gridIterate

1.6:
samples  %        linenr info                 app name                 symbol name
44039    42.5263  ALu.c:619                   libopenal.so.1.6.372     aluMixData
4106      3.9650  astar.c:203                 warzone2100              fpathCompare
2040      1.9699  pcm_dmix_x86_64.h:132       libasound.so.2.0.0       mix_areas_32_smp
1971      1.9033  visibility.c:134            warzone2100              rayTerrainCallback
1631      1.5750  r300_emit.c:345             r300_dri.so              r300EmitArrays

1.5:
samples  %        linenr info                 app name                 symbol name
15733    20.5776  ALu.c:628                   libopenal.so.1.5.304     aluMixData
3698      4.8367  astar.c:203                 warzone2100              fpathCompare
2244      2.9350  pcm_dmix_x86_64.h:132       libasound.so.2.0.0       mix_areas_32_smp
1840      2.4066  visibility.c:134            warzone2100              rayTerrainCallback
1819      2.3791  piedraw.c:131               warzone2100              pie_Draw3DShape2

1.4:
samples  %        linenr info                 app name                 symbol name
6567     10.0185  ALu.c:585                   libopenal.so.1.4.272     aluMixData
4189      6.3906  astar.c:203                 warzone2100              fpathCompare
2078      3.1701  pcm_dmix_x86_64.h:132       libasound.so.2.0.0       mix_areas_32_smp
1801      2.7476  visibility.c:134            warzone2100              rayTerrainCallback
1648      2.5141  piedraw.c:131               warzone2100              pie_Draw3DShape2

1.3:
samples  %        linenr info                 app name                 symbol name
5533      8.9315  ALu.c:581                   libopenal.so.1.3.253     aluMixData
4551      7.3464  astar.c:203                 warzone2100              fpathCompare
1888      3.0477  visibility.c:134            warzone2100              rayTerrainCallback
1584      2.5569  pcm_dmix_x86_64.h:132       libasound.so.2.0.0       mix_areas_32_smp
1578      2.5473  mapgrid.c:264               warzone2100              gridIterate

1.2:
samples  %        linenr info                 app name                 symbol name
12945    23.5999  ALu.c:591                   libopenal.so.1.2.218     aluMixData
2521      4.5960  astar.c:203                 warzone2100              fpathCompare
1434      2.6143  pcm_dmix_x86_64.h:132       libasound.so.2.0.0       mix_areas_32_smp
1315      2.3974  visibility.c:134            warzone2100              rayTerrainCallback
1018      1.8559  piedraw.c:131               warzone2100              pie_Draw3DShape2

1.1:
samples  %        linenr info                 app name                 symbol name
16683    23.8057  ALu.c:484                   libopenal.so.1.1.93      aluMixData
3954      5.6421  astar.c:203                 warzone2100              fpathCompare
2150      3.0679  (no location information)   libpthread-2.9.so        pthread_mutex_lock
1714      2.4458  pcm_dmix_x86_64.h:132       libasound.so.2.0.0       mix_areas_32_smp
1465      2.0905  visibility.c:134            warzone2100              rayTerrainCallback

1.0:
samples  %        linenr info                 app name                 symbol name
13184    21.9653  ALu.c:384                   libopenal.so.1.0.38      aluMixData
3681      6.1328  astar.c:203                 warzone2100              fpathCompare
1770      2.9489  (no location information)   libpthread-2.9.so        pthread_mutex_lock
1734      2.8889  pcm_dmix_x86_64.h:132       libasound.so.2.0.0       mix_areas_32_smp
1388      2.3125  visibility.c:134            warzone2100              rayTerrainCallback


Result of git bisect to find the jump between 1.5 and 1.6:

commit 22557070ec4852d64ad153f5cac907f84119702c
Author: Chris Robinson <chris.kcat@...>
Date:   Thu Aug 14 05:43:52 2008 -0700

    Ramp channel gains to remove pops and clicks from abrupt changes
    Thanks to Christopher Fitzgerald for helping me work on it

_______________________________________________
Openal-devel mailing list
Openal-devel@...
http://opensource.creative.com/mailman/listinfo/openal-devel

Re: OpenALsoft CPU usage?

by Christian Ohm :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

And of course I forgot to attach the config file.

# OpenAL config file. Options that are not under a block or are under the
# [general] block are for general, non-backend-specific options. Blocks may
# appear multiple times, and duplicated options will take the last value
# specified.
# The system-wide settings can be put in /etc/openal/alsoft.conf and user-
# specific override settings in ~/.alsoftrc.
# For Windows, these settings should go into %AppData%\alsoft.ini
# The environment variable ALSOFT_CONF can be used to specify another config
# override

# Option and block names are case-insenstive. The supplied values are only
# hints and may not be honored (though generally it'll try to get as close as
# possible). These are the current available settings:

format = AL_FORMAT_STEREO16  # Sets the output format. Can be one of:
                             # AL_FORMAT_MONO8    (8-bit mono)
                             # AL_FORMAT_STEREO8  (8-bit stereo)
                             # AL_FORMAT_QUAD8    (8-bit 4-channel)
                             # AL_FORMAT_51CHN8   (8-bit 5.1 output)
                             # AL_FORMAT_61CHN8   (8-bit 6.1 output)
                             # AL_FORMAT_71CHN8   (8-bit 7.1 output)
                             # AL_FORMAT_MONO16   (16-bit mono)
                             # AL_FORMAT_STEREO16 (16-bit stereo)
                             # AL_FORMAT_QUAD16   (16-bit 4-channel)
                             # AL_FORMAT_51CHN16  (16-bit 5.1 output)
                             # AL_FORMAT_61CHN16  (16-bit 6.1 output)
                             # AL_FORMAT_71CHN16  (16-bit 7.1 output)
                             # Default is AL_FORMAT_STEREO16

cf_level = 0  # Sets the crossfeed level for stereo output. Valid values are:
              # 0 - No crossfeed
              # 1 - Low crossfeed
              # 2 - Middle crossfeed
              # 3 - High crossfeed (virtual speakers are closer to itself)
              # 4 - Low easy crossfeed
              # 5 - Middle easy crossfeed
              # 6 - High easy crossfeed
              # Default is 0. Users of headphones may want to try various
              # settings. Has no effect on non-stereo modes.

frequency = 44100  # Sets the output frequency. Default is 44100

refresh = 4096  # Sets the buffer size, in frames. Default is 4096. Note that
                # the actual granularity may or may not be less than this.

sources = 256  # Sets the maximum number of allocatable sources. Lower values
               # may help for systems with apps that try to play more sounds
               # than the CPU can handle. Default is 256

stereodup =  # Sets whether to duplicate stereo sounds on the rear and side
             # speakers for 4+ channel output. This can make stereo sources
             # substantially louder than mono or even 4+ channel sources, but
             # provides a "fuller" playback quality. True, yes, on, and non-0
             # values will duplicate stereo sources. 0 and anything else will
             # cause stereo sounds to only play out the front speakers.
             # Default is false

drivers =  # Sets the backend driver list order, comma-seperated. Unknown
           # backends and duplicated names are ignored, and unlisted backends
           # won't be considered for use. An empty list means the default.
           # Default is:
           # alsa,oss,solaris,dsound,winmm,port,wave

excludefx =  # Sets which effects to exclude, preventing apps from using them.
             # This can help for apps that try to use effects which are too CPU
             # intensive for the system to handle. Available effects are:
             # reverb
             # Default is empty (all available effects enabled)

layout_STEREO =  # Sets the speaker layout when using stereo output. Values are
                 # specified in degrees, where 0 is straight in front, negative
                 # goes left, and positive goes right. The values must define a
                 # circular pattern, starting with the back-left at the most
                 # negative, around the front to back-center. Unspecified
                 # speakers will remain at their default position. Available
                 # speakers are front-left(fl) and front-right(fr).
                 # The default is:
                   fl=-90, fr=90

layout_QUAD =  # Sets the speaker layout when using quadriphonic output.
               # Available speakers are back-left(bl), front-left(fl),
               # front-right(fr), and back-right(br).
               # The default is:
                 bl=-135, fl=-45, fr=45, br=135

layout_51CHN =  # Sets the speaker layout when using 5.1 output. Available
                # speakers are back-left(bl), front-left(fl), front-center(fc),
                # front-right(fr), and back-right(br).
                # The default is:
                  bl=-110, fl=-30, fc=0, fr=30, br=110

layout_61CHN =  # Sets the speaker layout when using 6.1 output. Available
                # speakers are side-left(sl), front-left(fl), front-center(fc),
                # front-right(fr), side-right(sr), and back-center(bc).
                # The default is:
                  sl=-90, fl=-30, fc=0, fr=30, sr=90, bc=180

layout_71CHN =  # Sets the speaker layout when using 7.1 output. Available
                # speakers are back-left(bl), side-left(sl), front-left(fl),
                # front-center(fc), front-right(fr), side-right(sr), and
                # back-right(br).
                # The default is:
                  bl=-150, sl=-90, fl=-30, fc=0, fr=30, sr=90 br=150


[alsa]  # ALSA backend stuff
device = default  # Sets the device name for the default playback device.
                  # Default is default

periods = 0  # Sets the number of update buffers for playback. A value of 0
             # means auto-select. Default is 0

capture = default  # Sets the device name for the default capture device.
                   # Default is default

mmap = true  # Sets whether to try using mmap mode (helps reduce latencies and
             # CPU consumption). If mmap isn't available, it will automatically
             # fall back to non-mmap mode. True, yes, on, and non-0 values will
             # attempt to use mmap. 0 and anything else will force mmap off.
             # Default is true.

[oss]  # OSS backend stuff
device = /dev/dsp  # Sets the device name for OSS output. Default is /dev/dsp

periods = 4  # Sets the number of update buffers. Default is 4

capture = /dev/dsp  # Sets the device name for OSS capture. Default is /dev/dsp

[solaris]  # Solaris backend stuff
device = /dev/audio  # Sets the device name for Solaris output. Default is
                     # /dev/audio

[dsound]  # DirectSound backend stuff
periods = 4  # Sets the number of updates for the output buffer. Default is 4

[winmm]  # Windows Multimedia backend stuff
         # Nothing yet...

[port]  # PortAudio backend stuff
device = -1  # Sets the device index for output. Negative values will use the
             # default as given by PortAudio itself. Default is -1

periods = 4  # Sets the number of update buffers. Default is 4

[wave]  # Wave File Writer stuff
file =  # Sets the filename of the wave file to write to. An empty name
        # prevents the backend from opening, even when explicitly requested.
        # THIS WILL OVERWRITE EXISTING FILES WITHOUT QUESTION!
        # Default is empty

_______________________________________________
Openal-devel mailing list
Openal-devel@...
http://opensource.creative.com/mailman/listinfo/openal-devel

Re: OpenALsoft CPU usage?

by Chris Robinson-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tuesday 26 May 2009 7:57:14 am Christian Ohm wrote:

> I did the tests mentioned before. I repeated them after updating to ALSA
> 1.0.20, the results are below (just copied the test notes, I hope it is
> clear enough). I'll also attach my OpenAL config, though I think it's just
> the defaults.
>
> I also bisected to find what caused the difference between 1.5 and 1.6. The
> smaller differences are not that easy to find, since the results vary some
> percentage points depending on the in-game action. I can try to isolate
> those as well, but that needs some more time (need to do several test runs
> per version to get a good average, and a good cut-off point).
>
> If you need some more info, or want me to do other tests (of a more
> predictable nature perhaps), feel free to ask.

Hi, thanks for the info. :)

Those results are pretty odd. All that patch does is add 8 floating-point
additions per sample, which is comparatively minor compared to what it already
did.

Maybe because the sends are being updated through a pointer to an array in the
source struct, it's not optimizing it as well as when it was on the stack..
I'll have to see if putting them back on the stack helps at all.

Also FYI, I made a bit of a change in the GIT version to the ALSA backend
which may help the mixer's efficiency a bit. Not sure how much of an effect
it'll have if the sample mixing loop is being a killer, though.
_______________________________________________
Openal-devel mailing list
Openal-devel@...
http://opensource.creative.com/mailman/listinfo/openal-devel

Re: OpenALsoft CPU usage?

by Christian Ohm :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tuesday, 26 May 2009 at  9:06, Chris Robinson wrote:

> Those results are pretty odd. All that patch does is add 8 floating-point
> additions per sample, which is comparatively minor compared to what it already
> did.
>
> Maybe because the sends are being updated through a pointer to an array in the
> source struct, it's not optimizing it as well as when it was on the stack..
> I'll have to see if putting them back on the stack helps at all.
>
> Also FYI, I made a bit of a change in the GIT version to the ALSA backend
> which may help the mixer's efficiency a bit. Not sure how much of an effect
> it'll have if the sample mixing loop is being a killer, though.

Hello,

I finally got around to doing some more testing. I used altest from
svn://connect.creativelabs.com/OpenAL/trunk/contrib/tests/altest (a bit
modified to remove useless output, use the default device, start the multiple
sources test automatically, and change the number of sources it uses, see
http://pastebin.com/f24b34374). System is the same as last time, 64 bit Debian
unstable, 2 x 2.6 GHz X2, fixed to 2,6 GHz (forgot that last time, though with
oprofile's relative percentages it didn't matter that much), CPU usage as shown
by htop.

1.7 (the version from Debian) uses up to 20% CPU at up to 36 sources, then each
additional source adds a significant amount, at 41 sources it's at 80%, above
that at 95% (but I think just because Linux doesn't give it the full 100%).
Current git does 36 sources at 10%, and scales linearly with 128 sources at 30%
and the maximum of 256 sources at 60%.

Nice work so far. I hope you can release this soon, since we currently get
quite a few complaints about Warzone's CPU usage due to people using OpenAL
Soft 1.7, and we have to tell them to downgrade to fix this. More performance
optimizations would of course be welcome (I think Warzone uses one source per
unit, with 8 x 300 units maximum - we should probably fix this), but the
current git will at least not overload (fast) CPUs anymore.

Best regards,
Christian Ohm

_______________________________________________
Openal-devel mailing list
Openal-devel@...
http://opensource.creative.com/mailman/listinfo/openal-devel

Re: OpenALsoft CPU usage?

by Jason Daly :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Christian Ohm wrote:
I think Warzone uses one source per unit, with 8 x 300 units maximum


Wow, 2400 sources is an awful lot of mixing to do, no matter how well the mixing loop is optimized.  Furthermore, once you get above 30-60 sources (depending on content), the benefits of adding more and more sources to the soundfield start to diminish.

It's not too difficult to implement a management scheme that can pick the n most important sources (nearest, loudest, whatever your app needs) and only render those sources.  My experience has been that it's difficult to tell the difference in a practical application (except for the CPU time you recover :-) ).

--"J"


_______________________________________________
Openal-devel mailing list
Openal-devel@...
http://opensource.creative.com/mailman/listinfo/openal-devel

Re: OpenALsoft CPU usage?

by Christian Ohm :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Friday,  5 June 2009 at 18:28, Jason Daly wrote:
> Wow, 2400 sources is an awful lot of mixing to do, no matter how well the
> mixing loop is optimized.  Furthermore, once you get above 30-60 sources
> (depending on content), the benefits of adding more and more sources to
> the soundfield start to diminish.

Oh, I'm sure reducing the number of sources would be good (especially since
OpenAL Soft's maximum is 256, so we drop random sources now anyway). It "just"
needs someone to do it.

And I didn't want to imply that OpenAL Soft _needs_ to be optimized more. I
tested 1.3 (the version that used the least CPU in my earlier tests), and while
it does 42 sources at 13% CPU, it also chokes somewhere below 64 sources. So
current git is the best OpenAL Soft so far, and if it is released as it is and
replaces 1.7 everywhere I'm quite happy with that.

But I did the same tests with OpenAL-Sample from
svn://connect.creativelabs.com/OpenAL/trunk, and it does 1024 sources at 60%
CPU, scaling down more or less linearly (doing 256 sources at 14%). So if
OpenAL Soft's goal is to be better than the Sample Implementation in every way,
it isn't there yet.

Best regards,
Christian Ohm
_______________________________________________
Openal-devel mailing list
Openal-devel@...
http://opensource.creative.com/mailman/listinfo/openal-devel

Re: OpenALsoft CPU usage?

by Chris Robinson-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Saturday 06 June 2009 11:21:57 am Christian Ohm wrote:
> And I didn't want to imply that OpenAL Soft _needs_ to be optimized more. I
> tested 1.3 (the version that used the least CPU in my earlier tests), and
> while it does 42 sources at 13% CPU, it also chokes somewhere below 64
> sources. So current git is the best OpenAL Soft so far, and if it is
> released as it is and replaces 1.7 everywhere I'm quite happy with that.

I'll see about getting a release together sometime this weekend (today or
tomorrow). There's some problems compiling the current code under MSVC, and
I'd like to try to get them fixed in a reasonable way, but that's difficult to
do without access to that compiler. It does compile under MinGW though, and
Win32 binaries will be provided, so I don't know how important it would be for
release.
_______________________________________________
Openal-devel mailing list
Openal-devel@...
http://opensource.creative.com/mailman/listinfo/openal-devel