|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
ANN: LuaJIT 1.1.0Hi,
LuaJIT is a Just-In-Time (JIT) Compiler for Lua 5.1. LuaJIT is light-weight, efficient and extensible. LuaJIT 1.1.0 is based on Lua 5.1 (final). The performance has been improved in many areas: more specialization and inlining for operators and library functions, adaptive deoptimization, better type hinting, optional SSE2 code generation and many other small optimizations. It supports many popular x86 based operating systems: Linux, *BSD, Mac OS X on Intel, Solaris x86 and Windows (MSVC or MinGW). Please visit the project home page for more info: http://luajit.luaforge.net/ You can find the full changelog and performance comparisons here: http://luajit.luaforge.net/luajit_changes.html http://luajit.luaforge.net/luajit_performance.html Here is a direct link to the download page: http://luajit.luaforge.net/download.html Bye, Mike |
|
|
Re: ANN: LuaJIT 1.1.0Hallo,
On 3/13/06, Mike Pall <mikelu-0603@...> wrote: > Hi, > > LuaJIT is a Just-In-Time (JIT) Compiler for Lua 5.1. > LuaJIT is light-weight, efficient and extensible. > Thank you very much for LuaJIT. Some benchmarks: using plain Lua I can achieve 122 FPS visualising a 512x512 CLOD terrain. Using LuaJIT I can achieve 256 FPS with the same terrain. Visualising a 3200x3200 terrain runs at 31 FPS with plain Lua and at 35 FPS with LuaJIT. But with large terrains the CLOD algorithm dominates, which is entirely in C. -- -alex http://www.ventonegro.org/ |
|
|
Re: ANN: LuaJIT 1.1.0Mike Pall wrote:
> LuaJIT is a Just-In-Time (JIT) Compiler for Lua 5.1. > LuaJIT is light-weight, efficient and extensible. It's a great piece of work (and a really interesting architecture, small footprint, nice docs). In some narrow cases it even makes Lua rather competitive versus optimized GCC code. LuaJIT's maths speed is excellent and probably where it comes closest to C's performance. Loops are good too. Table access is a lot slower than C array access, but to some extent that's the price to be paid for safe and dynamic arrays, though it's a shame since of course many interesting applications of Lua will naturally use table access with abandon. I haven't compared string handling or call speed. I'm comparing LuaJIT to C because for me, LuaJIT is a lot more interesting for its ability to displace future C code than for its ability to run existing Lua code faster. I think that LuaJIT (this version in particular) starts to open Lua up to some domains in which Lua was not performant enough previously. Regards, --Adam |
|
|
Re: ANN: LuaJIT 1.1.0Mike,
What are your future plans for LuaJIT, and how fast do you think a just-in-time compiler for Lua could be? Also, I'm curious: what are the real sources of slowness for a dynamically-typed language like Lua - is it mostly instruction decoding, or is it having to resolve things at run time (like figuring out what function to call for the expression 'a + b'), the lack of inlined functions (I mean pure Lua functions), function call overhead, or what? What do you think the performance limits are for just-in-time compilation in Lua? -Paul |
|
|
Re: teste LulaMaria Lucia Agostini (LULA) wrote:
> Mike, > > What are your future plans for LuaJIT, and how fast do you think a > just-in-time compiler for Lua could be? Also, I'm curious: what are > the real sources of slowness for a dynamically-typed language like Lua > - is it mostly instruction decoding, or is it having to resolve > things at run time (like figuring out wha > |
|
|
Re: ANN: LuaJIT 1.1.0Hi,
Paul Chiusano wrote: > What are your future plans for LuaJIT, This depends on the feedback I'll receive from LuaJIT users. * "Make it produce faster code" is one rather obvious goal. But I have to know which area to target first. E.g. Adam made some comparisons between Lua code and equivalent C code. He sent me a few code snippets which show exactly what's slow and what needs to be tuned. This is very helpful and I can encourage other users of LuaJIT to do the same. Please note that I cannot analyze complete applications -- small and up to the point code snippets (without complex dependencies) are best. * Another goal is better portability (to non x86 CPUs). I think embedded CPUs would benefit most. I had a cheap Linux based DSL/VoIP router here for a few days (switched my parents home over to VoIP). This cute little thing (size of a sandwich loaf) runs Linux on a 200 MHz MIPS32 CPU with 8 or 16 MB RAM. It's adequate when used with compiled C code, but interpreted Lua runs really slow. The tiny cache and the lack of out-of-order execution is a killer for interpreters. IMHO Lua is the only scripting alternative due to severe size constraints (2 or 4 MB flash is really tight). MIPS32 code would also run on the PS2 or PSP, which will still play a role in the game market for a while. ARM is an interesting target for other embedded devices and PDAs (XScale). This box and other embedded systems would benefit greatly from LuaJIT. I'm self-employed and would rather work on LuaJIT than other (less interesting) projects. So this is the plea: I'm actively looking for sponsors who want to see LuaJIT ported to their favourite CPU. If you are a big company or have the necessary funds to pay a developer for several months, please contact me by mail. I will keep all negotiations confidential. The result of the port has to be available as open source of course. [Another option is the GPL + commercial license route (like MySQL), but I'm not sure this would work out.] > and how fast do you think a just-in-time compiler for Lua could be? Only the sky is the limit. No, seriously, it's more a matter of how much work one is able to put into the compiler. GCC and other top performing compilers have seen many years of coordinated development effort. And there are lots of research papers on how to optimize C or Java code. But the good papers on optimizing dynamic languages are far and few between. Right now LuaJIT is at the point where all the low hanging fruit have been picked. Any further performance gains will only be incremental, but take comparatively more work. The real limit is how much free (or paid) time I can spend working on LuaJIT. I just don't know at this point in time. And I have some other Lua projects on the back-burner, too. > Also, I'm curious: what are > the real sources of slowness for a dynamically-typed language like Lua > - is it mostly instruction decoding, This is only relevant for the interpreter. > or is it having to resolve things at run time (like figuring > out what function to call for the expression 'a + b'), This is quite easy in Lua because most opcodes have only one dominant receiver class. Even the interpreter inlines the number case for arithmetic opcodes. The LuaJIT optimizer is pretty good at detecting monomorphism. The new adaptive deoptimization support in LuaJIT 1.1.0 makes backing down in case of undetected polymorphism relatively cheap. Aggressive optimizations can be done without compromising Lua semantics. I think I've covered all of the commonly used monomorphic cases for opcodes now. > the lack of inlined functions (I mean pure Lua functions), This depends on the coding style. I'm not sure about the overall effect in most Lua apps. It's probably not so dominant for the Lua interpreter because other overhead shadows it. OTOH in typical OO-intensive Smalltalk or Self programs one really needs to do function inling to reach acceptable speeds. It's on my TODO list for LuaJIT, but I think other optimizations would pay off more and should be done first. Inlining many standard library functions (C functions) in LuaJIT 1.1.0 payed off a lot. But this is partly due to the reduced call overhead, partly due to specialization and partly because of direct access to internal structures. > function call overhead, This is pretty low for an interpreter (if compared to other interpreters). But it's relatively high when you compare LuaJIT to other compilers. The main reason is that LuaJIT still uses the Lua frame and stack structures. This makes it easy to switch between interpreted and compiled code. And most of the debug support can be reused, too. Reducing the function call overhead any further is hard without major conceptual changes. Inlining short Lua functions may be easier (and is potentially faster). > What do you think the performance limits are for just-in-time > compilation in Lua? * Lua has only a single number type. This simplifies many things and even using a double doesn't make much of a difference for the interpreter. But now that many other things have been optimized, it shows in LuaJIT. Array indexing is slow (compared to C) because it needs too many type conversions (double <-> int) and bounds checks. Narrowing numbers to integers with help from the optimizer is one way to go. Dual number support (int + double) would have benefits for embedded CPUs (lacking FPU hardware). But it's tricky to get this fast for the interpreter and even more so for compiled code. I guess pure integer support is too limiting for most embedded projects (but would be really fast). [I need feedback on this topic from people who use Lua on embedded devices.] * Lua has only a single generic container type (tables). Again this simplifies many things and has little impact on the interpreter. But it puts a limit on what can be optimized in a JIT compiler with only local knowledge. Struct accesses (obj.foo, obj.bar) always need a hash lookup (unlike in languages with static typing). The full metamethod semantics come at a price, too. * Caching globals and method lookups is difficult. A seemingly trivial statement like y = math.sqrt(x) needs two hash table lookups and several type checks and contract verifications to come to the point where the FP square root instruction (fsqrt) can be safely inlined. This overhead cannot be avoided without compromising language semantics (maybe the semantics need to be augmented). Manually caching often used functions is common practice in Lua (local sqrt = math.sqrt). But this doesn't work out so well for obj:method() calls. * Type checks and other contract verifications are cheap on modern x86 CPUs. They execute in the integer unit parallel to the FP intensive main code with out-of-order execution. But the overhead would be noticeable on embedded CPUs. Many redundant checks could be removed or hoisted out of loops. Arithmetic operations could be combined. * Garbage collection and heap allocation put Lua at a speed disadvantage to languages with manual memory management. The impact is less in Lua than other dynamic languages because of typed-value storage and immutable shared strings. Adding a custom memory allocator to the Lua core could be beneficial. Complex solutions like escape analysis are not on my radar for LuaJIT (yet). Bye, Mike |
|
|
Re: ANN: LuaJIT 1.1.0Very nice work. I downloaded it and integrated it into my modified LUA shell in about 5 minutes. The speed improvement was startling and obvious. So far no bugs observed.
I've been working on my own compiler back end. I'm primarily interested in optimization techniques and performance. I'm somewhat neutral on the front end language as long as it's interactive and has garbage collection. I'll have to dig into your code some to see what you've been doing. |
|
|
SpeedHi !
i have Lua5.1 in my application implemented, and have change double to float in luaconfig.h my question : have i any changs to speed lua up by change or patch anything ? thanks for suggestions.. (easy english sorry) ----- Original Message ----- From: "SevenThunders" <mattcbro@...> To: <lua@...> Sent: Wednesday, March 15, 2006 2:37 AM Subject: Re: ANN: LuaJIT 1.1.0 > > Very nice work. I downloaded it and integrated it into my modified LUA > shell > in about 5 minutes. The speed improvement was startling and obvious. So > far no bugs observed. > > I've been working on my own compiler back end. I'm primarily interested > in > optimization techniques and performance. I'm somewhat neutral on the > front > end language as long as it's interactive and has garbage collection. I'll > have to dig into your code some to see what you've been doing. > -- > View this message in context: > http://www.nabble.com/ANN%3A-LuaJIT-1.1.0-t1273815.html#a3408769 > Sent from the Lua - General forum at Nabble.com. > > |
|
|
Re: ANN: LuaJIT 1.1.0Mike Pall wrote:
> [snip] > The main reason is that LuaJIT still uses the Lua frame and stack > structures. This makes it easy to switch between interpreted and > compiled code. And most of the debug support can be reused, too. > > Reducing the function call overhead any further is hard without > major conceptual changes. Inlining short Lua functions may be > easier (and is potentially faster). > >> What do you think the performance limits are for just-in-time >> compilation in Lua? > [snip] I think up to a certain point, there is only so much one can do to speed things up without sacrificing something. If a JIT is to function exactly like interpreted Lua, one cannot exactly produce very fast code approaching the speed of C or bare metal code. It's a tradeoff -- for top speed, we'd have to start cutting some functionality off. What would achieve top speed? We'd need a lite-Lua profile that is largely procedural and where most data types can be made static. Things like metamethod checking will need to be dropped unless it is explicitly needed. There cannot be strict semantics; integers must be used where appropriate so that one can forget about conversion. Where applications are concerned however, a lite-Lua profile would not be appropriate where one wants to JIT the entire Lua source code, but it would greatly accelerate a subset of functions. So it assumes the application is mostly fast enough on interpreted Lua, but there are a few processing intensive functions that badly need accelerating. -- Cheers, Kein-Hong Man (esq.) Kuala Lumpur, Malaysia |
|
|
Re: ANN: LuaJIT 1.1.0In message <20060315005045.GA9206@...> you wrote:
> > What are your future plans for LuaJIT, > > This depends on the feedback I'll receive from LuaJIT users. > * Another goal is better portability (to non x86 CPUs). > ................ > ARM is an interesting target for other > embedded devices and PDAs (XScale). And for Risc OS - the OS which was designed around the ARM by the group who originally created the ARM architecture. There are still a few thousand users. I would love to see an integer-only ARM LuaJIT. The type-theoretic questions that the LuaJIT suggests are interesting. What languages are there, in the neighbourhood of Lua (in that "space" of languages that has not yet been given formal definition), that have more static typing - enough to make possible faster compiled code - but yet retain the character and appeal of Lua? -- Gavin Wraith (gavin@...) Home page: http://www.wra1th.plus.com/ |
|
|
Re: ANN: LuaJIT 1.1.0> This depends on the feedback I'll receive from LuaJIT users.
I tried to use LuaJIT in our game engine today. Unfortunately, it seems there is no support for float in stead of double: #ifndef LUA_NUMBER_DOUBLE #error "No support for other number types on x86 (yet)" #endif At this point, this is keeping me from using LuaJIT since I prefer to use float-s over double-s. It would be great to see float support in LuaJIT! Thanks, Hugo |
|
|
Re: ANN: LuaJIT 1.1.0Hallo,
On 3/15/06, Framework Studios: Hugo <hugo@...> wrote: > > At this point, this is keeping me from using LuaJIT since I prefer to use > float-s over double-s. It would be great to see float support in LuaJIT! > Why is that? Have you seen this: http://lua-users.org/wiki/FloatingPoint ? Or are you using SSE instructions? -- -alex http://www.ventonegro.org/ |
|
|
Re: ANN: LuaJIT 1.1.0Yes, I am aware of the difference between float and double.
It's because we're using DirectX... it sets the FPU to single precision mode so eighter we have to use float-s or Lua starts acting really weird after init of DirectX. Note the DirectX documentation claims DirectX to be slower when preserving the FPU flag; about "D3DCREATE_FPU_PRESERVE" it sais: "Indicates that the application needs either double-precision floating-point unit (FPU) or FPU exceptions enabled. Microsoft® Direct3D® sets the FPU state each time it is called. By default, the pipeline uses single precision. Be sure to use this flag to get double precision. Setting the flag will reduce Direct3D performance." The reason to use LuaJIT of course would be to gain speed, not loose it because of forcing double precision onto DirectX. Bye, Hugo ----- Original Message ----- From: "Alex Queiroz" <asandroq@...> To: "Lua list" <lua@...> Sent: Wednesday, March 15, 2006 3:40 PM Subject: Re: ANN: LuaJIT 1.1.0 Hallo, On 3/15/06, Framework Studios: Hugo <hugo@...> wrote: > > At this point, this is keeping me from using LuaJIT since I prefer to use > float-s over double-s. It would be great to see float support in LuaJIT! > Why is that? Have you seen this: http://lua-users.org/wiki/FloatingPoint ? Or are you using SSE instructions? -- -alex http://www.ventonegro.org/ |
|
|
Re: ANN: LuaJIT 1.1.0> Note the DirectX documentation claims DirectX to be slower when
> preserving the FPU flag; I'm just curious, does anyone have actual numbers to back this up? andras |
|
|
Re: ANN: LuaJIT 1.1.0Hi,
Framework Studios: Hugo wrote: > Note the DirectX documentation claims DirectX to be slower when preserving > the FPU flag; about "D3DCREATE_FPU_PRESERVE" it sais: AFAIK this only applies to really old CPUs. Does your game even run on a >5 year old PC without a 3D-capable GPU? I suggest you try to turn on the flag and check whether it makes any difference. Bye, Mike |
|
|
Re: ANN: LuaJIT 1.1.0On Wed, Mar 15, 2006 at 04:02:16PM +0100, Framework Studios: Hugo wrote:
> Yes, I am aware of the difference between float and double. > > It's because we're using DirectX... it sets the FPU to single precision > mode so eighter we have to use float-s or Lua starts acting really weird > after init of DirectX. > > Note the DirectX documentation claims DirectX to be slower when preserving > the FPU flag; about "D3DCREATE_FPU_PRESERVE" it sais: You don't need to switch Lua to floats to avoid this problem. If you really want to keep DirectX as it is, you can recompile Lua to use the default definition for lua_number2integer: luaconf.h:544 - #if defined(LUA_NUMBER_DOUBLE) && !defined(LUA_ANSI) && !defined(__SSE2__) && \ - (defined(__i386) || defined (_M_IX86) || defined(__i386__)) + #if 0 -- Roberto |
|
|
Re: ANN: LuaJIT 1.1.0Although I don't know much about DirectX, it is the SSE2 instructions that see a large boost using 32 bit floats, since they can do twice as many multiplies per clock cycle. Moreover if the modern Directx drivers are using the CPU I would be surprised if they are not using the more modern SSE and SSE2 instructions over the old x86 FPU. Thus one would never have to employ the nasty switch to single precision on the FPU (which probably sucks up a lot of clock cycles in it's own right).
Perhaps the question is what version of DirectX are you using? Actually a google search produces this link http://blogs.msdn.com/tmiller/archive/2004/06/01/145596.aspx Tell your DirectX to leave the deprecated FPU alone! |
|
|
Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectXHi,
I'm using the DirectX9c SDK from Feb. 2006 (the latest). Imho maybe we can assume the people behind DirectX have a good reason to do something as potentially 'dangerous' to other libraries as setting the FPU to single precision. A google search can also produce pages about the loss of speed when using the 'preserve FPU' flag, like http://discuss.microsoft.com/SCRIPTS/WA-MSD.EXE?A2=ind0504b&L=directxdev&D=1&P=4524 (quote: "For games, you typically don't want the performance hit of having the FP unit working in double-precision."). With SSE of course it can be avoided, but what about DX drivers on a CPU without SSE? Btw, can 32-bit float-s really benefit from SSE2 over 64-bit double-s? Or maybe double-s are faster than float-s on 64-bit CPUs? So for now I'm sticking with having DirectX setting the (perhaps not so very deprecated :) FPU to single precision and prefering float over double in our game engine until DirectX itself converts to using double-s. Anyway, today I got LuaJIT to work, big thanks to Roberto for this tip! luaconf.h:544 - #if defined(LUA_NUMBER_DOUBLE) && !defined(LUA_ANSI) && !defined(__SSE2__) && \ - (defined(__i386) || defined (_M_IX86) || defined(__i386__)) + #if 0 cheers, Hugo ----- Original Message ----- From: "SevenThunders" <mattcbro@...> To: <lua@...> Sent: Thursday, March 16, 2006 6:27 AM Subject: Re: ANN: LuaJIT 1.1.0 > > Although I don't know much about DirectX, it is the SSE2 instructions that > see a large boost using 32 bit floats, since they can do twice as many > multiplies per clock cycle. Moreover if the modern Directx drivers are > using the CPU I would be surprised if they are not using the more modern > SSE > and SSE2 instructions over the old x86 FPU. Thus one would never have to > employ the nasty switch to single precision on the FPU (which probably > sucks > up a lot of clock cycles in it's own right). > > Perhaps the question is what version of DirectX are you using? Actually a > google search produces this link > http://blogs.msdn.com/tmiller/archive/2004/06/01/145596.aspx > > Tell your DirectX to leave the deprecated FPU alone! > -- > View this message in context: > http://www.nabble.com/ANN%3A-LuaJIT-1.1.0-t1273815.html#a3430294 > Sent from the Lua - General forum at Nabble.com. > |
|
|
Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectXI still don't buy this. Give me numbers! Should be easy, right? Set
the flag and read FPS counter! I'd test it myself, but I'm using OpenGL, and thus never had to change the FPU control word.. Andras Thursday, March 16, 2006, 1:35:08 AM, you wrote: > A google search can also produce pages about the loss of speed when using > the 'preserve FPU' flag, like > http://discuss.microsoft.com/SCRIPTS/WA-MSD.EXE?A2=ind0504b&L=directxdev&D=1&P=4524 > (quote: "For games, you typically don't want the performance hit of having > the FP unit working in double-precision."). |
|
|
Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectXWell, one number I found on the decrease of Direct3D's speed with and
without the FPU preserve flag: http://discuss.microsoft.com/SCRIPTS/WA-MSD.EXE?A2=ind0504b&L=directxdev&D=1&P=3121 with: 560 fps without: 580 fps However I think it is a bit beside the point to 'prove' this with numbers since DirectX more or less already chose single precision for us (for a good reason, I trust). Also it seems logical for a 3D API to be faster when using float-s in stead of double-s because twice the data can be pushed to the GPU with the same bandwidth / stored in VRAM. Isn't this the same for OpenGL? Looking at the performance of double vs float on modern CPU-s should be interesting though. Are double-s faster, slower or the same compared to float-s on 32-bit and 64-bit CPU architecture? What about the CPU-s people are actually using on average at the moment? (to sell games we need to look at what is average on the market, not only to what is top-notch :) Cheers, Hugo ----- Original Message ----- From: "Andras Balogh" <andras.balogh@...> To: "Lua list" <lua@...> Sent: Thursday, March 16, 2006 2:51 PM Subject: Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX >I still don't buy this. Give me numbers! Should be easy, right? Set > the flag and read FPS counter! I'd test it myself, but I'm using > OpenGL, and thus never had to change the FPU control word.. > > > Andras > > Thursday, March 16, 2006, 1:35:08 AM, you wrote: > >> A google search can also produce pages about the loss of speed when using >> the 'preserve FPU' flag, like >> http://discuss.microsoft.com/SCRIPTS/WA-MSD.EXE?A2=ind0504b&L=directxdev&D=1&P=4524 >> (quote: "For games, you typically don't want the performance hit of >> having >> the FP unit working in double-precision."). > > |
| < Prev | 1 - 2 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |