|
View:
New views
17 Messages
—
Rating Filter:
Alert me
|
|
|
Lockfree ringbuffer review Hi!
I've just completed writing a lock free ringbuffer to get a JACK driver implemented. The only special thing is that it is frame based, because I saw that the JACK driver code itself really got messy if you always have to ensure that you don't by accident read or write half frames from/to the ringbuffer. So here you can create a ringbuffer with 1024 frames of float audio data, where each frame contains two floats (for the channels), by constructing a FrameRingBuffer<float> (1024, 2), and the ringbuffer itself will ensure that you only deal with complete frames afterwards. I would really like to get some review for the code. It works within the JACK driver, but that of course doesn't say that its bug-free. template<class T> class FrameRingBuffer { //BIRNET_PRIVATE_COPY (FrameRingBuffer); private: vector<T> buffer; typedef typename vector<T>::iterator BufferIterator; int read_frame_pos; int write_frame_pos; guint elements_per_frame; public: FrameRingBuffer (guint n_frames = 0, guint elements_per_frame = 1) { resize (n_frames, elements_per_frame); } /** * checks available read space in the ringbuffer * * @returns the number of frames that are available for reading */ guint read_space() { int wpos = Atomic::int_get (&write_frame_pos); int rpos = Atomic::int_get (&read_frame_pos); int size = buffer.size() / elements_per_frame; if (wpos < rpos) /* wpos == rpos -> empty ringbuffer */ wpos += size; return wpos - rpos; } /** * reads data from the ringbuffer * * @returns the number of successfully read frames */ guint read (guint n_frames, T* frames) { int rpos = Atomic::int_get (&read_frame_pos); guint size = buffer.size() / elements_per_frame; guint can_read = min (read_space(), n_frames); BufferIterator start = buffer.begin() + rpos * elements_per_frame; guint read1 = min (can_read, size - rpos) * elements_per_frame; copy (start, start + read1, frames); guint read2 = can_read * elements_per_frame - read1; copy (buffer.begin(), buffer.begin() + read2, frames + read1); Atomic::int_set (&read_frame_pos, (rpos + can_read) % size); return can_read; } /** * checks available write space in the ringbuffer * * @returns the number of frames that can be written */ guint write_space() { int wpos = Atomic::int_get (&write_frame_pos); int rpos = Atomic::int_get (&read_frame_pos); int size = buffer.size() / elements_per_frame; if (rpos <= wpos) /* wpos == rpos -> empty ringbuffer */ rpos += size; /* the extra element allows us to see the difference between * - empty ringbuffer * - full ringbuffer */ return rpos - wpos - 1; } /** * writes data to the ringbuffer * * @returns the number of successfully written frames */ guint write (guint n_frames, const T* frames) { int wpos = Atomic::int_get (&write_frame_pos); guint size = buffer.size() / elements_per_frame; guint can_write = min (write_space(), n_frames); BufferIterator start = buffer.begin() + wpos * elements_per_frame; guint write1 = min (can_write, size - wpos) * elements_per_frame; copy (frames, frames + write1, start); guint write2 = can_write * elements_per_frame - write1; copy (frames + write1, frames + write1 + write2, buffer.begin()); Atomic::int_set (&write_frame_pos, (wpos + can_write) % size); return can_write; } /** * returns the maximum number of frames that the ringbuffer can contain */ guint size() const { return (buffer.size() - 1) / elements_per_frame; } /** * clears the ringbuffer * this function is not! threadsafe */ void clear() { Atomic::int_set (&read_frame_pos, 0); Atomic::int_set (&write_frame_pos, 0); } /** * resizes and clears the ringbuffer * this function is not! threadsafe */ void resize (guint n_frames, guint elements_per_frame = 1) { this->elements_per_frame = elements_per_frame; buffer.resize ((n_frames + 1) * elements_per_frame); clear(); } }; Cu... Stefan -- Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan _______________________________________________ beast mailing list beast@... http://mail.gnome.org/mailman/listinfo/beast |
|
|
Re: Lockfree ringbuffer reviewOn Tue, 30 May 2006, Stefan Westerfeld wrote:
> Hi! well, for the most part, i can't make too much sense of the code, because i don't know how it's expected to be used which is also not documented anywhere. a blurb about what thread model and communication patterns you're using seems to be neccessary, and it probably should go into the code directly. also, when doing that, you should add comments next to functions explaining which thread/entity/etc. is supposed to call it. so, i'm going to make mostly stylistic comments here. in particular i can't say anything about the atomic ops. i just noted that you're only using int_set()/int_get() instead of the usual int_get()/int_cas() (note that int_set() is just provided for initialization purposes). and that doesn't make too much sense to me, at least not without the cmmunication model blurb i sked for above. > template<class T> > class FrameRingBuffer { > //BIRNET_PRIVATE_COPY (FrameRingBuffer); > private: > vector<T> buffer; > typedef typename vector<T>::iterator BufferIterator; > int read_frame_pos; > int write_frame_pos; these pointers should be volatile if you mean to access them atomically. > guint elements_per_frame; members should be prefixed with "m_". > public: > FrameRingBuffer (guint n_frames = 0, > guint elements_per_frame = 1) > { > resize (n_frames, elements_per_frame); to avoid shadowing of members via local variables or function arguments like here. > } > /** > * checks available read space in the ringbuffer > * > * @returns the number of frames that are available for reading i think read_space() is a very irritating funciton name for this, just like write_space() below. the "verb object" form indicates that this function *does* something, i.e. read/write. simply renaming to get_read_space() would fix that. but then, "space" si allmost a dummy word in that it could mean nearly anything (e.g. bytes, bits, pages, whatever). afaiu, you mean to calculate the available number of frames here, so get_readable_frames() and get_writable_frames() could be possible function names. > */ > guint > read_space() > { > int wpos = Atomic::int_get (&write_frame_pos); > int rpos = Atomic::int_get (&read_frame_pos); > int size = buffer.size() / elements_per_frame; > > if (wpos < rpos) /* wpos == rpos -> empty ringbuffer */ > wpos += size; if just wpos == rpos means the ringbuffer is empty, how do you distinguish that from a full ring buffer then? > > return wpos - rpos; > } > /** > * reads data from the ringbuffer > * > * @returns the number of successfully read frames > */ > guint > read (guint n_frames, T* frames) > { > int rpos = Atomic::int_get (&read_frame_pos); > guint size = buffer.size() / elements_per_frame; > guint can_read = min (read_space(), n_frames); > > BufferIterator start = buffer.begin() + rpos * elements_per_frame; > guint read1 = min (can_read, size - rpos) * elements_per_frame; > copy (start, start + read1, frames); you should use bseblockuitls.h functions to copy floats. > > guint read2 = can_read * elements_per_frame - read1; > copy (buffer.begin(), buffer.begin() + read2, frames + read1); > > Atomic::int_set (&read_frame_pos, (rpos + can_read) % size); > return can_read; > } > /** > * checks available write space in the ringbuffer > * > * @returns the number of frames that can be written > */ > guint > write_space() > { > int wpos = Atomic::int_get (&write_frame_pos); > int rpos = Atomic::int_get (&read_frame_pos); > int size = buffer.size() / elements_per_frame; > > if (rpos <= wpos) /* wpos == rpos -> empty ringbuffer */ > rpos += size; > > /* the extra element allows us to see the difference between > * - empty ringbuffer > * - full ringbuffer > */ > return rpos - wpos - 1; > } > /** > * writes data to the ringbuffer > * > * @returns the number of successfully written frames > */ > guint > write (guint n_frames, const T* frames) > { > int wpos = Atomic::int_get (&write_frame_pos); > guint size = buffer.size() / elements_per_frame; > guint can_write = min (write_space(), n_frames); > > BufferIterator start = buffer.begin() + wpos * elements_per_frame; > guint write1 = min (can_write, size - wpos) * elements_per_frame; > copy (frames, frames + write1, start); > > guint write2 = can_write * elements_per_frame - write1; > copy (frames + write1, frames + write1 + write2, buffer.begin()); > > Atomic::int_set (&write_frame_pos, (wpos + can_write) % size); > return can_write; > } > /** > * returns the maximum number of frames that the ringbuffer can contain > */ > guint > size() const > { > return (buffer.size() - 1) / elements_per_frame; > } > /** > * clears the ringbuffer > * this function is not! threadsafe in general, if you mean to put docuemtnation comments next to your functions, please use regular sentences and punctuation. emphasis can be added with @emph{}, not an exclamation mark in the middle of a sentence. also, functions that take arguments and return values need extra docs, like: * @param file_pattern wildcard pattern for file names * @return a singly linked list with newly allocated strings > */ > void > clear() > { > Atomic::int_set (&read_frame_pos, 0); > Atomic::int_set (&write_frame_pos, 0); > } > /** > * resizes and clears the ringbuffer > * this function is not! threadsafe > */ > void > resize (guint n_frames, > guint elements_per_frame = 1) > { > this->elements_per_frame = elements_per_frame; > buffer.resize ((n_frames + 1) * elements_per_frame); > clear(); > } > }; > > Cu... Stefan > -- > Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan > --- ciaoTJ _______________________________________________ beast mailing list beast@... http://mail.gnome.org/mailman/listinfo/beast |
|
|
Re: Lockfree ringbuffer review Hi!
On Tue, May 30, 2006 at 05:13:10PM +0200, Tim Janik wrote: > On Tue, 30 May 2006, Stefan Westerfeld wrote: > well, for the most part, i can't make too much sense of the code, > because i don't know how it's expected to be used which is also not > documented anywhere. > a blurb about what thread model and communication patterns you're > using seems to be neccessary, and it probably should go into the > code directly. > also, when doing that, you should add comments next to functions > explaining which thread/entity/etc. is supposed to call it. Ok, I added quite a bit of text now. Let me know if it needs to be extended / fixed. > so, i'm going to make mostly stylistic comments here. in particular > i can't say anything about the atomic ops. i just noted that you're > only using int_set()/int_get() instead of the usual int_get()/int_cas() > (note that int_set() is just provided for initialization purposes). > and that doesn't make too much sense to me, at least not without the > cmmunication model blurb i sked for above. I designed the code with only two things that I need from the atomic ints in mind: they should be atomic (i.e. no half integers), and setting them should actually force the memory really to be written (i.e. they should act as memory barriers). In the old implementation, I've forgotten one situation where a memory barrier is needed. You'll find it in the new code, explicitely marked as write_memory_barrier. > > int read_frame_pos; > > int write_frame_pos; > > these pointers should be volatile if you mean to access them atomically. Fixed. > > guint elements_per_frame; > > members should be prefixed with "m_". Fixed. > > } > > /** > > * checks available read space in the ringbuffer > > * > > * @returns the number of frames that are available for reading > > i think read_space() is a very irritating funciton name for this, > just like write_space() below. the "verb object" form indicates that > this function *does* something, i.e. read/write. > simply renaming to get_read_space() would fix that. but then, "space" > si allmost a dummy word in that it could mean nearly anything (e.g. > bytes, bits, pages, whatever). afaiu, you mean to calculate the > available number of frames here, so get_readable_frames() and > get_writable_frames() could be possible function names. Fixed. Although I found space in other ringbuffers. But your names are not too bad, either. > > guint > > read_space() > > { > > int wpos = Atomic::int_get (&write_frame_pos); > > int rpos = Atomic::int_get (&read_frame_pos); > > int size = buffer.size() / elements_per_frame; > > > > if (wpos < rpos) /* wpos == rpos -> empty ringbuffer */ > > wpos += size; > > if just wpos == rpos means the ringbuffer is empty, how do you > distinguish that from a full ring buffer then? I use one extra frame, and the ringbuffer will never be completely filled. So a 1024 element ringbuffer actually uses 1025 elements, while at most 1024 of them can be filled in any given situation. I spread the comment about it in the code now, so that everywhere where a (x + 1) or (x - 1) computation can be found in the code due to the extra element, you'll also find a comment about it. > > BufferIterator start = buffer.begin() + rpos * elements_per_frame; > > guint read1 = min (can_read, size - rpos) * elements_per_frame; > > copy (start, start + read1, frames); > > you should use bseblockuitls.h functions to copy floats. Well, I thought of it, but copy can copy anything, be it bytes or std::strings. The ringbuffer is a template (so you can instantiate it with floats or bytes or anything else), so just plugging bseblockutils.h won't work. I am not sure in how far the current ringbuffer code is useful for moving larger objects between threads (like smart pointers to big chunks of memory that have been filled by a reader thread), especially because when they are copied out from the ringbuffer, they are not explicitely destroyed. But the copies and construction/destruction is at least done properly (so you will never "forget" to execute a copy constructor like you would with just using memcpy) eventually. As a comment on efficiency, for floats copy calls memmove which in turn calls memcpy, and bseblockutils.h uses memcpy. So there is no actual efficiency loss right now. We could use template specialization to enforce that if you're instantiating a float ringbuffer, then bseblockutils will be used. But it wouldn't make things faster. The benchmarks on my AMD64 suggest that replacing memcpy in bseblockutils.h by SSE instructions will not have a positive effect on performance. I don't know about the other Intel/AMD systems though. Finally, if we were to optimize the JACK driver for maximum performance, a more important factor would probably be that the ringbuffer even has this API. It means that we need to deinterleave and reinterleave the data JACK supplies on the stack, and then read/write it to the ringbuffer. If the ringbuffer would offer an API to access its memory directly, the extra copy would disappear. But it makes the code harder to maintain and for now my first priority has been getting things implemented correctly. > > /** > > * clears the ringbuffer > > * this function is not! threadsafe > > in general, if you mean to put docuemtnation comments next to your > functions, > please use regular sentences and punctuation. emphasis can be added with > @emph{}, not an exclamation mark in the middle of a sentence. > also, functions that take arguments and return values need extra docs, like: > * @param file_pattern wildcard pattern for file names > * @return a singly linked list with newly allocated strings I fixed some of it, although using special tags is hard to get right if you don't have a web browser open which points to the same documentation you're writing in that moment. Anyway, finally, after all this discussion, here is the new code. namespace { /** * The FrameRingBuffer class implements a ringbuffer for the communication * between two threads. One thread - the producer thread - may only write * data to the ringbuffer. The other thread - the consumer thread - may * only read data from the ringbuffer. * * Given that these two threads only use the appropriate functions, no * other synchronization is required to ensure that the data gets safely * from the producer thread to the consumer thread. However, all operations * that are provided by the ringbuffer are non-blocking, so that you may * need a condition or other synchronization primitive if you want the * producer and/or consumer to block if the ringbuffer is full/empty. * * Implementation: the synchronization between the two threads is only * implemented by two index variables (read_frame_pos and write_frame_pos) * for which atomic integer reads and writes are required. Since the * producer thread only modifies the write_frame_pos and the consumer thread * only modifies the read_frame_pos, no compare-and-swap or similar * operations are needed to avoid concurrent writes. */ template<class T> class FrameRingBuffer { //BIRNET_PRIVATE_COPY (FrameRingBuffer); private: typedef typename vector<T>::iterator BufferIterator; vector<T> m_buffer; volatile int m_read_frame_pos; volatile int m_write_frame_pos; guint m_elements_per_frame; void write_memory_barrier() { static volatile int dummy = 0; /* * writing this dummy integer should ensure that all prior writes * are committed to memory */ Atomic::int_set (&dummy, 0x12345678); } public: FrameRingBuffer (guint n_frames = 0, guint elements_per_frame = 1) { resize (n_frames, elements_per_frame); } /** * Checks available read space in the ringbuffer. * This function should be called from the consumer thread. * * @returns the number of frames that are available for reading */ guint get_readable_frames() { int wpos = Atomic::int_get (&m_write_frame_pos); int rpos = Atomic::int_get (&m_read_frame_pos); int size = m_buffer.size() / m_elements_per_frame; if (wpos < rpos) /* wpos == rpos -> empty ringbuffer */ wpos += size; return wpos - rpos; } /** * Reads data from the ringbuffer; if there is not enough data * in the ringbuffer, the function will not block. * This function should be called from the consumer thread. * * @returns the number of successfully read frames */ guint read (guint n_frames, T *frames) { int rpos = Atomic::int_get (&m_read_frame_pos); guint size = m_buffer.size() / m_elements_per_frame; guint can_read = min (get_readable_frames(), n_frames); BufferIterator start = m_buffer.begin() + rpos * m_elements_per_frame; guint read1 = min (can_read, size - rpos) * m_elements_per_frame; copy (start, start + read1, frames); guint read2 = can_read * m_elements_per_frame - read1; copy (m_buffer.begin(), m_buffer.begin() + read2, frames + read1); Atomic::int_set (&m_read_frame_pos, (rpos + can_read) % size); return can_read; } /** * Checks available write space in the ringbuffer. * This function should be called from the producer thread. * * @returns the number of frames that can be written */ guint get_writable_frames() { int wpos = Atomic::int_get (&m_write_frame_pos); int rpos = Atomic::int_get (&m_read_frame_pos); guint size = m_buffer.size() / m_elements_per_frame; if (rpos <= wpos) /* wpos == rpos -> empty ringbuffer */ rpos += size; // the extra element allows us to see the difference between an empty/full ringbuffer return rpos - wpos - 1; } /** * Writes data to the ringbuffer; if there is not enough data * in the ringbuffer, the function will not block. * This function should be called from the producer thread. * * @returns the number of successfully written frames */ guint write (guint n_frames, const T *frames) { int wpos = Atomic::int_get (&m_write_frame_pos); guint size = m_buffer.size() / m_elements_per_frame; guint can_write = min (get_writable_frames(), n_frames); BufferIterator start = m_buffer.begin() + wpos * m_elements_per_frame; guint write1 = min (can_write, size - wpos) * m_elements_per_frame; copy (frames, frames + write1, start); guint write2 = can_write * m_elements_per_frame - write1; copy (frames + write1, frames + write1 + write2, m_buffer.begin()); // It is important that the data from the previous writes get committed // to memory *before* the index variable is updated. Otherwise, the // consumer thread could be reading invalid data, if the index variable // got written before the rest of the data (when unordered writes are // performed). write_memory_barrier(); Atomic::int_set (&m_write_frame_pos, (wpos + can_write) % size); return can_write; } /** * Returns the maximum number of frames that the ringbuffer can contain. * * This function can be called from any thread. */ guint size() const { // the extra element allows us to see the difference between an empty/full ringbuffer return (m_buffer.size() - 1) / m_elements_per_frame; } /** * Clears the ringbuffer. * * This function is @emph{not} threadsafe, and can not be used while * either the producer thread or the consumer thread are modifying * the ringbuffer. */ void clear() { Atomic::int_set (&m_read_frame_pos, 0); Atomic::int_set (&m_write_frame_pos, 0); } /** * Resizes and clears the ringbuffer. * * This function is @emph{not} threadsafe, and can not be used while * either the producer thread or the consumer thread are modifying * the ringbuffer. */ void resize (guint n_frames, guint elements_per_frame = 1) { m_elements_per_frame = elements_per_frame; // the extra element allows us to see the difference between an empty/full ringbuffer m_buffer.resize ((n_frames + 1) * m_elements_per_frame); clear(); } }; Cu... Stefan -- Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan _______________________________________________ beast mailing list beast@... http://mail.gnome.org/mailman/listinfo/beast |
|
|
Re: Lockfree ringbuffer reviewStefan Westerfeld <stefan@...> writes:
> a lock free ringbuffer to get a JACK driver Really excellent;) > I would really like to get some review for the code. Why not post it to jackit-devel? -- Esben Stien is b0ef@e s a http://www. s t n m irc://irc. b - i . e/%23contact sip:b0ef@ e e jid:b0ef@ n n _______________________________________________ beast mailing list beast@... http://mail.gnome.org/mailman/listinfo/beast |
|
|
Re: Lockfree ringbuffer reviewOn Wed, 31 May 2006, Stefan Westerfeld wrote:
> On Tue, May 30, 2006 at 05:13:10PM +0200, Tim Janik wrote: >>> BufferIterator start = buffer.begin() + rpos * elements_per_frame; >>> guint read1 = min (can_read, size - rpos) * elements_per_frame; >>> copy (start, start + read1, frames); >> >> you should use bseblockuitls.h functions to copy floats. > > Well, I thought of it, but copy can copy anything, be it bytes or > std::strings. The ringbuffer is a template (so you can instantiate it > with floats or bytes or anything else), so just plugging bseblockutils.h > won't work. i really think using a template here is overkill. it's highly unlikely that we'll use this ringbuffer with ints or doubles. and using floats allowes optimizations that currently are not possible. > As a comment on efficiency, for floats copy calls memmove which in turn > calls memcpy, and bseblockutils.h uses memcpy. So there is no actual > efficiency loss right now. if you judge bseblockutils.h by some default implementation of it, you completely missed the point of having it. the functions it provides are in place to make use of machine specific extensions to improve performance. trading that for memcpy() will definitely be a loss. and as an aside, the default implementation doesn't use memcpy() but wmemcpy() which is faster due to alignment constraints. > We could use template specialization to > enforce that if you're instantiating a float ringbuffer, then > bseblockutils will be used. But it wouldn't make things faster. well, what's the use case for a non-float ringbuffer? > The benchmarks on my AMD64 suggest that replacing memcpy in > bseblockutils.h by SSE instructions will not have a positive effect on > performance. I don't know about the other Intel/AMD systems though. that is unrelated to the ring buffer. if SIMD copying was slower than non-SIMD copying, this has to be fixed as part of an optimization of the blockutils implementation, not by the ring buffer (or by the ring buffer not using blockutils). > Finally, if we were to optimize the JACK driver for maximum performance, > a more important factor would probably be that the ringbuffer even has > this API. sorry? what API are you talking about? void bse_block_copy_float (guint n_values, gfloat *values, const gfloat *ivalues); copying? > It means that we need to deinterleave and reinterleave the > data JACK supplies on the stack, and then read/write it to the > ringbuffer. If the ringbuffer would offer an API to access its memory > directly, the extra copy would disappear. But it makes the code harder > to maintain and for now my first priority has been getting things > implemented correctly. i don't quite understand this point. are you saying that replacing memcpy() by bse_block_copy_float() can't be done due to interleaving? :-O > namespace { > > /** > * The FrameRingBuffer class implements a ringbuffer for the communication > * between two threads. One thread - the producer thread - may only write > * data to the ringbuffer. The other thread - the consumer thread - may > * only read data from the ringbuffer. > * > * Given that these two threads only use the appropriate functions, no > * other synchronization is required to ensure that the data gets safely > * from the producer thread to the consumer thread. However, all operations > * that are provided by the ringbuffer are non-blocking, so that you may > * need a condition or other synchronization primitive if you want the > * producer and/or consumer to block if the ringbuffer is full/empty. s/if/on the event that/ ? > * > * Implementation: the synchronization between the two threads is only > * implemented by two index variables (read_frame_pos and write_frame_pos) > * for which atomic integer reads and writes are required. Since the > * producer thread only modifies the write_frame_pos and the consumer thread > * only modifies the read_frame_pos, no compare-and-swap or similar > * operations are needed to avoid concurrent writes. > */ good comment btw, i now have an idea on why/how things are supposed to work and can scrutinize your implementaiton accordingly ;) > template<class T> > class FrameRingBuffer { > //BIRNET_PRIVATE_COPY (FrameRingBuffer); what's this comment for? > private: > typedef typename vector<T>::iterator BufferIterator; > > vector<T> m_buffer; > volatile int m_read_frame_pos; > volatile int m_write_frame_pos; > guint m_elements_per_frame; > > void > write_memory_barrier() > { > static volatile int dummy = 0; > > /* > * writing this dummy integer should ensure that all prior writes > * are committed to memory > */ > Atomic::int_set (&dummy, 0x12345678); > } > public: > FrameRingBuffer (guint n_frames = 0, > guint elements_per_frame = 1) > { > resize (n_frames, elements_per_frame); > } > /** > * Checks available read space in the ringbuffer. s/Checks/Check/ > * This function should be called from the consumer thread. s/should/may only/ since i suppose the consumer thread doesn't _have_ to call it ;) (same for all other functions) > * > * @returns the number of frames that are available for reading hm, conventionally we put argument/return value blurbs before the actual funciton doc paragraph. > */ > guint > get_readable_frames() > { > int wpos = Atomic::int_get (&m_write_frame_pos); > int rpos = Atomic::int_get (&m_read_frame_pos); > int size = m_buffer.size() / m_elements_per_frame; > > if (wpos < rpos) /* wpos == rpos -> empty ringbuffer */ > wpos += size; > > return wpos - rpos; > } > /** > * Reads data from the ringbuffer; if there is not enough data > * in the ringbuffer, the function will not block. > * This function should be called from the consumer thread. > * > * @returns the number of successfully read frames > */ > guint > read (guint n_frames, > T *frames) > { > int rpos = Atomic::int_get (&m_read_frame_pos); > guint size = m_buffer.size() / m_elements_per_frame; > guint can_read = min (get_readable_frames(), n_frames); > > BufferIterator start = m_buffer.begin() + rpos * m_elements_per_frame; > guint read1 = min (can_read, size - rpos) * m_elements_per_frame; > copy (start, start + read1, frames); > > guint read2 = can_read * m_elements_per_frame - read1; > copy (m_buffer.begin(), m_buffer.begin() + read2, frames + read1); > > Atomic::int_set (&m_read_frame_pos, (rpos + can_read) % size); > return can_read; > } > /** > * Checks available write space in the ringbuffer. > * This function should be called from the producer thread. > * > * @returns the number of frames that can be written > */ > guint > get_writable_frames() > { > int wpos = Atomic::int_get (&m_write_frame_pos); > int rpos = Atomic::int_get (&m_read_frame_pos); > guint size = m_buffer.size() / m_elements_per_frame; > > if (rpos <= wpos) /* wpos == rpos -> empty ringbuffer */ > rpos += size; > > // the extra element allows us to see the difference between an empty/full ringbuffer s/element/frame/ right? > return rpos - wpos - 1; > } > /** > * Writes data to the ringbuffer; if there is not enough data s/enough data/enough free space/ if i understand correctly. > * in the ringbuffer, the function will not block. hm, it be better to say what the function does rather than saying "it will not block", i.e. what it not does. afaiu, this should be: * Write data to the ringbuffer; if there is not enough free space * in the ringbuffer, the function will return with the amount of * frames consumed by a partial write. > * This function should be called from the producer thread. > * > * @returns the number of successfully written frames > */ > guint > write (guint n_frames, > const T *frames) > { > int wpos = Atomic::int_get (&m_write_frame_pos); > guint size = m_buffer.size() / m_elements_per_frame; > guint can_write = min (get_writable_frames(), n_frames); > > BufferIterator start = m_buffer.begin() + wpos * m_elements_per_frame; > guint write1 = min (can_write, size - wpos) * m_elements_per_frame; > copy (frames, frames + write1, start); > > guint write2 = can_write * m_elements_per_frame - write1; > copy (frames + write1, frames + write1 + write2, m_buffer.begin()); > > // It is important that the data from the previous writes get committed > // to memory *before* the index variable is updated. Otherwise, the > // consumer thread could be reading invalid data, if the index variable > // got written before the rest of the data (when unordered writes are > // performed). > write_memory_barrier(); > > Atomic::int_set (&m_write_frame_pos, (wpos + can_write) % size); > return can_write; > } > /** > * Returns the maximum number of frames that the ringbuffer can contain. > * > * This function can be called from any thread. > */ > guint > size() const > { > // the extra element allows us to see the difference between an empty/full ringbuffer > return (m_buffer.size() - 1) / m_elements_per_frame; > } ugh this is mighty confusing. you should get rid of this function or at least clearly rename it. seeing it, i had to go back and read your code all over to figure whether you used size() in every place or m_buffer.size(). if you need a name, call it total_n_frames(), allthough it seems this function isn't used/needed at all. > /** > * Clears the ringbuffer. s/Clears/Clear/, applies to lots of other comments as well. > * > * This function is @emph{not} threadsafe, and can not be used while well, depends on what you mean by "threadsafe". i'd simply leave that out and just write: * Clear the ringbuffer. * This function may not be used while * either the producer thread or the consumer thread are modifying * the ringbuffer. (also fixed "can" vs. "may") > * either the producer thread or the consumer thread are modifying > * the ringbuffer. > */ > void > clear() > { > Atomic::int_set (&m_read_frame_pos, 0); > Atomic::int_set (&m_write_frame_pos, 0); > } is this really needed outside of resize() though? > /** > * Resizes and clears the ringbuffer. > * > * This function is @emph{not} threadsafe, and can not be used while comment from above applies as well. > * either the producer thread or the consumer thread are modifying > * the ringbuffer. > */ > void > resize (guint n_frames, > guint elements_per_frame = 1) > { > m_elements_per_frame = elements_per_frame; > // the extra element allows us to see the difference between an empty/full ringbuffer s/element/frame/ to stick to your terminology > m_buffer.resize ((n_frames + 1) * m_elements_per_frame); > clear(); looks to me like you could simply inline resetting of the indices here. > } > }; hm, don't you want review on the rest of the driver as well? (that'd prolly also be the part that is more interesting for Paul) > > Cu... Stefan --- ciaoTJ _______________________________________________ beast mailing list beast@... http://mail.gnome.org/mailman/listinfo/beast |
|
|
Beast Jack driver (Re: Lockfree ringbuffer review) Hi!
On Thu, Jun 01, 2006 at 05:14:15PM +0200, Tim Janik wrote: > On Wed, 31 May 2006, Stefan Westerfeld wrote: > >On Tue, May 30, 2006 at 05:13:10PM +0200, Tim Janik wrote: > >Well, I thought of it, but copy can copy anything, be it bytes or > >std::strings. The ringbuffer is a template (so you can instantiate it > >with floats or bytes or anything else), so just plugging bseblockutils.h > >won't work. > > i really think using a template here is overkill. it's highly unlikely > that we'll use this ringbuffer with ints or doubles. and using floats > allowes optimizations that currently are not possible. > > [...] > > >We could use template specialization to > >enforce that if you're instantiating a float ringbuffer, then > >bseblockutils will be used. But it wouldn't make things faster. > > well, what's the use case for a non-float ringbuffer? I've just read my way through the new jack midi API, and we'll need a ring buffer to get the midi events from the realtime thread, where they will be delivered by jack, and our midi thread. So yes, there is a use case, which will probably have unsigned char as data type, or even something like struct MidiEvent. > >The benchmarks on my AMD64 suggest that replacing memcpy in > >bseblockutils.h by SSE instructions will not have a positive effect on > >performance. I don't know about the other Intel/AMD systems though. > > that is unrelated to the ring buffer. if SIMD copying was slower than > non-SIMD copying, this has to be fixed as part of an optimization of > the blockutils implementation, not by the ring buffer (or by the ring > buffer not using blockutils). Ok, ok... new version uses template specialization, so we have fast_copy now, which will use Bse::Block::copy if the data type permits doing so. > >Finally, if we were to optimize the JACK driver for maximum performance, > >a more important factor would probably be that the ringbuffer even has > >this API. > > sorry? what API are you talking about? > void bse_block_copy_float (guint n_values, > gfloat *values, > const gfloat *ivalues); > copying? > > >It means that we need to deinterleave and reinterleave the > >data JACK supplies on the stack, and then read/write it to the > >ringbuffer. If the ringbuffer would offer an API to access its memory > >directly, the extra copy would disappear. But it makes the code harder > >to maintain and for now my first priority has been getting things > >implemented correctly. > > i don't quite understand this point. > are you saying that replacing memcpy() by bse_block_copy_float() can't > be done due to interleaving? :-O No. It can be replaced, and the new version has it. However, our data flow looks like this: input jack thread process() callback supplies N separate buffers | V interleave on the stack (manually implemented, in process()) *jack-thread* | V copy into the input ringbuffer (input ringbuffer write) *jack-thread* | V copy in bse engine supplied buffer (input ringbuffer read) *engine-thread* | V [ beast processing ] *engine-thread* | V copy from bse engine supplied buffer (output ringbuffer write) *engine-thread* | V copy onto stack buffer (output ringbuffer read) *jack-thread* | V deinterleave into seperate buffers (manually) *jack-thread* | V end of jack thread process() callback output What I was pointing out to is that if the ringbuffer provided an API to access the data which is stored in the ringbuffer directly, then we could avoid some copies (if you wonder how such an API would look, read http://www.portaudio.com/trac/browser/portaudio/branches/v19-devel/src/hostapi/coreaudio/ringbuffer.c and look at RingBuffer_GetWriteRegions). However if we get everything we want - with the number of copies we're making right now - I am perfectly happy. Every sensible bse network will eat much more cycles than we can possibly spend in those extra Bse::Block::copy() calls. > >namespace { > > > >/** > >* The FrameRingBuffer class implements a ringbuffer for the communication > >* between two threads. One thread - the producer thread - may only write > >* data to the ringbuffer. The other thread - the consumer thread - may > >* only read data from the ringbuffer. > >* > >* Given that these two threads only use the appropriate functions, no > >* other synchronization is required to ensure that the data gets safely > >* from the producer thread to the consumer thread. However, all operations > >* that are provided by the ringbuffer are non-blocking, so that you may > >* need a condition or other synchronization primitive if you want the > >* producer and/or consumer to block if the ringbuffer is full/empty. > > s/if/on the event that/ ? I think not (but I am no native speaker). What I want to say is this: Suppose the ringbuffer is empty. The consumer will, when calling the read function, just read 0 frames. A common scenario is that the consumer will want to wait until data is available. To implement this, the consumer will need to use a synchonization mechanism (i.e. condition). > >* > >* Implementation: the synchronization between the two threads is only > >* implemented by two index variables (read_frame_pos and write_frame_pos) > >* for which atomic integer reads and writes are required. Since the > >* producer thread only modifies the write_frame_pos and the consumer thread > >* only modifies the read_frame_pos, no compare-and-swap or similar > >* operations are needed to avoid concurrent writes. > >*/ > > good comment btw, i now have an idea on why/how things are supposed > to work and can scrutinize your implementaiton accordingly ;) > > >template<class T> > >class FrameRingBuffer { > > //BIRNET_PRIVATE_COPY (FrameRingBuffer); > > what's this comment for? I wanted to call BIRNET_PRIVATE_COPY, but it didn't work, so I put // marks there, so it compiled. So can I use BIRNET_PRIVATE_COPY or do I need to put the code BIRNET_PRIVATE_COPY would put there if it worked here manually? > > // the extra element allows us to see the difference between an > > empty/full ringbuffer > > s/element/frame/ right? Right. > > /** > > * Writes data to the ringbuffer; if there is not enough data > > s/enough data/enough free space/ if i understand correctly. > > > * in the ringbuffer, the function will not block. > > hm, it be better to say what the function does rather than saying > "it will not block", i.e. what it not does. > afaiu, this should be: > > * Write data to the ringbuffer; if there is not enough free space > * in the ringbuffer, the function will return with the amount of > * frames consumed by a partial write. Well, it's similar now: * Write data to the ringbuffer; if there is not enough free space * in the ringbuffer, the function will return the amount of frames * consumed by a partial write (without blocking). I wanted to keep the remark that it doesn't block there. I also adapted the read description in a similar way: * Read data from the ringbuffer; if there is not enough data * in the ringbuffer, the function will return the number of frames * that could be read without blocking. > > /** > > * Returns the maximum number of frames that the ringbuffer can contain. > > * > > * This function can be called from any thread. > > */ > > guint > > size() const > > { > > // the extra element allows us to see the difference between an > > empty/full ringbuffer > > return (m_buffer.size() - 1) / m_elements_per_frame; > > } > > ugh this is mighty confusing. > you should get rid of this function or at least clearly rename it. > seeing it, i had to go back and read your code all over to figure whether > you > used size() in every place or m_buffer.size(). > if you need a name, call it total_n_frames(), allthough it seems this > function > isn't used/needed at all. Well, its not used for internal purposes. But its provided for the code which uses the ringbuffer. Basically if I am implementing C++ code for some data structure that has a size (as the ringbuffer does), then I should also provide an accessor that can be used to determine the size. Just as std::vector has a resize() and size() function. > > /** > > * Clears the ringbuffer. > > s/Clears/Clear/, applies to lots of other comments as well. > > > * > > * This function is @emph{not} threadsafe, and can not be used while > > well, depends on what you mean by "threadsafe". i'd simply leave that > out and just write: > > * Clear the ringbuffer. > * This function may not be used while > * either the producer thread or the consumer thread are modifying > * the ringbuffer. > > (also fixed "can" vs. "may") Oh yes, your comment sounds better. Definitely. Fixed. > > void > > clear() > > { > > Atomic::int_set (&m_read_frame_pos, 0); > > Atomic::int_set (&m_write_frame_pos, 0); > > } > > is this really needed outside of resize() though? The jack driver needs it (to get into a sane state after a dropout). > > m_buffer.resize ((n_frames + 1) * m_elements_per_frame); > > clear(); > > looks to me like you could simply inline resetting of the > indices here. If there is a function anyway, I might as well use it. > hm, don't you want review on the rest of the driver as well? > (that'd prolly also be the part that is more interesting for Paul) Definitely. So here is an updated version of the ringbuffer and the whole driver source. I think the driver already works pretty well, its just that there are quite a few FIXMEs in there. As I am only posting the C++ source to the list, if anyone is so brave and wants to compile the driver (you need beast CVS for this), I've put the whole driver to: http://space.twc.de/~stefan/download/20060601-preview-bse-jack-0.7.0.tar.gz (sorry that it extracts to the wrong directory, but I just used make dist...). Anyway, here is the C++ source (the interesting part, for the review): /* BSE - Bedevilled Sound Engine * Copyright (C) 2004 Tim Janik * Copyright (C) 2006 Stefan Westerfeld * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2 of the License, or (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General * Public License along with this program; if not, write to the * Free Software Foundation, Inc., 59 Temple Place, Suite 330, * Boston, MA 02111-1307, USA. */ #include "configure.h" #include "bsepcmdevice-jack.hh" #include <bse/bseblockutils.hh> #include <bse/gsldatautils.h> #include <jack/jack.h> #include <string.h> #include <errno.h> #include <signal.h> #include <set> #include <string> #include <map> #include <vector> using namespace Birnet; using std::vector; using std::set; using std::map; using std::string; using std::min; using std::max; using std::copy; using Bse::Block; namespace { /** * This function uses std::copy to copy the n_values of data from ivalues * to ovalues. If a specialized version is available in bseblockutils, * then this - usually faster - version will be used. * * The data in ivalues and ovalues may not overlap. */ template<class Data> void fast_copy (guint n_values, Data *ovalues, const Data *ivalues) { copy (ivalues, ivalues + n_values, ovalues); } template<> void fast_copy (guint n_values, float *ovalues, const float *ivalues) { Block::copy (n_values, ovalues, ivalues); } template<> void fast_copy (guint n_values, guint32 *ovalues, const guint32 *ivalues) { Block::copy (n_values, ovalues, ivalues); } /** * The FrameRingBuffer class implements a ringbuffer for the communication * between two threads. One thread - the producer thread - may only write * data to the ringbuffer. The other thread - the consumer thread - may * only read data from the ringbuffer. * * Given that these two threads only use the appropriate functions, no * other synchronization is required to ensure that the data gets safely * from the producer thread to the consumer thread. However, all operations * that are provided by the ringbuffer are non-blocking, so that you may * need a condition or other synchronization primitive if you want the * producer and/or consumer to block if the ringbuffer is full/empty. * * Implementation: the synchronization between the two threads is only * implemented by two index variables (read_frame_pos and write_frame_pos) * for which atomic integer reads and writes are required. Since the * producer thread only modifies the write_frame_pos and the consumer thread * only modifies the read_frame_pos, no compare-and-swap or similar * operations are needed to avoid concurrent writes. */ template<class T> class FrameRingBuffer { //BIRNET_PRIVATE_COPY (FrameRingBuffer); private: vector<T> m_buffer; volatile int m_read_frame_pos; volatile int m_write_frame_pos; guint m_elements_per_frame; void write_memory_barrier() { static volatile int dummy = 0; /* * writing this dummy integer should ensure that all prior writes * are committed to memory */ Atomic::int_set (&dummy, 0x12345678); } public: FrameRingBuffer (guint n_frames = 0, guint elements_per_frame = 1) { resize (n_frames, elements_per_frame); } /** * @returns the number of frames that are available for reading * * Check available read space in the ringbuffer. * This function may only be called from the consumer thread. */ guint get_readable_frames() { int wpos = Atomic::int_get (&m_write_frame_pos); int rpos = Atomic::int_get (&m_read_frame_pos); int size = m_buffer.size() / m_elements_per_frame; if (wpos < rpos) /* wpos == rpos -> empty ringbuffer */ wpos += size; return wpos - rpos; } /** * @returns the number of successfully read frames * * Read data from the ringbuffer; if there is not enough data * in the ringbuffer, the function will return the number of frames * that could be read without blocking. * * This function should be called from the consumer thread. */ guint read (guint n_frames, T *frames) { int rpos = Atomic::int_get (&m_read_frame_pos); guint size = m_buffer.size() / m_elements_per_frame; guint can_read = min (get_readable_frames(), n_frames); guint read1 = min (can_read, size - rpos) * m_elements_per_frame; fast_copy (read1, frames, &m_buffer[rpos * m_elements_per_frame]); guint read2 = can_read * m_elements_per_frame - read1; fast_copy (read2, frames + read1, &m_buffer[0]); Atomic::int_set (&m_read_frame_pos, (rpos + can_read) % size); return can_read; } /** * @returns the number of frames that can be written * * Check available write space in the ringbuffer. * This function should be called from the producer thread. */ guint get_writable_frames() { int wpos = Atomic::int_get (&m_write_frame_pos); int rpos = Atomic::int_get (&m_read_frame_pos); guint size = m_buffer.size() / m_elements_per_frame; if (rpos <= wpos) /* wpos == rpos -> empty ringbuffer */ rpos += size; // the extra frame allows us to see the difference between an empty/full ringbuffer return rpos - wpos - 1; } /** * @returns the number of successfully written frames * * Write data to the ringbuffer; if there is not enough free space * in the ringbuffer, the function will return the amount of frames * consumed by a partial write (without blocking). * * This function may only be called from the producer thread. */ guint write (guint n_frames, const T *frames) { int wpos = Atomic::int_get (&m_write_frame_pos); guint size = m_buffer.size() / m_elements_per_frame; guint can_write = min (get_writable_frames(), n_frames); guint write1 = min (can_write, size - wpos) * m_elements_per_frame; fast_copy (write1, &m_buffer[wpos * m_elements_per_frame], frames); guint write2 = can_write * m_elements_per_frame - write1; fast_copy (write2, &m_buffer[0], frames + write1); // It is important that the data from the previous writes get committed // to memory *before* the index variable is updated. Otherwise, the // consumer thread could be reading invalid data, if the index variable // got written before the rest of the data (when unordered writes are // performed). write_memory_barrier(); Atomic::int_set (&m_write_frame_pos, (wpos + can_write) % size); return can_write; } /** * @returns the maximum number of frames that the ringbuffer can contain * * This function can be called from any thread. */ guint get_total_n_frames() const { // the extra frame allows us to see the difference between an empty/full ringbuffer return (m_buffer.size() - 1) / m_elements_per_frame; } /** * @returns the number of elements that are part of one frame * * This function can be called from any thread. */ guint get_elements_per_frame() const { return m_elements_per_frame; } /** * Clear the ringbuffer. * * This function may not be used while either the producer thread or * the consumer thread are modifying the ringbuffer. */ void clear() { Atomic::int_set (&m_read_frame_pos, 0); Atomic::int_set (&m_write_frame_pos, 0); } /** * Resize and clear the ringbuffer. * * This function may not be used while either the producer thread or * the consumer thread are modifying the ringbuffer. */ void resize (guint n_frames, guint elements_per_frame = 1) { m_elements_per_frame = elements_per_frame; // the extra frame allows us to see the difference between an empty/full ringbuffer m_buffer.resize ((n_frames + 1) * m_elements_per_frame); clear(); } }; static BIRNET_MSG_TYPE_DEFINE (debug_pcm, "pcm", BIRNET_MSG_DEBUG, NULL); #define DEBUG(...) sfi_debug (debug_pcm, __VA_ARGS__) /* --- JACK PCM handle --- */ enum { CALLBACK_STATE_INACTIVE = 0, CALLBACK_STATE_ACTIVE = 1, CALLBACK_STATE_PLEASE_TERMINATE = 2, CALLBACK_STATE_TERMINATED = 3 }; struct JackPcmHandle { BsePcmHandle handle; JackPcmHandle (jack_client_t *jack_client) : jack_client (jack_client) { memset (&handle, 0, sizeof (handle)); callback_state = ixruns = pixruns = oxruns = poxruns = 0; is_down = false; } vector<jack_port_t *> input_ports; vector<jack_port_t *> output_ports; guint buffer_frames; /* input/output ringbuffer size in frames */ Cond cond; Mutex cond_mutex; jack_client_t *jack_client; FrameRingBuffer<float> input_ringbuffer; FrameRingBuffer<float> output_ringbuffer; int callback_state; int ixruns; int oxruns; int pixruns; int poxruns; bool is_down; }; } /* --- prototypes --- */ static void bse_pcm_device_jack_class_init (BsePcmDeviceJACKClass *klass); static void bse_pcm_device_jack_init (BsePcmDeviceJACK *self); static gsize jack_device_read (BsePcmHandle *handle, gfloat *values); static void jack_device_write (BsePcmHandle *handle, const gfloat *values); static gboolean jack_device_check_io (BsePcmHandle *handle, glong *tiumeoutp); static guint jack_device_latency (BsePcmHandle *handle); static void jack_device_retrigger (JackPcmHandle *jack); /* --- define object type and export to BSE --- */ static const char type_blurb[] = ("PCM driver implementation for JACK Audio Connection Kit " "(http://jackaudio.org)"); BSE_REGISTER_OBJECT (BsePcmDeviceJACK, BsePcmDevice, NULL, "", type_blurb, NULL, bse_pcm_device_jack_class_init, NULL, bse_pcm_device_jack_init); BSE_DEFINE_EXPORTS (__FILE__); /* --- variables --- */ static gpointer parent_class = NULL; /* --- functions --- */ static void bse_pcm_device_jack_init (BsePcmDeviceJACK *self) { /* FIXME: move signal handling somewhere else */ sigset_t signal_mask; sigemptyset (&signal_mask); sigaddset (&signal_mask, SIGPIPE); int rc = pthread_sigmask (SIG_BLOCK, &signal_mask, NULL); if (rc != 0) g_printerr ("jack driver: pthread_sigmask for SIGPIPE failed: %s\n", strerror (errno)); jack_status_t status; self->jack_client = jack_client_open ("beast", JackNoStartServer, &status); /* FIXME: translations(?) */ DEBUG ("attaching to JACK server returned status: %d\n", status); /* FIXME: check status for proper error reporting * FIXME: what about server name? (necessary if more than one jackd instance is running) * FIXME: get rid of device unloading, so that the jackd connection can be kept initialized * * maybe move this to a different function */ } namespace { struct DeviceDetails { guint ports, input_ports, output_ports, physical_ports, terminal_ports; vector<string> input_port_names; vector<string> output_port_names; DeviceDetails() : ports (0), input_ports (0), output_ports (0), physical_ports (0), terminal_ports (0) { } }; } static map<string, DeviceDetails> query_jack_devices (BsePcmDeviceJACK *self) { map<string, DeviceDetails> devices; /* FIXME: we need to make a difference here between * - jackd is running, but no output ports can be found * - jackd is not running */ const char **jack_ports = self->jack_client ? jack_get_ports (self->jack_client, NULL, NULL, 0) : NULL; if (jack_ports) { for (guint i = 0; jack_ports[i]; i++) { const char *end = strchr (jack_ports[i], ':'); if (end) { string device_name (jack_ports[i], end); DeviceDetails &details = devices[device_name]; details.ports++; jack_port_t *jack_port = jack_port_by_name (self->jack_client, jack_ports[i]); if (jack_port) { int flags = jack_port_flags (jack_port); if (flags & JackPortIsInput) { details.input_ports++; details.input_port_names.push_back (jack_ports[i]); } if (flags & JackPortIsOutput) { details.output_ports++; details.output_port_names.push_back (jack_ports[i]); } if (flags & JackPortIsPhysical) details.physical_ports++; if (flags & JackPortIsTerminal) details.terminal_ports++; } } } free (jack_ports); } return devices; } static SfiRing* bse_pcm_device_jack_list_devices (BseDevice *device) { BsePcmDeviceJACK *self = BSE_PCM_DEVICE_JACK (device); map<string, DeviceDetails> devices = query_jack_devices (self); SfiRing *ring = NULL; for (map<string, DeviceDetails>::iterator di = devices.begin(); di != devices.end(); di++) { const string &device_name = di->first; DeviceDetails &details = di->second; BseDeviceEntry *entry; entry = bse_device_group_entry_new (device, g_strdup (device_name.c_str()), g_strdup ("JACK Devices"), g_strdup_printf ("%s (%d*input %d*output)%s", device_name.c_str(), details.input_ports, details.output_ports, details.physical_ports == details.ports ? " [ hardware ]" : "")); /* ensure that alsa_pcm always is the first item listed (it is the default device) */ if (device_name == "alsa_pcm") ring = sfi_ring_prepend (ring, entry); else ring = sfi_ring_append (ring, entry); } if (!ring) ring = sfi_ring_append (ring, bse_device_error_new (device, g_strdup_printf ("No devices found"))); return ring; } static void jack_shutdown_callback (void *jack_handle_ptr) { JackPcmHandle *jack = static_cast <JackPcmHandle *> (jack_handle_ptr); AutoLocker cond_locker (jack->cond_mutex); jack->is_down = true; jack->cond.signal(); } static int jack_process_callback (jack_nframes_t n_frames, void *jack_handle_ptr) { typedef jack_default_audio_sample_t sample_t; JackPcmHandle *jack = static_cast <JackPcmHandle *> (jack_handle_ptr); int callback_state = Atomic::int_get (&jack->callback_state); /* still waiting for initialization to complete */ if (callback_state == CALLBACK_STATE_ACTIVE) { bool have_cond_lock = jack->cond_mutex.trylock(); guint n_channels = jack->handle.n_channels; float buffer[n_frames * n_channels]; if (jack->handle.readable) { /* interleave input data for processing in the engine thread */ g_assert (jack->input_ports.size() == n_channels); for (guint ch = 0; ch < n_channels; ch++) { sample_t *values = (sample_t *) jack_port_get_buffer (jack->input_ports[ch], n_frames); float *p = &buffer[ch]; for (guint i = 0; i < n_frames; i++) { *p = *values++; p += n_channels; } } guint frames_written = jack->input_ringbuffer.write (n_frames, buffer); if (frames_written != n_frames) Atomic::int_add (&jack->ixruns, 1); /* input underrun detected */ } if (jack->handle.writable) { guint read_frames = jack->output_ringbuffer.read (n_frames, buffer); if (read_frames != n_frames) { Atomic::int_add (&jack->oxruns, 1); /* output underrun detected */ Block::fill ((n_frames - read_frames) * n_channels, &buffer[read_frames * n_channels], 0.0); } /* deinterleave output data for processing in the engine thread */ g_assert (jack->output_ports.size() == n_channels); for (guint ch = 0; ch < n_channels; ch++) { sample_t *values = (sample_t *) jack_port_get_buffer (jack->output_ports[ch], n_frames); sample_t *values_end = values + n_frames; float *p = &buffer[ch]; while (values < values_end) { *values++ = *p; p += n_channels; } } } /* * as we can't (always) lock our data structures from the jack realtime * thread, using a condition introduces the possibility for a race: * * normal operation: * * [bse thread] writes some data to output_ringbuffer * [bse thread] checks remaining space, there is no room left * [bse thread] sleeps on the condition * [jack thread] reads some data from output_ringbuffer * [jack thread] signals the condition * [bse thread] continues to write some data to output_ringbuffer * * race condition: * * [bse thread] writes some data to output_ringbuffer * [bse thread] checks remaining space, there is no room left * [jack thread] reads some data from output_ringbuffer * [jack thread] signals the condition * [bse thread] sleeps on the condition * * since the jack callback gets called periodically, the bse thread will * never starve, though, in the worst case it will just miss a frame * * so we absolutely exclude the possibility for priority inversion (by * using trylock instead of lock), by introducing extra latency in * some extremely rare cases */ if (have_cond_lock) jack->cond_mutex.unlock(); } else { for (guint ch = 0; ch < jack->input_ports.size(); ch++) { sample_t *values = (sample_t *) jack_port_get_buffer (jack->input_ports[ch], n_frames); Block::fill (n_frames, values, 0.0); } for (guint ch = 0; ch < jack->output_ports.size(); ch++) { sample_t *values = (sample_t *) jack_port_get_buffer (jack->output_ports[ch], n_frames); Block::fill (n_frames, values, 0.0); } } jack->cond.signal(); if (callback_state == CALLBACK_STATE_PLEASE_TERMINATE) { Atomic::int_set (&jack->callback_state, CALLBACK_STATE_TERMINATED); return 1; } return 0; } static void terminate_and_free_jack (JackPcmHandle *jack) { g_return_if_fail (jack->jack_client != NULL); Atomic::int_set (&jack->callback_state, CALLBACK_STATE_PLEASE_TERMINATE); while (Atomic::int_get (&jack->callback_state) == CALLBACK_STATE_PLEASE_TERMINATE) { AutoLocker cond_locker (jack->cond_mutex); if (jack->is_down) /* if jack is already gone, then forget it */ Atomic::int_set (&jack->callback_state, CALLBACK_STATE_TERMINATED); else jack->cond.wait_timed (jack->cond_mutex, 100000); } for (guint ch = 0; ch < jack->input_ports.size(); ch++) jack_port_disconnect (jack->jack_client, jack->input_ports[ch]); for (guint ch = 0; ch < jack->output_ports.size(); ch++) jack_port_disconnect (jack->jack_client, jack->output_ports[ch]); jack_deactivate (jack->jack_client); delete jack; } static BseErrorType bse_pcm_device_jack_open (BseDevice *device, gboolean require_readable, gboolean require_writable, guint n_args, const gchar **args) { BsePcmDeviceJACK *self = BSE_PCM_DEVICE_JACK (device); if (!self->jack_client) return BSE_ERROR_FILE_OPEN_FAILED; JackPcmHandle *jack = new JackPcmHandle (self->jack_client); BsePcmHandle *handle = &jack->handle; handle->readable = require_readable; handle->writable = require_writable; handle->n_channels = BSE_PCM_DEVICE (device)->req_n_channels; /* try setup */ BseErrorType error = BSE_ERROR_NONE; const char *dname = (n_args == 1) ? args[0] : "alsa_pcm"; handle->mix_freq = jack_get_sample_rate (self->jack_client); Atomic::int_set (&jack->callback_state, CALLBACK_STATE_INACTIVE); for (guint i = 0; i < handle->n_channels; i++) { const int port_name_size = jack_port_name_size(); char port_name[port_name_size]; jack_port_t *port; snprintf (port_name, port_name_size, "in_%u", i); port = jack_port_register (self->jack_client, port_name, JACK_DEFAULT_AUDIO_TYPE, JackPortIsInput, 0); if (port) jack->input_ports.push_back (port); else error = BSE_ERROR_FILE_OPEN_FAILED; snprintf (port_name, port_name_size, "out_%u", i); port = jack_port_register (self->jack_client, port_name, JACK_DEFAULT_AUDIO_TYPE, JackPortIsOutput, 0); if (port) jack->output_ports.push_back (port); else error = BSE_ERROR_FILE_OPEN_FAILED; } /* activate */ jack_set_process_callback (self->jack_client, jack_process_callback, jack); jack_on_shutdown (self->jack_client, jack_shutdown_callback, jack); jack_activate (self->jack_client); /* connect ports */ map<string, DeviceDetails> devices = query_jack_devices (self); map<string, DeviceDetails>::const_iterator di; di = devices.find (dname); if (di != devices.end()) { const DeviceDetails &details = di->second; for (guint ch = 0; ch < handle->n_channels; ch++) { if (details.output_ports > ch) jack_connect (self->jack_client, details.output_port_names[ch].c_str(), jack_port_name (jack->input_ports[ch])); if (details.input_ports > ch) jack_connect (self->jack_client, jack_port_name (jack->output_ports[ch]), details.input_port_names[ch].c_str()); } } /* setup buffer size */ // keep at least two jack callback sizes for dropout free audio guint min_buffer_frames = jack_get_buffer_size (self->jack_client) * 2; // keep an extra engine buffer size (this compensates also for cases where the // engine buffer size is not a 2^N value, which would otherwise cause the // buffer never to be fully filled with 2 periods of data) min_buffer_frames += BSE_PCM_DEVICE (device)->req_block_length; // honor the user defined latency specification // // FIXME: should we try to tweak local buffering so that it corresponds // to the user defined latency, or global buffering (including the jack // buffer)? we do the former, since it is easier guint user_buffer_frames = BSE_PCM_DEVICE (device)->req_latency_ms * handle->mix_freq / 1000; guint buffer_frames = max (min_buffer_frames, user_buffer_frames); jack->input_ringbuffer.resize (buffer_frames, handle->n_channels); jack->output_ringbuffer.resize (buffer_frames, handle->n_channels); jack->buffer_frames = jack->output_ringbuffer.get_writable_frames(); // the ringbuffer should be exactly as big as requested g_assert (jack->buffer_frames == buffer_frames); DEBUG ("ringbuffer size = %.3fms", jack->buffer_frames / double (handle->mix_freq) * 1000); /* setup PCM handle or shutdown */ if (!error) { bse_device_set_opened (device, dname, handle->readable, handle->writable); if (handle->readable) handle->read = jack_device_read; if (handle->writable) handle->write = jack_device_write; handle->check_io = jack_device_check_io; handle->latency = jack_device_latency; BSE_PCM_DEVICE (device)->handle = handle; /* will prevent dropouts at initialization, when no data is there at all */ jack_device_retrigger (jack); jack_device_latency (handle); // debugging only: print latency values } else { terminate_and_free_jack (jack); } DEBUG ("JACK: opening PCM \"%s\" readupble=%d writable=%d: %s", dname, require_readable, require_writable, bse_error_blurb (error)); return error; } static void bse_pcm_device_jack_close (BseDevice *device) { JackPcmHandle *jack = (JackPcmHandle*) BSE_PCM_DEVICE (device)->handle; BSE_PCM_DEVICE (device)->handle = NULL; terminate_and_free_jack (jack); } static void bse_pcm_device_jack_finalize (GObject *object) { BsePcmDeviceJACK *self = BSE_PCM_DEVICE_JACK (object); if (self->jack_client) { jack_client_close (self->jack_client); self->jack_client = NULL; } /* chain parent class' handler */ G_OBJECT_CLASS (parent_class)->finalize (object); } static void jack_device_retrigger (JackPcmHandle *jack) { g_return_if_fail (jack->jack_client != NULL); /* * we need to reset the active flag to false here, as we modify the * buffers in a non threadsafe way; this is why we also wait for * the condition, to ensure that the jack callback really isn't * active any more */ Atomic::int_set (&jack->callback_state, CALLBACK_STATE_INACTIVE); /* usually should not timeout, but be notified by the jack callback */ jack->cond_mutex.lock(); if (!jack->is_down) jack->cond.wait_timed (jack->cond_mutex, 100000); jack->cond_mutex.unlock(); /* jack_ringbuffer_reset is not threadsafe! */ jack->input_ringbuffer.clear(); jack->output_ringbuffer.clear(); /* initialize output ringbuffer with silence */ vector<float> silence (jack->buffer_frames * jack->handle.n_channels); guint frames_written = jack->output_ringbuffer.write (jack->buffer_frames, &silence[0]); g_assert (frames_written == jack->buffer_frames); DEBUG ("jack_device_retrigger: %d frames written", frames_written); } static gboolean jack_device_check_io (BsePcmHandle *handle, glong *timeoutp) { JackPcmHandle *jack = (JackPcmHandle*) handle; g_return_val_if_fail (jack->jack_client != NULL, false); /* int ixruns = Atomic::int_get (&jack->ixruns); int oxruns = Atomic::int_get (&jack->oxruns); if (jack->poxruns != oxruns || jack->pixruns != ixruns) { printf ("beast caused jack input xruns %d, output xruns %d\n", ixruns, oxruns); jack->poxruns = oxruns; jack->pixruns = ixruns; jack_device_retrigger (jack); } */ /* (FIXME?) * * we can not guarantee that beast will read the data from the jack ringbuffer, * since this depends on the presence of a module that actually reads data in * the synthesis graph : so we simply ignore the ixruns variable; I am not sure * if thats the right thing to do, though */ int oxruns = Atomic::int_get (&jack->oxruns); if (jack->poxruns != oxruns) { g_printerr ("%d beast jack driver xruns\n", oxruns); jack->poxruns = oxruns; jack_device_retrigger (jack); } Atomic::int_set (&jack->callback_state, CALLBACK_STATE_ACTIVE); guint n_frames_avail = min (jack->output_ringbuffer.get_writable_frames(), jack->input_ringbuffer.get_readable_frames()); /* check whether data can be processed */ if (n_frames_avail >= handle->block_length) return TRUE; /* need processing */ /* calculate timeout until processing is possible or needed */ guint diff_frames = handle->block_length - n_frames_avail; *timeoutp = diff_frames * 1000 / handle->mix_freq; return FALSE; } static guint jack_device_latency (BsePcmHandle *handle) { JackPcmHandle *jack = (JackPcmHandle*) handle; jack_nframes_t rlatency = 0, wlatency = 0; g_return_val_if_fail (jack->jack_client != NULL, 0); /* FIXME: the API of this function is broken, because you can't use its result * to sync for instance the play position pointer with the screen * * why? because it doesn't return you rlatency and wlatency seperately, but * only the sum of both * * when using jack, there is no guarantee that rlatency and wlatency are equal */ for (guint i = 0; i < jack->input_ports.size(); i++) { jack_nframes_t latency = jack_port_get_total_latency (jack->jack_client, jack->input_ports[i]); rlatency = max (rlatency, latency); } for (guint i = 0; i < jack->output_ports.size(); i++) { jack_nframes_t latency = jack_port_get_total_latency (jack->jack_client, jack->output_ports[i]); wlatency = max (wlatency, latency); } guint total_latency = 2 * jack->buffer_frames + rlatency + wlatency; DEBUG ("rlatency=%.3f ms wlatency=%.3f ms ringbuffer=%.3f ms total_latency=%.3f ms", rlatency / double (handle->mix_freq) * 1000, wlatency / double (handle->mix_freq) * 1000, jack->buffer_frames / double (handle->mix_freq) * 1000, total_latency / double (handle->mix_freq) * 1000); return total_latency; } static gsize jack_device_read (BsePcmHandle *handle, gfloat *values) { JackPcmHandle *jack = (JackPcmHandle*) handle; g_return_val_if_fail (jack->jack_client != NULL, 0); g_return_val_if_fail (Atomic::int_get (&jack->callback_state) == CALLBACK_STATE_ACTIVE, 0); guint frames_left = handle->block_length; while (frames_left) { guint frames_read = jack->input_ringbuffer.read (frames_left, values); frames_left -= frames_read; values += frames_read * handle->n_channels; if (frames_left) { AutoLocker cond_locker (jack->cond_mutex); /* usually should not timeout, but be notified by the jack callback */ if (!jack->input_ringbuffer.get_readable_frames()) jack->cond.wait_timed (jack->cond_mutex, 100000); if (jack->is_down) { /* * FIXME: we need a way to indicate an error here; beast should provide * an adequate reaction in case the JACK server is down (it should stop * playing the file, and show a dialog, if JACK can not be reconnected) * * if we have a way to abort processing, this if can be moved above * the condition wait; however, right now moving it there means that * beast will render the output as fast as possible when jack dies, and * this will look like a machine lockup */ Block::fill (frames_left * handle->n_channels, values, 0.0); /* <- remove me once we can indicate errors */ return handle->block_length * handle->n_channels; } } } return handle->block_length * handle->n_channels; } static void jack_device_write (BsePcmHandle *handle, const gfloat *values) { JackPcmHandle *jack = (JackPcmHandle*) handle; g_return_if_fail (jack->jack_client != NULL); g_return_if_fail (Atomic::int_get (&jack->callback_state) == CALLBACK_STATE_ACTIVE); guint frames_left = handle->block_length; while (frames_left) { guint frames_written = jack->output_ringbuffer.write (frames_left, values); frames_left -= frames_written; values += frames_written * handle->n_channels; if (frames_left) { AutoLocker cond_locker (jack->cond_mutex); /* usually should not timeout, but be notified by the jack callback */ if (!jack->output_ringbuffer.get_writable_frames()) jack->cond.wait_timed (jack->cond_mutex, 100000); if (jack->is_down) { /* * FIXME: we need a way to indicate an error here; beast should provide * an adequate reaction in case the JACK server is down (it should stop * playing the file, and show a dialog, if JACK can not be reconnected) * * if we have a way to abort processing, this if can be moved above * the condition wait; however, right now moving it there means that * beast will render the output as fast as possible when jack dies, and * this will look like a machine lockup */ return; } } } } static void bse_pcm_device_jack_class_init (BsePcmDeviceJACKClass *klass) { GObjectClass *gobject_class = G_OBJECT_CLASS (klass); BseDeviceClass *device_class = BSE_DEVICE_CLASS (klass); parent_class = g_type_class_peek_parent (klass); gobject_class->finalize = bse_pcm_device_jack_finalize; device_class->list_devices = bse_pcm_device_jack_list_devices; const gchar *name = "jack"; const gchar *syntax = _("device"); const gchar *info = g_intern_printf (/* TRANSLATORS: keep this text to 70 chars in width */ _("JACK Audio Connection Kit PCM driver (http://jackaudio.org)\n")); /* Being BSE_RATING_PREFFERED + 1, should do the right thing: if jackd is running, * use it, if jackd isn't running (we'll get an error upon initialization, * then), use the next best driver. * * Right now we will never start the jackd process ourselves. */ bse_device_class_setup (klass, BSE_RATING_PREFERRED + 1, name, syntax, info); device_class->open = bse_pcm_device_jack_open; device_class->close = bse_pcm_device_jack_close; } Cu... Stefan -- Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan _______________________________________________ beast mailing list beast@... http://mail.gnome.org/mailman/listinfo/beast |
|
|
Re: Beast Jack driver (Re: Lockfree ringbuffer review)On Thu, 1 Jun 2006, Stefan Westerfeld wrote:
> On Thu, Jun 01, 2006 at 05:14:15PM +0200, Tim Janik wrote: >> well, what's the use case for a non-float ringbuffer? > > I've just read my way through the new jack midi API, and we'll need a > ring buffer to get the midi events from the realtime thread, where they > will be delivered by jack, and our midi thread. So yes, there is a use > case, which will probably have unsigned char as data type, or even > something like struct MidiEvent. ok, that's interesting indeed. >>> Finally, if we were to optimize the JACK driver for maximum performance, >>> a more important factor would probably be that the ringbuffer even has >>> this API. >> >> sorry? what API are you talking about? >> void bse_block_copy_float (guint n_values, >> gfloat *values, >> const gfloat *ivalues); >> copying? >> >>> It means that we need to deinterleave and reinterleave the >>> data JACK supplies on the stack, and then read/write it to the >>> ringbuffer. If the ringbuffer would offer an API to access its memory >>> directly, the extra copy would disappear. But it makes the code harder >>> to maintain and for now my first priority has been getting things >>> implemented correctly. >> >> i don't quite understand this point. >> are you saying that replacing memcpy() by bse_block_copy_float() can't >> be done due to interleaving? :-O > > No. It can be replaced, and the new version has it. However, our data > flow looks like this: > > input > > jack thread process() callback supplies N separate buffers > | > V > interleave on the stack (manually implemented, in process()) *jack-thread* > | > V > copy into the input ringbuffer (input ringbuffer write) *jack-thread* > | > V > copy in bse engine supplied buffer (input ringbuffer read) *engine-thread* > | > V > [ beast processing ] *engine-thread* note that this is processing in BSE, not beast (the GUI). and BSE does deinterleave its data, process it and then reinterleave because standard audio drivers want that. for jack this doesn't make sense though, so the data flow should instead be: n-jack-buffers -> ringbuffer-write -> || || -> ringbuffer-read -> n-bse-channel-blocks [...] > and look at RingBuffer_GetWriteRegions). However if we get everything we > want - with the number of copies we're making right now - I am perfectly > happy. Every sensible bse network will eat much more cycles than we can > possibly spend in those extra Bse::Block::copy() calls. i'm not happy with this. with every unneccessary copy you're doing in the audio driver, you reduce the number of cycles the engine can spend on audio processing eventhough it needs so much, like you just stated. >>> namespace { >>> >>> /** >>> * The FrameRingBuffer class implements a ringbuffer for the communication >>> * between two threads. One thread - the producer thread - may only write >>> * data to the ringbuffer. The other thread - the consumer thread - may >>> * only read data from the ringbuffer. >>> * >>> * Given that these two threads only use the appropriate functions, no >>> * other synchronization is required to ensure that the data gets safely >>> * from the producer thread to the consumer thread. However, all operations >>> * that are provided by the ringbuffer are non-blocking, so that you may >>> * need a condition or other synchronization primitive if you want the >>> * producer and/or consumer to block if the ringbuffer is full/empty. >> >> s/if/on the event that/ ? > > I think not (but I am no native speaker). What I want to say is this: > Suppose the ringbuffer is empty. The consumer will, when calling the > read function, just read 0 frames. A common scenario is that the > consumer will want to wait until data is available. To implement this, > the consumer will need to use a synchonization mechanism (i.e. > condition). that would need the suggested phrase "on the event that" then. hoever, the scenario you describe is absolutely *not* common in our application of the ringbuffer. the BSE engine does precisely *not* want to wait on audio. it'll just come back later and read or write again. for that, it just needs to be able to figure from the driver how many samples later that would ideally be. >>> * >>> * Implementation: the synchronization between the two threads is only >>> * implemented by two index variables (read_frame_pos and write_frame_pos) >>> * for which atomic integer reads and writes are required. Since the >>> * producer thread only modifies the write_frame_pos and the consumer thread >>> * only modifies the read_frame_pos, no compare-and-swap or similar >>> * operations are needed to avoid concurrent writes. >>> */ >> >> good comment btw, i now have an idea on why/how things are supposed >> to work and can scrutinize your implementaiton accordingly ;) >> >>> template<class T> >>> class FrameRingBuffer { >>> //BIRNET_PRIVATE_COPY (FrameRingBuffer); >> >> what's this comment for? > > I wanted to call BIRNET_PRIVATE_COPY, but it didn't work, so I put // > marks there, so it compiled. So can I use BIRNET_PRIVATE_COPY or do I > need to put the code BIRNET_PRIVATE_COPY would put there if it worked > here manually? i've not seen it "not working" yet. if that is indeed the case, you need to investigate the cause. i can't say give you advice just on the notion of "it didn't work" though... ;) >>> /** >>> * Returns the maximum number of frames that the ringbuffer can contain. >>> * >>> * This function can be called from any thread. >>> */ >>> guint >>> size() const >>> { >>> // the extra element allows us to see the difference between an >>> empty/full ringbuffer >>> return (m_buffer.size() - 1) / m_elements_per_frame; >>> } >> >> ugh this is mighty confusing. >> you should get rid of this function or at least clearly rename it. >> seeing it, i had to go back and read your code all over to figure whether >> you >> used size() in every place or m_buffer.size(). >> if you need a name, call it total_n_frames(), allthough it seems this >> function >> isn't used/needed at all. > > Well, its not used for internal purposes. But its provided for the code > which uses the ringbuffer. Basically if I am implementing C++ code for > some data structure that has a size (as the ringbuffer does), then I > should also provide an accessor that can be used to determine the size. > > Just as std::vector has a resize() and size() function. i think it's still bad because its very confusing in this context. so yes, it should still be renamed. (and no, i don't think the libc or std:: APIs picked good method names in every place; consequently they may very well do you a disservice as API naming reference) >>> void >>> clear() >>> { >>> Atomic::int_set (&m_read_frame_pos, 0); >>> Atomic::int_set (&m_write_frame_pos, 0); >>> } >> >> is this really needed outside of resize() though? > > The jack driver needs it (to get into a sane state after a dropout). how are you avoiding concurrent access when clearing due to a drop out then? >> hm, don't you want review on the rest of the driver as well? >> (that'd prolly also be the part that is more interesting for Paul) > > Definitely. So here is an updated version of the ringbuffer and the > whole driver source. I think the driver already works pretty well, its > just that there are quite a few FIXMEs in there. > > As I am only posting the C++ source to the list, if anyone is so brave > and wants to compile the driver (you need beast CVS for this), I've put > the whole driver to: > > http://space.twc.de/~stefan/download/20060601-preview-bse-jack-0.7.0.tar.gz > > (sorry that it extracts to the wrong directory, but I just used make > dist...). > > Anyway, here is the C++ source (the interesting part, for the review): well, i think this is better handled by another thread. it doesn't help to make this email even longer ;) > Cu... Stefan --- ciaoTJ _______________________________________________ beast mailing list beast@... http://mail.gnome.org/mailman/listinfo/beast |
|
|
bsepcmdevice-jack.cc (Re: Beast Jack driver)On Thu, 1 Jun 2006, Stefan Westerfeld wrote:
> Anyway, here is the C++ source (the interesting part, for the review): > > /* BSE - Bedevilled Sound Engine > * Copyright (C) 2004 Tim Janik > * Copyright (C) 2006 Stefan Westerfeld > * > * This program is free software; you can redistribute it and/or > * modify it under the terms of the GNU Lesser General Public > * License as published by the Free Software Foundation; either > * version 2 of the License, or (at your option) any later version. > * > * This program is distributed in the hope that it will be useful, > * but WITHOUT ANY WARRANTY; without even the implied warranty of > * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > * Lesser General Public License for more details. > * > * You should have received a copy of the GNU Lesser General > * Public License along with this program; if not, write to the > * Free Software Foundation, Inc., 59 Temple Place, Suite 330, > * Boston, MA 02111-1307, USA. > */ > #include "configure.h" > #include "bsepcmdevice-jack.hh" > #include <bse/bseblockutils.hh> > #include <bse/gsldatautils.h> > #include <jack/jack.h> > #include <string.h> > #include <errno.h> > #include <signal.h> > #include <set> > #include <string> > #include <map> > #include <vector> > > using namespace Birnet; > using std::vector; > using std::set; > using std::map; > using std::string; > using std::min; > using std::max; > using std::copy; > using Bse::Block; > > namespace { > > /** > * This function uses std::copy to copy the n_values of data from ivalues > * to ovalues. If a specialized version is available in bseblockutils, > * then this - usually faster - version will be used. well, you could even say "suppsoed to be faster", because if the blockutils version is indeed slower, that's worthy of a bug report. > * > * The data in ivalues and ovalues may not overlap. > */ > template<class Data> void > fast_copy (guint n_values, > Data *ovalues, > const Data *ivalues) > { > copy (ivalues, ivalues + n_values, ovalues); > } > > template<> void > fast_copy (guint n_values, > float *ovalues, > const float *ivalues) > { > Block::copy (n_values, ovalues, ivalues); > } > > template<> void > fast_copy (guint n_values, > guint32 *ovalues, > const guint32 *ivalues) > { > Block::copy (n_values, ovalues, ivalues); > } > > /** > * The FrameRingBuffer class implements a ringbuffer for the communication [ring buffer code that has already been reviewed...] the ring buffer code is probably best split off into an extra tcc file, especially if chances are that you need it in the MIDI driver as well. > static BIRNET_MSG_TYPE_DEFINE (debug_pcm, "pcm", BIRNET_MSG_DEBUG, NULL); > #define DEBUG(...) sfi_debug (debug_pcm, __VA_ARGS__) > > /* --- JACK PCM handle --- */ there should be an explanatory comment in this file on the jack thread and when it enters what states, etc. > enum > { > CALLBACK_STATE_INACTIVE = 0, > CALLBACK_STATE_ACTIVE = 1, > CALLBACK_STATE_PLEASE_TERMINATE = 2, > CALLBACK_STATE_TERMINATED = 3 minor cosmetic, we usually align '='-symbols in enums. > }; you should provide a type name for this enum, which can be used above to tell apart valid states of your callback_state variable. > > struct JackPcmHandle > { > BsePcmHandle handle; > JackPcmHandle (jack_client_t *jack_client) > : jack_client (jack_client) cosmetic, the ':' usually goes next to the ')', i.e.: JackPcmHandle (jack_client_t *jack_client) : jack_client (jack_client) {} > { > memset (&handle, 0, sizeof (handle)); > callback_state = ixruns = pixruns = oxruns = poxruns = 0; > is_down = false; hm, afaik, it's better style to intiialize all these after JackPcmHandle() : > } > vector<jack_port_t *> input_ports; > vector<jack_port_t *> output_ports; > > guint buffer_frames; /* input/output ringbuffer size in frames */ > > Cond cond; > Mutex cond_mutex; erk, what exactly is protected by this cond and this mutex? something like this definitely needs to be outlined in code (in comments or by naming prefixes, look for BirnetMutex in bseengineutils.c to see examples for that) > jack_client_t *jack_client; > FrameRingBuffer<float> input_ringbuffer; > FrameRingBuffer<float> output_ringbuffer; > > int callback_state; ^^^^^^^^^^ proper typing is missing here, you may take a look at bsemidireceiver.cc and search for "enum" to find examples for internally used enums. > int ixruns; > int oxruns; > int pixruns; > int poxruns; > > bool is_down; > }; > > } > > /* --- prototypes --- */ > static void bse_pcm_device_jack_class_init (BsePcmDeviceJACKClass *klass); > static void bse_pcm_device_jack_init (BsePcmDeviceJACK *self); > static gsize jack_device_read (BsePcmHandle *handle, > gfloat *values); > static void jack_device_write (BsePcmHandle *handle, > const gfloat *values); > static gboolean jack_device_check_io (BsePcmHandle *handle, > glong *tiumeoutp); > static guint jack_device_latency (BsePcmHandle *handle); > static void jack_device_retrigger (JackPcmHandle *jack); > > /* --- define object type and export to BSE --- */ > static const char type_blurb[] = ("PCM driver implementation for JACK Audio Connection Kit " > "(http://jackaudio.org)"); > BSE_REGISTER_OBJECT (BsePcmDeviceJACK, BsePcmDevice, NULL, "", type_blurb, NULL, bse_pcm_device_jack_class_init, NULL, bse_pcm_device_jack_init); hm, i guess this could be nicer as BsePcmDeviceJack, espcially for the class name BsePcmDeviceJackClass. allthoguh JACK is often upper cased like the abbreviation BSE, we also lower case it to Bse in namespaces for readability. and the http://jackaudio.org/ also has a few examples of "Jack" ;) > BSE_DEFINE_EXPORTS (__FILE__); > > /* --- variables --- */ > static gpointer parent_class = NULL; hm, i should fix BSE_REGISTER_OBJECT() to act like G_DEFINE_TYPE and provide a parent_class for you. > > /* --- functions --- */ > static void > bse_pcm_device_jack_init (BsePcmDeviceJACK *self) > { > /* FIXME: move signal handling somewhere else */ needs to be moved into birnetthread.c in particular. > sigset_t signal_mask; > sigemptyset (&signal_mask); > sigaddset (&signal_mask, SIGPIPE); > > int rc = pthread_sigmask (SIG_BLOCK, &signal_mask, NULL); > if (rc != 0) > g_printerr ("jack driver: pthread_sigmask for SIGPIPE failed: %s\n", strerror (errno)); this is what you'd usually use g_warning() for, since it's unexpected. > > jack_status_t status; > > self->jack_client = jack_client_open ("beast", JackNoStartServer, &status); /* FIXME: translations(?) */ what's this "beast" being used for? (especially since you quesiton translation) e.g. if it's similar to window titles, it should be "BEAST", if it's going to be used as a program name in continueous text, it should prolly be _("Beast") and if its meant as a technical identifier (e.g. for serialization), it should be "beast". maybe Paul can shed light on this? > DEBUG ("attaching to JACK server returned status: %d\n", status); > /* FIXME: check status for proper error reporting doesn't teh jack API have something like: const char* jack_status_to_string (jack_status_t status); ? > * FIXME: what about server name? (necessary if more than one jackd instance is running) > * FIXME: get rid of device unloading, so that the jackd connection can be kept initialized altering the device unloading will have to be done differently, the attempt to connect to the jack server will have to be made much earlier in that case. (in other words, i don't think you need this FIXME in this place, altering the driver loading + startup logic will be an entirely seperate undertaking) > * > * maybe move this to a different function that can be done once needed. > */ > } > > namespace { > struct DeviceDetails { > guint ports, input_ports, output_ports, physical_ports, terminal_ports; > vector<string> input_port_names; > vector<string> output_port_names; > DeviceDetails() : ports (0), input_ports (0), output_ports (0), physical_ports (0), terminal_ports (0) please put a newline after the ':' > { > } > }; > } > > static map<string, DeviceDetails> > query_jack_devices (BsePcmDeviceJACK *self) > { > map<string, DeviceDetails> devices; > > /* FIXME: we need to make a difference here between > * - jackd is running, but no output ports can be found curious, why exactly could that be the case? > * - jackd is not running that should/will actually be determined by jack_client_open(), right? > */ > const char **jack_ports = self->jack_client ? jack_get_ports (self->jack_client, NULL, NULL, 0) : NULL; > if (jack_ports) > { > for (guint i = 0; jack_ports[i]; i++) > { > const char *end = strchr (jack_ports[i], ':'); > if (end) > { > string device_name (jack_ports[i], end); hm, please don't do "using std::string;" (further above) but instead use std::string or just String which is provided by using Birnet. > DeviceDetails &details = devices[device_name]; > details.ports++; why are you incrementing here already, if the port could possibly not be looked up by name? > > jack_port_t *jack_port = jack_port_by_name (self->jack_client, jack_ports[i]); > if (jack_port) > { > int flags = jack_port_flags (jack_port); > if (flags & JackPortIsInput) > { > details.input_ports++; > details.input_port_names.push_back (jack_ports[i]); > } > if (flags & JackPortIsOutput) > { > details.output_ports++; > details.output_port_names.push_back (jack_ports[i]); > } > if (flags & JackPortIsPhysical) > details.physical_ports++; > if (flags & JackPortIsTerminal) > details.terminal_ports++; why are you counting physical_ports and terminal_ports here, regardless of whether the port in question is an input, output or neither port? and, if this is purely statistical/informative counting, why are you only counting ports that can be looked up by name? /me scratches head... > } > } > } > free (jack_ports); > } > > return devices; ugh, isn't this a full fledged recursive map+strings+DeviceDetails copy? > } > > static SfiRing* > bse_pcm_device_jack_list_devices (BseDevice *device) > { > BsePcmDeviceJACK *self = BSE_PCM_DEVICE_JACK (device); > map<string, DeviceDetails> devices = query_jack_devices (self); > > SfiRing *ring = NULL; > for (map<string, DeviceDetails>::iterator di = devices.begin(); di != devices.end(); di++) > { > const string &device_name = di->first; > DeviceDetails &details = di->second; > BseDeviceEntry *entry; > entry = bse_device_group_entry_new (device, g_strdup (device_name.c_str()), > g_strdup ("JACK Devices"), > g_strdup_printf ("%s (%d*input %d*output)%s", > device_name.c_str(), > details.input_ports, > details.output_ports, > details.physical_ports == details.ports ? > " [ hardware ]" : "")); hm, can you please elaborate the details.physical_ports == details.ports ? " [ hardware ]" : "" condition for me? ;) > > /* ensure that alsa_pcm always is the first item listed (it is the default device) */ hm, doesn't jack provide some kind of rating for the devices to pick? > if (device_name == "alsa_pcm") > ring = sfi_ring_prepend (ring, entry); > else > ring = sfi_ring_append (ring, entry); > } > > if (!ring) > ring = sfi_ring_append (ring, bse_device_error_new (device, g_strdup_printf ("No devices found"))); > return ring; > } > > static void > jack_shutdown_callback (void *jack_handle_ptr) > { > JackPcmHandle *jack = static_cast <JackPcmHandle *> (jack_handle_ptr); > > AutoLocker cond_locker (jack->cond_mutex); > jack->is_down = true; > jack->cond.signal(); > } > > static int this misses a comment on the purpose of the return value. > jack_process_callback (jack_nframes_t n_frames, this, along with most other functions, should have a comment that says which thread executes the function. e.g. like the /* UserThread */ and /* EngineThread */ comments in bsepcmmodule.c (or possibly even more elaborate) > void *jack_handle_ptr) > { > typedef jack_default_audio_sample_t sample_t; we use mixed case typenames, so this should be: typedef jack_default_audio_sample_t Sample; or: typedef jack_default_audio_sample_t SampleType; > > JackPcmHandle *jack = static_cast <JackPcmHandle *> (jack_handle_ptr); > > int callback_state = Atomic::int_get (&jack->callback_state); hm, callback_state is not declared as volatile above... we had this in the ringbuffer already. > > /* still waiting for initialization to complete */ > if (callback_state == CALLBACK_STATE_ACTIVE) > { > bool have_cond_lock = jack->cond_mutex.trylock(); > > guint n_channels = jack->handle.n_channels; > float buffer[n_frames * n_channels]; > > if (jack->handle.readable) > { > /* interleave input data for processing in the engine thread */ > g_assert (jack->input_ports.size() == n_channels); > > for (guint ch = 0; ch < n_channels; ch++) > { > sample_t *values = (sample_t *) jack_port_get_buffer (jack->input_ports[ch], n_frames); > float *p = &buffer[ch]; > for (guint i = 0; i < n_frames; i++) > { > *p = *values++; > p += n_channels; > } > } > > guint frames_written = jack->input_ringbuffer.write (n_frames, buffer); > if (frames_written != n_frames) > Atomic::int_add (&jack->ixruns, 1); /* input underrun detected */ > } > if (jack->handle.writable) > { > guint read_frames = jack->output_ringbuffer.read (n_frames, buffer); > if (read_frames != n_frames) > { > Atomic::int_add (&jack->oxruns, 1); /* output underrun detected */ was declared non-volatile oxruns. > Block::fill ((n_frames - read_frames) * n_channels, &buffer[read_frames * n_channels], 0.0); > } > > /* deinterleave output data for processing in the engine thread */ > g_assert (jack->output_ports.size() == n_channels); > > for (guint ch = 0; ch < n_channels; ch++) > { > sample_t *values = (sample_t *) jack_port_get_buffer (jack->output_ports[ch], n_frames); > sample_t *values_end = values + n_frames; > float *p = &buffer[ch]; > while (values < values_end) > { > *values++ = *p; > p += n_channels; > } > } > } > > /* commanets shouldn't start with an empty line unless its a comment for the documentation system. > * as we can't (always) lock our data structures from the jack realtime > * thread, using a condition introduces the possibility for a race: this type of comment should better be reworded in a general comment on how things are supposed to work and communicate. then it can go before the declaration of BsePcmDeviceJACK. > * > * normal operation: > * > * [bse thread] writes some data to output_ringbuffer > * [bse thread] checks remaining space, there is no room left > * [bse thread] sleeps on the condition BSE never sleeps, waiting for audio drivers. take a look at alsa_device_write() and alsa_device_check_io() in bsepcmdevice-alsa.c. i.e. allow full buffer writes if at all possible and provide timeout information on how long it takes for enough buffer space to be available. that's the kind of semantic that needs to be implemented. > * [jack thread] reads some data from output_ringbuffer > * [jack thread] signals the condition > * [bse thread] continues to write some data to output_ringbuffer basically, the above can work if you s/sleeps on the condition/continues with audio processing/. BSE can do that, once it has the required timeout information. > * > * race condition: > * > * [bse thread] writes some data to output_ringbuffer > * [bse thread] checks remaining space, there is no room left > * [jack thread] reads some data from output_ringbuffer > * [jack thread] signals the condition > * [bse thread] sleeps on the condition such races are only possible when not using conditions properly. you always have to follow the sequence: lock(); check(); wait(); unlock(); and on the pther side: lock(); update(); signal(); unlock(); this ensures that no update/signal goes by unnoticed by the check. see: http://developer.gnome.org/doc/API/2.0/glib/glib-Threads.html#GCond > * > * since the jack callback gets called periodically, the bse thread will > * never starve, though, in the worst case it will just miss a frame > * > * so we absolutely exclude the possibility for priority inversion (by > * using trylock instead of lock), by introducing extra latency in > * some extremely rare cases apart from excessive newlines ;) this argumentation is faulty. think of jack exiting after you hit your signal-lost period. BSE will hang forever, unacceptable. > */ > if (have_cond_lock) > jack->cond_mutex.unlock(); > } > else > { when exactly is this branch tirggered? (lost track due to the excessive inlined commenting above.) isn't this the !trylock() case? the purpose of having an atomic ringbuffer in the first place was to not be forced into using lock/trylock/unlock in the jack thread. so this if-branch shouldn't occour in the implementaiton at all. as an aside, if we would be willing to pay lock/unlock in the jack thread, we could as well pass around list nodes that maintain audio buffers. you could get entirely rid fo the value copying in the ring buffer that way... > for (guint ch = 0; ch < jack->input_ports.size(); ch++) > { > sample_t *values = (sample_t *) jack_port_get_buffer (jack->input_ports[ch], n_frames); > Block::fill (n_frames, values, 0.0); > } > for (guint ch = 0; ch < jack->output_ports.size(); ch++) > { > sample_t *values = (sample_t *) jack_port_get_buffer (jack->output_ports[ch], n_frames); > Block::fill (n_frames, values, 0.0); > } > } > jack->cond.signal(); ah, here's the racy signalling *outside* of the mutex lock being held... > > if (callback_state == CALLBACK_STATE_PLEASE_TERMINATE) > { > Atomic::int_set (&jack->callback_state, CALLBACK_STATE_TERMINATED); > return 1; > } > return 0; this misses a comment on the purpose of the return value. > } > > static void > terminate_and_free_jack (JackPcmHandle *jack) > { > g_return_if_fail (jack->jack_client != NULL); > > Atomic::int_set (&jack->callback_state, CALLBACK_STATE_PLEASE_TERMINATE); > while (Atomic::int_get (&jack->callback_state) == CALLBACK_STATE_PLEASE_TERMINATE) > { > AutoLocker cond_locker (jack->cond_mutex); > > if (jack->is_down) /* if jack is already gone, then forget it */ > Atomic::int_set (&jack->callback_state, CALLBACK_STATE_TERMINATED); > else > jack->cond.wait_timed (jack->cond_mutex, 100000); > } is the following for real? you disconnect the ports *after* the connection is terminated? or, is "terminated" wrongly worded here? > > for (guint ch = 0; ch < jack->input_ports.size(); ch++) > jack_port_disconnect (jack->jack_client, jack->input_ports[ch]); > > for (guint ch = 0; ch < jack->output_ports.size(); ch++) > jack_port_disconnect (jack->jack_client, jack->output_ports[ch]); > > jack_deactivate (jack->jack_client); > > delete jack; > } > > static BseErrorType > bse_pcm_device_jack_open (BseDevice *device, > gboolean require_readable, > gboolean require_writable, > guint n_args, > const gchar **args) > { > BsePcmDeviceJACK *self = BSE_PCM_DEVICE_JACK (device); > if (!self->jack_client) > return BSE_ERROR_FILE_OPEN_FAILED; > > JackPcmHandle *jack = new JackPcmHandle (self->jack_client); > BsePcmHandle *handle = &jack->handle; > > handle->readable = require_readable; > handle->writable = require_writable; > handle->n_channels = BSE_PCM_DEVICE (device)->req_n_channels; > > /* try setup */ > BseErrorType error = BSE_ERROR_NONE; > > const char *dname = (n_args == 1) ? args[0] : "alsa_pcm"; > handle->mix_freq = jack_get_sample_rate (self->jack_client); > > Atomic::int_set (&jack->callback_state, CALLBACK_STATE_INACTIVE); > > for (guint i = 0; i < handle->n_channels; i++) > { > const int port_name_size = jack_port_name_size(); > char port_name[port_name_size]; > jack_port_t *port; > > snprintf (port_name, port_name_size, "in_%u", i); please use g_snprintf() instead. > port = jack_port_register (self->jack_client, port_name, JACK_DEFAULT_AUDIO_TYPE, JackPortIsInput, 0); what is JACK_DEFAULT_AUDIO_TYPE? shouldn't we request floats here (i dunno the jack API though). > if (port) > jack->input_ports.push_back (port); > else > error = BSE_ERROR_FILE_OPEN_FAILED; > > snprintf (port_name, port_name_size, "out_%u", i); > port = jack_port_register (self->jack_client, port_name, JACK_DEFAULT_AUDIO_TYPE, JackPortIsOutput, 0); > if (port) > jack->output_ports.push_back (port); > else > error = BSE_ERROR_FILE_OPEN_FAILED; > } > > /* activate */ > > jack_set_process_callback (self->jack_client, jack_process_callback, jack); > jack_on_shutdown (self->jack_client, jack_shutdown_callback, jack); > jack_activate (self->jack_client); you're activating here without checking for if (error)? > > /* connect ports */ > > map<string, DeviceDetails> devices = query_jack_devices (self); querying the devices could have been done before registering the ports above. > map<string, DeviceDetails>::const_iterator di; > > di = devices.find (dname); > if (di != devices.end()) > { > const DeviceDetails &details = di->second; > > for (guint ch = 0; ch < handle->n_channels; ch++) > { > if (details.output_ports > ch) > jack_connect (self->jack_client, details.output_port_names[ch].c_str(), jack_port_name (jack->input_ports[ch])); > if (details.input_ports > ch) > jack_connect (self->jack_client, jack_port_name (jack->output_ports[ch]), details.input_port_names[ch].c_str()); > } > } you're not setting error in case you failed to find the required devices to connect the channels. > > /* setup buffer size */ > > // keep at least two jack callback sizes for dropout free audio > guint min_buffer_frames = jack_get_buffer_size (self->jack_client) * 2; > > // keep an extra engine buffer size (this compensates also for cases where the > // engine buffer size is not a 2^N value, which would otherwise cause the yeah, engine buffers are usually not 2^n sizes. see bse_engine_constrain() in bseengine.c. > // buffer never to be fully filled with 2 periods of data) "periods"? do you mean "elements" or "frames" here? (using your ringbuffer terminology) or do you mean buffer sizes here? jack buffers or bse buffers? > min_buffer_frames += BSE_PCM_DEVICE (device)->req_block_length; couldn't you instead just align min_buffer_frames to 2 * bse buffer size, to still fullfil the requirements you outlined, and yet yield a smaller ring buffer (thus optimize latency)? > > // honor the user defined latency specification > // > // FIXME: should we try to tweak local buffering so that it corresponds > // to the user defined latency, or global buffering (including the jack > // buffer)? we do the former, since it is easier we should do the latter. that shouldn't be too hard, Paul said that you could query (and inform) jack about the latency on a port. that then just needs to be subtracted from BSE_PCM_DEVICE (device)->req_latency_ms. > guint user_buffer_frames = BSE_PCM_DEVICE (device)->req_latency_ms * handle->mix_freq / 1000; > guint buffer_frames = max (min_buffer_frames, user_buffer_frames); > > jack->input_ringbuffer.resize (buffer_frames, handle->n_channels); > jack->output_ringbuffer.resize (buffer_frames, handle->n_channels); the way this is written, you're doubling the latency because you apply it to the input and the output buffer. you should just apply it to the output buffer (and also subtract the input buffer latency before doing that) > jack->buffer_frames = jack->output_ringbuffer.get_writable_frames(); what is jack->buffer_frames good for? and why is that different from total_n_frames() here? > > // the ringbuffer should be exactly as big as requested > g_assert (jack->buffer_frames == buffer_frames); > DEBUG ("ringbuffer size = %.3fms", jack->buffer_frames / double (handle->mix_freq) * 1000); hm, you should better use driver prefixing in DEBUG() statements like the ALSA driver does it, i.e.: DEBUG ("JACK: ringbuffer size = %.3fms", jack->buffer_frames / double (handle->mix_freq) * 1000); of course that applies to all your DEBUG() calls. > > /* setup PCM handle or shutdown */ > if (!error) > { > bse_device_set_opened (device, dname, handle->readable, handle->writable); > if (handle->readable) > handle->read = jack_device_read; > if (handle->writable) > handle->write = jack_device_write; > handle->check_io = jack_device_check_io; > handle->latency = jack_device_latency; > BSE_PCM_DEVICE (device)->handle = handle; > > /* will prevent dropouts at initialization, when no data is there at all */ why would this "prevent dropouts"? the jack thread got started already, so a dropout could possibly have occoured _already_. granted, retriggering may be neccessary, but the comment doesn't make sense to me here. > jack_device_retrigger (jack); > jack_device_latency (handle); // debugging only: print latency values > } > else > { > terminate_and_free_jack (jack); > } > DEBUG ("JACK: opening PCM \"%s\" readupble=%d writable=%d: %s", dname, require_readable, require_writable, bse_error_blurb (error)); > > return error; > } > > static void > bse_pcm_device_jack_close (BseDevice *device) > { > JackPcmHandle *jack = (JackPcmHandle*) BSE_PCM_DEVICE (device)->handle; > BSE_PCM_DEVICE (device)->handle = NULL; > > terminate_and_free_jack (jack); > } > > static void > bse_pcm_device_jack_finalize (GObject *object) > { > BsePcmDeviceJACK *self = BSE_PCM_DEVICE_JACK (object); > if (self->jack_client) > { > jack_client_close (self->jack_client); > self->jack_client = NULL; > } > > /* chain parent class' handler */ > G_OBJECT_CLASS (parent_class)->finalize (object); > } > > static void > jack_device_retrigger (JackPcmHandle *jack) > { > g_return_if_fail (jack->jack_client != NULL); > > /* extraneous newline at coment start. > * we need to reset the active flag to false here, as we modify the > * buffers in a non threadsafe way; this is why we also wait for > * the condition, to ensure that the jack callback really isn't > * active any more > */ > Atomic::int_set (&jack->callback_state, CALLBACK_STATE_INACTIVE); > > /* usually should not timeout, but be notified by the jack callback */ > jack->cond_mutex.lock(); > if (!jack->is_down) > jack->cond.wait_timed (jack->cond_mutex, 100000); > jack->cond_mutex.unlock(); hm, is_down is set by the jack thread onkly volountarily, i.e. if it exits prematurely, you get the stale BSE thread i mntioned above. > > /* jack_ringbuffer_reset is not threadsafe! */ yeah, i didn't mention this in the ringbuffer review, but i'd agree that "reset" would be a better name than "clear" for the resetting of the pointers. > jack->input_ringbuffer.clear(); > jack->output_ringbuffer.clear(); > > /* initialize output ringbuffer with silence */ > vector<float> silence (jack->buffer_frames * jack->handle.n_channels); this is expensive, you can simply use bse_engine_const_zeros() which provides at least BSE_STREAM_MAX_VALUES many values in constant time. and yes, this should be fixed in the ALSA driver as well. > guint frames_written = jack->output_ringbuffer.write (jack->buffer_frames, &silence[0]); > g_assert (frames_written == jack->buffer_frames); > DEBUG ("jack_device_retrigger: %d frames written", frames_written); hm, at the end of retrigger, the jack thread should go ACTIVE. just like we call snd_pcm_prepare() in bsepcmdevice-alsa.c:alsa_device_retrigger(). > } > > static gboolean > jack_device_check_io (BsePcmHandle *handle, > glong *timeoutp) > { > JackPcmHandle *jack = (JackPcmHandle*) handle; > g_return_val_if_fail (jack->jack_client != NULL, false); > > /* > int ixruns = Atomic::int_get (&jack->ixruns); > int oxruns = Atomic::int_get (&jack->oxruns); > > if (jack->poxruns != oxruns || jack->pixruns != ixruns) > { > printf ("beast caused jack input xruns %d, output xruns %d\n", ixruns, oxruns); debugging prints (if you already failed to use DEBUG()) should go to stderr, otherwise they cause major breakage in programs that generate data on stdout (of which we have quite bunch in CVS). > jack->poxruns = oxruns; > jack->pixruns = ixruns; > > jack_device_retrigger (jack); > } > */ > > /* (FIXME?) > * > * we can not guarantee that beast will read the data from the jack ringbuffer, > * since this depends on the presence of a module that actually reads data in > * the synthesis graph : so we simply ignore the ixruns variable; I am not sure > * if thats the right thing to do, though sounds reasonably at first glance. > */ > int oxruns = Atomic::int_get (&jack->oxruns); > > if (jack->poxruns != oxruns) > { > g_printerr ("%d beast jack driver xruns\n", oxruns); > jack->poxruns = oxruns; > > jack_device_retrigger (jack); you shouldn't need to retrigger after xruns have occoured. it has already happened and extra padding was provided already. if you do this, you just make sure that there really is a loud and noticable click in the output (while a simply 1-block xrun often doesn't produce too bad artefacts) > } > > Atomic::int_set (&jack->callback_state, CALLBACK_STATE_ACTIVE); this should have gone into retrigger, see alsa_device_retrigger() and alsa_device_check_io(). > > guint n_frames_avail = min (jack->output_ringbuffer.get_writable_frames(), jack->input_ringbuffer.get_readable_frames()); > > /* check whether data can be processed */ > if (n_frames_avail >= handle->block_length) > return TRUE; /* need processing */ > > /* calculate timeout until processing is possible or needed */ > guint diff_frames = handle->block_length - n_frames_avail; > *timeoutp = diff_frames * 1000 / handle->mix_freq; > return FALSE; > } > > static guint > jack_device_latency (BsePcmHandle *handle) > { > JackPcmHandle *jack = (JackPcmHandle*) handle; > jack_nframes_t rlatency = 0, wlatency = 0; > > g_return_val_if_fail (jack->jack_client != NULL, 0); > > /* FIXME: the API of this function is broken, because you can't use its result > * to sync for instance the play position pointer with the screen > * > * why? because it doesn't return you rlatency and wlatency seperately, but > * only the sum of both > * > * when using jack, there is no guarantee that rlatency and wlatency are equal good point. and more importantly, for song pointer syncs, we want to use just the wlatency. so for a first fix, we can change oss_device_latency() and alsa_device_latency() to just return the wlatency. we can later improve the API to also support rlatency queries. > */ > for (guint i = 0; i < jack->input_ports.size(); i++) > { > jack_nframes_t latency = jack_port_get_total_latency (jack->jack_client, jack->input_ports[i]); > rlatency = max (rlatency, latency); > } > > for (guint i = 0; i < jack->output_ports.size(); i++) > { > jack_nframes_t latency = jack_port_get_total_latency (jack->jack_client, jack->output_ports[i]); > wlatency = max (wlatency, latency); > } > > guint total_latency = 2 * jack->buffer_frames + rlatency + wlatency; ah, this is where buffer_frames gets used? can't you simply use jack->input_ringbuffer.n_total_frames() + jack->output_ringbuffer.n_total_frames(). (also be more accurate if the buffers have different lengths, e.g. the input buffer could possibly be shorter than the output buffer). > DEBUG ("rlatency=%.3f ms wlatency=%.3f ms ringbuffer=%.3f ms total_latency=%.3f ms", > rlatency / double (handle->mix_freq) * 1000, > wlatency / double (handle->mix_freq) * 1000, > jack->buffer_frames / double (handle->mix_freq) * 1000, > total_latency / double (handle->mix_freq) * 1000); > return total_latency; > } > > static gsize > jack_device_read (BsePcmHandle *handle, > gfloat *values) > { > JackPcmHandle *jack = (JackPcmHandle*) handle; > g_return_val_if_fail (jack->jack_client != NULL, 0); > g_return_val_if_fail (Atomic::int_get (&jack->callback_state) == CALLBACK_STATE_PACTIVE, 0); > > guint frames_left = handle->block_length; > while (frames_left) > { > guint frames_read = jack->input_ringbuffer.read (frames_left, values); > > frames_left -= frames_read; > values += frames_read * handle->n_channels; > > if (frames_left) > { > AutoLocker cond_locker (jack->cond_mutex); > > /* usually should not timeout, but be notified by the jack callback */ > if (!jack->input_ringbuffer.get_readable_frames()) > jack->cond.wait_timed (jack->cond_mutex, 100000); we can't afford to wait on jack here. engine locks have to be super-short (examples can be found in bseengineutils.c). i said this above already, we can't actually afford locking at all, which is why we have an atomic ringbuffer in the first place. > > if (jack->is_down) > { > /* extraneous newline at comment start. > * FIXME: we need a way to indicate an error here; beast should provide > * an adequate reaction in case the JACK server is down (it should stop > * playing the file, and show a dialog, if JACK can not be reconnected) we don't need error indication here (see alsa_device_read()). it's good enough if we return gracefully here. error checking code (and returns) can be executed by check_io() and/or retrigger(). > * > * if we have a way to abort processing, this if can be moved above > * the condition wait; however, right now moving it there means that > * beast will render the output as fast as possible when jack dies, and > * this will look like a machine lockup no, endless loops don't look like lockups ;) in the first case, your CPU meter runs at 100% and toggling Num-lock/etc. keys still work (usually also the X server etc.) while in the latter case *nothing* goes anymore. > */ > Block::fill (frames_left * handle->n_channels, values, 0.0); /* <- remove me once we can indicate errors */ nope should stay (the same way alsa_device_read() copeswith errors). > return handle->block_length * handle->n_channels; > } > } > } > return handle->block_length * handle->n_channels; > } > > static void > jack_device_write (BsePcmHandle *handle, > const gfloat *values) > { > JackPcmHandle *jack = (JackPcmHandle*) handle; > g_return_if_fail (jack->jack_client != NULL); > g_return_if_fail (Atomic::int_get (&jack->callback_state) == CALLBACK_STATE_ACTIVE); > > guint frames_left = handle->block_length; > > while (frames_left) > { > guint frames_written = jack->output_ringbuffer.write (frames_left, values); > > frames_left -= frames_written; > values += frames_written * handle->n_channels; > > if (frames_left) > { > AutoLocker cond_locker (jack->cond_mutex); > > /* usually should not timeout, but be notified by the jack callback */ > if (!jack->output_ringbuffer.get_writable_frames()) > jack->cond.wait_timed (jack->cond_mutex, 100000); > > if (jack->is_down) > { > /* > * FIXME: we need a way to indicate an error here; beast should provide > * an adequate reaction in case the JACK server is down (it should stop > * playing the file, and show a dialog, if JACK can not be reconnected) > * > * if we have a way to abort processing, this if can be moved above > * the condition wait; however, right now moving it there means that > * beast will render the output as fast as possible when jack dies, and > * this will look like a machine lockup > */ hrm, /* FIXME: see jack_device_read() */ would have been easier here. my comments there also apply here of course. > return; > } > } > } > } > > static void > bse_pcm_device_jack_class_init (BsePcmDeviceJACKClass *klass) > { > GObjectClass *gobject_class = G_OBJECT_CLASS (klass); > BseDeviceClass *device_class = BSE_DEVICE_CLASS (klass); > > parent_class = g_type_class_peek_parent (klass); > > gobject_class->finalize = bse_pcm_device_jack_finalize; > > device_class->list_devices = bse_pcm_device_jack_list_devices; > const gchar *name = "jack"; > const gchar *syntax = _("device"); "device" is too short of a syntax description for users. i don't know jack devices, but i suppose you need to say something like _("JACK_DEVICE_NAME") here. again, see how the alsa driver tries to guide the user here. > const gchar *info = g_intern_printf (/* TRANSLATORS: keep this text to 70 chars in width */ > _("JACK Audio Connection Kit PCM driver (http://jackaudio.org)\n")); jack version printing? again, see what the alsa driver does here. > > /* Being BSE_RATING_PREFFERED + 1, should do the right thing: if jackd is running, > * use it, if jackd isn't running (we'll get an error upon initialization, > * then), use the next best driver. > * > * Right now we will never start the jackd process ourselves. > */ > bse_device_class_setup (klass, BSE_RATING_PREFERRED + 1, name, syntax, info); > device_class->open = bse_pcm_device_jack_open; > device_class->close = bse_pcm_device_jack_close; > } > > Cu... Stefan for a few closing words: first, thanks for starting out on the implementation of a jack driver for BSE. the task in itself is definitely non-trivial. on the first draft, it looks like the driver still needs a lot of work to be basically usable. some portions already look good, but in others... it really was disappointing for me, how much of perfectly good template code / logic provided by bsepcmdevice-alsa.c simply got lost or ignored in the transition to a jack driver. re-doing these just adds up to the list of TODOs. the single most important issue that has to be fixed here is the mutex and condition. neither the BSE nor the JACK thread can afford acquiring it and wait for the other thread. that's why we need a ringbuffer based only on atomic ops. we got that now, so the next step is to modify the code to use it up to its fullest potential. basically, to get there, the jack thread needs to be modified to act more similarl to e.g. the kernel ALSA driver wrg to xruns. the kernel driver also has fixed-size ring buffers and still deals with drop outs perfectly well. that kind of API semantic has to be provided for the BSE thread in the bse-alsa driver. then, the implementations of check_io(), retrigger(), read() and write() can be much closer to the bse-alsa driver implementation. also, this will eliminate the need for the mutex and the condition. --- ciaoTJ _______________________________________________ beast mailing list beast@... http://mail.gnome.org/mailman/listinfo/beast |
|
|
Re: Beast Jack driver (Re: Lockfree ringbuffer review) Hi!
I am still working on a new version of the code, where I merge your comments, so I'll just reply to points I can immediately reply, and post new code when done. On Thu, Jun 01, 2006 at 10:53:31PM +0200, Tim Janik wrote: > >>>Finally, if we were to optimize the JACK driver for maximum performance, > >>>a more important factor would probably be that the ringbuffer even has > >>>this API. > >> > >>sorry? what API are you talking about? > >>void bse_block_copy_float (guint n_values, > >> gfloat *values, > >> const gfloat *ivalues); > >>copying? > >> > >>>It means that we need to deinterleave and reinterleave the > >>>data JACK supplies on the stack, and then read/write it to the > >>>ringbuffer. If the ringbuffer would offer an API to access its memory > >>>directly, the extra copy would disappear. But it makes the code harder > >>>to maintain and for now my first priority has been getting things > >>>implemented correctly. > >> > >>i don't quite understand this point. > >>are you saying that replacing memcpy() by bse_block_copy_float() can't > >>be done due to interleaving? :-O > > > >No. It can be replaced, and the new version has it. However, our data > >flow looks like this: > > > > input > > > >jack thread process() callback supplies N separate buffers > > | > > V > >interleave on the stack (manually implemented, in process()) > >*jack-thread* > > | > > V > >copy into the input ringbuffer (input ringbuffer write) > >*jack-thread* > > | > > V > >copy in bse engine supplied buffer (input ringbuffer read) > >*engine-thread* > > | > > V > > >[ beast processing ] > >*engine-thread* > > note that this is processing in BSE, not beast (the GUI). > and BSE does deinterleave its data, process it and then reinterleave > because standard audio drivers want that. for jack this doesn't > make sense though, so the data flow should instead be: > n-jack-buffers -> ringbuffer-write -> || > || -> ringbuffer-read -> n-bse-channel-blocks Ok, the new version gets rid of the interleaving/deinterleaving. That requires changes in the driver API of BSE, though. > >>>void > >>>clear() > >>>{ > >>> Atomic::int_set (&m_read_frame_pos, 0); > >>> Atomic::int_set (&m_write_frame_pos, 0); > >>>} > >> > >>is this really needed outside of resize() though? > > > >The jack driver needs it (to get into a sane state after a dropout). > > how are you avoiding concurrent access when clearing due to a drop > out then? The process callback checks an atomic integer before accessing any data structures (its called callback_state). Before resetting the ringbuffer, this callback state will be set to "inactive" (in the engine thread), and one condition wait will be done to ensure that if a process() callback was still running (and thus accessing data structures), it terminated. Thus, when the actual clear() is executed, it is guaranteed that the jack process callback can't interfere, and - on the other hand - the engine thread can also not interfere, because it is the thread calling the clear() method. Afterwards, the atomic integer is set back to "active" to reactivate the jack thread callback audio processing. Cu... Stefan -- Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan _______________________________________________ beast mailing list beast@... http://mail.gnome.org/mailman/listinfo/beast |
|
|
Re: bsepcmdevice-jack.cc (Re: Beast Jack driver) Hi!
Same comment as for the other reply: I'll post fixed code later, so I'll just answer to discussion stuff. On Fri, Jun 02, 2006 at 01:45:01AM +0200, Tim Janik wrote: > >enum > >{ > > CALLBACK_STATE_INACTIVE = 0, > > CALLBACK_STATE_ACTIVE = 1, > > CALLBACK_STATE_PLEASE_TERMINATE = 2, > > CALLBACK_STATE_TERMINATED = 3 > > minor cosmetic, we usually align '='-symbols in enums. > > >}; > > you should provide a type name for this enum, which > can be used above to tell apart valid states of your > callback_state variable. The problem is that callback_state is an atomic integer, so I think it needs to be volatile int, and I can't declare it of the enum type; or am I wrong? > > vector<jack_port_t *> input_ports; > > vector<jack_port_t *> output_ports; > > > > guint buffer_frames; /* input/output > > ringbuffer size in frames */ > > > > Cond cond; > > Mutex cond_mutex; > > erk, what exactly is protected by this cond and this mutex? > something like this definitely needs to be outlined in code > (in comments or by naming prefixes, look for BirnetMutex > in bseengineutils.c to see examples for that) Well, I'll put a comment there, but if you want to know what it does right now: the condition is signalled whenever the realtime callback has been called and is thus be used to wait for events that depend on the realtime callback (like waiting for termination or new data). The mutex is mostly useless - its there because conditions require it (you can't wait on a condition without one), but all code needs to work without the mutex, because the realtime thread can not use lock() (only trylock()) due to possible priority inversion. > > jack_status_t status; > > > > self->jack_client = jack_client_open ("beast", JackNoStartServer, > > &status); /* FIXME: translations(?) */ > > what's this "beast" being used for? (especially since you > quesiton translation) > e.g. if it's similar to window titles, it should be "BEAST", > if it's going to be used as a program name in continueous text, > it should prolly be _("Beast") and if its meant as a technical > identifier (e.g. for serialization), it should be "beast". > > maybe Paul can shed light on this? Its the name of the client, which means that for the command line utils for instance you'll use beast:in_1 as port name. So this suggests that you may not want to translate it. However, for qjackctl, its also the label that is displayed in the GUI, in the widget where the clients (and ports) are listed. So this suggests that you may want to translate it. However then, if you store a session (somehow, I don't know the details), and try to restore it in a different language, it will break. So probably its just that jack has no model that strings might need to be translated, and just treats the world as if there was only one language, so the best we can do is to not translate the name. > > DEBUG ("attaching to JACK server returned status: %d\n", status); > > /* FIXME: check status for proper error reporting > > doesn't teh jack API have something like: > const char* jack_status_to_string (jack_status_t status); > ? No. > >static map<string, DeviceDetails> > >query_jack_devices (BsePcmDeviceJACK *self) > >{ > > map<string, DeviceDetails> devices; > > > > /* FIXME: we need to make a difference here between > > * - jackd is running, but no output ports can be found > > curious, why exactly could that be the case? I am not sure it can happen, as you need to load some driver to be able to start jack at all, and the driver will probably always register some terminal ports. However, in theory it could be possible to start jack only to connect two applications; for instance one that produces audio (beast) and another one that sends it over the net somewhere. Then, which application you start first is arbitary, and it might as well be that beast gets started first. But... well... I don't know if jack is ever going to support starting without a driver (it would need to do something to get the timing, sample rate and so on setup; normally probably the driver does this). > > * - jackd is not running > > that should/will actually be determined by jack_client_open(), right? Right. Then self->jack_client will be NULL. Or not, if jackd was once running but crashed later... > > DeviceDetails &details = devices[device_name]; > > details.ports++; > > why are you incrementing here already, if the port > could possibly not be looked up by name? Nothing of significance. If jack_port_by_name returns NULL, the port doesn't exist any longer (or jackd crashed), and it will not be used anyway, since the other flags don't say that it is usable either as input or as output. > > jack_port_t *jack_port = jack_port_by_name (self->jack_client, > > jack_ports[i]); > > if (jack_port) > > { > > int flags = jack_port_flags (jack_port); > > if (flags & JackPortIsInput) > > { > > details.input_ports++; > > details.input_port_names.push_back (jack_ports[i]); > > } > > if (flags & JackPortIsOutput) > > { > > details.output_ports++; > > details.output_port_names.push_back (jack_ports[i]); > > } > > if (flags & JackPortIsPhysical) > > details.physical_ports++; > > if (flags & JackPortIsTerminal) > > details.terminal_ports++; > > why are you counting physical_ports and terminal_ports here, > regardless of whether the port in question is an input, > output or neither port? > and, if this is purely statistical/informative counting, > why are you only counting ports that can be looked up by name? > > /me scratches head... I can only gather statistics if I get the jack_port (because only then jack_port_flags can be called). I gather statistics to display it later. Intersting is the physical flag "hardware ports, like an alsa device", and the input/output flag. I am just gathering terminal for the sake of completeness. > > } > > } > > } > > free (jack_ports); > > } > > > > return devices; > > ugh, isn't this a full fledged recursive map+strings+DeviceDetails copy? Yes. So? Its not in any time critical code, so I don't bother to optimize it. > >} > > > >static SfiRing* > >bse_pcm_device_jack_list_devices (BseDevice *device) > >{ > > BsePcmDeviceJACK *self = BSE_PCM_DEVICE_JACK (device); > > map<string, DeviceDetails> devices = query_jack_devices (self); > > > > SfiRing *ring = NULL; > > for (map<string, DeviceDetails>::iterator di = devices.begin(); di != > > devices.end(); di++) > > { > > const string &device_name = di->first; > > DeviceDetails &details = di->second; > > BseDeviceEntry *entry; > > entry = bse_device_group_entry_new (device, g_strdup > > (device_name.c_str()), > > g_strdup ("JACK Devices"), > > g_strdup_printf ("%s > > (%d*input %d*output)%s", > > device_name.c_str(), > > details.input_ports, > > details.output_ports, > > details.physical_ports == details.ports ? > > " [ > > hardware ]" : "")); > > hm, can you please elaborate the > details.physical_ports == details.ports ? " [ hardware ]" : "" > condition for me? ;) Well... if all ports of a client are physical, then its got to be a hardware device. For instance alsa_pcm has only physical ports here. I am not sure what a client is that has *some* physical ports... I've yet to see one. > > > > /* ensure that alsa_pcm always is the first item listed (it is the > > default device) */ > > hm, doesn't jack provide some kind of rating for the devices to pick? No. People hardcode alsa_pcm as default all day long. > > * > > * normal operation: > > * > > * [bse thread] writes some data to output_ringbuffer > > * [bse thread] checks remaining space, there is no room left > > * [bse thread] sleeps on the condition > > BSE never sleeps, waiting for audio drivers. take a look at > alsa_device_write() and alsa_device_check_io() in bsepcmdevice-alsa.c. > i.e. allow full buffer writes if at all possible and provide > timeout information on how long it takes for enough buffer space > to be available. > that's the kind of semantic that needs to be implemented. Yes, it does block in the ALSA driver. However, I think I'll come back to it in a seperate mail, since you said it shouldn't or maybe should, so I'll investigate. > > * [bse thread] writes some data to output_ringbuffer > > * [bse thread] checks remaining space, there is no room left > > * [jack thread] reads some data from output_ringbuffer > > * [jack thread] signals the condition > > * [bse thread] sleeps on the condition > > such races are only possible when not using conditions properly. > you always have to follow the sequence: lock(); check(); wait(); unlock(); > and on the pther side: lock(); update(); signal(); unlock(); > this ensures that no update/signal goes by unnoticed by the check. see: > http://developer.gnome.org/doc/API/2.0/glib/glib-Threads.html#GCond Cool. Just that I can't lock() in the RT thread, because of priority inversion. I've read a little kernel code today, and fs/pipe.c has a PIPE_MUTEX(...) which is being locked. So I am not sure I can use write() to wakeup the master thread either. However pthread_cond_signal in glibc also seems to have a kind of a mutex (so its also a problem), although I am not sure if I interpret the code correctly. But if I do, then I am running out of ideas how to get this right at all. Well, I think I have to think/read/ask others more about it until I know whats the right thing to do, to get the synchronization between the engine thread and the jack realtime callback right. > > * since the jack callback gets called periodically, the bse thread > > will > > * never starve, though, in the worst case it will just miss a frame > > * > > * so we absolutely exclude the possibility for priority inversion (by > > * using trylock instead of lock), by introducing extra latency in > > * some extremely rare cases > > apart from excessive newlines ;) this argumentation is faulty. > think of jack exiting after you hit your signal-lost period. > BSE will hang forever, unacceptable. Won't because all conditions are timeout bound and there is a shutdown handler that libjack will call if jack dies, so that eventually jack->is_down gets set, upon which nothing will hang (look at the code again to see for yourself). > > > */ > > if (have_cond_lock) > > jack->cond_mutex.unlock(); > > } > > else > > { > > when exactly is this branch tirggered? (lost track due to the excessive > inlined commenting above.) There are different activation states. Thats the case where the callback has marked to be inactive (shouldn't touch ringbuffers and such), so it will just write out zero blocks. > isn't this the !trylock() case? > the purpose of having an atomic ringbuffer in the first place was to > not be forced into using lock/trylock/unlock in the jack thread. > so this if-branch shouldn't occour in the implementaiton at all. Trylock is there to make some stuff (like updating the ringbuffers) happen within a lock, so that in principle, things that wait on new ringbuffer state will not fall in the condition trap you mentioned above. However, since its only a *try*lock, it can't prevent the condition trap entierly. Thus the lengthly comment. > as an aside, if we would be willing to pay lock/unlock in the jack thread, > we could as well pass around list nodes that maintain audio buffers. > you could get entirely rid fo the value copying in the ring buffer that > way... No, because the engine block size and the jack block size are not necessarily dividable, so you've got to do some copying to make get map the "half engine blocks" onto jack blocks, and the "half jack blocks" onto engine blocks. Also the jack buffers get invalid after the process() callback, and the engine buffers get invalid out of the engine process() so you've got to copy the data somewhere. The only way to get rid of copying entierly would be to run the engine process cycle in the jack thread, either directly via lock, or with two blocking context switches back and forth. But that'd probably violate a million restrictions that apply within the jack callback. > > for (guint ch = 0; ch < jack->input_ports.size(); ch++) > > { > > sample_t *values = (sample_t *) jack_port_get_buffer > > (jack->input_ports[ch], n_frames); > > Block::fill (n_frames, values, 0.0); > > } > > for (guint ch = 0; ch < jack->output_ports.size(); ch++) > > { > > sample_t *values = (sample_t *) jack_port_get_buffer > > (jack->output_ports[ch], n_frames); > > Block::fill (n_frames, values, 0.0); > > } > > > > } > > jack->cond.signal(); > > ah, here's the racy signalling *outside* of the mutex lock being held... > > > > > if (callback_state == CALLBACK_STATE_PLEASE_TERMINATE) > > { > > Atomic::int_set (&jack->callback_state, CALLBACK_STATE_TERMINATED); > > return 1; > > } > > return 0; > > this misses a comment on the purpose of the return value. FYI: 0 = run on 1 = die now But I'll add a comment. > >static void > >terminate_and_free_jack (JackPcmHandle *jack) > >{ > > g_return_if_fail (jack->jack_client != NULL); > > > > Atomic::int_set (&jack->callback_state, CALLBACK_STATE_PLEASE_TERMINATE); > > while (Atomic::int_get (&jack->callback_state) == > > CALLBACK_STATE_PLEASE_TERMINATE) > > { > > AutoLocker cond_locker (jack->cond_mutex); > > > > if (jack->is_down) /* if jack is already gone, then forget it */ > > Atomic::int_set (&jack->callback_state, CALLBACK_STATE_TERMINATED); > > else > > jack->cond.wait_timed (jack->cond_mutex, 100000); > > } > > is the following for real? > you disconnect the ports *after* the connection is terminated? > or, is "terminated" wrongly worded here? Its not the connection that terminated. Its just that the process callback will no longer called (i.e. the client has become inactive). In principle, you could start new process callback cycles, by calling jack_activate again. But can't happen with the driver model bse gives me. It can happen, that jackd has crashed inbetween, so I am disconnecting ports that are no longer there at all. But it seems that libjack can handle it (by seeing that writing the disconnect requests to the jack pipe will fail), so in this case it will return an error which I'll ignore. > > for (guint ch = 0; ch < jack->input_ports.size(); ch++) > > jack_port_disconnect (jack->jack_client, jack->input_ports[ch]); > > > > for (guint ch = 0; ch < jack->output_ports.size(); ch++) > > jack_port_disconnect (jack->jack_client, jack->output_ports[ch]); > > > > jack_deactivate (jack->jack_client); > > > > delete jack; > >} > > > > for (guint i = 0; i < handle->n_channels; i++) > > { > > const int port_name_size = jack_port_name_size(); > > char port_name[port_name_size]; > > jack_port_t *port; > > > > snprintf (port_name, port_name_size, "in_%u", i); > > please use g_snprintf() instead. > > > port = jack_port_register (self->jack_client, port_name, > > JACK_DEFAULT_AUDIO_TYPE, JackPortIsInput, 0); > > what is JACK_DEFAULT_AUDIO_TYPE? shouldn't we request floats here (i > dunno the jack API though). JACK_DEFAULT_AUDIO_TYPE is float. There is no other audio type at all, i.e. there is no negotaition or anything. It also corresponds to my sample_t typedef above (is now called JackSample typedef to be UpperLowerCaseNamed), which is also float. Otherwise, the whole thing will break into pieces anyway, because I'll just copy things. > > /* activate */ > > > > jack_set_process_callback (self->jack_client, jack_process_callback, > > jack); > > jack_on_shutdown (self->jack_client, jack_shutdown_callback, jack); > > jack_activate (self->jack_client); > > you're activating here without checking for if (error)? Good point. Will fix. > > /* connect ports */ > > > > map<string, DeviceDetails> devices = query_jack_devices (self); > > querying the devices could have been done before registering > the ports above. So what? I can only connect the ports anyway. > > map<string, DeviceDetails>::const_iterator di; > > > > di = devices.find (dname); > > if (di != devices.end()) > > { > > const DeviceDetails &details = di->second; > > > > for (guint ch = 0; ch < handle->n_channels; ch++) > > { > > if (details.output_ports > ch) > > jack_connect (self->jack_client, > > details.output_port_names[ch].c_str(), jack_port_name > > (jack->input_ports[ch])); > > if (details.input_ports > ch) > > jack_connect (self->jack_client, jack_port_name > > (jack->output_ports[ch]), details.input_port_names[ch].c_str()); > > } > > } > > you're not setting error in case you failed to find > the required devices to connect the channels. I don't want to. If you for instance start beast with -pjack=jamin to use jamin for mastering, and later the user chooses to terminate jamin, then we can still run. Its just that then the ports will be initially unconnected, which is not a problem, because it can be easily fixed in qjackctl. > > /* setup buffer size */ > > > > // keep at least two jack callback sizes for dropout free audio > > guint min_buffer_frames = jack_get_buffer_size (self->jack_client) * 2; > > > > // keep an extra engine buffer size (this compensates also for cases > > where the > > // engine buffer size is not a 2^N value, which would otherwise cause the > > yeah, engine buffers are usually not 2^n sizes. see bse_engine_constrain() > in bseengine.c. > > > // buffer never to be fully filled with 2 periods of data) > > "periods"? do you mean "elements" or "frames" here? (using your ringbuffer > terminology) > or do you mean buffer sizes here? jack buffers or bse buffers? Jack buffer sizes. > > > min_buffer_frames += BSE_PCM_DEVICE (device)->req_block_length; > > couldn't you instead just align min_buffer_frames to 2 * bse buffer size, > to still fullfil the requirements you outlined, and yet yield a smaller > ring buffer (thus optimize latency)? Don't know. Trouble comes when we miss the - racy - condition. That produces one period of extra latency if things go badly wrong. Thus a stateful synchronization which can't miss something (such as a pipe()) would be superior. Then I suppose we could do as you suggested. > > // honor the user defined latency specification > > // > > // FIXME: should we try to tweak local buffering so that it corresponds > > // to the user defined latency, or global buffering (including the jack > > // buffer)? we do the former, since it is easier > > we should do the latter. that shouldn't be too hard, Paul said that you > could query (and inform) jack about the latency on a port. > that then just needs to be subtracted from BSE_PCM_DEVICE > (device)->req_latency_ms. It can change. If the beast output is jamin, that'd be adding latency. If the user reconnects jamin to a convolution which adds latency, then the total output latency query would change while we're running. Thats why I am a bit reluctant to do it, as its results are not well-defined. It could even happen that we are not connected initially (latency = 0), and only get connected afterwards by a patchbay, where the user is loading a session. Thus in such a case we would always misbehave. > > > guint user_buffer_frames = BSE_PCM_DEVICE (device)->req_latency_ms * > > handle->mix_freq / 1000; > > guint buffer_frames = max (min_buffer_frames, user_buffer_frames); > > > > jack->input_ringbuffer.resize (buffer_frames, handle->n_channels); > > jack->output_ringbuffer.resize (buffer_frames, handle->n_channels); > > the way this is written, you're doubling the latency because you apply > it to the input and the output buffer. you should just apply it to the > output buffer (and also subtract the input buffer latency before doing > that) The input buffer should not produce latency. Ideally, the filled frames of the input buffer and the filled frames of the output buffer should add up to exactly buffer_frames. Note that the input_buffer is *empty* initially while the output_buffer is *full* initially, and the jack callback methodically only adds as many frames to the input_buffer as it takes from the output_buffer, whereas the engine process cycle does the inverse. > > jack->buffer_frames = jack->output_ringbuffer.get_writable_frames(); > > what is jack->buffer_frames good for? and why is that different from > total_n_frames() here? Its not different. Don't know. Maybe I'll get rid of it in the new version. > > // the ringbuffer should be exactly as big as requested > > g_assert (jack->buffer_frames == buffer_frames); > > DEBUG ("ringbuffer size = %.3fms", jack->buffer_frames / double > > (handle->mix_freq) * 1000); > > hm, you should better use driver prefixing in DEBUG() statements like > the ALSA driver does it, i.e.: > DEBUG ("JACK: ringbuffer size = %.3fms", jack->buffer_frames / double > (handle->mix_freq) * 1000); > of course that applies to all your DEBUG() calls. I think the DEBUG() automagically does this. At least I saw that the output contained JACK already, so I didn't bother to add it. > > /* setup PCM handle or shutdown */ > > if (!error) > > { > > bse_device_set_opened (device, dname, handle->readable, > > handle->writable); > > if (handle->readable) > > handle->read = jack_device_read; > > if (handle->writable) > > handle->write = jack_device_write; > > handle->check_io = jack_device_check_io; > > handle->latency = jack_device_latency; > > BSE_PCM_DEVICE (device)->handle = handle; > > > > /* will prevent dropouts at initialization, when no data is there at > > all */ > > why would this "prevent dropouts"? the jack thread got started already, > so a dropout could possibly have occoured _already_. No the jack callback is still operating in the "inactive" state, so it will not touch our data yet. Yes. It will write zeros. And no. Normally you will not hear this. The point here is to fill the output ringbuffer fully with zeros (the alsa driver does the same), so that beast has some time to produce some audio that will be played. Otherwise it could be that there are dropouts at song start because jack is consuming the data about as fast at beast can produce it initially. > granted, retriggering may be neccessary, but the comment doesn't make > sense to me here. > > > jack_device_retrigger (jack); > > jack_device_latency (handle); // debugging only: print latency > > values > > } > > else > > { > > terminate_and_free_jack (jack); > > } > > DEBUG ("JACK: opening PCM \"%s\" readupble=%d writable=%d: %s", dname, > > require_readable, require_writable, bse_error_blurb (error)); > > > > return error; > >} > > > >static void > >bse_pcm_device_jack_close (BseDevice *device) > >{ > > JackPcmHandle *jack = (JackPcmHandle*) BSE_PCM_DEVICE (device)->handle; > > BSE_PCM_DEVICE (device)->handle = NULL; > > > > terminate_and_free_jack (jack); > >} > > > >static void > >bse_pcm_device_jack_finalize (GObject *object) > >{ > > BsePcmDeviceJACK *self = BSE_PCM_DEVICE_JACK (object); > > if (self->jack_client) > > { > > jack_client_close (self->jack_client); > > self->jack_client = NULL; > > } > > > > /* chain parent class' handler */ > > G_OBJECT_CLASS (parent_class)->finalize (object); > >} > > > >static void > >jack_device_retrigger (JackPcmHandle *jack) > >{ > > g_return_if_fail (jack->jack_client != NULL); > > > > /* > > extraneous newline at coment start. > > > * we need to reset the active flag to false here, as we modify the > > * buffers in a non threadsafe way; this is why we also wait for > > * the condition, to ensure that the jack callback really isn't > > * active any more > > */ > > Atomic::int_set (&jack->callback_state, CALLBACK_STATE_INACTIVE); > > > > /* usually should not timeout, but be notified by the jack callback */ > > jack->cond_mutex.lock(); > > if (!jack->is_down) > > jack->cond.wait_timed (jack->cond_mutex, 100000); > > jack->cond_mutex.unlock(); > > hm, is_down is set by the jack thread onkly volountarily, i.e. if > it exits prematurely, you get the stale BSE thread i mntioned above. Nope. Its set by the shutdown handler. > > > > > /* jack_ringbuffer_reset is not threadsafe! */ > > yeah, i didn't mention this in the ringbuffer review, but > i'd agree that "reset" would be a better name than "clear" > for the resetting of the pointers. Although the comment refers back to the old days, where I used jacks ringbuffer. Thus it also is called jack_ringbuffer_reset, and not FrameRingBuffer<float>::clear() in the comment. > >static gsize > >jack_device_read (BsePcmHandle *handle, > > gfloat *values) > >{ > [...] > > > > > if (jack->is_down) > > { > > /* > > extraneous newline at comment start. > > > * FIXME: we need a way to indicate an error here; beast > > should provide > > * an adequate reaction in case the JACK server is down (it > > should stop > > * playing the file, and show a dialog, if JACK can not be > > reconnected) > > we don't need error indication here (see alsa_device_read()). it's > good enough if we return gracefully here. > error checking code (and returns) can be executed by check_io() > and/or retrigger(). Yes, we do. If JACK dies, I want to know as user. Otherwise I'll just see that nothing happens. Why? And why is beast continuing to play if still nothing happens. Why should it...? > > * > > * if we have a way to abort processing, this if can be moved > > above > > * the condition wait; however, right now moving it there > > means that > > * beast will render the output as fast as possible when jack > > dies, and > > * this will look like a machine lockup > > no, endless loops don't look like lockups ;) > in the first case, your CPU meter runs at 100% and toggling Num-lock/etc. > keys still work (usually also the X server etc.) while in the latter case > *nothing* goes anymore. But it isn't a situation I would want to throw the user into either. > > */ > > Block::fill (frames_left * handle->n_channels, values, 0.0); > > /* <- remove me once we can indicate errors */ > > nope should stay (the same way alsa_device_read() copeswith errors). Its uncommon for a hardware device to go away while the application runs. Its unfortunately common for the jack server to go away. That might not be sloppy programming. The jack server might as well go away intentionally because we missed a deadline. For instance because the user was running jack and beast without rt prio enabled but had a semi-low latency. Just stopping in such cases is rather uninformative, and people may think its a beast bug or something. We should be able to present a reason why this happened to the user inside a nice dialog and then stop trying to play audio. Its pointless after jackd terminated our connection or terminated itself or crashed or something. > > const gchar *info = g_intern_printf (/* TRANSLATORS: keep this text to 70 > > chars in width */ > > _("JACK Audio Connection Kit PCM > > driver (http://jackaudio.org)\n")); > > jack version printing? again, see what the alsa driver does here. There is no jack version I can ask for in the API, so I can't print one. I could print the version of libjack I was compiled against (i.e. something that pkg-config produced or so), but the server version might be different, as long as the protocol version of the server and the protocol version of libjack match. > the single most important issue that has to be fixed here is > the mutex and condition. neither the BSE nor the JACK thread > can afford acquiring it and wait for the other thread. > that's why we need a ringbuffer based only on atomic ops. we > got that now, so the next step is to modify the code to use > it up to its fullest potential. > > basically, to get there, the jack thread needs to be modified > to act more similarl to e.g. the kernel ALSA driver wrg to > xruns. the kernel driver also has fixed-size ring buffers and > still deals with drop outs perfectly well. that kind of API > semantic has to be provided for the BSE thread in the bse-alsa > driver. then, the implementations of check_io(), retrigger(), > read() and write() can be much closer to the bse-alsa driver > implementation. also, this will eliminate the need for the > mutex and the condition. I think I don't know enough yet to give a qualified answer on what to do: especially I want to know, if * write to pipe * pthread_condition_signal * (maybe semaphores) * (maybe something else I forgot) are equally capable in their realtimeness properties, and suitable for notifying the bse master thread from the jack realtime thread, and so which is the one to choose to let the bse thread know when to compute date. Timeouts are a bad solution because they either occur too soon (in which case the engine thread may wake up, but can't do the computation), or too late, because there is always jitter. In any case you loose precious time or waste precious CPU cycles or both. I know that for practical purposes, there is no other solution the bse framework is capable than write-to-pipe, but I have seen in kernel code that the implentation is not lock-free. Can this lead to priority inversion? Can conditions do the same? What is safe and what is not? Right now, I don't know. Cu... Stefan -- Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan _______________________________________________ beast mailing list beast@... http://mail.gnome.org/mailman/listinfo/beast |
|
|
Re: bsepcmdevice-jack.cc (Re: Beast Jack driver)On Mon, 5 Jun 2006, Stefan Westerfeld wrote:
> Hi! > > Same comment as for the other reply: I'll post fixed code later, so I'll > just answer to discussion stuff. > > On Fri, Jun 02, 2006 at 01:45:01AM +0200, Tim Janik wrote: >>> enum >>> { >>> CALLBACK_STATE_INACTIVE = 0, >>> CALLBACK_STATE_ACTIVE = 1, >>> CALLBACK_STATE_PLEASE_TERMINATE = 2, >>> CALLBACK_STATE_TERMINATED = 3 >> >> minor cosmetic, we usually align '='-symbols in enums. >> >>> }; >> >> you should provide a type name for this enum, which >> can be used above to tell apart valid states of your >> callback_state variable. > > The problem is that callback_state is an atomic integer, so I think it > needs to be volatile int, and I can't declare it of the enum type; or am > I wrong? nope, that's a good point, C++ doesn't guarantee that enums are stored with int size, so you should indeed use a volatile int there to use the atomic functions. however, the decl line should then also have a comment, outlining the real type. >>> vector<jack_port_t *> input_ports; >>> vector<jack_port_t *> output_ports; >>> >>> guint buffer_frames; /* input/output >>> ringbuffer size in frames */ >>> >>> Cond cond; >>> Mutex cond_mutex; >> >> erk, what exactly is protected by this cond and this mutex? >> something like this definitely needs to be outlined in code >> (in comments or by naming prefixes, look for BirnetMutex >> in bseengineutils.c to see examples for that) > > Well, I'll put a comment there, but if you want to know what it does > right now: the condition is signalled whenever the realtime callback has > been called and is thus be used to wait for events that depend on the > realtime callback (like waiting for termination or new data). > > The mutex is mostly useless - its there because conditions require it > (you can't wait on a condition without one), but all code needs to work > without the mutex, because the realtime thread can not use lock() (only > trylock()) due to possible priority inversion. ok. note that beast doesn't 100% perfectly prevent priority inversion in other scenarios either. this issue is slowly being worked on (the idea is to move all communication to atomic ops guarded by BirnetGuard, but that's an entire new thread ;) until then, the jack RT thread should use a pipe write() to notify the engine and sequencer threads, which both create their own pipe fds that they also poll() on. >>> jack_status_t status; >>> >>> self->jack_client = jack_client_open ("beast", JackNoStartServer, >>> &status); /* FIXME: translations(?) */ >> >> what's this "beast" being used for? (especially since you >> quesiton translation) >> e.g. if it's similar to window titles, it should be "BEAST", >> if it's going to be used as a program name in continueous text, >> it should prolly be _("Beast") and if its meant as a technical >> identifier (e.g. for serialization), it should be "beast". >> >> maybe Paul can shed light on this? > > Its the name of the client, which means that for the command line utils > for instance you'll use > > beast:in_1 > > as port name. So this suggests that you may not want to translate it. > However, for qjackctl, its also the label that is displayed in the GUI, > in the widget where the clients (and ports) are listed. ok, it has to be untranslated "beast" then. >>> DEBUG ("attaching to JACK server returned status: %d\n", status); >>> /* FIXME: check status for proper error reporting >> >> doesn't teh jack API have something like: >> const char* jack_status_to_string (jack_status_t status); >> ? > > No. fine, then we need to roll our own function that does this. for existing example code that stringifies foreign errors, you can take a look at gslvorbis-cutter.c:ov_error_blurb(). it might be agood idea to submit this to jack once done btw. >>> static map<string, DeviceDetails> >>> query_jack_devices (BsePcmDeviceJACK *self) >>> { >>> map<string, DeviceDetails> devices; >>> >>> /* FIXME: we need to make a difference here between >>> * - jackd is running, but no output ports can be found >> >> curious, why exactly could that be the case? > > I am not sure it can happen, as you need to load some driver to be able > to start jack at all, and the driver will probably always register some > terminal ports. However, in theory it could be possible to start jack > only to connect two applications; for instance one that produces audio > (beast) and another one that sends it over the net somewhere. Then, > which application you start first is arbitary, and it might as well be > that beast gets started first. > > But... well... I don't know if jack is ever going to support starting > without a driver (it would need to do something to get the timing, > sample rate and so on setup; normally probably the driver does this). ok, then the querying logic should simply treat a running jackd without output ports as if no jackd was running. >>> * - jackd is not running >> >> that should/will actually be determined by jack_client_open(), right? > > Right. Then self->jack_client will be NULL. Or not, if jackd was once > running but crashed later... that should be taken care of by the cleanup handler then (afaiu). >>> } >>> } >>> free (jack_ports); >>> } >>> >>> return devices; >> >> ugh, isn't this a full fledged recursive map+strings+DeviceDetails copy? > > Yes. So? Its not in any time critical code, so I don't bother to > optimize it. well, it's not exactly "hard to optimize" here. just return a pointer. >>> static SfiRing* >>> bse_pcm_device_jack_list_devices (BseDevice *device) >>> { >>> BsePcmDeviceJACK *self = BSE_PCM_DEVICE_JACK (device); >>> map<string, DeviceDetails> devices = query_jack_devices (self); >>> >>> SfiRing *ring = NULL; >>> for (map<string, DeviceDetails>::iterator di = devices.begin(); di != >>> devices.end(); di++) >>> { >>> const string &device_name = di->first; >>> DeviceDetails &details = di->second; >>> BseDeviceEntry *entry; >>> entry = bse_device_group_entry_new (device, g_strdup >>> (device_name.c_str()), >>> g_strdup ("JACK Devices"), >>> g_strdup_printf ("%s >>> (%d*input %d*output)%s", >>> device_name.c_str(), >>> details.input_ports, >>> details.output_ports, >>> details.physical_ports == details.ports ? >>> " [ >>> hardware ]" : "")); >> >> hm, can you please elaborate the >> details.physical_ports == details.ports ? " [ hardware ]" : "" >> condition for me? ;) > > Well... if all ports of a client are physical, then its got to be a > hardware device. For instance alsa_pcm has only physical ports here. > > I am not sure what a client is that has *some* physical ports... I've > yet to see one. a combined virtual alsa device? (combing a virtual and a hw device) >>> * normal operation: >>> * >>> * [bse thread] writes some data to output_ringbuffer >>> * [bse thread] checks remaining space, there is no room left >>> * [bse thread] sleeps on the condition >> >> BSE never sleeps, waiting for audio drivers. take a look at >> alsa_device_write() and alsa_device_check_io() in bsepcmdevice-alsa.c. >> i.e. allow full buffer writes if at all possible and provide >> timeout information on how long it takes for enough buffer space >> to be available. >> that's the kind of semantic that needs to be implemented. > > Yes, it does block in the ALSA driver. However, I think I'll come back > to it in a seperate mail, since you said it shouldn't or maybe should, > so I'll investigate. the rationale provided here: http://mail.gnome.org/archives/beast/2006-June/msg00005.html for why writes shouldn't block applies also to the bse-alsa driver. the fact that bse-alsa could indeed block in its write function since we set the device to nonoblock=0 is merely the guarding i'm talking about at the end of the above email. >>> * [bse thread] writes some data to output_ringbuffer >>> * [bse thread] checks remaining space, there is no room left >>> * [jack thread] reads some data from output_ringbuffer >>> * [jack thread] signals the condition >>> * [bse thread] sleeps on the condition >> >> such races are only possible when not using conditions properly. >> you always have to follow the sequence: lock(); check(); wait(); unlock(); >> and on the pther side: lock(); update(); signal(); unlock(); >> this ensures that no update/signal goes by unnoticed by the check. see: >> http://developer.gnome.org/doc/API/2.0/glib/glib-Threads.html#GCond > > Cool. Just that I can't lock() in the RT thread, because of priority > inversion. > > I've read a little kernel code today, and fs/pipe.c has a > PIPE_MUTEX(...) which is being locked. So I am not sure I can use > write() to wakeup the master thread either. well, forget about priority inversion for the moment. it's very hard to trigger anyway. a much stronger reason to use a pipe is that the engine and synthesizer threads use poll() and not cond_wait. > However pthread_cond_signal in glibc also seems to have a kind of a > mutex (so its also a problem), although I am not sure if I interpret the > code correctly. But if I do, then I am running out of ideas how to get > this right at all. > > Well, I think I have to think/read/ask others more about it until I know > whats the right thing to do, to get the synchronization between the > engine thread and the jack realtime callback right. just use a pipe wirte() and start to concentrate on other stuff ;) >>> */ >>> if (have_cond_lock) >>> jack->cond_mutex.unlock(); >>> } >>> else >>> { >> >> when exactly is this branch tirggered? (lost track due to the excessive >> inlined commenting above.) > > There are different activation states. Thats the case where the callback > has marked to be inactive (shouldn't touch ringbuffers and such), so it > will just write out zero blocks. this (and similar things) really need to go into a state machine description comment at the start of the source file. >>> port = jack_port_register (self->jack_client, port_name, >>> JACK_DEFAULT_AUDIO_TYPE, JackPortIsInput, 0); >> >> what is JACK_DEFAULT_AUDIO_TYPE? shouldn't we request floats here (i >> dunno the jack API though). > > JACK_DEFAULT_AUDIO_TYPE is float. There is no other audio type at all, > i.e. there is no negotaition or anything. odd, should have a comment like: /* request the _only_ audio type jack has, which is float */ so rationale is preserved for future maintenance. >>> /* connect ports */ >>> >>> map<string, DeviceDetails> devices = query_jack_devices (self); >> >> querying the devices could have been done before registering >> the ports above. > > So what? I can only connect the ports anyway. "So what?" ? there's no point in connectiong ports if the devices can't be found. >>> map<string, DeviceDetails>::const_iterator di; >>> >>> di = devices.find (dname); >>> if (di != devices.end()) >>> { >>> const DeviceDetails &details = di->second; >>> >>> for (guint ch = 0; ch < handle->n_channels; ch++) >>> { >>> if (details.output_ports > ch) >>> jack_connect (self->jack_client, >>> details.output_port_names[ch].c_str(), jack_port_name >>> (jack->input_ports[ch])); >>> if (details.input_ports > ch) >>> jack_connect (self->jack_client, jack_port_name >>> (jack->output_ports[ch]), details.input_port_names[ch].c_str()); >>> } >>> } >> >> you're not setting error in case you failed to find >> the required devices to connect the channels. > > I don't want to. If you for instance start beast with > > -pjack=jamin > > to use jamin for mastering, and later the user chooses to terminate > jamin, then we can still run. Its just that then the ports will be > initially unconnected, which is not a problem, because it can be easily > fixed in qjackctl. sorry, would you care to enlighten me here please? what's -pjack=jamin supposed to mean? should the beast output ports be auto-connected to input ports named "jamin"? and, if input ports named "jamin" are not present, that'll silently be ignored? will the jack callback thread still properly run and request BSE engine data if the beast output port is unconnected? if that's the case, i think at least the behaviour of silently ignoring unsuccessfull connections has to be changed. we also throw proper errors in other drivers if unconectable devices are being adressed. (if unsuccessful connections really are common usage with jack, we can still add device flags, like we do with OSS, so you could say -pjack=jamin,auto for automated connections) >>> min_buffer_frames += BSE_PCM_DEVICE (device)->req_block_length; >> >> couldn't you instead just align min_buffer_frames to 2 * bse buffer size, >> to still fullfil the requirements you outlined, and yet yield a smaller >> ring buffer (thus optimize latency)? > > Don't know. Trouble comes when we miss the - racy - condition. That > produces one period of extra latency if things go badly wrong. Thus a > stateful synchronization which can't miss something (such as a pipe()) > would be superior. sorry? what do you mean by "stateful synchronization"? > Then I suppose we could do as you suggested. fine. >>> // honor the user defined latency specification >>> // >>> // FIXME: should we try to tweak local buffering so that it corresponds >>> // to the user defined latency, or global buffering (including the jack >>> // buffer)? we do the former, since it is easier >> >> we should do the latter. that shouldn't be too hard, Paul said that you >> could query (and inform) jack about the latency on a port. >> that then just needs to be subtracted from BSE_PCM_DEVICE >> (device)->req_latency_ms. > > It can change. If the beast output is jamin, that'd be adding latency. > If the user reconnects jamin to a convolution which adds latency, then > the total output latency query would change while we're running. Thats > why I am a bit reluctant to do it, as its results are not well-defined. i see. at the very least the latency tooltip has to be adapted then to reflect this. > It could even happen that we are not connected initially (latency = 0), > and only get connected afterwards by a patchbay, where the user is > loading a session. Thus in such a case we would always misbehave. what's that about being non-connected. does jack still request audio from beast when unconnected? and, you said jack clients would always auto connect to alsa_pcm and that alsa_pcm should always be present. so why would we start up disconnected? >>> guint user_buffer_frames = BSE_PCM_DEVICE (device)->req_latency_ms * >>> handle->mix_freq / 1000; >>> guint buffer_frames = max (min_buffer_frames, user_buffer_frames); >>> >>> jack->input_ringbuffer.resize (buffer_frames, handle->n_channels); >>> jack->output_ringbuffer.resize (buffer_frames, handle->n_channels); >> >> the way this is written, you're doubling the latency because you apply >> it to the input and the output buffer. you should just apply it to the >> output buffer (and also subtract the input buffer latency before doing >> that) > > The input buffer should not produce latency. Ideally, the filled frames > of the input buffer and the filled frames of the output buffer should > add up to exactly buffer_frames. Note that the input_buffer is *empty* > initially while the output_buffer is *full* initially, and the jack > callback methodically only adds as many frames to the input_buffer as it > takes from the output_buffer, whereas the engine process cycle does the > inverse. that means input latency increases with every output underrun, because in that case you're faking extra output values. basically, in duplex scenrios under latency constraints you have to properly resync input and output and can't just pad output values. >>> jack->buffer_frames = jack->output_ringbuffer.get_writable_frames(); >> >> what is jack->buffer_frames good for? and why is that different from >> total_n_frames() here? > > Its not different. Don't know. Maybe I'll get rid of it in the new > version. "Don't know."? ;) ok, if you need help here, take my word that if it's useless you should get rid of it ;) >>> * FIXME: we need a way to indicate an error here; beast >>> should provide >>> * an adequate reaction in case the JACK server is down (it >>> should stop >>> * playing the file, and show a dialog, if JACK can not be >>> reconnected) >> >> we don't need error indication here (see alsa_device_read()). it's >> good enough if we return gracefully here. >> error checking code (and returns) can be executed by check_io() >> and/or retrigger(). > > Yes, we do. If JACK dies, I want to know as user. Otherwise I'll just > see that nothing happens. Why? And why is beast continuing to play if > still nothing happens. Why should it...? i already said that. but i can say it again if that helps you. you should gracefully return and ignore the error in write(). you can check for error conditions in check_io() and there we can decide and undertake appropriate measures to deal with the error. e.g. by resyncing the stream (that's at least interesting for ALSA/OSS or by notifying the user, etc.) >>> * if we have a way to abort processing, this if can be moved >>> above >>> * the condition wait; however, right now moving it there >>> means that >>> * beast will render the output as fast as possible when jack >>> dies, and >>> * this will look like a machine lockup >> >> no, endless loops don't look like lockups ;) >> in the first case, your CPU meter runs at 100% and toggling Num-lock/etc. >> keys still work (usually also the X server etc.) while in the latter case >> *nothing* goes anymore. > > But it isn't a situation I would want to throw the user into either. right. simply fix check_io() then, beast won't busily write audio data if check_io() doesn't tell it so. >>> Block::fill (frames_left * handle->n_channels, values, 0.0); >>> /* <- remove me once we can indicate errors */ >> >> nope should stay (the same way alsa_device_read() copeswith errors). > > Its uncommon for a hardware device to go away while the application > runs. Its unfortunately common for the jack server to go away. That > might not be sloppy programming. The jack server might as well go away > intentionally because we missed a deadline. For instance because the > user was running jack and beast without rt prio enabled but had a > semi-low latency. Just stopping in such cases is rather uninformative, > and people may think its a beast bug or something. > > We should be able to present a reason why this happened to the user > inside a nice dialog and then stop trying to play audio. Its pointless > after jackd terminated our connection or terminated itself or crashed or > something. ok, then put an informative g_printerr() containing all the relevant error scenario information into check_io() with a FIXME next to it. we can then start discussing ways to treat processing errors in the engine and possible recovery from such scenrios. (basically processing errors are currently not supported by the engine design) >>> const gchar *info = g_intern_printf (/* TRANSLATORS: keep this text to 70 >>> chars in width */ >>> _("JACK Audio Connection Kit PCM >>> driver (http://jackaudio.org)\n")); >> >> jack version printing? again, see what the alsa driver does here. > > There is no jack version I can ask for in the API, so I can't print one. > I could print the version of libjack I was compiled against (i.e. > something that pkg-config produced or so), but the server version might > be different, as long as the protocol version of the server and the > protocol version of libjack match. ok, if jack doesn't provide us more, we'll use the compiled-against version then. after all, the runtime libjack should be compatible with that (ELF ensures that) and also compatible with the running jackd (libjack checks that). >> the single most important issue that has to be fixed here is >> the mutex and condition. neither the BSE nor the JACK thread >> can afford acquiring it and wait for the other thread. >> that's why we need a ringbuffer based only on atomic ops. we >> got that now, so the next step is to modify the code to use >> it up to its fullest potential. >> >> basically, to get there, the jack thread needs to be modified >> to act more similarl to e.g. the kernel ALSA driver wrg to >> xruns. the kernel driver also has fixed-size ring buffers and >> still deals with drop outs perfectly well. that kind of API >> semantic has to be provided for the BSE thread in the bse-alsa >> driver. then, the implementations of check_io(), retrigger(), >> read() and write() can be much closer to the bse-alsa driver >> implementation. also, this will eliminate the need for the >> mutex and the condition. > > I think I don't know enough yet to give a qualified answer on what to > do: especially I want to know, if > > * write to pipe > * pthread_condition_signal > * (maybe semaphores) > * (maybe something else I forgot) > > are equally capable in their realtimeness properties, and suitable for > notifying the bse master thread from the jack realtime thread, and so > which is the one to choose to let the bse thread know when to compute > date. well, let me simply tell you that the only viable way is to use pipe. if not for any of the various reasons you've already heared from me on the topic, just for the simple fact that the BSE engine thread and the BSE sequencer thread *already* support them. that is not the case for any of the "alternatives" you list. > Timeouts are a bad solution because they either occur too soon (in which > case the engine thread may wake up, but can't do the computation), or > too late, because there is always jitter. In any case you loose precious > time or waste precious CPU cycles or both. note that the BSE audio driver model *and* the BSE engine model are based on timeouts. that's what check_io() provides and what the engine needs. you won't get any other programming model with BSE, regardless of whether you like it or not ;) > I know that for practical purposes, there is no other solution the bse > framework is capable than write-to-pipe, but I have seen in kernel code > that the implentation is not lock-free. Can this lead to priority > inversion? you can ask on lkml to figure if/what liunx prevents priority inversion for pipe access. > Can conditions do the same? yeah, signaling via conditions also involves locks/syncronization *somewhere* in the path between two threads. > What is safe and what is not? > Right now, I don't know. simply stop worrying, like i recommended above ;) > Cu... Stefan --- ciaoTJ _______________________________________________ beast mailing list beast@... http://mail.gnome.org/mailman/listinfo/beast |
|
|
Re: bsepcmdevice-jack.cc (Re: Beast Jack driver) Hi!
On Mon, Jun 05, 2006 at 04:31:23AM +0200, Tim Janik wrote: > >>>vector<jack_port_t *> input_ports; > >>>vector<jack_port_t *> output_ports; > >>> > >>>guint buffer_frames; /* input/output > >>>ringbuffer size in frames */ > >>> > >>>Cond cond; > >>>Mutex cond_mutex; > >> > >>erk, what exactly is protected by this cond and this mutex? > >>something like this definitely needs to be outlined in code > >>(in comments or by naming prefixes, look for BirnetMutex > >>in bseengineutils.c to see examples for that) > > > >Well, I'll put a comment there, but if you want to know what it does > >right now: the condition is signalled whenever the realtime callback has > >been called and is thus be used to wait for events that depend on the > >realtime callback (like waiting for termination or new data). > > > >The mutex is mostly useless - its there because conditions require it > >(you can't wait on a condition without one), but all code needs to work > >without the mutex, because the realtime thread can not use lock() (only > >trylock()) due to possible priority inversion. > > ok. note that beast doesn't 100% perfectly prevent priority inversion in > other scenarios either. this issue is slowly being worked on (the idea > is to move all communication to atomic ops guarded by BirnetGuard, but > that's an entire new thread ;) However, these are two kinds of separate issues: (1) priority inversion occuring within the jack realtime thread (2) priority inversion occuring elsewhere in beast When (1) is being triggered, the jackd server can get dropouts while writing audio, and might decide that we are the client that is badly implemented. So it might shutdown the connection we have, leaving the other clients running. When (2) is being triggered, we miss the opportunity to refill the ringbuffer in time, so there will be a click in the output, but at least the audio processing still continues; but we also may have still enough data buffered in the rinbuffer, and nothing will happen; that will also depend on how the user configured the beast latency. So I am trying harder to avoid (1) than (2). If beast would not be likely to produce more underruns in (2) than in (1), we might as well run the beast processing within the jack RT thread. But we don't and thats good as it is. I still agree with your concluding remarks, though, which are: > until then, the jack RT thread should use a pipe write() to notify the > engine and sequencer threads, which both create their own pipe fds that > they also poll() on. After thinking some more about it, and also reading Paul's oppinion: http://permalink.gmane.org/gmane.linux.audio.devel/11725 I think that pipe write() is probably ok for a solution running under linux. Its simply that linux is no realtime operating system, and so whatever we do in our case, we can get no guarantee that it will work in 100% of the cases. So if this was an realtime airplane control software, we should probably use some other operating systen. However, for an audio software its probably enough if it works reliable enough for daily music production, and it pipe writes should deliver this degree of reliability. > >>>DEBUG ("attaching to JACK server returned status: %d\n", status); > >>>/* FIXME: check status for proper error reporting > >> > >>doesn't teh jack API have something like: > >> const char* jack_status_to_string (jack_status_t status); > >>? > > > >No. > > fine, then we need to roll our own function that does this. > for existing example code that stringifies foreign errors, > you can take a look at gslvorbis-cutter.c:ov_error_blurb(). > > it might be agood idea to submit this to jack once done btw. Its not that writing our own function will help us a lot though. Its not that every function returns a jack_status_t. Just the open function. The usual way for getting stringified errors in other situations is to set the error callback with jack_set_error_function. > >>> } > >>> } > >>> free (jack_ports); > >>> } > >>> > >>>return devices; > >> > >>ugh, isn't this a full fledged recursive map+strings+DeviceDetails copy? > > > >Yes. So? Its not in any time critical code, so I don't bother to > >optimize it. > > well, it's not exactly "hard to optimize" here. just return a pointer. I am trying to avoid manual new/delete in all code I write, because its a potential source of errors. So returning a pointer is not so good. It also makes map accesses somewhat strange to read, i.e. (*devices)["foo"] = bar; instead of devices["foo"] = bar; Under some circumstances, the C++ compiler will also construct objects that are returned from a function within the memory of the caller, so that no copy will done at all. But I don't know if the device map case is one of these cases, or not. So if you insist that it must be "optimized", I would suggest changing the API to static void query_jack_devices (BsePcmDeviceJack *self, map<string, DeviceDetails> &devices) then I can still keep my automatic deletion feature and nice to read map access code. > >>> * normal operation: > >>> * > >>> * [bse thread] writes some data to output_ringbuffer > >>> * [bse thread] checks remaining space, there is no room left > >>> * [bse thread] sleeps on the condition > >> > >>BSE never sleeps, waiting for audio drivers. take a look at > >>alsa_device_write() and alsa_device_check_io() in bsepcmdevice-alsa.c. > >>i.e. allow full buffer writes if at all possible and provide > >>timeout information on how long it takes for enough buffer space > >>to be available. > >>that's the kind of semantic that needs to be implemented. > > > >Yes, it does block in the ALSA driver. However, I think I'll come back > >to it in a seperate mail, since you said it shouldn't or maybe should, > >so I'll investigate. > > the rationale provided here: > http://mail.gnome.org/archives/beast/2006-June/msg00005.html > for why writes shouldn't block applies also to the bse-alsa driver. > the fact that bse-alsa could indeed block in its write function > since we set the device to nonoblock=0 is merely the guarding > i'm talking about at the end of the above email. Ok, investigating the question pointed out to in the mail you linked there is still on my todo. > >Cool. Just that I can't lock() in the RT thread, because of priority > >inversion. > > > >I've read a little kernel code today, and fs/pipe.c has a > >PIPE_MUTEX(...) which is being locked. So I am not sure I can use > >write() to wakeup the master thread either. > > well, forget about priority inversion for the moment. it's very hard > to trigger anyway. a much stronger reason to use a pipe is that the > engine and synthesizer threads use poll() and not cond_wait. Yes, I changed my mind. I will use write(). > >>>/* connect ports */ > >>> > >>>map<string, DeviceDetails> devices = query_jack_devices (self); > >> > >>querying the devices could have been done before registering > >>the ports above. > > > >So what? I can only connect the ports anyway. > > "So what?" ? there's no point in connectiong ports if the > devices can't be found. I will not "connect" the points if the device can not be found, since then the (di != devices.end()) condition (4 lines below from here) will be false. However, I still need to register the beast output ports, so that beast can write output, although then it will be unconnected when writing output, which is perfectly valid in the jack world. It happens all the time. The user can just connect it afterwards using a patchbay. > >>>map<string, DeviceDetails>::const_iterator di; > >>> > >>>di = devices.find (dname); > >>>if (di != devices.end()) > >>> { > >>> const DeviceDetails &details = di->second; > >>> > >>> for (guint ch = 0; ch < handle->n_channels; ch++) > >>> { > >>> if (details.output_ports > ch) > >>> jack_connect (self->jack_client, > >>> details.output_port_names[ch].c_str(), jack_port_name > >>> (jack->input_ports[ch])); > >>> if (details.input_ports > ch) > >>> jack_connect (self->jack_client, jack_port_name > >>> (jack->output_ports[ch]), details.input_port_names[ch].c_str()); > >>> } > >>> } > >> > >>you're not setting error in case you failed to find > >>the required devices to connect the channels. > > > >I don't want to. If you for instance start beast with > > > >-pjack=jamin > > > >to use jamin for mastering, and later the user chooses to terminate > >jamin, then we can still run. Its just that then the ports will be > >initially unconnected, which is not a problem, because it can be easily > >fixed in qjackctl. > > sorry, would you care to enlighten me here please? > what's -pjack=jamin supposed to mean? > should the beast output ports be auto-connected to input ports named > "jamin"? > and, if input ports named "jamin" are not present, that'll silently be > ignored? Yes. > will the jack callback thread still properly run and request BSE engine data > if the beast output port is unconnected? Yes. > if that's the case, i think at least the behaviour of silently ignoring > unsuccessfull connections has to be changed. we also throw proper errors > in other drivers if unconectable devices are being adressed. > (if unsuccessful connections really are common usage with jack, we can > still add device flags, like we do with OSS, so you could say > -pjack=jamin,auto for automated connections) I don't know from your example what you envision the syntax to be like. There are four good cases as I see the situation. (1) request beast to autoconnect, and request beast to find a suitable device to autoconnect to without asking -> this is what -pjack does right now, and I think it should be kept this way (2) request beast to connect to a specific client if possible, but continue if that doesn't work (we could have some [X] report errors on jack connection problems message, then the user could disable the error, or we could make this configurable) -> this is what -pjack=jamin does right now (3) request beast to connect to a specific client, and let it fail and report an error if it doesn't work (4) request beast to start without connecting a client -> this is what -pjack=unconnected does right now, as long as there is no jack client called "unconnected", in which case this behaves like (2) What I wasn't too sure about when designing the syntax is whether the position of the argument should play a role. Suppose we also decided some day to add ro (readonly) and wo (writeonly) as options. Should then -p jack=auto,ro,alsa_pcm be the same like -p jack=alsa_pcm,ro,auto This rules out the possibility of connecting clients called "auto" or "ro", which is maybe a bad decision. So maybe we should start with something like -p jack=connect:alsa_pcm -p jack=connect:alsa_pcm,wo or so? I am really not sure what would be a good and clean syntax here, that handles all cases that are needed. > >>>min_buffer_frames += BSE_PCM_DEVICE (device)->req_block_length; > >> > >>couldn't you instead just align min_buffer_frames to 2 * bse buffer size, > >>to still fullfil the requirements you outlined, and yet yield a smaller > >>ring buffer (thus optimize latency)? > > > >Don't know. Trouble comes when we miss the - racy - condition. That > >produces one period of extra latency if things go badly wrong. Thus a > >stateful synchronization which can't miss something (such as a pipe()) > >would be superior. > > sorry? what do you mean by "stateful synchronization"? By stateful synchronization I mean that writing to a pipe will definitely wakeup the engine thread later, whereas signalling a condition will only wakeup the engine thread if it has been sleeping on the condition. The former is better. > >It could even happen that we are not connected initially (latency = 0), > >and only get connected afterwards by a patchbay, where the user is > >loading a session. Thus in such a case we would always misbehave. > > what's that about being non-connected. > does jack still request audio from beast when unconnected? Yes. > and, you said jack clients would always auto connect to alsa_pcm > and that alsa_pcm should always be present. so why would we start > up disconnected? Thats not what I wanted to say. I just pointed out that lots of people hardcode alsa_pcm somewhere in their code. However, whether a jack application should startup connected or unconnected, or how defaulting or auto connecting should work for "normal" jack applications is an open issue as far as I understand from reading the mailing list. There have been some threads suggesting this or that (like api extensions, extra default flags on ports and the like), and some people have been pointing out that they like this or that behaviour better. So I think until people have figured out what they want, I'd like to see beast to support both: connect on open and open unconnected (so that some external program can do the work). > >>>guint user_buffer_frames = BSE_PCM_DEVICE (device)->req_latency_ms * > >>>handle->mix_freq / 1000; > >>>guint buffer_frames = max (min_buffer_frames, user_buffer_frames); > >>> > >>>jack->input_ringbuffer.resize (buffer_frames, handle->n_channels); > >>>jack->output_ringbuffer.resize (buffer_frames, handle->n_channels); > >> > >>the way this is written, you're doubling the latency because you apply > >>it to the input and the output buffer. you should just apply it to the > >>output buffer (and also subtract the input buffer latency before doing > >>that) > > > >The input buffer should not produce latency. Ideally, the filled frames > >of the input buffer and the filled frames of the output buffer should > >add up to exactly buffer_frames. Note that the input_buffer is *empty* > >initially while the output_buffer is *full* initially, and the jack > >callback methodically only adds as many frames to the input_buffer as it > >takes from the output_buffer, whereas the engine process cycle does the > >inverse. > > that means input latency increases with every output underrun, because > in that case you're faking extra output values. basically, in duplex > scenrios under latency constraints you have to properly resync input > and output and can't just pad output values. I don't think it can happen. After an output underrun, the jack_device_retrigger code clears both ringbuffers (input and output), and then refills the output buffer with zeros. > >>> * FIXME: we need a way to indicate an error here; beast > >>> should provide > >>> * an adequate reaction in case the JACK server is down (it > >>> should stop > >>> * playing the file, and show a dialog, if JACK can not be > >>> reconnected) > >> > >>we don't need error indication here (see alsa_device_read()). it's > >>good enough if we return gracefully here. > >>error checking code (and returns) can be executed by check_io() > >>and/or retrigger(). > > > >Yes, we do. If JACK dies, I want to know as user. Otherwise I'll just > >see that nothing happens. Why? And why is beast continuing to play if > >still nothing happens. Why should it...? > > i already said that. but i can say it again if that helps you. > you should gracefully return and ignore the error in write(). > you can check for error conditions in check_io() and there we > can decide and undertake appropriate measures to deal with the > error. e.g. by resyncing the stream (that's at least interesting > for ALSA/OSS or by notifying the user, etc.) Yes, right. If I implement check_io properly, one of the problems goes away. > >Timeouts are a bad solution because they either occur too soon (in which > >case the engine thread may wake up, but can't do the computation), or > >too late, because there is always jitter. In any case you loose precious > >time or waste precious CPU cycles or both. > > note that the BSE audio driver model *and* the BSE engine model are based > on timeouts. that's what check_io() provides and what the engine needs. > you won't get any other programming model with BSE, regardless of whether > you like it or not ;) Well, pipes are another programming model. I don't think we'll need timeouts when we can use pipes. But right now I can only make theoretical claims about it: I will have to work on the code to see whether the code can be written in a way so that check_io will rely on the notifications of pipes (and the lockless ringbuffers) only. Then it will be able to return an arbitary timeout, but the pipe will still ensure that everything goes well. Cu... Stefan -- Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan _______________________________________________ beast mailing list beast@... http://mail.gnome.org/mailman/listinfo/beast |
|
|
Re: bsepcmdevice-jack.cc (Re: Beast Jack driver)On Tue, 6 Jun 2006, Stefan Westerfeld wrote:
> On Mon, Jun 05, 2006 at 04:31:23AM +0200, Tim Janik wrote: >> ok. note that beast doesn't 100% perfectly prevent priority inversion in >> other scenarios either. this issue is slowly being worked on (the idea >> is to move all communication to atomic ops guarded by BirnetGuard, but >> that's an entire new thread ;) > > However, these are two kinds of separate issues: > (1) priority inversion occuring within the jack realtime thread > (2) priority inversion occuring elsewhere in beast > > When (1) is being triggered, the jackd server can get dropouts while > writing audio, and might decide that we are the client that is badly > implemented. So it might shutdown the connection we have, leaving the > other clients running. > > When (2) is being triggered, we miss the opportunity to refill the > ringbuffer in time, so there will be a click in the output, but at least > the audio processing still continues; but we also may have still enough > data buffered in the rinbuffer, and nothing will happen; that will also > depend on how the user configured the beast latency. > > So I am trying harder to avoid (1) than (2). these consideraitons shouldn't be made without taking into account the probability for priority inversion scenrios which is very close to 0 in practice. so this is an academic concern for the most part. > If beast would not be > likely to produce more underruns in (2) than in (1), we might as well > run the beast processing within the jack RT thread. But we don't and > thats good as it is. > > I still agree with your concluding remarks, though, which are: great! >> until then, the jack RT thread should use a pipe write() to notify the >> engine and sequencer threads, which both create their own pipe fds that >> they also poll() on. > > After thinking some more about it, and also reading Paul's oppinion: > > http://permalink.gmane.org/gmane.linux.audio.devel/11725 > > I think that pipe write() is probably ok for a solution running under > linux. Its simply that linux is no realtime operating system, and so > whatever we do in our case, we can get no guarantee that it will work in > 100% of the cases. with the recent low latency in linux kernels, it's starting to become a very good pseudo realtime operating system: http://lac.zkm.de/2006/papers/lac2006_lee_revell.pdf (Realtime Audio vs. Linux 2.6 paper by Lee Revell) > So if this was an realtime airplane control software, > we should probably use some other operating systen. exactly! ;) i'm so glad you recognize we're _not_ coding realtime airplane control software here which runs redundantly in 4 self monitoring processors ;) > However, for an audio software its probably enough if it works reliable > enough for daily music production, and it pipe writes should deliver > this degree of reliability. yes, i'd hope so. if not it's time for filing a kernel bug. [...] > Cu... Stefan --- ciaoTJ _______________________________________________ beast mailing list beast@... http://mail.gnome.org/mailman/listinfo/beast |
|
|
Re: bsepcmdevice-jack.cc (Re: Beast Jack driver)i'm taking Paul off the Cc: list, since we're probably just pestering with most stuff. Paul, FYI this is also all available through gmane: http://thread.gmane.org/gmane.comp.gnome.beast/171/focus=178 or of course our list archives: http://mail.gnome.org/archives/beast/ On Tue, 6 Jun 2006, Stefan Westerfeld wrote: > On Mon, Jun 05, 2006 at 04:31:23AM +0200, Tim Janik wrote: >>>>> DEBUG ("attaching to JACK server returned status: %d\n", status); >>>>> /* FIXME: check status for proper error reporting >>>> >>>> doesn't teh jack API have something like: >>>> const char* jack_status_to_string (jack_status_t status); >>>> ? >>> >>> No. >> >> fine, then we need to roll our own function that does this. >> for existing example code that stringifies foreign errors, >> you can take a look at gslvorbis-cutter.c:ov_error_blurb(). >> >> it might be agood idea to submit this to jack once done btw. > > Its not that writing our own function will help us a lot though. Its not > that every function returns a jack_status_t. Just the open function. it will help to improve the debugging output. we've done that for all other drivers, we should also do it here. and it prolly wouldn't hurt JACK to export status stringification. > The usual way for getting stringified errors in other situations is to > set the error callback with jack_set_error_function. dunno how that's applicable here... > Cu... Stefan --- ciaoTJ _______________________________________________ beast mailing list beast@... http://mail.gnome.org/mailman/listinfo/beast |
|
|
Re: bsepcmdevice-jack.cc (Re: Beast Jack driver)On Tue, 6 Jun 2006, Stefan Westerfeld wrote:
> On Mon, Jun 05, 2006 at 04:31:23AM +0200, Tim Janik wrote: >>>>> free (jack_ports); >>>>> } >>>>> >>>>> return devices; >>>> >>>> ugh, isn't this a full fledged recursive map+strings+DeviceDetails copy? >>> >>> Yes. So? Its not in any time critical code, so I don't bother to >>> optimize it. >> >> well, it's not exactly "hard to optimize" here. just return a pointer. > > I am trying to avoid manual new/delete in all code I write, because its > a potential source of errors. So returning a pointer is not so good. It > also makes map accesses somewhat strange to read, i.e. > > (*devices)["foo"] = bar; > > instead of > > devices["foo"] = bar; > > Under some circumstances, the C++ compiler will also construct objects > that are returned from a function within the memory of the caller, so > that no copy will done at all. But I don't know if the device map case > is one of these cases, or not. well, it'd be good to find out about that then ;) i.e. does std::map do the kind of internal data refcounting that std::string does? > So if you insist that it must be "optimized", I would suggest changing > the API to > > static void > query_jack_devices (BsePcmDeviceJack *self, > map<string, DeviceDetails> &devices) > > then I can still keep my automatic deletion feature and nice to read > map access code. sure. it'd still be interesting to know how expensive map copies are though ;) >>>> you're not setting error in case you failed to find >>>> the required devices to connect the channels. >>> >>> I don't want to. If you for instance start beast with >>> >>> -pjack=jamin >>> >>> to use jamin for mastering, and later the user chooses to terminate >>> jamin, then we can still run. Its just that then the ports will be >>> initially unconnected, which is not a problem, because it can be easily >>> fixed in qjackctl. >> >> sorry, would you care to enlighten me here please? >> what's -pjack=jamin supposed to mean? >> should the beast output ports be auto-connected to input ports named >> "jamin"? > Yes. > >> and, if input ports named "jamin" are not present, that'll silently be >> ignored? > Yes. > >> will the jack callback thread still properly run and request BSE engine data >> if the beast output port is unconnected? > Yes. > >> if that's the case, i think at least the behaviour of silently ignoring >> unsuccessfull connections has to be changed. we also throw proper errors >> in other drivers if unconectable devices are being adressed. >> (if unsuccessful connections really are common usage with jack, we can >> still add device flags, like we do with OSS, so you could say >> -pjack=jamin,auto for automated connections) > > I don't know from your example what you envision the syntax to be like. > There are four good cases as I see the situation. > > (1) request beast to autoconnect, and request beast to find a suitable > device to autoconnect to without asking > > -> this is what -pjack does right now, and I think it should be kept > this way you mean, the shorthand -p jack is equivalent to -p jack=alsa_pcm ? i'd say that makes sense. > (2) request beast to connect to a specific client if possible, but > continue if that doesn't work (we could have some [X] report errors > on jack connection problems message, then the user could disable > the error, or we could make this configurable) > > -> this is what -pjack=jamin does right now i don't think continuation is a good idea here, as i already outlined. we don't accept failing destinations in the other drivers either. in general, we don't by default silently ignore warnings. and shouldn't do that here either. about making the warning/error dialog optional, that is always a good idea anyway. the point is about having the warning/ error in the first place though, and yes, that should be the case as it is with other drivers. however, you said it would make sense in some scenrios to let -p jack=port effectively behave like -p jack=unconnected in case "port" can't be found. that's what i suggested the "auto" flag for. i.e. -p jack=port,auto could either succeed in connecting to "port" or behave like -p jack=unconnected if "port" couldn#t be connected to. > (3) request beast to connect to a specific client, and let it fail and > report an error if it doesn't work that's exactly like (2), since reporting an error/warning will be a dialog with a [x] deduping option. > (4) request beast to start without connecting a client > > -> this is what -pjack=unconnected does right now, as long as there is > no jack client called "unconnected", in which case this behaves like > (2) is "unconnected" some kind of 'standard' in the jack world? what do other programs use as identifier for this mode? i'm asking because i think "unconnected" is maybe a bit more tedious than just "open". what do you think? > What I wasn't too sure about when designing the syntax is whether the > position of the argument should play a role. definitely, take a peek at beast --bse-driver-list. note that i've not seen anything about your "syntax" yet though, you just wrote "jack=PORT" in the docs. in any case, you should basically pick up what the OSS driver does, for jack that'd be (mockup): jack jack=PORT,MODE // or s/MODE/FLAGS/ if you prefer JACK sound daemon PCM driver: PORT - Jack port name (use "open" for unconnected ports) MODE - may contain "rw" or "wo" for read-write or write-only access; adding "hs" forces hard sync on underruns. Adding "auto" allows failing port connects and behaves as if "open" was specified as PORT. Devices: > alsa_pcm (read-write) > Suppose we also decided > some day to add ro (readonly) and wo (writeonly) as options. Should then > > -p jack=auto,ro,alsa_pcm > > be the same like > > -p jack=alsa_pcm,ro,auto nope, like OSS, you're supposed to use jack=RHS with: RHS := DEVICEorPORT [ ',' FLAG ]* DEVICEorPORT := 'alsa_pcm' | 'open' FLAG := 'rw' | 'wo' | 'auto' | 'hs' > This rules out the possibility of connecting clients called "auto" or > "ro", which is maybe a bad decision. So maybe we should start with > something like > > -p jack=connect:alsa_pcm > -p jack=connect:alsa_pcm,wo > > or so? I am really not sure what would be a good and clean syntax here, > that handles all cases that are needed. i don't think we have to do that just yet without any use cases. using the port name as first arg is good and easy enough and allowes for arbitrary (albeit not comma-comprised) port names. >>> It could even happen that we are not connected initially (latency = 0), >>> and only get connected afterwards by a patchbay, where the user is >>> loading a session. Thus in such a case we would always misbehave. >> >> what's that about being non-connected. >> does jack still request audio from beast when unconnected? > > Yes. ok, that's where you'd then usually want to use -p jack=PATCHBAY-INPUT,auto if i understand correctly. right? >> and, you said jack clients would always auto connect to alsa_pcm >> and that alsa_pcm should always be present. so why would we start >> up disconnected? > > Thats not what I wanted to say. I just pointed out that lots of people > hardcode alsa_pcm somewhere in their code. However, whether a jack > application should startup connected or unconnected, or how defaulting > or auto connecting should work for "normal" jack applications is an open > issue as far as I understand from reading the mailing list. There have > been some threads suggesting this or that (like api extensions, extra > default flags on ports and the like), and some people have been pointing > out that they like this or that behaviour better. > > So I think until people have figured out what they want, I'd like to see > beast to support both: connect on open and open unconnected (so that > some external program can do the work). ok, i see. well, ... and default to "alsa_pcm" i presume? >>>>> guint user_buffer_frames = BSE_PCM_DEVICE (device)->req_latency_ms * >>>>> handle->mix_freq / 1000; >>>>> guint buffer_frames = max (min_buffer_frames, user_buffer_frames); >>>>> >>>>> jack->input_ringbuffer.resize (buffer_frames, handle->n_channels); >>>>> jack->output_ringbuffer.resize (buffer_frames, handle->n_channels); >>>> >>>> the way this is written, you're doubling the latency because you apply >>>> it to the input and the output buffer. you should just apply it to the >>>> output buffer (and also subtract the input buffer latency before doing >>>> that) >>> >>> The input buffer should not produce latency. Ideally, the filled frames >>> of the input buffer and the filled frames of the output buffer should >>> add up to exactly buffer_frames. Note that the input_buffer is *empty* >>> initially while the output_buffer is *full* initially, and the jack >>> callback methodically only adds as many frames to the input_buffer as it >>> takes from the output_buffer, whereas the engine process cycle does the >>> inverse. >> >> that means input latency increases with every output underrun, because >> in that case you're faking extra output values. basically, in duplex >> scenrios under latency constraints you have to properly resync input >> and output and can't just pad output values. > > I don't think it can happen. After an output underrun, the > jack_device_retrigger code clears both ringbuffers (input and output), > and then refills the output buffer with zeros. well, note that for the sake of making resyncronization/retriggering less nasty when you encounter underruns, we might want to do less than fully retriggerring the device. e.g. we could "soft-sync" by writing an extra 0-buffer into jack and by throwing away a single input buffer (to keep the input latency). doing that will need the jack thread to write-access the read-pointer of the ring buffer though, which you code currently doesn't account for. or, it might also be possible to have an atomic throw-away-input-blocks counter that the BSE driver trhead could decrement and ignore incoming buffers.... that's a bit hackish in the presence of a limited ring buffer size though, so adjusting the read-pointer from the jack thread would be much better. reducing underrun impact by using soft-sync vs. hard-sync is also done by the OSS driver ("hs" flag). >>> Timeouts are a bad solution because they either occur too soon (in which >>> case the engine thread may wake up, but can't do the computation), or >>> too late, because there is always jitter. In any case you loose precious >>> time or waste precious CPU cycles or both. >> >> note that the BSE audio driver model *and* the BSE engine model are based >> on timeouts. that's what check_io() provides and what the engine needs. >> you won't get any other programming model with BSE, regardless of whether >> you like it or not ;) > > Well, pipes are another programming model. I don't think we'll need > timeouts when we can use pipes. it's not a question of "need". we *have* timeout behaviour already. > But right now I can only make > theoretical claims about it: I will have to work on the code to see > whether the code can be written in a way so that check_io will rely on > the notifications of pipes (and the lockless ringbuffers) only. fine. if we switched to pipe based notification, we should do that for *all* drivers. since the changes related to this are not going to be trivial, i really want to seperate this from the general jack driver development. so please lets concentrate on finishing up bse-jack with the current timing model and after that look into timing alternatives. > Then it > will be able to return an arbitary timeout, but the pipe will still > ensure that everything goes well. > > Cu... Stefan --- ciaoTJ _______________________________________________ beast mailing list beast@... http://mail.gnome.org/mailman/listinfo/beast |
|
|
Re: bsepcmdevice-jack.cc (Re: Beast Jack driver)Tim Janik <timj@...> writes:
> you said jack clients would always auto connect to alsa_pcm This is wrong. The general consensus is that JACK applications should never auto connect to anything. Connection management is handled externally. There is talk of a new addition to the API, where the user may set a flag to indicate if the application should auto connect to a specific set of ports, but it's not there yet. -- Esben Stien is b0ef@e s a http://www. s t n m irc://irc. b - i . e/%23contact sip:b0ef@ e e jid:b0ef@ n n _______________________________________________ beast mailing list beast@... http://mail.gnome.org/mailman/listinfo/beast |
|
|
Re: bsepcmdevice-jack.cc (Re: Beast Jack driver)On Wed, 7 Jun 2006, Esben Stien wrote:
> Tim Janik <timj@...> writes: > >> you said jack clients would always auto connect to alsa_pcm > > This is wrong. The general consensus is that JACK applications should > never auto connect to anything. Connection management is handled > externally. There is talk of a new addition to the API, where the user > may set a flag to indicate if the application should auto connect to a > specific set of ports, but it's not there yet. a flag for auto-connect settable by the user? we could certainly add that to our preferences. > -- > Esben Stien is b0ef@e s a > http://www. s t n m > irc://irc. b - i . e/%23contact > sip:b0ef@ e e > jid:b0ef@ n n --- ciaoTJ _______________________________________________ beast mailing list beast@... http://mail.gnome.org/mailman/listinfo/beast |
| Free embeddable forum powered by Nabble | Forum Help |