contiguous DMA while transferring data over IP

View: New views
6 Messages — Rating Filter:   Alert me  

contiguous DMA while transferring data over IP

by Eugene Grayver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Hello,

I am using a Verdex video interface to capture data from a custom CCD processor.  The interface goes directly into a small FIFO on the PXA270.  I setup a DMA to get data from the FIFO into bulk memory.  After an image has been transferred, it is sent over the network to a server.  It takes too long to capture the image and then send it out.  I'd like to start sending data out while it is being received.  Unfortunately, the DMA buffer is locked and cannot be accessed by the user-space program while DMA is running. 

My idea is to break up the transfer into N smaller DMAs.  As each DMA completes, I transfer that chunk, while the next DMA is writing to a new buffer.  The way the custom hardware is setup, I cannot tolerate pauses in emptying the FIFO (it will overrun).  Thus, I need the DMAs to run (almost) contiguously.  Is there a way to queue up all N DMA requests and have the kernel take care of starting each one after the previous one has completed?

Thanks.

------------------------------------------------------------------------------
_______________________________________________
gumstix-users mailing list
gumstix-users@...
https://lists.sourceforge.net/lists/listinfo/gumstix-users

Re: contiguous DMA while transferring data over IP

by Ned Forrester :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 06/26/2009 11:01 PM, Eugene Grayver wrote:

> Hello,
>
> I am using a Verdex video interface to capture data from a custom CCD
> processor.  The interface goes directly into a small FIFO on the
> PXA270.  I setup a DMA to get data from the FIFO into bulk memory.
> After an image has been transferred, it is sent over the network to a
> server.  It takes too long to capture the image and then send it out.
> I'd like to start sending data out while it is being received.
> Unfortunately, the DMA buffer is locked and cannot be accessed by the
> user-space program while DMA is running.
>
> My idea is to break up the transfer into N smaller DMAs.  As each DMA
> completes, I transfer that chunk, while the next DMA is writing to a
> new buffer.  The way the custom hardware is setup, I cannot tolerate
> pauses in emptying the FIFO (it will overrun).  Thus, I need the DMAs
> to run (almost) contiguously.  Is there a way to queue up all N DMA
> requests and have the kernel take care of starting each one after the
> previous one has completed?

Yes, but you will likely have to re-write the kernel driver to support
that.  The PXA series has support for descriptor-fetch DMA, in which the
essential DMA registers are re-loaded from memory by the DMA hardware,
itself, so that DMA buffers can be filled one after another without
intervention by the CPU.  I have used this successfully in a re-write of
pxa2xx_spi.c, which is the driver for the SPI interface.

The maximum DMA length in the PXA2xx processors is only 8191 bytes, so I
find it hard to believe that you would want the buffer length to be any
smaller (than possibly 4096 bytes) if you are transferring images.

--
Ned Forrester                                       nforrester@...
Oceanographic Systems Lab                                  508-289-2226
Applied Ocean Physics and Engineering Dept.
Woods Hole Oceanographic Institution          Woods Hole, MA 02543, USA
http://www.whoi.edu/
http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212
http://www.whoi.edu/hpb/Site.do?id=1532
http://www.whoi.edu/page.do?pid=10079


------------------------------------------------------------------------------
_______________________________________________
gumstix-users mailing list
gumstix-users@...
https://lists.sourceforge.net/lists/listinfo/gumstix-users

Re: contiguous DMA while transferring data over IP

by Eugene Grayver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Hmm... I must be entirely ignorant about the details of PXA DMA transfers.  We modified the kernel to have 16MB of coherent memory space and were doing a single LONG DMA transfer from the FIFO to the memory.  I did not realize that the transfer was already being broken up into 8k chunks.  I would be entirely happy with chunks of ~100kB.  However, when I tried to break the transfer up into multiple DMAs from user mode, I got overruns.



----- Original Message ----
From: Ned Forrester <nforrester@...>
To: General mailing list for gumstix users. <gumstix-users@...>
Sent: Friday, June 26, 2009 9:01:06 PM
Subject: Re: [Gumstix-users] contiguous DMA while transferring data over IP

On 06/26/2009 11:01 PM, Eugene Grayver wrote:

> Hello,
>
> I am using a Verdex video interface to capture data from a custom CCD
> processor.  The interface goes directly into a small FIFO on the
> PXA270.  I setup a DMA to get data from the FIFO into bulk memory.
> After an image has been transferred, it is sent over the network to a
> server.  It takes too long to capture the image and then send it out.
> I'd like to start sending data out while it is being received.
> Unfortunately, the DMA buffer is locked and cannot be accessed by the
> user-space program while DMA is running.
>
> My idea is to break up the transfer into N smaller DMAs.  As each DMA
> completes, I transfer that chunk, while the next DMA is writing to a
> new buffer.  The way the custom hardware is setup, I cannot tolerate
> pauses in emptying the FIFO (it will overrun).  Thus, I need the DMAs
> to run (almost) contiguously.  Is there a way to queue up all N DMA
> requests and have the kernel take care of starting each one after the
> previous one has completed?

Yes, but you will likely have to re-write the kernel driver to support
that.  The PXA series has support for descriptor-fetch DMA, in which the
essential DMA registers are re-loaded from memory by the DMA hardware,
itself, so that DMA buffers can be filled one after another without
intervention by the CPU.  I have used this successfully in a re-write of
pxa2xx_spi.c, which is the driver for the SPI interface.

The maximum DMA length in the PXA2xx processors is only 8191 bytes, so I
find it hard to believe that you would want the buffer length to be any
smaller (than possibly 4096 bytes) if you are transferring images.

--
Ned Forrester                                      nforrester@...
Oceanographic Systems Lab                                  508-289-2226
Applied Ocean Physics and Engineering Dept.
Woods Hole Oceanographic Institution          Woods Hole, MA 02543, USA
http://www.whoi.edu/
http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212
http://www.whoi.edu/hpb/Site.do?id=1532
http://www.whoi.edu/page.do?pid=10079


------------------------------------------------------------------------------
_______________________________________________
gumstix-users mailing list
gumstix-users@...
https://lists.sourceforge.net/lists/listinfo/gumstix-users


------------------------------------------------------------------------------
_______________________________________________
gumstix-users mailing list
gumstix-users@...
https://lists.sourceforge.net/lists/listinfo/gumstix-users

Re: contiguous DMA while transferring data over IP

by Ned Forrester :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 06/30/2009 08:37 PM, Eugene Grayver wrote:
> Hmm... I must be entirely ignorant about the details of PXA DMA
> transfers.  We modified the kernel to have 16MB of coherent memory
> space and were doing a single LONG DMA transfer from the FIFO to the
> memory.  I did not realize that the transfer was already being broken
> up into 8k chunks.  I would be entirely happy with chunks of ~100kB.
> However, when I tried to break the transfer up into multiple DMAs
> from user mode, I got overruns.

Hmmm...  The kernel driver must be breaking the transfer into smaller
bits.  I suggest you take a look at the driver code and see if you can
figure out what it is doing for you with regard to partitioning the DMA
and how it is managing to keep up with the data input.  I gather 16MB is
enough for a whole picture.

Exactly which peripheral on the PXA270 are you using?  Which driver?
What speed (bytes/sec) does this device require?

If you are using the Quick Capture Interface, I see reference in section
27.4.4.3 of the Developer's Manual to setting up a descriptor chain for
and entire frame.  That is what I would expect to be required to capture
data from an external device that is supplying the transfer clock.
Perhaps this is what the kernel driver is doing.

Do you really have to get a single frame faster, or is it a matter of
keeping up with a particular frame rate?  If the latter, could you just
use two buffers so that you can be transmitting one while the other is
being filled by the driver?

--
Ned Forrester                                       nforrester@...
Oceanographic Systems Lab                                  508-289-2226
Applied Ocean Physics and Engineering Dept.
Woods Hole Oceanographic Institution          Woods Hole, MA 02543, USA
http://www.whoi.edu/
http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212
http://www.whoi.edu/hpb/Site.do?id=1532
http://www.whoi.edu/page.do?pid=10079


------------------------------------------------------------------------------
_______________________________________________
gumstix-users mailing list
gumstix-users@...
https://lists.sourceforge.net/lists/listinfo/gumstix-users

Re: contiguous DMA while transferring data over IP

by Eugene Grayver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Hi Ned,
Thanks for the ideas.  Looking into the driver, pxa_camera.c, I see that it was written for V4L and is therefore frame oriented.  I am reading single frames from a large CCD over the QuickCapture interface and it takes a few seconds to complete a read.  It then takes another few seconds to transfer the data.  The overall latency is too high.  So, I really do need to get a single frame out faster rather than keeping up with a frame rate.  I may only get a frame a minute, driven by an external trigger signal.

I tried fooling the driver by telling it that each frame is a fraction of the actual frame.  Unfortunately, I ended up with bad data at the subframe boundaries (I guess the FIFO overflowed).  There should be a way for the driver to provide 'updates' to the application -- I got x% of the data, go ahead and read it...  This is much lower level code than I am used to, so any suggestions are very welcome.  Thanks for pointing me to the right section of the dev manual, I'll see if I can make some sense of it.


Thanks.



----- Original Message ----
From: Ned Forrester <nforrester@...>
To: General mailing list for gumstix users. <gumstix-users@...>
Sent: Tuesday, June 30, 2009 8:10:17 PM
Subject: Re: [Gumstix-users] contiguous DMA while transferring data over IP

On 06/30/2009 08:37 PM, Eugene Grayver wrote:
> Hmm... I must be entirely ignorant about the details of PXA DMA
> transfers.  We modified the kernel to have 16MB of coherent memory
> space and were doing a single LONG DMA transfer from the FIFO to the
> memory.  I did not realize that the transfer was already being broken
> up into 8k chunks.  I would be entirely happy with chunks of ~100kB.
> However, when I tried to break the transfer up into multiple DMAs
> from user mode, I got overruns.

Hmmm...  The kernel driver must be breaking the transfer into smaller
bits.  I suggest you take a look at the driver code and see if you can
figure out what it is doing for you with regard to partitioning the DMA
and how it is managing to keep up with the data input.  I gather 16MB is
enough for a whole picture.

Exactly which peripheral on the PXA270 are you using?  Which driver?
What speed (bytes/sec) does this device require?

If you are using the Quick Capture Interface, I see reference in section
27.4.4.3 of the Developer's Manual to setting up a descriptor chain for
and entire frame.  That is what I would expect to be required to capture
data from an external device that is supplying the transfer clock.
Perhaps this is what the kernel driver is doing.

Do you really have to get a single frame faster, or is it a matter of
keeping up with a particular frame rate?  If the latter, could you just
use two buffers so that you can be transmitting one while the other is
being filled by the driver?

--
Ned Forrester                                      nforrester@...
Oceanographic Systems Lab                                  508-289-2226
Applied Ocean Physics and Engineering Dept.
Woods Hole Oceanographic Institution          Woods Hole, MA 02543, USA
http://www.whoi.edu/
http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212
http://www.whoi.edu/hpb/Site.do?id=1532
http://www.whoi.edu/page.do?pid=10079


------------------------------------------------------------------------------
_______________________________________________
gumstix-users mailing list
gumstix-users@...
https://lists.sourceforge.net/lists/listinfo/gumstix-users


------------------------------------------------------------------------------
_______________________________________________
gumstix-users mailing list
gumstix-users@...
https://lists.sourceforge.net/lists/listinfo/gumstix-users

Re: contiguous DMA while transferring data over IP

by Ned Forrester :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 07/03/2009 08:33 PM, Eugene Grayver wrote:
> Hi Ned, Thanks for the ideas.  Looking into the driver, pxa_camera.c,
> I see that it was written for V4L and is therefore frame oriented.  

Yup, that driver is breaking the big image buffer into a series of
chained DMA descriptors.  That happens in pxa_init_dma_channel().  That
is how it keeps up with the incoming data.

> I
> am reading single frames from a large CCD over the QuickCapture
> interface and it takes a few seconds to complete a read.  It then
> takes another few seconds to transfer the data.  The overall latency
> is too high.  So, I really do need to get a single frame out faster
> rather than keeping up with a frame rate.  I may only get a frame a
> minute, driven by an external trigger signal.

I don't follow why, if you are taking a picture per minute, 6 seconds is
too long to get the image, but you know your application and I don't.

> I tried fooling the driver by telling it that each frame is a
> fraction of the actual frame.  Unfortunately, I ended up with bad
> data at the subframe boundaries (I guess the FIFO overflowed).  

That is why the whole frame is collected with a single chain of
descriptors: otherwise the buffer will overflow.

> There
> should be a way for the driver to provide 'updates' to the
> application -- I got x% of the data, go ahead and read it...

I guess the driver does not happen to be written that way, but, in
principle, it could be.  If that is the functionality you require, then
likely you will have to re-write the driver.  (Working with the
maintainer and submitting patches if you want your version to be
maintained as part of Linux.  See below).

> This is
> much lower level code than I am used to, so any suggestions are very
> welcome.  Thanks for pointing me to the right section of the dev
> manual, I'll see if I can make some sense of it.

I can't really help you from here.  If you want to re-write the kernel
driver, then I can tell you that I have done that for another driver,
but never having written any device driver before, it was a very steep
learning curve.  I highly recommend "Linux Device Drivers" by Corbet, et
al., O'Reilly Press.  I'm not sure why this is available on-line, but
the paper copy is easier to read (IMO), but admittedly harder to search.

http://lwn.net/Kernel/LDD3/

The kernel is a very different environment from user-space.  Not only
are the available calls mostly unfamiliar, but unlike the intentional
stability of libc (the interface between user-space programs and the
kernel), the interfaces within the kernel are intentionally subject to
constant change.  Drivers that are maintained within the mainline kernel
get updated by anyone who changes an internal kernel interface; on the
other hand, private drivers become obsolete very fast, without a lot of
continuing effort.

> ----- Original Message ---- From: Ned Forrester
> <nforrester@...> To: General mailing list for gumstix users.
> <gumstix-users@...> Sent: Tuesday, June 30, 2009
> 8:10:17 PM Subject: Re: [Gumstix-users] contiguous DMA while
> transferring data over IP
>
> On 06/30/2009 08:37 PM, Eugene Grayver wrote:
>> Hmm... I must be entirely ignorant about the details of PXA DMA
>> transfers.  We modified the kernel to have 16MB of coherent memory
>> space and were doing a single LONG DMA transfer from the FIFO to
>> the memory.  I did not realize that the transfer was already being
>> broken up into 8k chunks.  I would be entirely happy with chunks of
>> ~100kB. However, when I tried to break the transfer up into
>> multiple DMAs from user mode, I got overruns.
>
> Hmmm...  The kernel driver must be breaking the transfer into
> smaller bits.  I suggest you take a look at the driver code and see
> if you can figure out what it is doing for you with regard to
> partitioning the DMA and how it is managing to keep up with the data
> input.  I gather 16MB is enough for a whole picture.
>
> Exactly which peripheral on the PXA270 are you using?  Which driver?
> What speed (bytes/sec) does this device require?
>
> If you are using the Quick Capture Interface, I see reference in
> section 27.4.4.3 of the Developer's Manual to setting up a descriptor
> chain for and entire frame.  That is what I would expect to be
> required to capture data from an external device that is supplying
> the transfer clock. Perhaps this is what the kernel driver is doing.
>
> Do you really have to get a single frame faster, or is it a matter
> of keeping up with a particular frame rate?  If the latter, could you
> just use two buffers so that you can be transmitting one while the
> other is being filled by the driver?
>


--
Ned Forrester                                       nforrester@...
Oceanographic Systems Lab                                  508-289-2226
Applied Ocean Physics and Engineering Dept.
Woods Hole Oceanographic Institution          Woods Hole, MA 02543, USA
http://www.whoi.edu/
http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212
http://www.whoi.edu/hpb/Site.do?id=1532
http://www.whoi.edu/page.do?pid=10079


------------------------------------------------------------------------------
_______________________________________________
gumstix-users mailing list
gumstix-users@...
https://lists.sourceforge.net/lists/listinfo/gumstix-users