FreeBSD ARM network speed

View: New views
5 Messages — Rating Filter:   Alert me  

FreeBSD ARM network speed

by Yohanes Nugroho :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi All,

I am continuing my work on CNX11XX/STR91XX (more info about the work:
http://tinyhack.com/2009/09/28/cnx11xxstr91xx-freebsd-progress/), two
important things left are the Flash/CFI driver, and network problem.
The Flash/CFI in theory should be easy, but I will read more about it
to make sure that I will not mess the boot loader part. And now about
the network.

The network speed is now around half of Linux on the same hardware.
FTP-ing from the device to my computer (uploading 30 mb file), the
speed is about 1.6-2 megabyte/second (the high speed is on the second
time when the data is already cached). On Linux, I can upload the same
file with the speed of about 3-4 megabyte/second.

Some info about the device: RAM: 64 Mb,  CPU FA526 (ARM4, no thumb
instruction), Speed 200Mhz. MAC is part of SoC, PHY is ICPLUS IP101A

I have two question:
1. Is the network speed in Freebsd ARM currently slower than Linux ARM?

If it is slower, then how much slower is it? I can not find a
comparison of network speed on Freebsd arm and Linux ARM. I am
interested if anyone can provide me the comparison between Linux and
Freebsd on NSLU2 or some other device.

Just for information, changing some kernel options in the Linux
version (such as the scheduler used) makes the network speed varies
greatly (i think the variation is more than 30%, so in certain
configuration it can be a slow as the current FreeBSD version). The
network in Linux 2.4 kernel is faster than Linux 2.6.

2. What should I do to make the network faster (especially the sending
from device part, to make it usable as a media server)?

Here are the things that I have done:
- using the scatter/gather feature of the hardware (this improves the
speed a little bit)
- using checksum offloading feature of the hardware (this improves the
speed a little bit)
- using task_queue for sending (this improves the speed a lot)
- I have disabled spinlock debugging, and other debugging except for the DDB
- I have used the -O2 optimization flag
- I have checked that there is no error/retransmission (using
wireshark), so all the packets are sent and received correctly
- I have disabled IPV6

(here is my current configuration:
http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/str91xx/src/sys/arm/conf/CNS11XXNAS&REV=3)

The specification for the STR9104 SoC is available on Cavium website
for those who are interested, but it is not very clear, so in
developing the network driver, I followed the logic used by the Linux
driver (the initialization sequence, etc). The current code is at
http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/str91xx/src/sys/arm/econa/if_ece.c&REV=4

Here is how the sending part works on STR9104:

- In the initialization part, I allocate a ring, the size of the ring
is 256 entries (same as Linux version).
- When being asked to send a packet, I will do the following thing:
  - stop the network TX DMA
  - put the address of each segment of the packet to the ring, and set
a flag so that the entry in the ring will be sent by hardware
  - start the network TX DMA

obviously there is a cleaning up part (freeing mbuf) that should be
done. The network driver can generate interrupt when a packet has been
sent (but can't tell which entry was sent). In the Linux version, this
interrupt is not used, the clean up is done just after starting the TX
DMA, at the send of the sending function, and I do the same in the
FreeBSD driver . Usually only one entry that needs to be removed, so
it is quite fast.

Is there something obvious (or not so obvius) that I've missed?

--
Regards
Yohanes
http://yohan.es/
_______________________________________________
freebsd-arm@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..."

Re: FreeBSD ARM network speed

by Rui Paulo-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 2 Oct 2009, at 06:58, Yohanes Nugroho wrote:

> Hi All,
>
> I am continuing my work on CNX11XX/STR91XX (more info about the work:
> http://tinyhack.com/2009/09/28/cnx11xxstr91xx-freebsd-progress/), two
> important things left are the Flash/CFI driver, and network problem.
> The Flash/CFI in theory should be easy, but I will read more about it
> to make sure that I will not mess the boot loader part. And now about
> the network.
>
> The network speed is now around half of Linux on the same hardware.
> FTP-ing from the device to my computer (uploading 30 mb file), the
> speed is about 1.6-2 megabyte/second (the high speed is on the second
> time when the data is already cached). On Linux, I can upload the same
> file with the speed of about 3-4 megabyte/second.
>
> Some info about the device: RAM: 64 Mb,  CPU FA526 (ARM4, no thumb
> instruction), Speed 200Mhz. MAC is part of SoC, PHY is ICPLUS IP101A
>
> I have two question:
> 1. Is the network speed in Freebsd ARM currently slower than Linux  
> ARM?

I see no problems on my ARM boards running FreeBSD.

--
Rui Paulo



_______________________________________________
freebsd-arm@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..."

Re: FreeBSD ARM network speed

by Stanislav Sedov-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 2 Oct 2009 12:58:38 +0700
Yohanes Nugroho <yohanes@...> mentioned:

> I have two question:
> 1. Is the network speed in Freebsd ARM currently slower than Linux ARM?
>

I don't think so.  Our network stack is arch-independent and should perform
equally well on all platforms.  I've been able to acchieve speeds up to
70 Mbps on my 180Mhz AT91 based board which uses very plain and dumb
ethernet controller (although, DMA is supported).

> Here is how the sending part works on STR9104:
>
> - In the initialization part, I allocate a ring, the size of the ring
> is 256 entries (same as Linux version).
> - When being asked to send a packet, I will do the following thing:
>   - stop the network TX DMA
>   - put the address of each segment of the packet to the ring, and set
> a flag so that the entry in the ring will be sent by hardware
>   - start the network TX DMA
>
This looks weird.  Why do you stop the TX engine to add more packets
in the ring?  This thing definitely can kill the network performace
as the controller unable to transmit anything during the time you're
filling the ring.  You should not also generally transmit only one
packet a time as in this case your driver will do a lot of extra
work and, considering that you're stopping the TX engine when filling
the ring, will prevent the adapter doing any useful work.

The main strategy of the driver should be to keep the ring filled,
waking up when some reasonable amount of space in the ring become
available, and sleeping all other time when the adapter is working.
I'm not sure why Linux doesn't use interrupt, but this looks really
wrong.

I'd suggest you to ananlyze the performance of network driver
either by using the profiling tools available (kgmon, hardware
counters (if any)) or/and via system monitoring tools (top, etc).
Top, in particular, will allow you to see where all the CPU time
went.

--
Stanislav Sedov
ST4096-RIPE


attachment0 (817 bytes) Download Attachment

Re: FreeBSD ARM network speed

by Yohanes Nugroho :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Oct 2, 2009 at 4:35 PM, Stanislav Sedov <stas@...> wrote:

> On Fri, 2 Oct 2009 12:58:38 +0700
> Yohanes Nugroho <yohanes@...> mentioned:
>
>> I have two question:
>> 1. Is the network speed in Freebsd ARM currently slower than Linux ARM?
>>
>
> I don't think so.  Our network stack is arch-independent and should perform
> equally well on all platforms.  I've been able to acchieve speeds up to
> 70 Mbps on my 180Mhz AT91 based board which uses very plain and dumb
> ethernet controller (although, DMA is supported).

Ok, glad to hear that :) because the first time I asked about a
problem in the USB, it turns out that there was a problem in the
latest source code in busdma, and I have spent several days thinking
it was my bug.

>
> This looks weird.  Why do you stop the TX engine to add more packets
> in the ring?  This thing definitely can kill the network performace

yes, you are right, that is weird, I will have a look at it again.

> The main strategy of the driver should be to keep the ring filled,
> waking up when some reasonable amount of space in the ring become
> available, and sleeping all other time when the adapter is working.

Thank you for your enlightenment.

> I'm not sure why Linux doesn't use interrupt, but this looks really
> wrong.

It is because the driver comes from a vendor (very messy code), not in
the mainline kernel yet.

the background story:
- I have a cheap chinese NAS device (Agestar NCB3AST, cost about $50,
now you can get it for about $40), comes with linux kernel 2.4, no
source code was given. SoC used is Starsemi 9104
- I found out that there is a Linux source code for this SoC but for
different device (with different hardware around the SoC).
- Based on the source code, I ported it to work on Linux kernel 2.6, I
didn't bother to try to clean up the source code
- I am thinking of trying to add my code to mainline kernel, I
realized that I didn't understand most of the source code
- Bruce Simpson offered a device with same SoC (NSD-100) and I tried
to port FreeBSD to it, thinking that I will understand the SoC better
when rewriting the code
- Starsemi was bought by cavium, the SoC is renamed to Econa CNS1102,
and the datasheet was released. The datasheet is not very clear, so I
am still basing some of my code on the Linux code (just the logic, not
copy pasting, I understand about the license implication).

> I'd suggest you to ananlyze the performance of network driver
> either by using the profiling tools available (kgmon, hardware
> counters (if any)) or/and via system monitoring tools (top, etc).
> Top, in particular, will allow you to see where all the CPU time
> went.

I am testing in single user mode. Last time i tested using kgmon, it
doesn't show any particular area that might cause the slowdown.

Once again, thank you, I now have some ideas on what to do this weekend.

--
Regards
Yohanes
http://yohan.es/
_______________________________________________
freebsd-arm@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..."

Re: FreeBSD ARM network speed

by Pyun YongHyeon :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Oct 02, 2009 at 12:58:38PM +0700, Yohanes Nugroho wrote:
> Hi All,
>

Hi,


[...]

> The specification for the STR9104 SoC is available on Cavium website
> for those who are interested, but it is not very clear, so in
> developing the network driver, I followed the logic used by the Linux
> driver (the initialization sequence, etc). The current code is at
> http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/str91xx/src/sys/arm/econa/if_ece.c&REV=4
>
> Here is how the sending part works on STR9104:
>
> - In the initialization part, I allocate a ring, the size of the ring
> is 256 entries (same as Linux version).

If ethernet controller does not support 1000baseT(I think it's
fastethernt because ICPlus IP101A is 10/100 PHY) allocating 256
descriptors are waste of resource especially on 64MB systems, I
think.

> - When being asked to send a packet, I will do the following thing:
>   - stop the network TX DMA
>   - put the address of each segment of the packet to the ring, and set
> a flag so that the entry in the ring will be sent by hardware
>   - start the network TX DMA
>
> obviously there is a cleaning up part (freeing mbuf) that should be
> done. The network driver can generate interrupt when a packet has been
> sent (but can't tell which entry was sent). In the Linux version, this
> interrupt is not used, the clean up is done just after starting the TX
> DMA, at the send of the sending function, and I do the same in the
> FreeBSD driver . Usually only one entry that needs to be removed, so
> it is quite fast.
>
> Is there something obvious (or not so obvius) that I've missed?
>

I briefly looked over the driver code and I can see missing
bus_dmamap_sync(9) in several places as well as incorrect use of
bus_dma(9). This may also affect performance because checking OWN
bit wouldn't be correct in CPU's view without bus_dmamap_sync(9).
Another poor performance might come from m_devget(9), I don't know
whether controller really needs this type of copying(sorry, have
no time to read data sheet) but m_devget(9) is really slow and time
consuming operation because it has to copy entire frame to new
mbuf. If you had to use m_devget(9) to align buffers on ETHER_ALIGN
boundary I guess you can pass the alignment restriction to
bus_dma(9). Of course, this requires the controller have ability to
receive frames on even address boundary or no Rx buffer alignment
limitation.

I believe you should not stop DMA before sending another frame as
you did in Rx handler. Basically you should make controller as
busy as you can to get maximum performance and should reclaim
transmitted buffers as soon as you noticed. Stopping DMA may take
time since it may have to drain active DMA cycles. If the
controller does not generate Tx completion interrupt after sending
a frame, which is not likely, you may have to implement a kind of
polling in separate thread or should use polling(9).

Good luck!
_______________________________________________
freebsd-arm@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..."