[PATCH] Enable K8 GART as an IOMMU

View: New views
8 Messages — Rating Filter:   Alert me  

[PATCH] Enable K8 GART as an IOMMU

by Mark Langsdorf :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

AMD Opteron and Athlon 64 processors have an AGP aperture
and GART built into the processor.  Linux has used the
AGP GART as an IOMMU to improve the performance of 32-bit
only PCI devices when DMA'ing into physical memory above
0xffffffff.  This patch provides a similar capability
for Xen dom0.

Most of the code simply migrates the native Linux
aperture.c and pci-gart.c to dom0.  A new memory op
is added to clear the aperture mapping from the
hypervisor's page tables, which is necessary to
prevent cache alias issues resulting from processor
speculation.

My thanks to Ulrich Meis [um@...], who was
instrumental in developing and testing this patch.

Signed-off-by: Mark Langsdorf <mark.langsdorf@...>


xen-gart.patch (59K) Download Attachment

Re: [PATCH] Enable K8 GART as an IOMMU

by Jan Beulich :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>--- a/linux-2.6-xen-sparse/arch/i386/mm/hypervisor.c Fri Feb 09 10:48:41 2007 +0000
>+++ b/linux-2.6-xen-sparse/arch/i386/mm/hypervisor.c Fri Feb 09 16:32:04 2007 -0600
>@@ -252,7 +252,7 @@ static void contiguous_bitmap_clear(
> }
>
> /* Protected by balloon_lock. */
>-#define MAX_CONTIG_ORDER 9 /* 2MB */
>+#define MAX_CONTIG_ORDER 16 /* 256MB */
> static unsigned long discontig_frames[1<<MAX_CONTIG_ORDER];
> static multicall_entry_t cr_mcl[1<<MAX_CONTIG_ORDER];

This seems dangerous to me.

>--- a/linux-2.6-xen-sparse/include/asm-x86_64/mach-xen/asm/dma-mapping.h Fri Feb 09 10:48:41 2007 +0000
>+++ b/linux-2.6-xen-sparse/include/asm-x86_64/mach-xen/asm/dma-mapping.h Fri Feb 09 16:32:04 2007 -0600
>@@ -62,7 +62,12 @@ static inline int valid_dma_direction(in
> (dma_direction == DMA_FROM_DEVICE));
> }
>
>-#if 0
>+#ifdef CONFIG_XEN
>+#define global_need_iommu() 1
>+#else
>+#define global_need_iommu() (HYPERVISOR_memory_op(XENMEM_maximum_ram_page, NULL)>MAX_DMA32_PFN)
>+#endif
>+
> static inline int dma_mapping_error(dma_addr_t dma_addr)
> {
> if (dma_ops->mapping_error)

HYPERVISOR_memory_op() if CONFIG_XEN is undefined?

>--- a/linux-2.6-xen-sparse/lib/Makefile Fri Feb 09 10:48:41 2007 +0000
>+++ b/linux-2.6-xen-sparse/lib/Makefile Fri Feb 09 16:32:04 2007 -0600
>@@ -51,8 +51,7 @@ obj-$(CONFIG_SMP) += percpu_counter.o
> obj-$(CONFIG_SMP) += percpu_counter.o
> obj-$(CONFIG_AUDIT_GENERIC) += audit.o
>
>-obj-$(CONFIG_SWIOTLB) += swiotlb.o
>-swiotlb-$(CONFIG_XEN) := ../arch/i386/kernel/swiotlb.o
>+obj-$(CONFIG_SWIOTLB) += swiotlb-xen.o
>
> hostprogs-y := gen_crc32table
> clean-files := crc32table.h

This seems very unlikely to have been tested in a native build. You should
use cherry-pick-xen in the file.
I generally welcome moving arch/i386/kernel/swiotlb.c to lib/swiotlb-xen.c,
but would appreciate if you then also removed the original file (and perhaps
this should be done as a separate patch, so that it'd be clear that the move
itself doesn't change the file in any way (and if you need changes to it for
the IOMMU patch, have only those changes in the patch here).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

Re: [PATCH] Enable K8 GART as an IOMMU

by Muli Ben-Yehuda-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Feb 12, 2007 at 11:47:09AM -0600, Langsdorf, Mark wrote:

> AMD Opteron and Athlon 64 processors have an AGP aperture and GART
> built into the processor.  Linux has used the AGP GART as an IOMMU
> to improve the performance of 32-bit only PCI devices when DMA'ing
> into physical memory above 0xffffffff.  This patch provides a
> similar capability for Xen dom0.
>
> Most of the code simply migrates the native Linux aperture.c and
> pci-gart.c to dom0.  A new memory op is added to clear the aperture
> mapping from the hypervisor's page tables, which is necessary to
> prevent cache alias issues resulting from processor speculation.

Could you please split it up into two patches, one of which does the
bulk movement around and the other which adds GART support, to make
reviewing easier?

Thanks,
Muli



_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

RE: [PATCH] Enable K8 GART as an IOMMU

by Mark Langsdorf :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> >--- a/linux-2.6-xen-sparse/arch/i386/mm/hypervisor.c Fri Feb
> 09 10:48:41 2007 +0000
> >+++ b/linux-2.6-xen-sparse/arch/i386/mm/hypervisor.c Fri Feb
> 09 16:32:04 2007 -0600
> >@@ -252,7 +252,7 @@ static void contiguous_bitmap_clear(
> > }
> >
> > /* Protected by balloon_lock. */
> >-#define MAX_CONTIG_ORDER 9 /* 2MB */
> >+#define MAX_CONTIG_ORDER 16 /* 256MB */
> > static unsigned long discontig_frames[1<<MAX_CONTIG_ORDER];
> > static multicall_entry_t cr_mcl[1<<MAX_CONTIG_ORDER];
>
> This seems dangerous to me.

We need at least 64MB of contiguous memory for the aperture.


> a/linux-2.6-xen-sparse/include/asm-x86_64/mach-xen/asm/dma-ma
> pping.h Fri Feb 09 10:48:41 2007 +0000
> >+++
> b/linux-2.6-xen-sparse/include/asm-x86_64/mach-xen/asm/dma-ma
> pping.h Fri Feb 09 16:32:04 2007 -0600
> >@@ -62,7 +62,12 @@ static inline int valid_dma_direction(in
> > (dma_direction == DMA_FROM_DEVICE));
> > }
> >
> >-#if 0
> >+#ifdef CONFIG_XEN
> >+#define global_need_iommu() 1
> >+#else
> >+#define global_need_iommu()
> (HYPERVISOR_memory_op(XENMEM_maximum_ram_page, NULL)>MAX_DMA32_PFN)
> >+#endif
> >+
> > static inline int dma_mapping_error(dma_addr_t dma_addr)
> > {
> > if (dma_ops->mapping_error)
>
> HYPERVISOR_memory_op() if CONFIG_XEN is undefined?

Will fix.

> >--- a/linux-2.6-xen-sparse/lib/Makefile Fri Feb 09
> 10:48:41 2007 +0000
> >+++ b/linux-2.6-xen-sparse/lib/Makefile Fri Feb 09
> 16:32:04 2007 -0600
> >@@ -51,8 +51,7 @@ obj-$(CONFIG_SMP) += percpu_counter.o
> > obj-$(CONFIG_SMP) += percpu_counter.o
> > obj-$(CONFIG_AUDIT_GENERIC) += audit.o
> >
> >-obj-$(CONFIG_SWIOTLB) += swiotlb.o
> >-swiotlb-$(CONFIG_XEN) := ../arch/i386/kernel/swiotlb.o
> >+obj-$(CONFIG_SWIOTLB) += swiotlb-xen.o
> >
> > hostprogs-y := gen_crc32table
> > clean-files := crc32table.h
>
> This seems very unlikely to have been tested in a native
> build. You should use cherry-pick-xen in the file.

Okay.

> I generally welcome moving arch/i386/kernel/swiotlb.c to
> lib/swiotlb-xen.c, but would appreciate if you then also
> removed the original file (and perhaps this should be done
> as a separate patch, so that it'd be clear that the move
> itself doesn't change the file in any way (and if you need
> changes to it for the IOMMU patch, have only those changes
> in the patch here).

I'll seperate the reorg a bit and make it clearer.  Thanks for
looking at this.

-Mark Langsdorf
AMD, Inc.



_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

RE: [PATCH] Enable K8 GART as an IOMMU

by Jan Beulich :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>>> "Langsdorf, Mark" <mark.langsdorf@...> 14.02.07 00:19 >>>
>> >--- a/linux-2.6-xen-sparse/arch/i386/mm/hypervisor.c Fri Feb
>> 09 10:48:41 2007 +0000
>> >+++ b/linux-2.6-xen-sparse/arch/i386/mm/hypervisor.c Fri Feb
>> 09 16:32:04 2007 -0600
>> >@@ -252,7 +252,7 @@ static void contiguous_bitmap_clear(
>> > }
>> >
>> > /* Protected by balloon_lock. */
>> >-#define MAX_CONTIG_ORDER 9 /* 2MB */
>> >+#define MAX_CONTIG_ORDER 16 /* 256MB */
>> > static unsigned long discontig_frames[1<<MAX_CONTIG_ORDER];
>> > static multicall_entry_t cr_mcl[1<<MAX_CONTIG_ORDER];
>>
>> This seems dangerous to me.
>
>We need at least 64MB of contiguous memory for the aperture.

I think this should then be established by some other means. 256Mb
(and even the minimum of 64Mb that you say you need) is, for
contiguous memory, almost as good as no limit at all. Among other
reservations I have - how do you intend to make sure the hypervisor
even has a chance of fulfilling this big a request?

But I would certainly like to know Keir's opinion here, too.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

Re: [PATCH] Enable K8 GART as an IOMMU

by Keir Fraser :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 13/2/07 23:19, "Langsdorf, Mark" <mark.langsdorf@...> wrote:

>>> /* Protected by balloon_lock. */
>>> -#define MAX_CONTIG_ORDER 9 /* 2MB */
>>> +#define MAX_CONTIG_ORDER 16 /* 256MB */
>>> static unsigned long discontig_frames[1<<MAX_CONTIG_ORDER];
>>> static multicall_entry_t cr_mcl[1<<MAX_CONTIG_ORDER];
>>
>> This seems dangerous to me.
>
> We need at least 64MB of contiguous memory for the aperture.

Not that I know anything much about the K8 GART, but I assume the aperture
is an address range that the GART takes control of and dynamically aliases
other RAM pages into? Is it necessary to burn 64MB of RAM (which is
presumably inaccessible when the GART is turned on)? Will the BIOS not
already have conveniently piointed the aperture into a RAM hole (e.g., just
below 4GB)?

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

Re: [PATCH] Enable K8 GART as an IOMMU

by Uli-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed 14.02.07 10:04, Keir Fraser wrote:

> On 13/2/07 23:19, "Langsdorf, Mark" <mark.langsdorf@...> wrote:
>
> >>> /* Protected by balloon_lock. */
> >>> -#define MAX_CONTIG_ORDER 9 /* 2MB */
> >>> +#define MAX_CONTIG_ORDER 16 /* 256MB */
> >>> static unsigned long discontig_frames[1<<MAX_CONTIG_ORDER];
> >>> static multicall_entry_t cr_mcl[1<<MAX_CONTIG_ORDER];
> >>
> >> This seems dangerous to me.
> >
> > We need at least 64MB of contiguous memory for the aperture.
>
> Not that I know anything much about the K8 GART, but I assume the aperture
> is an address range that the GART takes control of and dynamically aliases
> other RAM pages into? Is it necessary to burn 64MB of RAM (which is
> presumably inaccessible when the GART is turned on)? Will the BIOS not
> already have conveniently piointed the aperture into a RAM hole (e.g., just
> below 4GB)?

Usually, yes, the BIOS should allocate the aperture. However, on all
systems I've tested on the BIOS allocated only 32MB (probably because
they had no AGP). Sometimes it would even keep the memory location that
was set on the last boot---potentially pointing to usable RAM.

The patch will only allocate the aperture from RAM if the BIOS reserved
less than 64MB. In that case, it will also make the call to
xen_create_contiguous_region. The memory is lost. There's also a boot
mesage telling people that.

Concerning the availability of 64MB contiguous RAM: The hypervisor keeps
if dom0_mem is unspecified 1/16th of memory free, which is for >4GB
systems (where one needs a GART) at least 256MB. Therefore, the
allocation should always succeed unless someone's tweaking that
parameter.

Uli


_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

Re: [PATCH] Enable K8 GART as an IOMMU

by Keir Fraser-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 14/2/07 14:33, "Uli Meis" <um@...> wrote:

> Usually, yes, the BIOS should allocate the aperture. However, on all
> systems I've tested on the BIOS allocated only 32MB (probably because
> they had no AGP). Sometimes it would even keep the memory location that
> was set on the last boot---potentially pointing to usable RAM.
>
> The patch will only allocate the aperture from RAM if the BIOS reserved
> less than 64MB. In that case, it will also make the call to
> xen_create_contiguous_region. The memory is lost. There's also a boot
> mesage telling people that.

Might it typically be possible to find some free space in the hole just
below 4GB? I don't believe most BIOSes make effort to keep the RAM hole
small.

Looking back to the particular approach you have taken, expanding the static
argument arrays for this one particular user is not really desirable. It
would be better to do a dynamic GFP_ATOMIC allocation in the case that order
> MAX_CONTIG_ORDER. We want to avoid that in the general case because
GFP_ATOMIC allocations can fail, but this will be happening at start of day
when there should be no memory pressure.

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel