Monday, 27 August 2012

Xen Part 13: VGA Passthrough: Another failed attempt

Preamble: this post has been sitting around since 20th May, waiting for me to finally get things working. Yet, due to other commitments, I simply haven't found the time to invest in Xen. I'm posting this just in case it helps somebody out of a particular problem, given the level of interest I'm seeing in passthrough. Personally, I've now migrated back to a Fedora dom0 (which worked better for me OOTB), and am waiting for 4.2 to be released before trying again - hopefully with more success.

Warning: the following doesn't result in a working VGA passthrough setup.

Setting up VGA passthrough as per the xen wiki (detailed in my posts Part 9: PCI Passthrough and Part 11: ATi Graphics Drivers on the domU) got me to the stage where I thought it should be working - but I simply didn't get any graphical output on the monitors when the time came.

The only oddities I could see on the domU were in Xorg.0.log:

[    54.071] (EE) fglrx(0): V_BIOS address 0x0 out of range
[    54.071] (II) fglrx(0): Invalid ATI BIOS from int10, the adapter is not VGA-enabled
... a seemingly random period of time passes (seconds to minutes), then everything comes up roses...
[    57.325] (II) fglrx(0): ATI Video BIOS revision 9 or later detected

This occurred both on Ubuntu 11.10 running the latest stable 3.2.13 kernel, and on Windows XP, both using latest AMD proprietary graphics drivers.

I was therefore left with the inescapable conclusion that Xen 4.1.2 was to blame. Thankfully, I stumbled upon Jean David Techer's instructions for applying a collection of VGA passthrough patches to Xen unstable, which handle the provision of the VC BIOS and setting the BARs. Many thanks to Jean for posting the walkthrough, and also saving everybody the trouble of porting the VGA passthrough patches to the latest Xen revisions.

Before We Begin

Let's just make sure that your graphics card is detected and initialised correctly in the dom0. There's little point proceeding if it isn't.

1) A quick check to make sure you don't need Debian's firmware-linux-nonfree package:

$ dmesg | grep ni_cp | grep "Failed to load firmware" && echo "You need to install firmware-linux-nonfree" || echo "Looks OK, proceed to point 2"

# apt-get install firmware-linux-nonfree 

2) You may need to setup some pci quirks for your card. This is a check for a problem I encountered with my HD6970:

$ dmesg | grep "Driver tried to write to a read-only configuration space" && echo "You need to setup a PCI quirk" || echo "Looks OK, proceed to point 3"

$ dmesg | grep -A 2 "Driver tried to write to a read-only configuration space"

[927513.834633] pciback 0000:01:00.0: Driver tried to write to a read-only configuration space field at offset 0xa2, size 2. This may be harmless, but if you have problems with your device:
[927513.834635] 1) see permissive attribute in sysfs
[927513.834636] 2) report problems to the xen-devel mailing list along with details of your device obtained from lspci.


To add a PCI quirk, you need the vendor and device ID for your device (it's the last entry on the line):
$ lspci -nn | grep VGA

00:02.0 VGA compatible controller [0300]: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller [8086:0102] (rev 09)
01:00.0 VGA compatible controller [0300]: ATI Technologies Inc Cayman XT [Radeon HD 6970] [1002:6718]


# vim /etc/xen/xend-pci-quirks.sxp

(HD6970
   (pci_ids
      ('1002:6718')
   )

   (pci_config_space_fields
      ('000000a2:2:00000000')
   )
)


Replace HD6970 with any name you like to identity your card, replace 1002:6718 with the vendor/device ID you retrieved from lspci, replace 000000a2 with the offset from dmesg, and replace 2 with the size from dmesg.

3) Search dmesg for the logs pertaining to your graphics card. You'll have to amend the greps below to correctly identify your graphics card's PCI ID (I'm using the 6970 grep to find my HD6970).

$ dmesg | grep `lspci | grep VGA | grep 6970 | awk '{ print $1 }'`

Look over these logs to identify any further problems, and correct any obvious faults before proceeding.

4) Verify that running lspci on the domU returns your card. If not, check the output of dmesg | grep -i pci for clues.

If you see:


XENBUS: Device with no driver: device/pci/0

verify that the domU's kernel has pcifront loaded.


Extract the BIOS from the Graphics Card

ATI cards are handled in this section, whilst NVIDIA card users should follow step 1 in Jean's instructions.

Find out how to extract your graphics card BIOS. If you determine that ATIFlash is the way you want to go, then first obtain it (ATIFlash 3.95) and find a USB drive without any important data on. Insert it, find out its /dev/XXX node and ensure it's unmounted before proceeding.

# apt-get install unetbootin
# mkdosfs -F32 /dev/XXX
# mount /dev/XXX /mnt

Run UNetbootin and install FreeDOS to the USB drive. Don't reboot when prompted.

$ unzip atiflash_395.zip
# cp atiflash.exe /mnt
# umount /mnt

Reboot to the USB drive

> c:
> atiflash -i
adapter bn dn dID      asic           flash     romsize
======= == == ==== ============== ============= =======
   0    01 00 6718 Cayman         M25P10/c      20000     
> atiflash -s 0 bios0.rom

Reboot, copy bios0.rom onto a HDD and rename it to vgabios-pt.bin

Obtain a Patchable Xen Unstable

This is really just following steps 2-7 at Jean's site; I reproduce them below mostly for my own benefit for the specific case of a HD6970.

Here I'm using Xen unstable revision 25099. This is, at time of writing, the most recent version explicitly supported by the VGA passthrough patches that Jean David Techer maintains. If you want to use a later revision, you would have to recreate the patch diffs accordingly, or wait for Jean to diligently provide a newer collection of patches.

# apt-get install mercurial libglib2.0-dev libyajl-dev
$ mkdir -p Downloads/xen-unstable
$ cd Downloads/xen-unstable
$ rev=25099;hg clone -r $rev http://xenbits.xensource.com/staging/xen-unstable.hg/ xen-unstable.hg-rev-${rev}
$ cd xen-unstable.hg-rev-25099
$ hg summary
parent: 25099:4bd752a4cdf3 tip
 x86_emulate: Do not push an error code onto a #UD exception stack
branch: default
commit: (clean)
update: (current)
$ ./configure

$ cd tools

Ensure you actually do run this command as a normal user - as indicated.

$ make
$ make clean
$ cd ..
$ xenpatches=xen-4.2_rev24798_gfx-passthrough-patchs
$ wget -q http://www.davidgis.fr/download/${xenpatches}.tar.bz2
$ tar xjf ${xenpatches}.tar.bz2 

BAR Configuration

Now to set up the Base Address Registers (BARs) specific to your graphics card.

$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) 01:00.0 VGA compatible controller: ATI Technologies Inc Cayman XT [Radeon HD 6970]

Locate the correct PCI ID from the above output, as usual...

$ dmesg | grep XX:XX.X | grep "mem 0x"
[    4.120860] pci 0000:01:00.0: reg 10: [mem 0xc0000000-0xcfffffff 64bit pref]
[    4.120878] pci 0000:01:00.0: reg 18: [mem 0xfbe20000-0xfbe3ffff 64bit]
[    4.120912] pci 0000:01:00.0: reg 30: [mem 0xfbe00000-0xfbe1ffff pref]

In the above output, there are 3 memory ranges that Xen needs to know about. The start and end of each range is provided in hex (e.g. the first range starts at 0xc0000000 and ends at 0xcfffffff).

As Jean explains, we also need to know the size of each range. Jean uses hex->dec and dec->hex conversion for the calculations, but I think figuring it out purely in hex is easier. Just remember your basic rules of hexadecimal, and you should find this calculation pretty simple.  

If you got a bit lost here, use Jean's method instead.

To recap, In decimal we have a maximum number of 9 before we wrap around to 0 again. 0 to max (9) is a total of 10 values. In hex, the maximum number is 0xf (==15). 0x0 to max (0xf) is a total of 0x10 values. Switching back to memory ranges, this means that a range starting at 0xc0 and ending at 0xcf would have a size of 0x10.

Applying this to the first example above, the total size of memory range 0xc0000000 to 0xcfffffff would be 0x10000000 (the number of values in 0x0000000 -> 0xfffffff).

Start End Size
0xC0000000 0xCFFFFFFF 0x10000000
0xFBE20000 0xFBE3FFFF 0x00020000
0xFBE00000 0xFBE1FFFF 0x00020000

Now let's change the relevant patch file to match these BARs.

$ vim ${xenpatches}/patch_dsdt.asl

Modify the first three DWordMemory function calls, such that the second and third hex values are set to the start and end addresses, and the fifth (final) value is the size. For example,

DWordMemory( ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite, 0x00000000, - 0xF0000000, - 0xF4FFFFFF, + 0xF4000000, + 0xF5FFFFFF, 0x00000000, - 0x05000000, - ,, _Y01) + 0x02000000)

would change to

DWordMemory( ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite, 0x00000000, - 0xF0000000, - 0xF4FFFFFF, + 0xC0000000, + 0xCFFFFFFF, 0x00000000, - 0x05000000, - ,, _Y01) + 0x10000000)

and

+ DWordMemory( + ResourceProducer, PosDecode, MinFixed, MaxFixed, + Cacheable, ReadWrite, + 0x00000000, + 0xF4000000, + 0xF5FFFFFF, + 0x00000000, + 0x02000000)

would change to

+ DWordMemory( + ResourceProducer, PosDecode, MinFixed, MaxFixed, + Cacheable, ReadWrite, + 0x00000000, + 0xFBE20000, + 0xFBE3FFFF, + 0x00000000,
+ 0x00020000)

The third follows the same pattern. Leave the final function call as-is.

Reinstating PCI Passthrough Config via pciback

Think back to Xen Part 9: PCI Passthrough. Did you amend the /etc/init.d/xencommons script to enable passthrough for one or more PCI devices? If you did, heads up: reinstalling Xen is about to overwrite your code.

If you used some custom code, just copy it into tools/hotplug/Linux/init.d/xencommons.

If you used the bog standard code in the tutorial and just amended the BDF ID, then to make things simpler you may want to add this xencommons patch to your patch set (NB: this is built against revision 25099), and amend your BDF ID in it as before. That should make maintenance easier, and remind you to update that file if/when you build a newer version of Xen in the future.

Patch Xen Unstable

$ for file in `ls ${xenpatches}/*`; do patch -N -p1 < $file; done

Check that succeeded. Then copy the graphics card's BIOS, which you extracted earlier, to the vgabios folder:

$ cp /home/ace/vgabios-pt.bin tools/firmware/vgabios/

Compile & Install

$ make xen && make tools && make stubdom

Now time for installation.

# make install-xen && make install-tools PYTHON_PREFIX_ARG= \
&& make install-stubdom
# update-grub

Reboot

# shutdown -r now

root@ace2x1:~# dmesg | grep "mem 0x" [ 0.669673] pci_bus 0000:00: root bus resource [mem 0x00000000-0xfffffffff] [ 0.673606] pci 0000:00:00.0: reg 10: [mem 0xc0000000-0xcfffffff 64bit pref] [ 0.673606] pci 0000:00:00.0: reg 18: [mem 0xfbe20000-0xfbe3ffff 64bit] [ 0.673606] pci 0000:00:00.0: reg 30: [mem 0xfbe00000-0xfbffffff pref] [ 0.732491] pci 0000:00:00.0: address space collision: [mem 0xfbe00000-0xfbffffff pref] conflicts with 0000:00:00.0 [mem 0xfbe20000-0xfbe3ffff 64bit]

This is where it should be working. Instead of that, I see an erroneous BAR contrary to the ranges I provided, and I get no further.