Monday 27 August 2012

Xen Part 13: VGA Passthrough: Another failed attempt

Preamble: this post has been sitting around since 20th May, waiting for me to finally get things working. Yet, due to other commitments, I simply haven't found the time to invest in Xen. I'm posting this just in case it helps somebody out of a particular problem, given the level of interest I'm seeing in passthrough. Personally, I've now migrated back to a Fedora dom0 (which worked better for me OOTB), and am waiting for 4.2 to be released before trying again - hopefully with more success.

Warning: the following doesn't result in a working VGA passthrough setup.

Setting up VGA passthrough as per the xen wiki (detailed in my posts Part 9: PCI Passthrough and Part 11: ATi Graphics Drivers on the domU) got me to the stage where I thought it should be working - but I simply didn't get any graphical output on the monitors when the time came.

The only oddities I could see on the domU were in Xorg.0.log:

[    54.071] (EE) fglrx(0): V_BIOS address 0x0 out of range
[    54.071] (II) fglrx(0): Invalid ATI BIOS from int10, the adapter is not VGA-enabled
... a seemingly random period of time passes (seconds to minutes), then everything comes up roses...
[    57.325] (II) fglrx(0): ATI Video BIOS revision 9 or later detected

This occurred both on Ubuntu 11.10 running the latest stable 3.2.13 kernel, and on Windows XP, both using latest AMD proprietary graphics drivers.

I was therefore left with the inescapable conclusion that Xen 4.1.2 was to blame. Thankfully, I stumbled upon Jean David Techer's instructions for applying a collection of VGA passthrough patches to Xen unstable, which handle the provision of the VC BIOS and setting the BARs. Many thanks to Jean for posting the walkthrough, and also saving everybody the trouble of porting the VGA passthrough patches to the latest Xen revisions.

Before We Begin

Let's just make sure that your graphics card is detected and initialised correctly in the dom0. There's little point proceeding if it isn't.

1) A quick check to make sure you don't need Debian's firmware-linux-nonfree package:

$ dmesg | grep ni_cp | grep "Failed to load firmware" && echo "You need to install firmware-linux-nonfree" || echo "Looks OK, proceed to point 2"

# apt-get install firmware-linux-nonfree 

2) You may need to setup some pci quirks for your card. This is a check for a problem I encountered with my HD6970:

$ dmesg | grep "Driver tried to write to a read-only configuration space" && echo "You need to setup a PCI quirk" || echo "Looks OK, proceed to point 3"

$ dmesg | grep -A 2 "Driver tried to write to a read-only configuration space"

[927513.834633] pciback 0000:01:00.0: Driver tried to write to a read-only configuration space field at offset 0xa2, size 2. This may be harmless, but if you have problems with your device:
[927513.834635] 1) see permissive attribute in sysfs
[927513.834636] 2) report problems to the xen-devel mailing list along with details of your device obtained from lspci.


To add a PCI quirk, you need the vendor and device ID for your device (it's the last entry on the line):
$ lspci -nn | grep VGA

00:02.0 VGA compatible controller [0300]: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller [8086:0102] (rev 09)
01:00.0 VGA compatible controller [0300]: ATI Technologies Inc Cayman XT [Radeon HD 6970] [1002:6718]


# vim /etc/xen/xend-pci-quirks.sxp

(HD6970
   (pci_ids
      ('1002:6718')
   )

   (pci_config_space_fields
      ('000000a2:2:00000000')
   )
)


Replace HD6970 with any name you like to identity your card, replace 1002:6718 with the vendor/device ID you retrieved from lspci, replace 000000a2 with the offset from dmesg, and replace 2 with the size from dmesg.

3) Search dmesg for the logs pertaining to your graphics card. You'll have to amend the greps below to correctly identify your graphics card's PCI ID (I'm using the 6970 grep to find my HD6970).

$ dmesg | grep `lspci | grep VGA | grep 6970 | awk '{ print $1 }'`

Look over these logs to identify any further problems, and correct any obvious faults before proceeding.

4) Verify that running lspci on the domU returns your card. If not, check the output of dmesg | grep -i pci for clues.

If you see:


XENBUS: Device with no driver: device/pci/0

verify that the domU's kernel has pcifront loaded.


Extract the BIOS from the Graphics Card

ATI cards are handled in this section, whilst NVIDIA card users should follow step 1 in Jean's instructions.

Find out how to extract your graphics card BIOS. If you determine that ATIFlash is the way you want to go, then first obtain it (ATIFlash 3.95) and find a USB drive without any important data on. Insert it, find out its /dev/XXX node and ensure it's unmounted before proceeding.

# apt-get install unetbootin
# mkdosfs -F32 /dev/XXX
# mount /dev/XXX /mnt

Run UNetbootin and install FreeDOS to the USB drive. Don't reboot when prompted.

$ unzip atiflash_395.zip
# cp atiflash.exe /mnt
# umount /mnt

Reboot to the USB drive

> c:
> atiflash -i
adapter bn dn dID      asic           flash     romsize
======= == == ==== ============== ============= =======
   0    01 00 6718 Cayman         M25P10/c      20000     
> atiflash -s 0 bios0.rom

Reboot, copy bios0.rom onto a HDD and rename it to vgabios-pt.bin

Obtain a Patchable Xen Unstable

This is really just following steps 2-7 at Jean's site; I reproduce them below mostly for my own benefit for the specific case of a HD6970.

Here I'm using Xen unstable revision 25099. This is, at time of writing, the most recent version explicitly supported by the VGA passthrough patches that Jean David Techer maintains. If you want to use a later revision, you would have to recreate the patch diffs accordingly, or wait for Jean to diligently provide a newer collection of patches.

# apt-get install mercurial libglib2.0-dev libyajl-dev
$ mkdir -p Downloads/xen-unstable
$ cd Downloads/xen-unstable
$ rev=25099;hg clone -r $rev http://xenbits.xensource.com/staging/xen-unstable.hg/ xen-unstable.hg-rev-${rev}
$ cd xen-unstable.hg-rev-25099
$ hg summary
parent: 25099:4bd752a4cdf3 tip
 x86_emulate: Do not push an error code onto a #UD exception stack
branch: default
commit: (clean)
update: (current)
$ ./configure

$ cd tools

Ensure you actually do run this command as a normal user - as indicated.

$ make
$ make clean
$ cd ..
$ xenpatches=xen-4.2_rev24798_gfx-passthrough-patchs
$ wget -q http://www.davidgis.fr/download/${xenpatches}.tar.bz2
$ tar xjf ${xenpatches}.tar.bz2 

BAR Configuration

Now to set up the Base Address Registers (BARs) specific to your graphics card.

$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) 01:00.0 VGA compatible controller: ATI Technologies Inc Cayman XT [Radeon HD 6970]

Locate the correct PCI ID from the above output, as usual...

$ dmesg | grep XX:XX.X | grep "mem 0x"
[    4.120860] pci 0000:01:00.0: reg 10: [mem 0xc0000000-0xcfffffff 64bit pref]
[    4.120878] pci 0000:01:00.0: reg 18: [mem 0xfbe20000-0xfbe3ffff 64bit]
[    4.120912] pci 0000:01:00.0: reg 30: [mem 0xfbe00000-0xfbe1ffff pref]

In the above output, there are 3 memory ranges that Xen needs to know about. The start and end of each range is provided in hex (e.g. the first range starts at 0xc0000000 and ends at 0xcfffffff).

As Jean explains, we also need to know the size of each range. Jean uses hex->dec and dec->hex conversion for the calculations, but I think figuring it out purely in hex is easier. Just remember your basic rules of hexadecimal, and you should find this calculation pretty simple.  

If you got a bit lost here, use Jean's method instead.

To recap, In decimal we have a maximum number of 9 before we wrap around to 0 again. 0 to max (9) is a total of 10 values. In hex, the maximum number is 0xf (==15). 0x0 to max (0xf) is a total of 0x10 values. Switching back to memory ranges, this means that a range starting at 0xc0 and ending at 0xcf would have a size of 0x10.

Applying this to the first example above, the total size of memory range 0xc0000000 to 0xcfffffff would be 0x10000000 (the number of values in 0x0000000 -> 0xfffffff).

Start End Size
0xC0000000 0xCFFFFFFF 0x10000000
0xFBE20000 0xFBE3FFFF 0x00020000
0xFBE00000 0xFBE1FFFF 0x00020000

Now let's change the relevant patch file to match these BARs.

$ vim ${xenpatches}/patch_dsdt.asl

Modify the first three DWordMemory function calls, such that the second and third hex values are set to the start and end addresses, and the fifth (final) value is the size. For example,

DWordMemory( ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite, 0x00000000, - 0xF0000000, - 0xF4FFFFFF, + 0xF4000000, + 0xF5FFFFFF, 0x00000000, - 0x05000000, - ,, _Y01) + 0x02000000)

would change to

DWordMemory( ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite, 0x00000000, - 0xF0000000, - 0xF4FFFFFF, + 0xC0000000, + 0xCFFFFFFF, 0x00000000, - 0x05000000, - ,, _Y01) + 0x10000000)

and

+ DWordMemory( + ResourceProducer, PosDecode, MinFixed, MaxFixed, + Cacheable, ReadWrite, + 0x00000000, + 0xF4000000, + 0xF5FFFFFF, + 0x00000000, + 0x02000000)

would change to

+ DWordMemory( + ResourceProducer, PosDecode, MinFixed, MaxFixed, + Cacheable, ReadWrite, + 0x00000000, + 0xFBE20000, + 0xFBE3FFFF, + 0x00000000,
+ 0x00020000)

The third follows the same pattern. Leave the final function call as-is.

Reinstating PCI Passthrough Config via pciback

Think back to Xen Part 9: PCI Passthrough. Did you amend the /etc/init.d/xencommons script to enable passthrough for one or more PCI devices? If you did, heads up: reinstalling Xen is about to overwrite your code.

If you used some custom code, just copy it into tools/hotplug/Linux/init.d/xencommons.

If you used the bog standard code in the tutorial and just amended the BDF ID, then to make things simpler you may want to add this xencommons patch to your patch set (NB: this is built against revision 25099), and amend your BDF ID in it as before. That should make maintenance easier, and remind you to update that file if/when you build a newer version of Xen in the future.

Patch Xen Unstable

$ for file in `ls ${xenpatches}/*`; do patch -N -p1 < $file; done

Check that succeeded. Then copy the graphics card's BIOS, which you extracted earlier, to the vgabios folder:

$ cp /home/ace/vgabios-pt.bin tools/firmware/vgabios/

Compile & Install

$ make xen && make tools && make stubdom

Now time for installation.

# make install-xen && make install-tools PYTHON_PREFIX_ARG= \
&& make install-stubdom
# update-grub

Reboot

# shutdown -r now

root@ace2x1:~# dmesg | grep "mem 0x" [ 0.669673] pci_bus 0000:00: root bus resource [mem 0x00000000-0xfffffffff] [ 0.673606] pci 0000:00:00.0: reg 10: [mem 0xc0000000-0xcfffffff 64bit pref] [ 0.673606] pci 0000:00:00.0: reg 18: [mem 0xfbe20000-0xfbe3ffff 64bit] [ 0.673606] pci 0000:00:00.0: reg 30: [mem 0xfbe00000-0xfbffffff pref] [ 0.732491] pci 0000:00:00.0: address space collision: [mem 0xfbe00000-0xfbffffff pref] conflicts with 0000:00:00.0 [mem 0xfbe20000-0xfbe3ffff 64bit]

This is where it should be working. Instead of that, I see an erroneous BAR contrary to the ranges I provided, and I get no further.

Saturday 25 August 2012

Apple vs Samsung: The Farce

I've always been amazed this made it to trial. It seemed like an open and shut case; one that should have been thrown out long before a jury was convened and it made headline news. Apple claimed that Samsung's mobile devices violated 6 of its patents. Today, the jury sided with Apple on 5 out of 6 of these patents, and awarded it $1bn in damages.

Has there ever been a clearer demonstration of the urgent need for patent reform? It's a system which, for hardware and software, offers little if any protection for true innovation, and has simply descended into a messy lawyer's dream of suits, counter-suits, gross monopolization, the growth of patent trolls, and the ousting of the "little guy". Which is moreorless the opposite of its claimed raison d'etre.

Groklaw describes the verdict as "preposterous"; a "farce". I describe it as a disgrace.

Take, for example, Apple patent D677. It is a patent for the design of the iPhone. It describes the front as black; flat; rectangular; 4-cornered; round-edged; containing a screen; with thin side borders; larger top and bottom borders; a top speaker; button area beneath.

Did Samsung use a similar design for some of their mobile devices? I believe so, yes. But, then again, so would almost any smartphone manufacturer. What D677 describes is a blueprint for what almost has to be to constitute a phone with a touchscreen.

Think about it. To manufacture a smartphone (and remember that Apple was not the first), you need a touchscreen (which is black, flat, and rectangular). You need to house it in a container (which will obviously need 4 corners, and unless you want your end-users to stab their hands every time they unpocket it, those corners will need to be round-edged).

The thin side borders are a simple case of ergonomics. You need to hold the device, but need to be able to touch all areas of the screen. Wide borders would make it harder to reach all areas of the touchscreen. You can't sensibly add any other useful functionality (like buttons) on the sides, since you'd accidentally hit them when using the touchscreen. Finally, small bezels look better; monitor manufacturers have promoted this as a feature for years, as have TV manufacturers, laptop manufacturers, etc.

The larger top and bottom borders are also required, because there's a lot that needs to fit into a smartphone "under the hood" - and if you make one dimension shorter, you need to make the other dimension longer, just to fit everything in.

The speaker at the top; now, that's surely something that could have been placed elsewhere? That must have been copied.

Not exactly. Remembering these are phones, consider: where is your ear in relation to your mouth? It's another case of basic human requirements.

That leaves us with the button area at the bottom. Android (like iOS) requires some hard buttons, like the home button. They have to go somewhere. For the principal buttons, putting them on the side makes them awkward to use; on the top is too far to reach for most people's hands; and the sides we've ruled out already. Where else is left?

D677 simply specifies what any smartphone manufacturer would be likely to work out for themselves within the first few days, or hours, of the design process. You need a touchscreen, buttons, speaker, mic, camera(s), battery, processor, memory, lights, connectors, etc. There are requirements posed by the OS. There are human factors to consider. Putting them all together for both Android and iOS, and with presently available hardware, you end up with something similar to D677.

The same holds for many of the other patents Apple has used to secure this $1bn ruling.

The long and short of it? Apple, somehow, holds some patents which describe obvious design points for the classes of devices called smartphones and tablet PCs. It's tantamount to a PC manufacturer waving a patent describing the design for a computer case, keyboard and monitor, and asking all the other PC manufacturers in the world to cough up royalties. Or, for a non-technological example, it's tantamount to a clothing manufacturer taking out a patent for a small handbag; tapered; with a latch in the top-centre; a long adjustable strap; and a reinforced bottom.

This case considered the similarity of the external aesthetics of the hardware, of which manufacturers of smartphones have very few choices, as I've already described. The similarity is by necessity, much like the similarity in most QWERTY  keyboard designs is by necessity. The case neglected to consider the extreme dissimilarity in every other aspect of the devices; from internal hardware, to the OSs, to the applications and services on top, to the UX, etcetera.

This particular ruling seeks to ensure that Apple alone is allowed to manufacture and sell smartphones and tablets in the US.

How? It forces other manufacturers to modify the external aesthetics of their devices to sub-standard designs, in order to differentiate them from Apple's "patents" sufficiently such that juries no longer complain. Indeed, Samsung has already started to do so, with its release of the Galaxy Tab 2 - moreorless identical to the Galaxy Tab, just with an uglier and less practical external design.

I don't blame the jury. They simply affirmed that the Galaxy S3 has a speaker at the top, a screen in the middle, some buttons at the bottom, and non-lethal corners.

It was the job of the patent examiners to ensure the validity of the patent claims at issuance; to properly inspect the claims for prior art and non-obviousness.

It would have helped if the judge had permitted Samsung the right to demonstrate invalidity by displaying the prior art.

The damage claims might have sounded less ridiculous if all the damages awarded related to the claimed violations (some figures, in the millions, were requested for Samsung devices deemed non-violating).

Finally, the case might have been more believable if it had taken a length of time to deliberate which befitted the complexity of the case.

Appeals will undoubtedly follow.

Ctrl+Space (Content Assist) doesn't work in Eclipse

Navigate to Window -> Preferences -> General -> Keys. Find Content Assist. Delete Ctrl+Space from the binding field, and hold down Ctrl and press space:

  • If only Ctrl+ is displayed, then something is intercepting the Ctrl+Space key binding before it reaches Eclipse. For me, it was IBus (in XFCE, go to Settings -> Input Method Selector, Use IBus -> Preferences*, and check the Enable or disable keyboard shortcut. If it's set to Ctrl+Space, clear the field, click Apply, and restart Eclipse).
  • On the other hand, if Ctrl+Space is displayed, then Eclipse is able to receive the key combination. Go to Window -> Preferences -> Java -> Editor -> Content Assist -> Advanced and make sure all relevant proposals are enabled. Other than that, just ensure dodgy import statements aren't affecting content assist's ability to recommend completions.
* Update: under Xubuntu 12.10, this seems to be slightly different, and Preferences is no longer available in this dialog. No matter; fire up a terminal and start ibus-setup. Delete Ctrl+Space from the keyboard shortcuts. There was no need to restart Eclipse.