Sunday, 29 April 2012 cannot open shared object file

error while loading shared libraries: cannot open shared object file: No such file or directory

You can use strace to debug the program you're trying to execute - this should show you exactly which directory paths it's looking in for Then you can run find /usr -name* or an apt-file search. Typically though, you can just install or reinstall the appropriate glx package - for example:

# apt-get install --reinstall libgl1-mesa-glx

Thursday, 26 April 2012

FreeBSD: TIMEOUTs under high I/O throughput

ace1 kernel: ad40: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ace1 kernel: ata20: port is not ready (timeout 15000ms) tfd = 00000080
ace1 kernel: ata20: hardware reset timeout
ace1 kernel: unknown: TIMEOUT - FLUSHCACHE48 retrying (0 retries left)
ace1 kernel: unknown: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=3062893328

On a *nix fileserver purpose-built to host a 20TB zpool, this is not what you expect to see. Yet, I've been seeing a lot of this recently, and it always ends the worst possible way: in a hard kernel crash.

ZFS has performed its job admirably throughout, with no data corruption ever observed or recorded, but that doesn't make up for my fileserver crashing increasingly frequently. It used to be once a month or so; in the past week that worsened significantly to > once a day. Furthermore, the timeouts are highly dependent on I/O load: the more data I shift around simultaneously, the more likely I'm going to be holding the power button for 4 seconds.

Looking into it, I read that the timeout issue could be mitigated by patching the ATA driver to accept a configurable timeout value, and increasing it to a value which was non-problematic. Details can be accessed via the FreeBSD wiki's ATA page, under the heading ATA/SATA DMA Timeout Issues.

Whilst this may work (and kudos to Volker for testing it out), it didn't sound like an ideal solution; a bit like papering over the hole in the wall instead of re-plastering. Increasing a timeout doesn't address why the timeout is being hit in the first place.

The best solution seemed to be to upgrade from the pre-historic ATA driver to the newer, leaner AHCI driver, which I heard is far superior. Seen as that would require a kernel rebuild, it seemed like an excellent opportunity to upgrade FreeBSD from 8.2 (STABLE) to 9.0 (RELEASE). I spent the rest of the week backing up my ZFS pool again, just in case something inexplicable occurred...

As it transpired, the upgrade process was dead easy. I was running STABLE, so I needed to do a source upgrade. This may take a little longer than firing off freebsd-update, but it isn't a whole lot harder (just so long as you allow yourself enough time to sort out any pesky port update issues and to check over the changes to /etc). If anybody's interested I can post the upgrade procedure.

To cut a short story shorter, bringing the system back up with 9.0 was painless. I was a little concerned that the device renaming due to the AHCI driver migration (/dev/adXX devices changing to /dev/adaXX) might cause problems, but that too passed completely without incident, due to two little niceties:
  1. ZFS doesn't rely on device names 
  2. FreeBSD created symlinks to the new device names, meaning even my other filesystems mounted from fstab were unaffected
But the best bits were yet to come. All the old timeout problems have disappeared, leaving me with a rock-solid FreeBSD server (just as it should be). Furthermore, disk I/O throughput is noticeably improved. Now there's a handy bonus!

However, the mystery remains: what caused the ATA driver to shift all of a sudden, from logging occasional timeout errors, to consistently logging many & causing hard crashes multiple times per day?

Monday, 9 April 2012

Xen Part 12: Windows Guest Installation

Create a logical volume for Windows.

# lvcreate -L 50G -n winxp-disk xendomu

Create a Xen configuration file, specifying the location of your Windows installation media.

# vim /etc/xen/ace2x2

kernel = '/usr/lib/xen/boot/hvmloader'
device_model = '/usr/lib/xen/bin/qemu-dm'
builder = 'hvm'
memory  = '2048'
cdrom = 'file:/os/Windows XP/windowsxp.iso'
disk    = [ 'phy:/dev/xendomu/winxp-disk,hdc,w', 'file:/os/Windows XP/windowsxp.iso,hdb:cdrom,r' ]
name    = 'ace2x2'
vif  = [ 'bridge=guestbr' ]
boot='dc'  #CDROM=d, Disk=c.
vnc=1 # Can remove this after installation
on_poweroff = 'destroy'
on_reboot = 'restart'
on_crash = 'restart'

Start the domain, which should boot straight into the installation media.

# xen create /etc/xen/ace2x2
# xen vncviewer ace2x2

The Cursor Doesn't Track Mouse Movement Properly 

You'll probably find that the doesn't align with your mouse movement within the VNC window. You just need to add usbdevice = 'tablet' to your xen config file.

Next > VGA Passthrough: Another failed attempt

Xen Part 11: ATi Graphics drivers on the domU

If you're looking to run a raft of monitors on your domU, and require fast 3D rendering and more, you're probably looking to run a proprietary driver. I have a HD 6970 which I'm using for passthrough, so the following discusses the AMD/ATI driver only.

First things first: follow my instructions for installing the proprietary ATI driver.

When you try to start it, you'll more than likely find it fails with the following error.

# modprobe -v fglrx

insmod /lib/modules/3.2.13/kernel/drivers/acpi/button.ko 
WARNING: Error inserting button (/lib/modules/3.2.13/kernel/drivers/acpi/button.ko): No such device
FATAL: Error inserting fglrx (/lib/modules/3.2.13/updates/dkms/fglrx.ko): No such device

Unfortunately, this isn't awfully easy to fix. You'll find the AMD fglrx module has a single dependency; the same module that gave the WARNING above:

# modinfo fglrx | grep depends
depends:        button

Predictably, if you try loading this dependency, you get the same fatal error: "No such device".

# modprobe -v button

insmod /lib/modules/3.2.13/kernel/drivers/acpi/button.ko 
FATAL: Error inserting button (/lib/modules/3.2.13/kernel/drivers/acpi/button.ko): No such device

Further investigation doesn't yield many results. There are no errors reported in dmesg. What can the problem be?

# modinfo button | grep description
description:    ACPI Button Driver
# ls /proc/acpi
ls: cannot access /proc/acpi: No such file or directory
# dmesg | grep ACPI | head -n 1

[    0.000000] ACPI in unprivileged domain disabled

It's an ACPI driver, but ACPI isn't loaded. So we found the problem. If we look in arch/x86/xen/setup.c we find what's responsible for disabling ACPI:

        if (!(xen_start_info->flags & SIF_INITDOMAIN)) {
                printk(KERN_INFO "ACPI in unprivileged domain disabled\n");

In the domU we won't have ACPI, and we won't have /proc/acpi/*. Which means that modprobe button isn't ever going to work, and by extension nor is modprobe fglrx. So how can we install the proprietary ATI driver if we can't satisfy its dependency?

I'm sure brighter people than me have stumbled across this little irritation and figured out a sensible solution. After an hour or so I still didn't have any bright ideas, which left me with only one terrible option: make a stub ACPI Button driver. After all, I don't think I care whether the graphics driver knows when the metaphorical power button ("xen shutdown") is pressed.

All this really entails is stripping the ACPI includes and much of the code from drivers/acpi/button.c, and finally playing around to get the thing compiling. The resulting "stub" driver (based on kernel 3.2.13) is here:

Instructions: First, look over the file above to check you're not downloading nefarious code. (Obviously it's clean, but it's important to get into that habit. Nobody else is looking over your shoulder checking these things for you). Second, take a backup of the original drivers/acpi/button.c file in your kernel source tree. Third, replace the original with this bastardised stub copy. Finally, recompile the kernel and copy the module over to domU. Or, in code:

$ cp drivers/acpi/button.c drivers/acpi/button.c.original
$ wget -O drivers/acpi/button.c
$ make clean
$ make -j 10 modules
# xen shutdown ace2x1
# mount /dev/xendomu/ace2x1-disk /mnt
# mv /mnt/lib/modules/3.2.13/kernel/drivers/acpi/button.ko /mnt/lib/modules/3.2.13/kernel/drivers/acpi/button.ko.original
# cp ./drivers/acpi/button.ko /mnt/lib/modules/3.2.13/kernel/drivers/acpi/button.ko
# umount /mnt
# xen create /etc/xen/ace2x1

Connect to the domU

root@ace2x1:~# lsmod | grep -E "(button|fglrx)"
fglrx                3150992  3 
button                 12535  1 fglrx
root@ace2x1:~# dmesg | grep fglrx
[    4.011571] fglrx: module license 'Proprietary. (C) 2002 - ATI Technologies, Starnberg, GERMANY' taints kernel.
[    4.047300] [fglrx] Maximum main memory to use for locked dma buffers: 1876 MBytes.
[    4.047320] [fglrx:firegl_init_device_list] *ERROR* No supported display adapters were found
[    4.047331] [fglrx:firegl_init_module] *ERROR* firegl_init_devices failed
[    5.966928] [fglrx] Maximum main memory to use for locked dma buffers: 1876 MBytes.
[    5.966974] [fglrx]   vendor: 1002 device: 6718 count: 1
[    5.967189] [fglrx] ioport: bar 4, base 0xe000, size: 0x100
[    5.967503] [fglrx] Kernel PAT support is enabled
[    5.967527] [fglrx] module loaded - fglrx 8.95.3 [Mar  8 2012] with 1 minors
[   11.422975] [fglrx] ACPI is disabled on this system
[   11.877991] [fglrx] Firegl kernel thread PID: 921
[   11.878106] [fglrx] Firegl kernel thread PID: 922
[   11.878223] [fglrx] Firegl kernel thread PID: 923
[   11.878359] [fglrx] IRQ 58 Enabled
root@ace2x1:~# aticonfig --initial

root@ace2x1:~# grep fglrx /etc/X11/xorg.conf
Driver      "fglrx"

Finally, we have to ensure the xen-pcifront driver loads before fglrx, otherwise we'll get errors about fglrx being unable to find any display adapters. The ordering of the following is important.

root@ace2x1:~# vim /etc/modules

Next > Windows Guest Installation

Sunday, 1 April 2012

Xen Part 10: Compiling a Custom DomU Kernel

I upgraded the kernel on the Debian dom0, and switched domU to the newer kernel. Suddenly things started to break, including PCI passthrough.

Xen uses PV kernels residing on the dom0, but the modules obviously need to reside on the domU. In this case, it looks like module compatibility broke when I upgraded the kernel for domU without replacing its modules.

In my opinion, this separation between the kernel on dom0 and the modules on domU is rather odd and inconvenient. It seems like one should mount the domU partition in ro mode and load the kernel from there; in that way, the domU would be in near-complete control of its own kernel.

However, I've never seen a method like that mentioned. The other obvious workaround is to compile a domU kernel without any modules, but that sounded like a terrible idea. So I figured I'd try a compile-on-dom0 solution first.

Remember, do not use root for kernel compilation. The example below is for 3.2.13.

Get the kernel sources for the version you want from

$ VERSION=3.2.13
$ wget$VERSION.tar.bz2
$ tar xjf linux-$VERSION.tar.bz2
$ cd linux-$VERSION

Here I'm basing the Ubuntu domU kernel config on my Debian Wheezy dom0 kernel config; whether that's wise is another question.

$ make oldconfig
Unless you have special requirements, just hold down enter to accept the defaults
$ make menuconfig
Make any customisations you require
$ vim .config
Further customisations; e.g., search for XEN and check everything you require is enabled ('y' or 'm')
$ make -j 10 
$ make -j 10 modules
# VERSION=3.2.13
# cp arch/x86_64/boot/bzImage /boot/bzImage-$VERSION
# make modules_install
# update-initramfs -c -k $VERSION 

Mount your domU partition/image (you'd be well advised to stop your domU first, otherwise... well, you can figure it out)

# xen shutdown ace2x1
# mount /dev/xendomu/ace2x1-disk /mnt

Copy the modules and source to domU. The second command in particular may take a while.

# make INSTALL_MOD_PATH=/mnt modules_install
# cp -ru . /mnt/usr/src/linux-$VERSION
# cp .config /mnt/boot/config-$VERSION
# umount /mnt

Use the compiled kernel for the domU

# vim /etc/xen/ace2x1
Set the kernel and ramdisk variables to /boot/bzImage-3.2.13 and /boot/initrd.img-3.2.13 respectively 

Back on the domU, just finish tidying up after the hacks.

# xen create /etc/xen/ace2x1
# xen console ace2x1 / $ ssh root@ace2x1
root@ace2x1# cd /lib/modules/3.2.13
root@ace2x1# rm build source
root@ace2x1# ln -s /usr/src/linux-3.2.13 build
root@ace2x1# ln -s /usr/src/linux-3.2.13 source

Next > ATi Graphics drivers on the domU