Sunday, 19 February 2012

Xen Part 6: Guest Installation

I haven't managed to solve the DPMS issue yet, but the workaround (disabling monitor power saving) seems to be doing the trick in the interim. So, time to plough on and get a domU up and running.

Guest Network Configuration

Debian advises the manual creation of a network bridge rather than using Xen's scripts, so we'll follow that path.

# apt-get install bridge-utils
# brctl addbr guestbr
# brctl addif guestbr eth0

Replace eth0 with the interface that you wish to bridge to (if you don't know, check ifconfig). You can add multiple interfaces if you like.

This is all well and good, but if you're using NetworkManager for your connection, you're likely to have just lost Internet connectivity! We're going to need to take your interface out of the hands of NetworkManager, and configure the bridge to load automatically.

# vim /etc/network/interfaces

It'll need to look something like this, substituting the appropriate interface name(s), etc:

auto lo guestbr
iface lo inet loopback
iface eth0 inet manual
iface guestbr inet dhcp
  bridge_ports eth0

Finally, perform this step to avoid an error later on when starting the domU ("missing vif-script"):


# chmod +x /etc/xen/scripts/*


Time to reboot and reclaim your networking capabilities. Hopefully!


If you're uncertain what to put in the interfaces file, there are countless resources online which can help. If you encounter networking issues with Xen specifically, the XenSource networking page is an invaluable reference.


PCI Passthrough with VT-d

I intend to dedicate a graphics card to one of the guests via VT-d, the preferential method. For this we need xen-pciback in the dom0 kernel. Fedora has xen-pciback compiled as a module, but on Debian it's integrated. There are pluses and minuses to each approach, but from our perspective the Debian method probably makes configuration marginally simpler.


Add the following to the GRUB_CMDLINE_LINUX entry in /etc/default/grub, where the PCI ID (in BDF notation) comes from lspci:

$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation Sandy Bridge Integrated Graphics Controller (rev 09)
01:00.0 VGA compatible controller: ATI Technologies Inc Device 6718
# vim /etc/default/grub


xen-pciback.hide=(01:00.0)

Then issue the necessary:

# update-grub

Reboot and verify it worked (you'll get some output if it did):

# xl pci-list-assignable-devices
0000:01:00.0


LVM Preparation

I'm using use LVM2 for my domu partitions because a) I want high performance, and b) I want to be able to grow the size of the domus. Using partitions would give me a) but not b). Using loopback images, i.e. files, would give me b) but not a). Using LVM gives me the best of both worlds.

The partition I have dedicated to domus (/dev/md3) needs to be configured to use LVM (pvcreate), with a volume group (vgcreate), and the necessary logical volumes (lvcreate), which will act like normal partitions for each domu.

# pvcreate /dev/md3
  Physical volume "/dev/md3" successfully created
# pvdisplay | grep "PV Size"
  PV Size               833.72 GiB / not usable 888.00 KiB
# vgcreate xendomu /dev/md3
  Volume group "xendomu" successfully created
# vgdisplay | grep "VG Size"
  VG Size               833.72 GiB

LV Preparation

It's not necessary to create a LV for the installation method we're using below, but I include this as a reference. Nothing to see here; skip to the next section. 

# lvcreate -L 50G -n ubuntu xendomu
  Logical volume "ubuntu" created
# lvdisplay | grep "LV Size"
  LV Size                50.00 GiB

There's a 50GB partition for my first domu: Ubuntu. Before the OS can be installed, the LV needs to be formatted. I'm going to be using ext3. I could have chosen ext4, the successor filesystem with superior performance in key aspects, but I've heard about a lot of odd bugs relating to PV guests on ext4, so I'm staying safe and sticking with ext3. 

# mkfs.ext3 /dev/xendomu/ubuntu
mke2fs 1.41.12 (17-May-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
3276800 inodes, 13107200 blocks
655360 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
400 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
4096000, 7962624, 11239424

Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 22 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

If we wanted, we could mount the new LV as such:

# mkdir -p /domu/ubuntu
# mount /dev/xendomu/ubuntu /domu/ubuntu

To permanently delete the LV:

# umount /domu/ubuntu
# lvchange -an /dev/xendomu/ubuntu
# lvremove /dev/xendomu/ubuntu 

Guest Installation Methods

There are different ways we could go about installing the domu systems. Here's a few methods:
  • Manually burn an installation CD, reboot the system and install it into the LV. Of course, no seasoned Linux user would consider a solution involving a reboot. Reboots are those things performed daily by users of Microsoft Windows.
  • Manually install Debian with debootstrap, or CentOS with rpmstrap, etc.
Either of those manual methods would then require you to manually create a config file for the installation, and issue an xm create -f /path/to/config command.  This is unnecessarily laborious, so here's some simpler methods:
  • Use a GUI: virt-manager is a very cool solution. It's based on libvirt, a virtualisation abstraction layer which can sit on top of either KVM or Xen. I played with this under Fedora. The problem was, I found it to be quite buggy. Most of the bugs can be worked around, but in some senses I was left wondering if it wasn't simpler to use the CLI in the first place
  • Use the complementary CLI tool, virt-install. It's developed by the same team as virt-manager, and uses the same backend, so can be seen as virt-manager, with neither the GUI nor the bugs
  • Use xen-create-image, part of xen-tools. This is what we'll be doing below.

First, make sure the prerequisites are installed:

# apt-get install xen-tools debootstrap

xen-create-image can be run with a mass of command line arguments, which define all the aspects of our domu. A more sensible approach is to amend its configuration file, which defines all the default parameters for the command. This is a much more workable solution. When we finally run the command, it will use the amended configuration file for its settings, and any parameters we supply as overrides. 

xen-create-image Configuration

# vim /etc/xen-tools/xen-tools.conf

1) Storage type. Tell the script that we're using LVM storage and provide the VG name:

lvm = xendomu

Note that EVMS is supported. I haven't tried this yet, but it sounds like a superb way of managing your storage. 

2) Installation method. The installation method section gives us a number of interesting options:
  • debootstrap (for Debian systems)
  • rpmstrap (for CentOS, Fedora etc.)
  • rinse (an alternative for CentOS, Fedora etc.) 
  • from an existing installation directory
  • from an existing tar file
As I'm installing Ubuntu, which is a Debian-based OS, I'll be using debootstrap, which happens to be the default - so nothing to change here.

3) Disk. The tool is going to create a Logical Volume (LV) on the VG we specified of the specified size, i.e. the below will result in xen-create-image running a command like lvcreate -L 50G -n <hostname>-disk xendomu

size = 50Gb

4) Memory. Choose how much RAM to allocate to the domu. I'll give Ubuntu 8GB for the time being - this can be changed later. Also, since I'm allocating so much RAM, I don't see the point in enabling a swap file. If you're allocating much less RAM, you may want to leave this enabled and possibly even increase its size (swap = 1024Mb, for example). Note that swap space is assigned by creating a dedicated LV on the provided VG.

memory = 8192Mb
noswap = 1

5) Distribution. Pick the distribution to install. This field takes the name of the distribution version, e.g. Oneiric for Ubuntu 11.10 Oneiric Ocelot. Other examples are centos5 for CentOS 5 and squeeze for Debian 6 stable. You can find the distributions supported by your copy of xen-tools in the hook script directory: /usr/lib/xen-tools

dist = oneiric

6) Networking. Standard settings, complete these as appropriate for your network:

gateway = 192.168.2.1
netmask = 255.255.255.0
broadcast = 192.168.2.255
dhcp = 1

7) Root password. By default, a generated root password is set. I'd rather set my own.

passwd = 1

8) Architecture. I doubt this is necessary, but just in case I'm setting this explicitly. If you don't have a 64-bit system & dom0 OS, ignore this.

arch = amd64

Guest Installation

Now that's all set up, all we need to do is invoke the command. Note that I've provided a few parameters:
  • hostname is mandatory, and according to the manpage should really be fully qualified, e.g. hostname.domain.com. I'm being rebellious.
  • vcpus sets the number of virtual CPUs to expose to the guest (the default is 1). I have a 4-core HT system, exposing 8 cores. I'll probably pin one of them to dom0 later on, so for the time being I'll just expose 7 to this domu.
  • ip sets the guest's IP address; use this if DHCP is disabled
xen-create-image --hostname=ace2x1 --vcpus=7

This will take some time, as it sets up the LV and downloads reams of data from the mirror you specified.

It Didn't Work!

It didn't work for me either. Look in /var/log/xen-tools for some logs explaining what went wrong.

1) If you find the error: 

"We are trying to configure an installation of <distribution> in <location> - but there is no hook directory for us to use. This means that we would not know how to configure this installation."

Then you've provided a distribution value for which xen-tools doesn't have the necesary hook files for. Let's see which distributions xen-tools supports:

# dpkg --status xen-tools | grep Version
Version: 4.2-1
# ls /usr/lib/xen-tools/ | grep natty
natty.d
# ls /usr/lib/xen-tools/ | grep oneiric

So, xen-tools 4.2-1 can install Ubuntu Natty (11.04), but doesn't know about Ubuntu Oneiric (11.10). As it so happens, the version of debootstrap I have doesn't support Oneiric either. Luckily, this is all easy to resolve:

# ln -s /usr/lib/xen-tools/karmic.d /usr/lib/xen-tools/oneiric.d
# ln -s /usr/share/debootstrap/scripts/gutsy /usr/share/debootstrap/scripts/oneiric

2) If you find the error:

"E: Failed getting release file <URL>"

then you should try setting the mirror location manually.

# vim /etc/xen-tools/xen-tools.conf

For Ubuntu distributions, comment out (add a # before) the line

mirror = `xt-guess-suite-and-mirror --mirror`

and uncomment (remove the # before) the line

mirror = http://gb.archive.ubuntu.com/ubuntu/

For other distributions, you're going to have to find out the URL and set it yourself.

3) If you find other errors, alas. You're on your own.

It Worked!

Great stuff!

The domU's Xen configuration file will reside at /etc/xen/<hostname>.cfg. It's worth having a look at. Note that you can set the kernel/initrd, amount of RAM, number of CPUs, disk devices, networking settings (including MAC address and hostname), and poweroff/reboot/crash behaviour.

# vim /etc/xen/<hostname>.cfg

We need to add the name of the network bridge we created, something like this:

vif        = [ 'mac=xx:xx:xx:xx:xx:xx,bridge=guestbr' ]

After that, fire her up!

# xm create -c /etc/xen/<hostname>.cfg
# xm console <hostname>

Sunday, 12 February 2012

Xen Part 5: Dom0 X Instability

After a little time running the Debian dom0, it became clear that something was amiss. Sometimes, when I returned to the PC, I would find it unresponsive. The screen simply refused to come out of power saving mode, or would dump me straight back at the gdm login screen.

However, I was able to SSH into the box, and 'top' showed that my BOINC applications where still alive and well - a sure sign that the problems were limited to X. I could always regain control over the box by restarting Gnome Display Manager:

# /etc/init.d/gdm3 stop && sleep 2 && /etc/init.d/gdm3 start


A tail of /var/log/Xorg.0.log.old showed I was getting a segmentation fault, but didn't give much else to go by. In an effort to discover what the problem was, I decided to debug the X session remotely. 


ace@remote# apt-get install xserver-xorg-core-dbg xserver-xorg-video-intel-dbg gdb
ace@remote# gdb /usr/bin/Xorg $(pidof Xorg)
[...snip...]
(gdb) handle SIGPIPE nostop
(gdb) cont
Program received signal SIGSEGV, Segmentation fault.

0x00007f88988ec775 in outl (port=61440, val=323652) at ../../../../hw/xfree86/common/compiler.h:438



(gdb) bt
#0  0x00007f88988ec775 in outl (port=61440, val=323652) at ../../../../hw/xfree86/common/compiler.h:438
#1  x_outl (port=61440, val=323652) at ../../../../hw/xfree86/int10/helper_exec.c:423
#2  0x00007f88988f3613 in x86emuOp_out_word_DX_AX (op1=<value optimized out>)
    at ../../../../hw/xfree86/int10/../x86emu/ops.c:9847
#3  0x00007f88988ff31c in X86EMU_exec () at ../../../../hw/xfree86/int10/../x86emu/decode.c:122
#4  0x00007f88988eded5 in xf86ExecX86int10 (pInt=0x1810780) at ../../../../hw/xfree86/int10/xf86x86emu.c:40
#5  0x00007f8898b098c9 in VBEDPMSSet (pVbe=0x180e3f0, mode=323652) at ../../../../hw/xfree86/vbe/vbe.c:1072
#6  0x000000000047382f in DPMSSet (client=<value optimized out>, level=<value optimized out>)
    at ../../../../hw/xfree86/common/xf86DPMS.c:167
#7  0x00007f889a636b7f in ProcDPMSForceLevel (client=0xf000) at ../../Xext/dpms.c:188
#8  ProcDPMSDispatch (client=0xf000) at ../../Xext/dpms.c:236

The Xorg server source code can be downloaded from the XOrg Foundation. Looking through the files involved in the stack trace, it seems that the monitor is being restored from power saving mode via DPMS when the segfault occurs. As a test, I tried disabling sleep mode for my monitor, via System -> Preferences -> Power Management -> Put display to sleep when inactive for <Never>. This solved the problem - but that's hardly what I'd call a solution. Best get back to it.

At frame #1, the x86 emulator has already decided to send value 323652 to port 61440 (0xf000). That's probably the port allocated to my onboard graphics card; let's double-check.

$ dmesg | grep 0xf000
[    4.001927] pci 0000:00:02.0: reg 20 io port: [0xf000-0xf03f]
$ lspci | grep 00:02.0
00:02.0 VGA compatible controller: Intel Corporation Sandy Bridge Integrated Graphics Controller (rev 09)

Indeed it is (remember that I'm using my onboard graphics controller for dom0, with the intention of using pci passthrough to dedicate my HD6970 to a domU). So we're using outl to write a long (a 32-bit value) to the 64-bit port at location 0xf000 - 0xf03f, and receiving a segfault. Damn; I was hoping for something a little simpler.

Here's the code in compiler.h (obtained from the Xorg source linked to above) where the segfault is occuring:

static __inline__ void
outl(unsigned short port, unsigned int val)
{
   __asm__ __volatile__("outl %0,%1" : :"a" (val), "d" (port));
}

If the above seems a little unfamiliar to you, you're not alone. That method contains an example of extended inline assembly code. Sandeep has an excellent explanation is his howto guide on inline assembly. A read through of that short howto in its entirely will give you a good understanding of what that line does.

In brief, it executes assembly command "out", with the source operand set to the 32-bit value in C variable $val, and the destination operand set to the value in C variable $port.  "a" and "d" are constraints: they specify that $val should be placed into the eax register and $port should be placed into the edx (well, port is a word value, so dx) register. I'd imagine the resulting AT&T assembly code would look something like this:

movl val,%eax
movw port,%edx
outl %eax,%edx

And the assembly command "out"? That just writes data in the source register to the port provided in the destination register. So far, everything looks fine: we're not really getting to the root of the problem. 

The O'Reilly book Linux Device Drivers, 2nd Edition, mentions in chapter 8 that ioperm or iopl should be called prior to inX/outX calls being made. ioperm can't assign permissions to ports as high as 61440, so it should be using iopl(3) instead to assign permissions to all ports. There is indeed such a call:

$ grep -n -r "iopl(3)" *
hw/xfree86/os-support/linux/lnx_video.c:523:        if (ioperm(0, 1024, 1) || iopl(3)) {

This is in xf86EnableIO(). I checked that it was being called by starting Xorg directly with gdb, and breaking on iopl - it hit. Seemingly another dead end. 

Time to try things for myself.

$ vim outl_test.c
#include <stdio.h>

static __inline__ void
outl(unsigned short port, unsigned int val)
{
   __asm__ __volatile__("outl %0,%1" : :"a" (val), "d" (port));
}

int main() {
  printf("Testing outl\n");
  outl(61140,323652);
  printf("Done\n");
  return 0;
}
$ gcc -O -g -o outl_test outl_test.c
$ gdb ./outl_test
GNU gdb (GDB) 7.0.1-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from <stripped>/outl_test...done.
(gdb) run

Starting program: <stripped>/outl_test 
Testing outl
Done

Program exited normally.

(gdb) run
Starting program: <stripped>/outl_test 
Testing outl

Program received signal SIGSEGV, Segmentation fault.
0x00000000004004fc in outl () at outl_test2.c:6
6 outl_test.c: No such file or directory.
in outl_test.c
(gdb)

That's right - this test failed intermittently. It wasn't uncommon to have to run the program 50 times before it segfaulted, or vice versa. The same behaviour was exhibited when running as root. Why did it fail intermittently? 

Well, that isn't really the question... the question is, why did it ever succeed? I should need to request IO port access via iopl(level), otherwise it runs at the default level 0. Odd. I expanded my test case.

#include <stdio.h>
#include <errno.h>

// from <sys/io.h>
extern int iopl (int __level) __THROW;

static int requestPortAccess = 1; // runs iopl(3)
static int clearPortAccess = 0; // runs iopl(0)

static int RUN_TIMES = 1000;
static unsigned short PORT = 61140;
static unsigned int VAL = 323652;

static __inline__ void
outl(unsigned short port, unsigned int val)
{
   __asm__ __volatile__("outl %0,%1" : :"a" (val), "d" (port));
}

int main(int argc, char *argv[]) {
  int i = 1;
  int ret = 0;
  // Handle args (yes it's ugly as heck, but it's 3am and I need my dinner)
  if (argc>1) {
    if (atoi(argv[1]) >=0  && atoi(argv[1]) <=1) {
      requestPortAccess=atoi(argv[1]);
    }
  }
  if (argc>2) {
    if (atoi(argv[2]) >=0  && atoi(argv[2]) <=1) {
      clearPortAccess=atoi(argv[2]);
    }
  }
  if (clearPortAccess > 0) {
    // Simulate forgetting to call iopl(3)
    ret = iopl(0);
    printf("iopl(0) returned %i\n",ret);
  }
  if (requestPortAccess > 0) {
    printf("iopl(3)\n");
    ret = iopl(3);
    printf("iopl(3) returned %i\n", ret);
  }
  if (ret < 0 && errno == ENODEV) {
    printf("No I/O ports found\n");
  } else if (ret < 0) {
    printf("iopl failed to get access to the IO ports, errno: %i\n", errno);
  }
  // Failure can be intermittent, so run plenty of times
  for (;i<=RUN_TIMES;i++) {
    printf("ATTEMPT %i, outl(%i,%i)\n",i,PORT,VAL);
    outl(PORT,VAL);
    printf("Done\n");
  }
  return 0;
}

This allowed for a few more tests. 

1) $ ./outl_test 0 1
2) $ ./outl_test 1 0
3) # ./outl_test 0 1 
4) # ./outl_test 1 0
5) $ ./outl_test 0 0
6) # ./outl_test 0 0

1) iopl(0) as user [outl always segfaulted]
2) iopl(3) as user [outl always segfaulted]
3) iopl(0) as root [outl always segfaulted]
4) iopl(3) as root [outl never segfaulted]
5) no iopl as user [outl intermittently segfaulted]
6) no iopl as root [outl intermittently segfaulted]

The results of tests 1-4 are as one would expect, but the results of tests 5 and 6 don't seem to comply with the spec.  The process should be running with io privilege level 0 by default (== iopl(0)), so outl should always segfault.

And to recap, this occurs only when running with the 2.6.32-5-xen-amd64 with Xen, i.e. in dom0 mode. It doesn't occur when just running 2.6.32-5-xen-amd64 or 2.6.32-5-amd64.

Investigations continue...


Sunday, 5 February 2012

Autostart Synergy on Debian

If, like me, you use multiple computers, chances are you've come across Synergy - probably the best application for controlling multiple PCs from one keyboard and mouse. (For the uninitiated, throw away your KVM switches - a superior solution exists!)
To get the synergy client to autostart on Debian, you'll need to modify three files. The first will start synergyc as root for the gdm login screen. The second stops synergyc after you've logged in, and the third starts synergyc as your user.
# vim /etc/gdm3/Init/Default
Add the following almost at the end, just above the "exit 0":
/usr/bin/killall synergyc
while [ $(pgrep -x synergyc) ]; do sleep 0.1; done
/usr/bin/synergyc <synergy server IP address>
The second file can be based on the sample which Debian provides:
# mv /etc/gdm3/PostLogin/Default.sample /etc/gdm3/PostLogin/Default
# vim /etc/gdm3/PostLogin/Default 
Add the following at the end:

/usr/bin/killall synergyc
while [ $(pgrep -x synergyc) ]; do sleep 0.1; done

For the final file:
# vim /etc/X11/Xsession.d/80synergyc

Add the following:
/usr/bin/killall synergyc
while [ $(pgrep -x synergyc) ]; do sleep 0.1; done
/usr/bin/synergyc <synergy server IP address>
Modifying gdm's scripts and duplicating server IP addresses all over the place is obviously a terrible means to an end. Better implementations are left as an exercise for the reader.