Sunday, 5 February 2012

Xen Part 4: Booting into dom0

It's been a while since I found any time to devote to Xen. Unfortunately, real life is no easier than hypervisor configuration.

Winding back a couple of weeks, we had a standard installation of Debian, which was to become our "Domain 0" (dom0) OS. This is the privileged operating system which we will be using for VM management, general system monitoring, and very little else. It's important that dom0 is respected and not sullied by the day-to-day web browsing, e-mail access and other risky tasks we perform on a daily basis, since dom0 is the all-powerful base of our system. Think of it like a standard Linux system's root account. Much like a single OS system is "owned" when root is compromised, all the VMs on our Xen system can be "owned" through a compromise of dom0.

Dom0 comprises of the Xen kernel and our host OS. The Xen kernel contains the hypervisor code responsible for sharing resources between multiple VMs, with a modified version of the Linux kernel running on top of it. Our goal today is to install the xen kernel and boot into dom0.

Xen can be downloaded from www.xen.org. Compilation and installation is very simple nowadays, with the most complex part being the identification of all the prerequisite packages required for the build.

We don't even need to bother with that. Debian provides a one-stop-shop package which includes a xen kernel and all the necessary xen tools (xend, xenstored, xm/xl, etc). Whilst not as up-to-date as the latest code from Xen, we'll give this a try first, in order to reap the benefits of Debian's package management. There is an official wiki page which we will be using as the basis for this process.

First, let's install Xen.

I'd best take a moment to explain the convention I use for code segments. A line prefixed by $ should be run using your standard user account (but can be run by root too). A line prefixed by # should be run as the root user, i.e. using sudo or su. Lines not prefixed by either $ or # show output.

# apt-get install xen-linux-system xen-qemu-dm-4.0
You might encounter the following error:

Setting up xen-utils-common (4.0.0-1) ...
insserv: Service xenstored has to be enabled to start service xend
insserv: Service xenconsoled has to be enabled to start service xend
insserv: exiting now!
update-rc.d: error: insserv rejected the script header
dpkg: error processing xen-utils-common (--configure):
subprocess installed post-installation script returned error exit status 1
configured to not write apport reports
Errors were encountered while processing:
xen-utils-common
E: Sub-process /usr/bin/dpkg returned an error code (1) 

If so, fix it with:

# insserv xencommons
# apt-get install -f 

Technically, that should be enough to boot the dom0 kernel. Let's have a look and see what the grub bootloader has listed (your output may differ):

$ grep menuentry /boot/grub/grub.cfg | grep -v recovery
menuentry 'Debian GNU/Linux, with Linux 2.6.32-5-xen-amd64' --class debian --class gnu-linux --class gnu --class os {
menuentry 'Debian GNU/Linux, with Linux 2.6.32-5-amd64' --class debian --class gnu-linux --class gnu --class os {
menuentry 'Debian GNU/Linux, with Linux 2.6.32-5-xen-amd64 and XEN 4.0-amd64' --class debian --class gnu-linux --class gnu --class os --class xen {
The first entry is for the xen kernel, but isn't going to start essential xen services. The second entry is the current kernel - a safe bet to fall back to if you can't boot your dom0 kernel. Finally, the third entry starts the xen kernel and boots all the xen services, resulting in dom0. We can forget about the first entry, and concentrate on our standard kernel and our dom0 kernel.

I personally advise starting an ssh server at this point, so you have a way into your system should the dom0 kernel freeze on boot. You should always be able to restart the machine and select your standard kernel via GRUB, but being able to log in to your box and try fixes at the time can be invaluable. To get ssh running, install openssh-server and customise /etc/ssh/sshd_config to your needs. 

At this stage, I'd try booting into your dom0 kernel. Let's see if it works.

If that fell over spectacularly, this is where you'll really have fun. If it falls over at the hypervisor or early kernel stage, you may find it easier to use a serial card for debugging - there are many good howtos on the web explaining how to configure such a setup. 

I encountered a problem with a blank screen at the point when gdm (the graphical login) usually starts. Looking at the logs over a ssh connection (see, it is useful), I quickly encountered the following error in /var/log/Xorg.0.log:


(EE) XKB: Could not invoke xkbcomp(EE) XKB: Couldn't compile keymapKeyboard initialization failed. This could be a missing or incorrect setup of xkeyboard-config. 
Fatal server error:Failed to activate core devices.

A bug to this effect has already been filed at Debian's bug tracker, without much by way of conclusion. After some digging, the problem seems to be buggy PAT code in the old(-ish) kernel version we're using. A simple workaround is to apply the boot parameter "nopat" to the kernel command line.

Command line text editors are a touchy topic, with everybody having their own personal preference. If you're new to command line text editors, now's the time to pick one up and learn the basics. Like many, I use vim (vi to more traditional folks), which isn't considered the simplest to pick up. You may be more comfortable using e.g. nano. On the flipside, if you're already an emacs fan, I emplore you to resist the urge to flame me in the comments. Just try to perform the mental gymnastics required to substitute your favourite for the placeholder string "vim"...

# vim /etc/default/grub
GRUB_CMDLINE_LINUX="nopat"
Whilst you're there, now might be a good time to update the default OS to our dom0 kernel.

$ grep -n menuentry /boot/grub/grub.cfg
Find your dom0 kernel in the resultant output, subtract 1 from the first number on the line, and enter that in place of the X below:

# vim /etc/default/grub
GRUB_DEFAULT=X
After modifying the grub config file, we need to regenerate /boot/grub/grub.cfg based on the new settings:

# update-grub
Reboot back into dom0:
# shutdown -r now
Hopefully, this time, you can boot into dom0, complete with that welcoming GNOME GUI. At this stage, everything should be set - xend should be running, and we should be ready to create some domUs. Let's check. This command should show us a line for each OS running on the hypervisor, including one for dom0:
# xm list
Error: Unable to connect to xend: Connection refused. Is xend running?
If you got a line showing your dom0, hurrah - go away and grab a beer. If you see something annoyingly similar to the above, your work is not yet done...

At this stage, a solid bet is to check the xen logs under /var/log/xen/*. Unfortunately, that wasn't terribly helpful in my case, so I had to do some more digging. I noticed a couple of instances of /etc/init.d/xencommons were still running, as were many instances of "xenstore-read -s /":
$ ps aux | grep xen
So I tried to start xencommons manually:
# /etc/init.d/xencommons start

This didn't work, or error... it just hang indefinitely.
I added echo statements throughout this init script to monitor its progress and identify when and where it hangs (you may wish to do the same). It turns out it was hanging on the call to xenstore-read.
As a hacky workaround, I commented that call out (it only appears to be used as a bizarre test of whether xenstored is already running):

# vim /etc/init.d/xencommons
change this line:
if ! `xenstore-read -s / >/dev/null 2>&1`
to this:
if true #! `xenstore-read -s / >/dev/null 2>&1` 
That got past that little problem, but the script then spat out an error about "evtchn". This is easy, we're missing the kernel module xen_evtchn, which is easily remedied:

# echo "xen_evtchn" >> /etc/modules
# modprobe xen_evtchn

Starting xencommons should now be possible:

# /etc/init.d/xencommons start

Once you can start it manually, reboot your box once more and check everything automatically starts as expected. You should now be able to run xm list and get the proper output signifying dom0 is fully operative:

# xm list
Name                    ID   Mem VCPUs      State   Time(s)
Domain-0                 0 13703     8     r-----  42092.8

Do let us know how you get on.
Next > Dom0 X Instability