Q1+Q2 2008, 10 ECTS
Live Migration of Virtual Machines
by Jacob Gorm HansenVirtual machines (VMs) have become a popular way of multiplexing commodity hardware in corporate data centers. Compared to traditional processes, VMs offer stronger performance and security isolation and have fewer dependencies on the sofware that must be installed to support them. When the hardware abstraction on which VMs are running is sufficiently well-defined, it becomes possible to migrate a VM between different physical hosts, e.g., when the original host needs to be taken down for repair, or for load-balancing purposes. In 2002, we designed and implemented the NomadBIOS paravirtualized hypervisor, and were the first to show how a VM running a production-class operating system could be migrated between two hosts with almost imperceptible downtime, less than 100 ms. Such functionality, now commonly referred to as "Live Migration", has since become a hallmark of VM systems, and is supported in products from VMware, Citrix, Oracle, and planned in a future product announced by Microsoft.
The drive towards commodity hardware has not only affected corporate data centers, but also the world of scientific computing. For many workloads, a cluster of modern PCs are able to match the performance of a custom-built super computer, at a fraction of the price. Grid and Utility computing promise to make computing resources available to anyone over the network, but contemporary cluster management systems have lacked crucial features, such as pre-emptive scheduling and strong job isolation of jobs, resulting in suboptimal resource utilization and increased concerns about security.
Building on our previous live VM migration work, we propose to use VMs as the cornerstone of a secure cluster management system that can deliver on the promises of Grid and Utility computing. VMs provide strong isolation between users, and VM migration allows a job (a collection of VMs spread over multiple physical machines) to be suspended or moved to a different set of machines, without loss of computational progress. Because we are concerned about security, we design our system to have the smallest possible attack-surface, minimizing the amount of trusted code that must communicate with the outside world. Live VM migration has previously been implemented as a complex feature of the trusted Virtual Machine Monitor, but we show how this feature can be completely moved to the inside of the VM, an approach that we refer to as "Self-Migration". Using this and other novel techniques, we were able to implement a truly minimal network control plane service, with cryptographic authentication of incoming VMs and lease-based resource reservations, in only a few hundred lines of C code.
The talk will focus mostly on self-migration and our "Evil Man" cluster management system ("Evil Man" derives from the "On Demand" slogan previously used for utility computing at IBM), but I will also spend some time describing some of our other contributions, such as the Blink secure 3D display system for desktop VMs.
