When this was being written, distributors were working quickly to ship kernel updates fixing the local root vulnerabilities in the vmsplice() system call. Unlike a number of other recent vulnerabilities which have required special situations (such as the presence of specific hardware) to exploit, these vulnerabilities are trivially exploited and the code to do so is circulating on the net. The author found himself wondering how such a wide hole could find its way into the core kernel code, so he set himself the task of figuring out just what was going on - a task which took rather longer than he had expected.
The splice() system call, remember, is a mechanism for creating data flow plumbing within the kernel. It can be used to join two file descriptors; the kernel will then read data from one of those descriptors and write it to the other in the most efficient way possible. So one can write a trivial file copy program which opens the source and destination files, then splices the two together. The vmsplice() variant connects a file descriptor (which must be a pipe) to a region of user memory; it is in this system call that the problems came to be.
The first step in understanding this vulnerability is that, in fact, it is three separate bugs. When the word of this problem first came out, it was thought to only affect 2.6.23 and 2.6.24 kernels. Changes to the vmsplice() code had caused the omission of a couple of important permissions checks. In particular, if the application had requested that vmsplice() move the contents of a pipe into a range of memory, the kernel didn't check whether that application had the right to write to that memory. So the exploit could simply write a code snippet of its choice into a pipe, then ask the kernel to copy it into a piece of kernel memory. Think of it as a quick-and-easy rootkit installation mechanism.
If the application is, instead, splicing a memory range into a pipe, the kernel must, first, read in one or more iovec structures describing that memory range. The 2.6.23 vmsplice() changes omitted a check on whether the purported iovec structures were in readable memory. This looks more like an information disclosure vulnerability than anything else - though, as we will see, it can be hard to tell sometimes.