When a program has to kill a process, it may well want to perform some safety checks before invoking
kill(), to ensure that the process really is the one that it is expecting. For example, it might check that the
/proc/$PID/exe symlink matches the executable it originally used to start it. Or it might want to check the uid/gid the process is running as from
/proc/$PID/status. Or any number of other attributes about a process that are exposed in
This is really a similar pattern to the checks that a process might do when operating on a file. For example, it may stat() the file to check it has the required uid/gid ownership. It has long been known that doing
stat() followed by
open() is subject to a race condition which can have serious security implications in many circumstances. For this reason, well written apps will actually do
open() and then
fstat() on the file descriptor, so they’re guaranteed the checks are performed against the file they’re actually operating on.
If the kernel ever wraps around on PID numbers, it should be obvious that apps that do checks on
/proc/$PID before killing a process are also subject to a race condition with potential security implications. This leads to the questions of whether it is possible to design a race free way of checking & killing a process, similar to the
fstat() pattern used for files. AFAICT there is not, but there are a couple of ways that one could be constructed without too much extra support from the kernel.
To attack this, we first need an identifier for the process which is unique and not reusable if a process dies & new one takes it place. As mentioned above, if the kernel allows PID numbers to wrap, the PID is unsuitable for this purpose. A file handle for the
/proc/$PID directory though, it potentially suitable for this purpose. We still can’t simply open files like
/proc/$PID/exe directly, since that’s relying on the potentially reusable PID number. Here the
readlinkat() system calls come to rescuse. They allow you to pass a file descriptor for a directory and a relative filename. This guarantees that the file opened is in the directory expected, even if the original directory path changes / is replaced. All that is now missing is a way to kill a process, given a FD for
/proc/$PID. There are two ideas for this, first a new system call
fkill($fd, $signum), or a new proc file
/proc/$PID/kill where you can write signal numbers.
With these pieces, a race free way to validate a process’ executable before killing it would look like
int pidfd = open("/proc/1234", O_RDONLY); char buf[PATH_MAX]; char *exe = readlinkat(pidfd, "exe", buf, sizeof(buf)); if (strcmp(exe, "/usr/bin/mybinary") != 0) return -1; fkill(pidfd, SIGTERM);
Alternatively the last line could look like
char signum; snprintf(signum, sizeof(signum), "%d", SIGTERM); int sigfd = openat(pidfd, "kill", O_WRONLY); write(sigfd, signum, strlen(signum)) close(sigfd)
Though I prefer the
fkill() variant for reasons of brevity.
After reading this, you might wonder how something like systemd safely kills processes it manages without this kind of functionality. Well, it uses cgroups to confine processes associated with each service. Therefore it does not need to care about safety checking arbitrary PIDs, instead it can merely iterate over every PID in the service’s cgroup and kill them. The downside of cgroups is that it is Linux specific, but that doesn’t matter for systemd’s purposes. I believe that a
fkill() syscall, however, is something that could be more easily used in a variety of existing apps without introducing too much portability code. Apps could simply use
fkill() if available, but fallback to regular
NB I’ve no intention of actually trying to implement this, since I like to stay out of kernel code. It is just an idea I wanted to air in case it interests anyone else.