The impossible EFAULT

I have written a program; suppose it's called worker. (While the program is written in Haskell, I don't think that's particularly relevant to this post.)

(EDIT: Reproducer can be found here.)

(EDIT 2: Diagnosis by int-e on irc here.)

When run, worker starts a bunch of copies of a script. Under normal circumstances this script sets up a container using Linux cgroups and Linux user namespaces, but none of that is relevant because the strange behaviour in question occurs just fine without all of that -- in fact, we'll let it start the following script, say ./sleep.sh:

#!/bin/bash
sleep 10

Clearly, there is no weird behaviour here, assuming that the system has bash under /bin, and mine does.

The copies of sleep.sh are started by passing ./sleep.sh to posix_spawnp(3). (The Haskell process library does this for me.) The thing is, occasionally (once every 5 to 10 invocations of ./worker, approximately), posix_spawnp returns EFAULT ("Bad Address"). The manpage for posix_spawnp says that:

ERRORS

The posix_spawn() and posix_spawnp() functions fail only in the case where the underlying fork(2), vfork(2) or clone(2) call fails; in these cases, these functions return an error number, which will be one of the errors described for fork(2), vfork(2) or clone(2).

In addition, these functions fail if:

ENOSYS Function not supported on this system.

Okay, so I should look for EFAULT in fork(2), vfork(2) and clone(2) to figure out what goes wrong, right? Wrong. Or, in any case, none of those manpages mention EFAULT. I've looked through the source code of posix_spawnp in glibc and it at least doesn't throw EFAULT directly; presumably, one of the subroutines it calls does. glibc is large and I don't think looking through the entire call tree will be very productive, so I tried to diagnose the issue from the outside instead.

And this is where the weirdness starts. Whenever my program encounters EFAULT from posix_spawnp, it prints Oops EFAULT; hence grepping for EFAULT gives output precisely if the error occurred in this run. I get the following observations:

("errors occur" means that once every few executions I get output indicating that EFAULT occurred; in the negative case I've run it for >20x the number of invocations that are necessary to produce EFAULT in the other cases, without any EFAULT.)

The only situation in which posix_spawnp seems to always succeed, is when stdout of the process that worker's output is piped to, is block-buffered. But this makes no sense: there shouldn't even be a reasonable way in which worker can even determine whether this is the case! Surely it can distinguish between ./worker | cat and ./worker (using isatty(3) -- this is precisely what grep does when not passed --line-buffered), but in all of the above cases the output is piped to another process anyway.

This is already spooky, but it gets even spookier: if I replace the invocation of ./sleep.sh by an invocation of sleep (i.e. removing the indirection of the shell script), errors occur in none of the above setups. Somehow, starting a script is different from starting a native process (and changing bash to dash in sleep.sh doesn't change anything). posix_spawnp shouldn't care what it is starting! That's the job of the loader, as far as I know. So what gives?

The cause

I'll try to reduce my own program to a minimal reproducer, and if I find anything I'll post an update to this post. In the meantime, spookiness.

snap-server modifies the environment to set the locale, and setenv(3) is not atomic. In particular, it breaks execve(2) when they race, and this is what happens. All possible solutions to this problem are hacks.