linux-sandbox: make delayed kill send SIGKILL instead of SIGALRM.

Before this change, the kill delay feature caused SIGALRM, which can be caught.
The intent appears to be to use SIGLARM to decide when to upgrade a SIGTERM to
something that can't be caught, but of course SIGALRM _can_ be caught. This is
a problem because linux-sandbox-pid1.cc does catch SIGALRM (and does the wrong
thing with it in certain conditions: http://b/168848201).

The only reason test_child_ignores_sigterm passes is because SIGALRM's default
action is also to terminate the process, and the bash script used here as the
grandchild process didn't trap that signal. Because we muck around with the
exit status of the child (https://bit.ly/2Hc7IX1) and therefore grandchild
(https://bit.ly/2FS8EQ1), the test saw an exit status for SIGTERM as it
expected.

In this change I've changed the test to also trap SIGLARM, and indeed without
the changes to the code under test it now no longer passes, instead just
getting stuck forever.

The fix is to actually raise SIGKILL, as expected. To make the code involved
less baffling I've gone from a "enumerate every signal" approach to "mention
only those signals we have a reason to mention". I will do the same for
linux-sandbox-pid1.cc in a future change.

Behavior change: a kill delay of zero seconds is no longer a special case:
instead of waiting forever for the child to deal with SIGTERM we now just
SIGKILL it immediately, as the name implies.

Behavior change: we no longer attempt to massage the exit status of the child
process (which is what hid this bug from the test). Instead we return it
directly. In particular this means that we may return zero even after a timeout
(see test_signal_catcher), but this seems fine: this program is supposed to be
a pass-through, and if the application says it was successful then who are we
to disagree?

PiperOrigin-RevId: 333251901
diff --git a/src/main/tools/linux-sandbox.cc b/src/main/tools/linux-sandbox.cc
index a025e0f..58673f1 100644
--- a/src/main/tools/linux-sandbox.cc
+++ b/src/main/tools/linux-sandbox.cc
@@ -58,7 +58,7 @@
 #include <sys/wait.h>
 #include <unistd.h>
 
-#include <string>
+#include <atomic>
 #include <vector>
 
 #include "src/main/tools/linux-sandbox-options.h"
@@ -69,13 +69,17 @@
 int global_outer_uid;
 int global_outer_gid;
 
-static int global_child_pid;
+// The PID of our child process, for use in signal handlers.
+static std::atomic<pid_t> global_child_pid{0};
 
-// The signal that will be sent to the child when a timeout occurs.
-static volatile sig_atomic_t global_next_timeout_signal = SIGTERM;
+// Must we politely ask the child to exit before we send it a SIGKILL (once we
+// want it to exit)? Holds only zero or one.
+static std::atomic<int> global_need_polite_sigterm{false};
 
-// The signal that caused us to kill the child (e.g. on timeout).
-static volatile sig_atomic_t global_signal;
+#if __cplusplus >= 201703L
+static_assert(global_child_pid.is_always_lock_free);
+static_assert(global_need_polite_sigterm.is_always_lock_free);
+#endif
 
 // Make sure the child process does not inherit any accidentally left open file
 // handles from our parent.
@@ -116,18 +120,32 @@
   }
 }
 
-static void OnTimeoutOrTerm(int sig) {
-  if (global_signal == 0) {
-    global_signal = sig;
+static void OnTimeoutOrTerm(int) {
+  // Find the PID of the child, which main set up before installing us as a
+  // signal handler.
+  const pid_t child_pid = global_child_pid.load(std::memory_order_relaxed);
+
+  // Figure out whether we should send a SIGTERM here. If so, we won't want to
+  // next time we're called.
+  const bool need_polite_sigterm =
+      global_need_polite_sigterm.fetch_and(0, std::memory_order_relaxed);
+
+  // If we're not supposed to ask politely, simply forcibly kill the child.
+  if (!need_polite_sigterm) {
+    kill(child_pid, SIGKILL);
+    return;
   }
-  kill(global_child_pid, global_next_timeout_signal);
-  if (global_next_timeout_signal == SIGTERM && opt.kill_delay_secs > 0) {
-    global_next_timeout_signal = SIGKILL;
-    alarm(opt.kill_delay_secs);
-  }
+
+  // Otherwise make a polite request, then arrange to be called again after a
+  // delay, at which point we'll send SIGKILL.
+  //
+  // Note that main sets us up as the signal handler for SIGALRM, and arranges
+  // for this code path to be taken only if kill_delay_secs > 0.
+  kill(child_pid, SIGTERM);
+  alarm(opt.kill_delay_secs);
 }
 
-static void SpawnPid1() {
+static pid_t SpawnPid1() {
   const int kStackSize = 1024 * 1024;
   std::vector<char> child_stack(kStackSize);
 
@@ -152,13 +170,14 @@
   // https://lkml.org/lkml/2015/7/28/833).
   PRINT_DEBUG("calling clone(2)...");
 
-  global_child_pid =
+  const pid_t child_pid =
       clone(Pid1Main, child_stack.data() + kStackSize, clone_flags, sync_pipe);
-  if (global_child_pid < 0) {
+
+  if (child_pid < 0) {
     DIE("clone");
   }
 
-  PRINT_DEBUG("linux-sandbox-pid1 has PID %d", global_child_pid);
+  PRINT_DEBUG("linux-sandbox-pid1 has PID %d", child_pid);
 
   // We close the write end of the sync pipe, read a byte and then close the
   // pipe. This proves to the linux-sandbox-pid1 process that we still existed
@@ -176,95 +195,43 @@
   }
 
   PRINT_DEBUG("done manipulating pipes");
+
+  return child_pid;
 }
 
-static int WaitForPid1() {
-  int err, status;
+static int WaitForPid1(const pid_t child_pid) {
+  // Wait for the child to exit, obtaining usage information. Restart in the
+  // case of a signal interrupting us.
+  int child_status;
+  struct rusage child_rusage;
+  while (true) {
+    const int ret = wait4(child_pid, &child_status, 0, &child_rusage);
+    if (ret > 0) {
+      break;
+    }
+
+    if (errno == EINTR) {
+      continue;
+    }
+
+    DIE("wait4");
+  }
+
+  // If we're supposed to write stats to a file, do so now.
   if (!opt.stats_path.empty()) {
-    struct rusage child_rusage;
-    do {
-      err = wait4(global_child_pid, &status, 0, &child_rusage);
-    } while (err < 0 && errno == EINTR);
-    if (err < 0) {
-      DIE("wait4");
-    }
     WriteStatsToFile(&child_rusage, opt.stats_path);
-  } else {
-    do {
-      err = waitpid(global_child_pid, &status, 0);
-    } while (err < 0 && errno == EINTR);
-    if (err < 0) {
-      DIE("waitpid");
-    }
   }
 
-  if (global_signal > 0) {
-    // The child exited because we killed it due to receiving a signal
-    // ourselves. Do not trust the exitcode in this case, just calculate it from
-    // the signal.
-    PRINT_DEBUG("child exited due to us catching signal: %s",
-                strsignal(global_signal));
-    return 128 + global_signal;
-  } else if (WIFSIGNALED(status)) {
-    PRINT_DEBUG("child exited due to receiving signal: %s",
-                strsignal(WTERMSIG(status)));
-    return 128 + WTERMSIG(status);
-  } else {
-    PRINT_DEBUG("child exited normally with exitcode %d", WEXITSTATUS(status));
-    return WEXITSTATUS(status);
+  // We want to exit in the same manner as the child.
+  if (WIFSIGNALED(child_status)) {
+    const int signal = WTERMSIG(child_status);
+    PRINT_DEBUG("child exited due to receiving signal: %s", strsignal(signal));
+    return 128 + signal;
   }
-}
 
-static void ForwardSignal(int signum) {
-  PRINT_DEBUG("ForwardSignal(%d)", signum);
-  kill(global_child_pid, signum);
-}
-
-static void SetupSignalHandlers() {
-  ClearSignalMask();
-
-  for (int signum = 1; signum < NSIG; signum++) {
-    switch (signum) {
-      // Some signals should indeed kill us and not be forwarded to the child,
-      // thus we can use the default handler.
-      case SIGABRT:
-      case SIGBUS:
-      case SIGFPE:
-      case SIGILL:
-      case SIGSEGV:
-      case SIGSYS:
-      case SIGTRAP:
-        break;
-      // It's fine to use the default handler for SIGCHLD, because we use
-      // waitpid() in the main loop to wait for children to die anyway.
-      case SIGCHLD:
-        break;
-      // One does not simply install a signal handler for these two signals
-      case SIGKILL:
-      case SIGSTOP:
-        break;
-      // Ignore SIGTTIN and SIGTTOU, as we hand off the terminal to the child in
-      // SpawnChild().
-      case SIGTTIN:
-      case SIGTTOU:
-        IgnoreSignal(signum);
-        break;
-      case SIGTERM:
-        InstallSignalHandler(signum, OnTimeoutOrTerm);
-        break;
-      case SIGINT:
-        if (opt.sigint_sends_sigterm) {
-          InstallSignalHandler(signum, OnTimeoutOrTerm);
-        } else {
-          InstallSignalHandler(signum, ForwardSignal);
-        }
-        break;
-      // All other signals should be forwarded to the child.
-      default:
-        InstallSignalHandler(signum, ForwardSignal);
-        break;
-    }
-  }
+  const int exit_code = WEXITSTATUS(child_status);
+  PRINT_DEBUG("child exited normally with code %d", exit_code);
+  return exit_code;
 }
 
 int main(int argc, char *argv[]) {
@@ -273,32 +240,64 @@
     DIE("prctl");
   }
 
+  // Start with default signal actions and a clear signal mask.
+  ClearSignalMask();
+
+  // Ignore SIGTTIN and SIGTTOU, as we hand off the terminal to the child in
+  // SpawnChild.
+  IgnoreSignal(SIGTTIN);
+  IgnoreSignal(SIGTTOU);
+
+  // Parse our command-line options and set up a global variable used by
+  // PRINT_DEBUG.
   ParseOptions(argc, argv);
   global_debug = opt.debug;
 
+  // Redirect output as requested.
   Redirect(opt.stdout_path, STDOUT_FILENO);
   Redirect(opt.stderr_path, STDERR_FILENO);
 
+  // Set up two globals used by the child process.
   global_outer_uid = getuid();
   global_outer_gid = getgid();
 
+  // Ensure we don't pass on any FDs from our parent to our child.
   CloseFds();
 
   // Spawn the child that will fork the sandboxed progam with fresh namespaces
   // etc.
-  SpawnPid1();
+  const pid_t child_pid = SpawnPid1();
 
-  // Set up signal handlers to manipulate the child as desired.
-  SetupSignalHandlers();
+  // Let the signal handlers installed below know the PID of the child.
+  global_child_pid.store(child_pid, std::memory_order_relaxed);
 
-  // If we've been asked to automatically time out after a certain amount of
-  // time, arrange for SIGALRM to handle the timeout and then set up an interval
-  // timer that will raise SIGALRM.
-  if (opt.timeout_secs > 0) {
-    InstallSignalHandler(SIGALRM, OnTimeoutOrTerm);
-    SetTimeout(opt.timeout_secs);
+  // If a kill delay has been configured, let the signal handlers installed
+  // below know that it needs to be respected.
+  if (opt.kill_delay_secs > 0) {
+    global_need_polite_sigterm.store(1, std::memory_order_relaxed);
   }
 
-  // Wait for the child to finish.
-  return WaitForPid1();
+  // OnTimeoutOrTerm, which is used for other signals below, assumes that it
+  // handles SIGALRM. We also explicitly invoke it after the timeout using
+  // alarm(2).
+  InstallSignalHandler(SIGALRM, OnTimeoutOrTerm);
+
+  // If requested, arrange for the child to be killed (optionally after being
+  // asked politely to terminate) once the timeout expires.
+  //
+  // Note that it's important to set this up before support for SIGTERM and
+  // SIGINT. Otherwise one of those signals could arrive before we get here, and
+  // then we would reset its opt.kill_delay_secs interval timer.
+  if (opt.timeout_secs > 0) {
+    alarm(opt.timeout_secs);
+  }
+
+  // Also ask/tell the child to quit on SIGTERM, and optionally for SIGINT too.
+  InstallSignalHandler(SIGTERM, OnTimeoutOrTerm);
+  if (opt.sigint_sends_sigterm) {
+    InstallSignalHandler(SIGINT, OnTimeoutOrTerm);
+  }
+
+  // Wait for the child to exit, returning an appropriate status.
+  return WaitForPid1(child_pid);
 }
diff --git a/src/test/shell/integration/linux-sandbox_test.sh b/src/test/shell/integration/linux-sandbox_test.sh
index c2e4fc0..7848f51 100755
--- a/src/test/shell/integration/linux-sandbox_test.sh
+++ b/src/test/shell/integration/linux-sandbox_test.sh
@@ -82,13 +82,25 @@
   assert_equals 134 "$code" # SIGNAL_BASE + SIGABRT = 128 + 6
 }
 
-# Tests that even when the child catches SIGTERM and exits with code 0, that the sandbox exits with
-# code 142 (telling us about the expired timeout).
 function test_signal_catcher() {
-  $linux_sandbox $SANDBOX_DEFAULT_OPTS -T 2 -t 3 -- /bin/bash -c \
-    'trap "echo later; exit 0" SIGINT SIGTERM SIGALRM; sleep 1000' &> $TEST_log || code=$?
-  assert_equals 142 "$code" # SIGNAL_BASE + SIGALRM = 128 + 14
-  expect_log "^later$"
+  # Run the sandbox with a child that catches SIGTERM and exits successfully,
+  # and a kill delay that ensures it gets the chance to see the signal.
+  $linux_sandbox $SANDBOX_DEFAULT_OPTS -t 30 -- /bin/bash -c \
+    'trap "exit 0" SIGINT SIGTERM SIGALRM; \
+     touch marker; \
+     sleep 10000' \
+    &> $TEST_log &
+  local sandbox_pid=$!
+
+  # Synchronize on the child having registered its signal handler.
+  until test -f "$SANDBOX_DIR/marker"; do sleep 1; done
+
+  # Send SIGTERM to the sandbox.
+  kill -SIGTERM "${sandbox_pid}"
+
+  # The sandbox should exit successfully: if the child says it succeeded, who
+  # are we to disagree?
+  wait "${sandbox_pid}"
 }
 
 function test_basic_timeout() {
@@ -96,40 +108,55 @@
   expect_log "^before$" ""
 }
 
-function test_timeout_grace() {
-  $linux_sandbox $SANDBOX_DEFAULT_OPTS -T 2 -t 3 -- /bin/bash -c \
-    'trap "echo -n before; sleep 1; echo -n after; exit 0" SIGINT SIGTERM SIGALRM; sleep 1000' &> $TEST_log || code=$?
-  assert_equals 142 "$code" # SIGNAL_BASE + SIGALRM = 128 + 14
+function test_timeout_exceeded_with_large_kill_delay() {
+  # Run the sandbox with a child that catches signals, waits while, then exits
+  # with a canned code. use a kill delay that gives it a chance to do so.
+  $linux_sandbox $SANDBOX_DEFAULT_OPTS -T 2 -t 30 -- /bin/bash -c \
+    'trap "echo -n before; sleep 1; echo -n after; exit 17" SIGINT SIGTERM SIGALRM; sleep 1000' &> $TEST_log || code=$?
+
+  # It should have been given the chance to get to its "clean shutdown" code,
+  # even despite taking a second.
+  assert_equals 17 "$code"
   expect_log "^beforeafter$"
 }
 
-function test_timeout_kill() {
+function test_timeout_and_kill_delay_exceeded() {
+  # Run the sandbox with a child that ignores SIGTERM, and with both a short
+  # timeout and a short kill delay.
   $linux_sandbox $SANDBOX_DEFAULT_OPTS -T 2 -t 3 -- /bin/bash -c \
-    'trap "echo before; sleep 1000; echo after; exit 0" SIGINT SIGTERM SIGALRM; sleep 1000' &> $TEST_log || code=$?
-  assert_equals 142 "$code" # SIGNAL_BASE + SIGALRM = 128 + 14
-  expect_log "^before$"
+    'trap "" SIGTERM; sleep 1000' &> $TEST_log || code=$?
+
+  # Once the timeout expired, the child should have been forcibly killed.
+  assert_equals 137 "$code" # SIGNAL_BASE + SIGKILL = 128 + 9
 }
 
 function test_sigint_sends_sigterm() {
+  # Run the sandbox with a child whose SIGTERM handler exits with a canned error
+  # code.
   $linux_sandbox $SANDBOX_DEFAULT_OPTS -T 100000 -i -- /bin/bash -c \
-    'trap "echo ignoring signal" SIGTERM; \
+    'trap "exit 17" SIGTERM; \
      trap "echo should not get here" SIGINT SIGALRM; \
-     for i in $(seq 5); do sleep 1; done; echo after' \
+     touch marker; \
+     sleep 10000' \
     &> $TEST_log &
-  local pid=$!
-  sleep 1
-  kill -INT "${pid}"
-  local code=0
-  wait "${pid}" || code=$?
-  assert_equals 130 "$code"
-  expect_log 'ignoring signal
-after'
+  local sandbox_pid=$!
+
+  # Synchronize on the child having registered its signal handler.
+  until test -f "$SANDBOX_DIR/marker"; do sleep 1; done
+
+  # Send SIGINT to the sandbox.
+  kill -SIGINT "${sandbox_pid}"
+
+  # That should be converted to SIGTERM by the sandbox, causing the child's
+  # handler to run.
+  wait "${sandbox_pid}" || code=$?
+  assert_equals 137 "$code" # SIGNAL_BASE + SIGTERM = 128 + 9
 }
 
 function test_debug_logging() {
   touch ${TEST_TMPDIR}/testfile
   $linux_sandbox $SANDBOX_DEFAULT_OPTS -D -- /bin/true &> $TEST_log || code=$?
-  expect_log "child exited normally with exitcode 0"
+  expect_log "child exited normally with code 0"
 }
 
 function test_mount_additional_paths_success() {
@@ -151,7 +178,7 @@
   expect_log "bind mount: ${TEST_TMPDIR}/bar -> ${TEST_TMPDIR}/bar\$"
   # mount a file to a customized path inside the sandbox
   expect_log "bind mount: ${TEST_TMPDIR}/testfile -> ${MOUNT_TARGET_ROOT}/sandboxed_testfile\$"
-  expect_log "child exited normally with exitcode 0"
+  expect_log "child exited normally with code 0"
   rm -rf ${MOUNT_TARGET_ROOT}/foo
   rm -rf ${MOUNT_TARGET_ROOT}/sandboxed_testfile
 }
@@ -212,7 +239,7 @@
   expect_log "bind mount: ${TEST_TMPDIR}/foo -> ${MOUNT_TARGET_ROOT}/foo\$"
   # mount a new source directory to the same target, which will overwrite the previous source path
   expect_log "bind mount: ${TEST_TMPDIR}/bar -> ${MOUNT_TARGET_ROOT}/foo\$"
-  expect_log "child exited normally with exitcode 0"
+  expect_log "child exited normally with code 0"
   rm -rf ${MOUNT_TARGET_ROOT}/foo
 }
 
@@ -290,25 +317,69 @@
   assert_linux_sandbox_exec_time 10 25 10 25
 }
 
-function test_child_receives_sigterm() {
-  $linux_sandbox $SANDBOX_DEFAULT_OPTS -- /bin/bash -c \
-    'trap "echo childsigterm; exit 1" SIGTERM; touch marker; sleep 1000' &> $TEST_log &
+function test_sigterm_is_forwarded_to_child() {
+  # Run the sandbox with a child that lets us know it saw SIGTERM and then exits
+  # gracefully. Use a kill delay that implies it will have a chance to see it.
+  $linux_sandbox $SANDBOX_DEFAULT_OPTS -t 30 -- /bin/bash -c \
+    'trap "exit 17" SIGTERM; \
+     touch marker; \
+     sleep 10000' \
+    &> $TEST_log &
   local sandbox_pid=$!
+
+  # Synchronize on the child having registered its signal handler.
   until test -f "$SANDBOX_DIR/marker"; do sleep 1; done
+
+  # Send SIGTERM to the sandbox, which should pass it on to the child.
   kill -SIGTERM "${sandbox_pid}"
+
+  # The sandbox should soon exit with the child's exit code.
   wait "${sandbox_pid}" || code=$?
-  assert_equals 143 "$code" # SIGNAL_BASE + SIGTERM = 128 + 15
-  expect_log "childsigterm"
+  assert_equals 17 "$code"
 }
 
-function test_child_ignores_sigterm() {
+function test_child_ignores_sigterm_and_sigalrm() {
+  # Run the sandbox with a child that ignores SIGTERM (the polite request to
+  # shut down that we send below) and SIGALRM (which we use internally and which
+  # was previously erroneously forwarded to the child).
+  #
+  # Set a kill delay of 2 seconds.
   $linux_sandbox $SANDBOX_DEFAULT_OPTS -t 2 -- /bin/bash -c \
-    'trap "" SIGTERM; touch marker; sleep 1000' &> $TEST_log &
+    'trap "" SIGTERM SIGALRM; touch marker; sleep 1000' &> $TEST_log &
   local sandbox_pid=$!
+
+  # Synchronize on the child having registered its signal handler.
   until test -f "$SANDBOX_DIR/marker"; do sleep 1; done
+
+  # Send SIGTERM to the sandbox to ask it nicely to shut down.
   kill -SIGTERM "${sandbox_pid}"
+
+  # The sandbox should soon exit with a code that indicates it was due to
+  # SIGKILL, from exceeding the kill delay.
   wait "${sandbox_pid}" || code=$?
-  assert_equals 143 "$code" # SIGNAL_BASE + SIGTERM = 128 + 15
+  assert_equals 137 "$code" # SIGNAL_BASE + SIGTERM = 128 + 9
+}
+
+function test_child_ignores_sigterm_and_sigalrm_no_kill_delay() {
+  # Run the sandbox with a child that ignores SIGTERM (the polite request to
+  # shut down that we send below) and SIGALRM (which we use internally and which
+  # was previously erroneously forwarded to the child).
+  #
+  # Unlike in the test above, don't enable a kill delay.
+  $linux_sandbox $SANDBOX_DEFAULT_OPTS -- /bin/bash -c \
+    'trap "" SIGTERM SIGALRM; touch marker; sleep 1000' &> $TEST_log &
+  local sandbox_pid=$!
+
+  # Synchronize on the child having registered its signal handler.
+  until test -f "$SANDBOX_DIR/marker"; do sleep 1; done
+
+  # Send SIGTERM to the sandbox to ask it nicely to shut down.
+  kill -SIGTERM "${sandbox_pid}"
+
+  # The sandbox should soon exit with a code that indicates it was due to
+  # SIGKILL, from exceeding the kill delay (of zero).
+  wait "${sandbox_pid}" || code=$?
+  assert_equals 137 "$code" # SIGNAL_BASE + SIGTERM = 128 + 9
 }
 
 # The test shouldn't fail if the environment doesn't support running it.