Fix block_for_lock.

On two fronts: First, it should follow standard command line semantics. Second, it should work as intended: --noblock_for_lock means the client will not wait for another command to finish, but will exit eagerly. It can be useful for preventing hanging in applications that are non-interactively calling bazel.

It should have standard startup-option semantics: the default value is accepted as a no-op or can be provided to override a previous value.

The next issue involves 2 different locks - the client lock, and the server-side command lock. This duality exists because we would like, one day, to be able to run certain commands, like info or help, at the same time, so multiple commands would need specialized locks that allow some duality but blocks others. This can only be done at the server level, so as soon as the client gets the "we're connected" grpc message from the server, it releases the client lock and lets the server manage  multiple requests.

There are basically 3 possible states that are relevant to this option:

1) no other client is active, so no one holds the client lock or the command lock - the server can be used, shutdown or started as needed. - no blocking, but no need to block, either, so we're safe
2) another client (client1) holds the lock, but it is currently using a server that we want to reuse. If client1 still holds the client lock, we fail fast. Same thing if client1 is holding the server-side lock: we will exit gracefully when the BlazeCommandDispatcher responds with a failure.
3) client1 holds the lock but its server cannot be reused. (batch clients also fall into this category, as there is no server to reuse - but in this case, the client lock is still in play). However, for server mode, this is broken - the following happens:
   - Server is occupied with client1's request, holds the command lock
   - client2 wants to restart the server, so sends the old server a "shutdown" command
   - the BlazeCommandDispatcher says - nuh-uh, this is busy, and you said you didn't want to wait for the lock
   - client2 absorbs this response
   - waits (blocks...)
   - for a minute
   - then force shuts-down the old server.

So we had 2 problems - we block, and we shutdown a server that we truly intended to keep going. Now, if the server responds saying another action is using it, the client will exit correctly, and leave the old server to do its thing.

RELNOTES: None.
PiperOrigin-RevId: 205671817
diff --git a/src/main/cpp/blaze.cc b/src/main/cpp/blaze.cc
index fe44107..84abd69 100644
--- a/src/main/cpp/blaze.cc
+++ b/src/main/cpp/blaze.cc
@@ -1515,6 +1515,8 @@
       new GrpcBlazeServer(globals->options->connect_timeout_secs));
 
   globals->command_wait_time = blaze_server->AcquireLock();
+  BAZEL_LOG(INFO) << "Acquired the client lock, waited "
+                  << globals->command_wait_time << " milliseconds";
 
   WarnFilesystemType(globals->options->output_base);
 
@@ -1744,12 +1746,37 @@
   request.set_client_description("pid=" + blaze::GetProcessIdAsString() +
                                  " (for shutdown)");
   request.add_arg("shutdown");
+  BAZEL_LOG(INFO) << "Shutting running server with request ["
+                  << request.ShortDebugString() << "]";
   std::unique_ptr<grpc::ClientReader<command_server::RunResponse>> reader(
       client_->Run(&context, request));
 
+  // TODO(b/111179585): Swallowing these responses loses potential messages from
+  // the server, which may be useful in understanding why a shutdown failed.
+  // However, we don't want to spam the user in case the shutdown works
+  // perfectly fine, so we discard the information. For --noblock_for_lock, this
+  // means that we don't output the PID of the competing client, which isn't
+  // great. We could either store the stderr_output returned by the server and
+  // output it in the case of a failed shutdown, or we could add a
+  // special-cased field in RunResponse for this purpose.
   while (reader->Read(&response)) {
   }
 
+  // Check the final message from the server to see if it exited because another
+  // command holds the client lock.
+  if (response.finished()) {
+    if (response.exit_code() == blaze_exit_code::LOCK_HELD_NOBLOCK_FOR_LOCK) {
+      assert(!globals->options->block_for_lock);
+      BAZEL_DIE(blaze_exit_code::LOCK_HELD_NOBLOCK_FOR_LOCK)
+          << "Exiting because the lock is held and --noblock_for_lock was "
+             "given.";
+    }
+  }
+
+  // If for any reason the shutdown request failed to initiate a termination,
+  // this is a bug. Yes, this means the server won't be forced to shut down,
+  // which might be the preferred behavior, but it will help identify the bug.
+  assert(response.termination_expected());
   // Wait for the server process to terminate (if we know the server PID).
   // If it does not terminate itself gracefully within 1m, terminate it.
   if (globals->server_pid > 0 &&
@@ -1812,6 +1839,8 @@
   // Release the server lock because the gRPC handles concurrent clients just
   // fine. Note that this may result in two "waiting for other client" messages
   // (one during server startup and one emitted by the server)
+  BAZEL_LOG(INFO)
+      << "Releasing client lock, let the server manage concurrent requests.";
   blaze::ReleaseLock(&blaze_lock_);
 
   std::thread cancel_thread(&GrpcBlazeServer::CancelThread, this);
diff --git a/src/main/cpp/blaze_util_posix.cc b/src/main/cpp/blaze_util_posix.cc
index 79ebbb0..7cb33f3 100644
--- a/src/main/cpp/blaze_util_posix.cc
+++ b/src/main/cpp/blaze_util_posix.cc
@@ -656,7 +656,7 @@
     }
 
     if (!block) {
-      BAZEL_DIE(blaze_exit_code::BAD_ARGV)
+      BAZEL_DIE(blaze_exit_code::LOCK_HELD_NOBLOCK_FOR_LOCK)
           << "Exiting because the lock is held and --noblock_for_lock was "
              "given.";
     }
diff --git a/src/main/cpp/blaze_util_windows.cc b/src/main/cpp/blaze_util_windows.cc
index 19ce3af..e851ef0 100644
--- a/src/main/cpp/blaze_util_windows.cc
+++ b/src/main/cpp/blaze_util_windows.cc
@@ -1155,14 +1155,15 @@
     }
     if (GetLastError() == ERROR_SHARING_VIOLATION) {
       // Someone else has the lock.
+      BAZEL_LOG(USER) << "Another command holds the client lock";
       if (!block) {
-        BAZEL_DIE(blaze_exit_code::BAD_ARGV)
-            << "Another command is running. Exiting immediately.";
+        BAZEL_DIE(blaze_exit_code::LOCK_HELD_NOBLOCK_FOR_LOCK)
+            << "Exiting because the lock is held and --noblock_for_lock was "
+               "given.";
       }
       if (first_lock_attempt) {
         first_lock_attempt = false;
-        BAZEL_LOG(USER)
-            << "Another command is running. Waiting for it to complete...";
+        BAZEL_LOG(USER) << "Waiting for it to complete...";
         fflush(stderr);
       }
       Sleep(/* dwMilliseconds */ 200);
diff --git a/src/main/cpp/startup_options.cc b/src/main/cpp/startup_options.cc
index 7b5625c..9f0589d 100644
--- a/src/main/cpp/startup_options.cc
+++ b/src/main/cpp/startup_options.cc
@@ -209,18 +209,21 @@
   } else if (GetNullaryOption(arg, "--nodeep_execroot")) {
     deep_execroot = false;
     option_sources["deep_execroot"] = rcfile;
+  } else if (GetNullaryOption(arg, "--block_for_lock")) {
+    block_for_lock = true;
+    option_sources["block_for_lock"] = rcfile;
   } else if (GetNullaryOption(arg, "--noblock_for_lock")) {
     block_for_lock = false;
     option_sources["block_for_lock"] = rcfile;
   } else if (GetNullaryOption(arg, "--host_jvm_debug")) {
     host_jvm_debug = true;
     option_sources["host_jvm_debug"] = rcfile;
-  } else if ((value = GetUnaryOption(arg, next_arg,
-                                     "--host_jvm_profile")) != NULL) {
+  } else if ((value = GetUnaryOption(arg, next_arg, "--host_jvm_profile")) !=
+             NULL) {
     host_jvm_profile = value;
     option_sources["host_jvm_profile"] = rcfile;
-  } else if ((value = GetUnaryOption(arg, next_arg,
-                                     "--host_javabase")) != NULL) {
+  } else if ((value = GetUnaryOption(arg, next_arg, "--host_javabase")) !=
+             NULL) {
     // TODO(bazel-team): Consider examining the javabase and re-execing in case
     // of architecture mismatch.
     host_javabase = blaze::AbsolutePathFromFlag(value);
diff --git a/src/main/cpp/util/exit_code.h b/src/main/cpp/util/exit_code.h
index be6f8de..fc095d8 100644
--- a/src/main/cpp/util/exit_code.h
+++ b/src/main/cpp/util/exit_code.h
@@ -32,6 +32,10 @@
   // The user interrupted the build, most probably with Ctrl-C.
   INTERRUPTED = 8,
 
+  // The client or server lock is held, and --noblock_for_lock was passed, so
+  // this command fails fast.
+  LOCK_HELD_NOBLOCK_FOR_LOCK = 9,
+
   // Something is wrong with the host Bazel is running on and a re-run of the
   // same command probably will not help.
   LOCAL_ENVIRONMENTAL_ERROR = 36,
diff --git a/src/main/java/com/google/devtools/build/lib/runtime/BlazeCommandDispatcher.java b/src/main/java/com/google/devtools/build/lib/runtime/BlazeCommandDispatcher.java
index f10a603..92c1370 100644
--- a/src/main/java/com/google/devtools/build/lib/runtime/BlazeCommandDispatcher.java
+++ b/src/main/java/com/google/devtools/build/lib/runtime/BlazeCommandDispatcher.java
@@ -177,7 +177,7 @@
           case ERROR_OUT:
             outErr.printErrLn(String.format("Another command (" + currentClientDescription + ") is "
                 + "running. Exiting immediately."));
-            return BlazeCommandResult.exitCode(ExitCode.COMMAND_LINE_ERROR);
+            return BlazeCommandResult.exitCode(ExitCode.LOCK_HELD_NOBLOCK_FOR_LOCK);
 
           default:
             throw new IllegalStateException();
diff --git a/src/main/java/com/google/devtools/build/lib/util/ExitCode.java b/src/main/java/com/google/devtools/build/lib/util/ExitCode.java
index b51e720..d5f2378 100644
--- a/src/main/java/com/google/devtools/build/lib/util/ExitCode.java
+++ b/src/main/java/com/google/devtools/build/lib/util/ExitCode.java
@@ -49,6 +49,8 @@
   public static final ExitCode RUN_FAILURE = ExitCode.create(6, "RUN_FAILURE");
   public static final ExitCode ANALYSIS_FAILURE = ExitCode.create(7, "ANALYSIS_FAILURE");
   public static final ExitCode INTERRUPTED = ExitCode.create(8, "INTERRUPTED");
+  public static final ExitCode LOCK_HELD_NOBLOCK_FOR_LOCK =
+      ExitCode.create(9, "LOCK_HELD_NOBLOCK_FOR_LOCK");
 
   public static final ExitCode REMOTE_ENVIRONMENTAL_ERROR =
       ExitCode.createInfrastructureFailure(32, "REMOTE_ENVIRONMENTAL_ERROR");
diff --git a/src/test/cpp/test_util.cc b/src/test/cpp/test_util.cc
index c9e2698..e39c96c 100644
--- a/src/test/cpp/test_util.cc
+++ b/src/test/cpp/test_util.cc
@@ -18,6 +18,10 @@
 
 namespace blaze {
 
+// TODO(b/31290599): We should not only check that the options are registered,
+// but also that they are parsed. StartupOptions* options would need to be
+// non-const to call ProcessArgs and test that the value is recognized by the
+// command line.
 void ExpectIsNullaryOption(const StartupOptions* options,
                            const std::string& flag_name) {
   EXPECT_TRUE(options->IsNullary("--" + flag_name));