Use ForkJoinPool for the CPU-heavy executor.

Using FJP for the CPU-heavy pool yields better wall time than ThreadPoolExecutor on machines with many cores. The previous regression found in the 72-core machine is no longer present.

Benchmark results with flag value = HOST_CPUS showed:
* 12 cores: -12.37% CPU, -8.88% wall
* 16 cores: -5.82% CPU, +0.48% wall (~1s)
* 72 cores: -2.98% CPU, +1.61% wall (~1s, statistically insignificant)

PiperOrigin-RevId: 374620567
diff --git a/src/main/java/com/google/devtools/build/skyframe/AbstractParallelEvaluator.java b/src/main/java/com/google/devtools/build/skyframe/AbstractParallelEvaluator.java
index cdd12d4..e19fca7 100644
--- a/src/main/java/com/google/devtools/build/skyframe/AbstractParallelEvaluator.java
+++ b/src/main/java/com/google/devtools/build/skyframe/AbstractParallelEvaluator.java
@@ -134,7 +134,8 @@
                     AbstractQueueVisitor.createExecutorService(
                         /*parallelism=*/ cpuHeavySkyKeysThreadPoolSize,
                         "skyframe-evaluator-cpu-heavy",
-                        /*useForkJoinPool=*/ false), // FJP resulted in a small regression.
+                        // FJP performs much better on machines with many cores.
+                        /*useForkJoinPool=*/ true),
                     /*failFastOnException=*/ true,
                     NodeEntryVisitor.NODE_ENTRY_VISITOR_ERROR_CLASSIFIER)
             : () ->