Use ForkJoinPool for the CPU-heavy executor.
Using FJP for the CPU-heavy pool yields better wall time than ThreadPoolExecutor on machines with many cores. The previous regression found in the 72-core machine is no longer present.
Benchmark results with flag value = HOST_CPUS showed:
* 12 cores: -12.37% CPU, -8.88% wall
* 16 cores: -5.82% CPU, +0.48% wall (~1s)
* 72 cores: -2.98% CPU, +1.61% wall (~1s, statistically insignificant)
PiperOrigin-RevId: 374620567
diff --git a/src/main/java/com/google/devtools/build/skyframe/AbstractParallelEvaluator.java b/src/main/java/com/google/devtools/build/skyframe/AbstractParallelEvaluator.java
index cdd12d4..e19fca7 100644
--- a/src/main/java/com/google/devtools/build/skyframe/AbstractParallelEvaluator.java
+++ b/src/main/java/com/google/devtools/build/skyframe/AbstractParallelEvaluator.java
@@ -134,7 +134,8 @@
AbstractQueueVisitor.createExecutorService(
/*parallelism=*/ cpuHeavySkyKeysThreadPoolSize,
"skyframe-evaluator-cpu-heavy",
- /*useForkJoinPool=*/ false), // FJP resulted in a small regression.
+ // FJP performs much better on machines with many cores.
+ /*useForkJoinPool=*/ true),
/*failFastOnException=*/ true,
NodeEntryVisitor.NODE_ENTRY_VISITOR_ERROR_CLASSIFIER)
: () ->