Diff - 07b15e6d996609129c5bd42d7669519cd959e4d5^! - bazel

commit	07b15e6d996609129c5bd42d7669519cd959e4d5	[log] [tgz]
author	adonovan <adonovan@google.com>	Thu Apr 09 18:32:33 2020 -0700
committer	Copybara-Service <copybara-worker@google.com>	Thu Apr 09 18:34:00 2020 -0700
tree	96024089ac25cf4120ae9c8aa1782e71b36c3107
parent	1cd84ecf25ba495b70b5601babc45423427c7c9c [diff] [blame]

bazel syntax: fine-grained syntax locations

This change improves the precision with which the locations
of source tokens are recorded in the syntax tree. Prior to
this change, every Node held a single LexerLocation object
that recorded the start and end offsets of the node, plus
a reference to the shared LineNumberTable (LNT), that maps
these offsets to Locations. This had a cost of one reference
and one LexerLocation object per node.

This change causes every Node to record the offsets only of
its salient tokens, plus a reference to the LNT. For example,
in the expression "1 + 2", the only salient token is the plus
operator; the start and end offsets can be computed inductively
by delegating to x.getStartLocation and y.getEndLocation.
Similarly, in f(x), the salient tokens are '(' and ')'.
This has a cost of 1 word plus approximately 1 int per Node.
Consequently, we can record the exact position of operators
that fail, and do so using less memory than before.

Now, when an expression such as 'f().g() + 1' fails,
the location in the error message will refer to the '+'
operator or one of the two '(' tokens. Before, all
three errors would be wrongly reported at the same place:
f, since it is the start of all three subexpressions.

Overview:
- Every Node has a reference to the LNT, set immediately
  after construction. (Morally it is part of the constructor
  but it's fussy to set it that way.)
- Every node defines getStartOffset and getEndOffset,
  typically by delegating to its left and right subtrees.
- Node end offsets are exclusive again. CL 170723732 was a mistake:
  half-open intervals are mathematically simpler.
  A client that wants to subtract one may do that.
  But there are none.
- Comprehension.{For,If} are now true Nodes.
- StarlarkFile's extent is now (correctly) the entire file,
  not just the range from the first statement to the last.
- The parser provides offsets of salient tokens to the Node constructors.
- IntegerLiteral now retains the raw token text in addition to the value.
- Token is gone. Its four fields are now embedded in the Lexer.
- Eval uses the following token positions in run-time error messages:

     x+y   f(x)   x[i]   x.y   x[i:j]   k: v
      ^     ^      ^      ^     ^        ^

- Location is final. LexerLocation and LineAndColumn are gone.
- Misparsed source represented as an Identifier now has the text of the
  source instead of "$error$". This is more faithful and causes
  the offsets to be correct.
- The offsets of the orig Identifier in load("module", local="orig")
  coincide with the text 'orig', sans quotation marks.

Benchmark: saves about 65MB (1% of live RAM) retained by the
Usual Benchmark, a deps query.

RELNOTES: N/A
PiperOrigin-RevId: 305803031

diff --git a/src/test/java/com/google/devtools/build/lib/analysis/CircularDependencyTest.java b/src/test/java/com/google/devtools/build/lib/analysis/CircularDependencyTest.java
index bb9fed6..77fdbda 100644
--- a/src/test/java/com/google/devtools/build/lib/analysis/CircularDependencyTest.java
+++ b/src/test/java/com/google/devtools/build/lib/analysis/CircularDependencyTest.java

@@ -98,7 +98,7 @@
       }
     }
     assertThat(foundEvent).isNotNull();
-    assertThat(foundEvent.getLocation().toString()).isEqualTo("/workspace/cycle/BUILD:3:1");
+    assertThat(foundEvent.getLocation().toString()).isEqualTo("/workspace/cycle/BUILD:3:14");
   }
 
   /**