commit | 07b15e6d996609129c5bd42d7669519cd959e4d5 | [log] [tgz] |
---|---|---|
author | adonovan <adonovan@google.com> | Thu Apr 09 18:32:33 2020 -0700 |
committer | Copybara-Service <copybara-worker@google.com> | Thu Apr 09 18:34:00 2020 -0700 |
tree | 96024089ac25cf4120ae9c8aa1782e71b36c3107 | |
parent | 1cd84ecf25ba495b70b5601babc45423427c7c9c [diff] |
bazel syntax: fine-grained syntax locations This change improves the precision with which the locations of source tokens are recorded in the syntax tree. Prior to this change, every Node held a single LexerLocation object that recorded the start and end offsets of the node, plus a reference to the shared LineNumberTable (LNT), that maps these offsets to Locations. This had a cost of one reference and one LexerLocation object per node. This change causes every Node to record the offsets only of its salient tokens, plus a reference to the LNT. For example, in the expression "1 + 2", the only salient token is the plus operator; the start and end offsets can be computed inductively by delegating to x.getStartLocation and y.getEndLocation. Similarly, in f(x), the salient tokens are '(' and ')'. This has a cost of 1 word plus approximately 1 int per Node. Consequently, we can record the exact position of operators that fail, and do so using less memory than before. Now, when an expression such as 'f().g() + 1' fails, the location in the error message will refer to the '+' operator or one of the two '(' tokens. Before, all three errors would be wrongly reported at the same place: f, since it is the start of all three subexpressions. Overview: - Every Node has a reference to the LNT, set immediately after construction. (Morally it is part of the constructor but it's fussy to set it that way.) - Every node defines getStartOffset and getEndOffset, typically by delegating to its left and right subtrees. - Node end offsets are exclusive again. CL 170723732 was a mistake: half-open intervals are mathematically simpler. A client that wants to subtract one may do that. But there are none. - Comprehension.{For,If} are now true Nodes. - StarlarkFile's extent is now (correctly) the entire file, not just the range from the first statement to the last. - The parser provides offsets of salient tokens to the Node constructors. - IntegerLiteral now retains the raw token text in addition to the value. - Token is gone. Its four fields are now embedded in the Lexer. - Eval uses the following token positions in run-time error messages: x+y f(x) x[i] x.y x[i:j] k: v ^ ^ ^ ^ ^ ^ - Location is final. LexerLocation and LineAndColumn are gone. - Misparsed source represented as an Identifier now has the text of the source instead of "$error$". This is more faithful and causes the offsets to be correct. - The offsets of the orig Identifier in load("module", local="orig") coincide with the text 'orig', sans quotation marks. Benchmark: saves about 65MB (1% of live RAM) retained by the Usual Benchmark, a deps query. RELNOTES: N/A PiperOrigin-RevId: 305803031
{Fast, Correct} - Choose two
Build and test software of any size, quickly and reliably.
Speed up your builds and tests: Bazel rebuilds only what is necessary. With advanced local and distributed caching, optimized dependency analysis and parallel execution, you get fast and incremental builds.
One tool, multiple languages: Build and test Java, C++, Android, iOS, Go, and a wide variety of other language platforms. Bazel runs on Windows, macOS, and Linux.
Scalable: Bazel helps you scale your organization, codebase, and continuous integration solution. It handles codebases of any size, in multiple repositories or a huge monorepo.
Extensible to your needs: Easily add support for new languages and platforms with Bazel's familiar extension language. Share and re-use language rules written by the growing Bazel community.
Follow our tutorials:
See CONTRIBUTING.md