var map = {
key: long
- value
}
// block
{
long
- value
}
The newlines should be ignored inside the map literal, but not inside the block. A correct semicolon insertion for this should be:
var map = {
key: long // <-- no semicolon here
- value
};
// block
{
long; // <-- but there is one here
- value;
}
But the lexer doesn't know when curlies are maps and when they are blocks. The parser does, which means you could theoretically define the semicolon insertion rules in the grammar, which is what we'd likely have to do, but that makes it much more complex.
Ah OK, that makes sense. So Python doesn't have that issue because it doesn't overload braces for hashes and blocks.
Oil does overload them, but it has a separate lexing mode for statements and expressions. It switches when it sees say the 'var' keyword, so the right of var x = {a: 3} is lexed as an expression rather than a statement. This sort of fell out of compatibility constraints with shell, but ended up being pretty convenient and powerful.
The lexing mode relies pretty strongly on having the "first word", e.g. 'var' or 'func', Pascal-style. So yeah I can see how it would be more complex with Java-style syntax.
> So Python doesn't have that issue because it doesn't overload braces for hashes and blocks.
Yes, and also because lambdas in Python can only have expression bodies, not statements. That means you can never have a statement nested inside an expression. This is important because Python's rule to ignore all newlines between parentheses would fall apart if you could stuff a statement-bodied function inside a parenthesized expression.
Yes the lambda issue is something I ran into for Oil. Although this part of the language is deferred, the parser is implemented:
# Ruby/Rust style lambda, with the Python constraint that the body can only be an expression
var inc = |x| x + 1
# NOT possible in Oil because | isn't a distinct enough token to change the lexer mode
var inc = |x| {
return x + 1
}
# This is possible but I didn't implement it.
# "func" can change the lexer mode so { is known to start a block.
# In other words, { and } each have two distinct token IDs.
var inc = func(x) {
return x + 1
}
I think this is a decent tradeoff, but it's indeed tricky and something I probably spent too much time thinking about ... the perils of "familiar" syntax :-/
Unfortunately, no. That doesn't work. Consider:
The newlines should be ignored inside the map literal, but not inside the block. A correct semicolon insertion for this should be: But the lexer doesn't know when curlies are maps and when they are blocks. The parser does, which means you could theoretically define the semicolon insertion rules in the grammar, which is what we'd likely have to do, but that makes it much more complex.