-
Notifications
You must be signed in to change notification settings - Fork 46
Stop parsing Junk on lines which look like Entries #211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -70,7 +70,7 @@ | |
{ | ||
"type": "Junk", | ||
"annotations": [], | ||
"content": "mixed-case-callee = {Function()}\n" | ||
"content": "mixed-case-callee = {Function()}\n\n" | ||
}, | ||
{ | ||
"type": "Comment", | ||
|
@@ -88,7 +88,7 @@ | |
{ | ||
"type": "Junk", | ||
"annotations": [], | ||
"content": "variable-callee = {$variable()}\n" | ||
"content": "variable-callee = {$variable()}\n\n" | ||
}, | ||
{ | ||
"type": "GroupComment", | ||
|
@@ -323,7 +323,7 @@ | |
{ | ||
"type": "Junk", | ||
"annotations": [], | ||
"content": "shuffled-args = {FUN(1, x: 1, \"a\", y: \"Y\", msg)}\n" | ||
"content": "shuffled-args = {FUN(1, x: 1, \"a\", y: \"Y\", msg)}\n\n" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Asking about these ^^^ There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great question! I'm not happy with the answer, but the good news is that this PR will help in making this right. The reference fixtures in So what you're seeing in The actual output of the tooling parser includes those trailing blank lines. You can verify that in the Playground. The |
||
}, | ||
{ | ||
"type": "Comment", | ||
|
@@ -332,7 +332,7 @@ | |
{ | ||
"type": "Junk", | ||
"annotations": [], | ||
"content": "duplicate-named-args = {FUN(x: 1, x: \"X\")}\n" | ||
"content": "duplicate-named-args = {FUN(x: 1, x: \"X\")}\n\n\n" | ||
}, | ||
{ | ||
"type": "GroupComment", | ||
|
@@ -1063,7 +1063,17 @@ | |
{ | ||
"type": "Junk", | ||
"annotations": [], | ||
"content": "one-argument = {FUN(1,,)}\nmissing-arg = {FUN(,)}\nmissing-sparse-arg = {FUN( , )}\n" | ||
"content": "one-argument = {FUN(1,,)}\n" | ||
}, | ||
{ | ||
"type": "Junk", | ||
"annotations": [], | ||
"content": "missing-arg = {FUN(,)}\n" | ||
}, | ||
{ | ||
"type": "Junk", | ||
"annotations": [], | ||
"content": "missing-sparse-arg = {FUN( , )}\n\n\n" | ||
}, | ||
{ | ||
"type": "GroupComment", | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,21 @@ | ||
## Two adjacent Junks. | ||
err01 = {1x} | ||
err02 = {2x} | ||
|
||
# A single Junk. | ||
err03 = {1x | ||
2 | ||
|
||
# A single Junk. | ||
ą=Invalid identifier | ||
ć=Another one | ||
|
||
key01 = { | ||
# The COMMENT ends this junk. | ||
err04 = { | ||
# COMMENT | ||
|
||
# The COMMENT ends this junk. | ||
# The closing brace is a separate Junk. | ||
err04 = { | ||
# COMMENT | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,68 @@ | ||
{ | ||
"type": "Resource", | ||
"body": [ | ||
{ | ||
"type": "GroupComment", | ||
"content": "Two adjacent Junks." | ||
}, | ||
{ | ||
"type": "Junk", | ||
"annotations": [], | ||
"content": "err01 = {1x}\n" | ||
}, | ||
{ | ||
"type": "Junk", | ||
"annotations": [], | ||
"content": "err02 = {2x}\n\n" | ||
}, | ||
{ | ||
"type": "Comment", | ||
"content": "A single Junk." | ||
}, | ||
{ | ||
"type": "Junk", | ||
"annotations": [], | ||
"content": "err03 = {1x\n2\n\n" | ||
}, | ||
{ | ||
"type": "Comment", | ||
"content": "A single Junk." | ||
}, | ||
{ | ||
"type": "Junk", | ||
"annotations": [], | ||
"content": "ą=Invalid identifier\nć=Another one\n" | ||
"content": "ą=Invalid identifier\nć=Another one\n\n" | ||
}, | ||
{ | ||
"type": "Comment", | ||
"content": "The COMMENT ends this junk." | ||
}, | ||
{ | ||
"type": "Junk", | ||
"annotations": [], | ||
"content": "err04 = {\n" | ||
}, | ||
{ | ||
"type": "Comment", | ||
"content": "COMMENT" | ||
}, | ||
{ | ||
"type": "Comment", | ||
"content": "The COMMENT ends this junk.\nThe closing brace is a separate Junk." | ||
}, | ||
{ | ||
"type": "Junk", | ||
"annotations": [], | ||
"content": "err04 = {\n" | ||
}, | ||
{ | ||
"type": "Comment", | ||
"content": "COMMENT" | ||
}, | ||
{ | ||
"type": "Junk", | ||
"annotations": [], | ||
"content": "key01 = {\n" | ||
"content": "}\n" | ||
} | ||
] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This rendering to ebnf doesn't make any sense, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it does. We're using the EBNF syntax as defined in the XML spec, with an extension of allowing regexes in a few places. The XML one reads:
So
junk_line - "#"
matches a line of junk which doesn't start with a#
. I think that's exactly what we want to say here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me,
-
is defined usefully only on single character productions in the XML spec. Or literally using matches any string that matches A but does not match B,# foo\n
does matchjunkline
, but it doesn't match#
, so it's a junk line.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rephrase this please?
To me, the EBNF in this PR clearly expresses the intent. This is already a slippery slope because we're trying to define how to parse unparsed content. I don't want to overthink it. I'm also not sure how you'd like to write this differently. I could look at a PR if you'd like to prepare one :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just took the literal quote from the xml spec, and replaced
A
andB
withjunk_line
and#
, resp. And tested that against a candidate line line# foo\n
.I know that you don't believe in the value of the EBNF, and I don't care much either about this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you mean, thanks.
#
matches# foo\n
partially and that's enough for a negative lookahead to work here. I guess we could try to refactor this into something likesequence(and(not("#"), any_char), junk_line)
but it would require special handling of blank lines inside of junk. (any_char
doesn't parse newlines.) All in all, I favor the expressiveness of the approach I implemented in this PR.