问题描述
我想解析的语法如下:
# This is a comment
# This is a block. It starts with \begin{} and ends with \end{}
\begin{document}
# Within the document block other kinds of blocks can exist,and within them yet other kinds.
# Comments can exist anywhere in the code.
This is another block within the block. It is a paragraph,no formal \begin{} and \end{} are needed.
The parser infers its type as a ParagraphBlock. The block ends with the newline.
\end{document}
我正在学习如何使用 PEG,这是我迄今为止针对当前语法开发的:
Start
= (Newline / Comment / DocumentBlock)*
Comment
= '#' value: (!Newline .)* Newline? {
return {
type: "comment",value: value.map(y => y[1]).join('').trim()
}
}
Newline
= [\n\r\t]
DocumentBlock
= "\\begin\{document\}"
(!"\\end\{document\}" DocumentChildren)*
"\\end\{document\}"
DocumentChildren
= NewlineBlock / ParagraphBlock
NewlineBlock
= value: Newline*
{
return {
type: "newline",value: value.length
}
}
ParagraphBlock
= (!Newline .)* Newline
Line 19,column 5: Possible infinite loop when parsing (repetition used with an expression that may not consume any input).
上述简单语法的正确实现是什么?
解决方法
我认为这是由于在 NewlineBlock
上使用 kleene 星的 Newline
规则。
在 DocumentBlock
中,您有一个重复的 DocumentChildren
。在 NewlineBlock
中,您有一个重复的 Newline
,这意味着它总是可以返回 ''
,即空字符串,这会导致无限循环。
将 *
中的 NewlineBlock
更改为 +
可以解决问题。这样它就不再有返回空字符串的选项。
NewlineBlock
= value: Newline+
{
return {
type: "newline",value: value.length
}
}