如何使用 PEG.js 创建一个简单的解析器

问题描述

我想解析的语法如下:

# This is a comment

# This is a block. It starts with \begin{} and ends with \end{}
\begin{document}

# Within the document block other kinds of blocks can exist,and within them yet other kinds.
# Comments can exist anywhere in the code.

This is another block within the block. It is a paragraph,no formal \begin{} and \end{} are needed.
The parser infers its type as a ParagraphBlock. The block ends with the newline.

\end{document}

我正在学习如何使用 PEG,这是我迄今为止针对当前语法开发的:

Start
  = (Newline / Comment / DocumentBlock)*
  
Comment
  = '#' value: (!Newline .)* Newline? {
    return {
      type: "comment",value: value.map(y => y[1]).join('').trim()
    }
  } 
  
Newline
  = [\n\r\t]
  
DocumentBlock
  = "\\begin\{document\}"
  
    (!"\\end\{document\}" DocumentChildren)*
    
    "\\end\{document\}"
    
DocumentChildren
  = NewlineBlock / ParagraphBlock
    
NewlineBlock
  = value: Newline*
  {
    return {
      type: "newline",value: value.length
    }
  }
    
ParagraphBlock
  = (!Newline .)* Newline

我在无限循环方面遇到了一些问题。当前代码产生此错误

Line 19,column 5: Possible infinite loop when parsing (repetition used with an expression that may not consume any input).

上述简单语法的正确实现是什么?

解决方法

我认为这是由于在 NewlineBlock 上使用 kleene 星的 Newline 规则。

DocumentBlock 中,您有一个重复的 DocumentChildren。在 NewlineBlock 中,您有一个重复的 Newline,这意味着它总是可以返回 '',即空字符串,这会导致无限循环。

* 中的 NewlineBlock 更改为 + 可以解决问题。这样它就不再有返回空字符串的选项。

NewlineBlock
  = value: Newline+
  {
    return {
      type: "newline",value: value.length
    }
  }