如何使用 PEG.js 创建一个简单的解析器

问题描述

我想解析的语法如下：

# This is a comment

# This is a block. It starts with \begin{} and ends with \end{}
\begin{document}

# Within the document block other kinds of blocks can exist,and within them yet other kinds.
# Comments can exist anywhere in the code.

This is another block within the block. It is a paragraph,no formal \begin{} and \end{} are needed.
The parser infers its type as a ParagraphBlock. The block ends with the newline.

\end{document}

我正在学习如何使用 PEG，这是我迄今为止针对当前语法开发的：

Start
  = (Newline / Comment / DocumentBlock)*
  
Comment
  = '#' value: (!Newline .)* Newline? {
    return {
      type: "comment",value: value.map(y => y[1]).join('').trim()
    }
  } 
  
Newline
  = [\n\r\t]
  
DocumentBlock
  = "\\begin\{document\}"
  
    (!"\\end\{document\}" DocumentChildren)*
    
    "\\end\{document\}"
    
DocumentChildren
  = NewlineBlock / ParagraphBlock
    
NewlineBlock
  = value: Newline*
  {
    return {
      type: "newline",value: value.length
    }
  }
    
ParagraphBlock
  = (!Newline .)* Newline

我在无限循环方面遇到了一些问题。当前代码产生此错误：

Line 19,column 5: Possible infinite loop when parsing (repetition used with an expression that may not consume any input).

上述简单语法的正确实现是什么？

解决方法

我认为这是由于在 NewlineBlock 上使用 kleene 星的 Newline 规则。

在 DocumentBlock 中，您有一个重复的 DocumentChildren。在 NewlineBlock 中，您有一个重复的 Newline，这意味着它总是可以返回 ''，即空字符串，这会导致无限循环。

将 * 中的 NewlineBlock 更改为 + 可以解决问题。这样它就不再有返回空字符串的选项。

NewlineBlock
  = value: Newline+
  {
    return {
      type: "newline",value: value.length
    }
  }

javascript parsing parsing peg pegjs