为什么 expr 解析器只能解析它的第一项？

问题描述

package app
import scala.util.parsing.combinator._


class MyParser extends JavaTokenParsers {
  import MyParser._
  
  def expr =
    plus | sub | multi | divide | num
  
  def num = floatingPointNumber ^^ (x => Value(x.todouble).e)

  def plus = num ~ rep("+" ~> num) ^^ {
    case num ~ nums => nums.foldLeft(num.e) {
      (x,y) => Operation("+",x,y)
    }
  }

  def sub = num ~ rep("-" ~> num) ^^ {
    case num ~ nums => nums.foldLeft(num.e){
      (x,y) => Operation("-",y)
    }
  }

  def multi = num ~ rep("*" ~> num) ^^ {
    case num ~ nums => nums.foldLeft(num.e){
      (x,y) => Operation("*",y)
    }
  }

  def divide = num ~ rep("/" ~> num) ^^ {
    case num ~ nums => nums.foldLeft(num.e){
      (x,y) => Operation("/",y)
    }
  }
}

object MyParser {
  sealed trait Expr {
    def e = this.asInstanceOf[Expr]
    def compute: Double = this match {
      case Value(x) => x
      case Operation(op,left,right) => (op : @unchecked) match {
        case "+" => left.compute + right.compute
        case "-" => left.compute - right.compute
        case "*" => left.compute * right.compute
        case "/" => left.compute / right.compute
      }
    }
  }

  case class Value(x: Double) extends Expr
  case class Operation(op: String,left: Expr,right: Expr) extends Expr
}

我用它来解析表达式

package app

object Runner extends App {
  val p = new MyParser
  println(p.parseAll(p.expr,"1 * 11"))
}

打印

[1.3] failure: end of input expected

1 * 11
  ^

但是如果我解析表达式 1 + 11，它将成功解析它。

[1.7] parsed: Operation(+,Value(1.0),Value(11.0))

我可以通过 plus,multi,divide,num,sub 组合子解析一些东西，但只有 expr 组合子可以解析 or 组合子的第一项。那么为什么它只能解析 expr 解析器的第一项？以及如何更改解析器的定义以使解析成功？

解决方法

问题在于您使用的 rep 匹配零次或多次。

def rep[T](p: => Parser[T]): Parser[List[T]] = rep1(p) | success(List())

您需要使用 rep1 代替，这至少需要一个匹配项。

如果您将所有 rep 替换为 rep1，您的代码将起作用。

查看 scastie 上的更改

运行实验：

println(p.parseAll(p.expr,"1 + 11"))
println(p.parseAll(p.expr,"1 - 11"))
println(p.parseAll(p.expr,"1 * 11"))
println(p.parseAll(p.expr,"1 / 11"))

会发生什么？

[1.7] parsed: Operation(+,Value(1.0),Value(11.0))
[1.3] failure: end of input expected
1 - 11
  ^
[1.3] failure: end of input expected
1 * 11
  ^
[1.3] failure: end of input expected
1 / 11

+ 被消耗了，但其他一切都失败了。让我们更改 def expr 定义

  def expr =
    multi | plus | sub | divide | num

[1.3] failure: end of input expected
1 + 11
  ^
[1.3] failure: end of input expected
1 - 11
  ^
[1.7] parsed: Operation(*,Value(11.0))
[1.3] failure: end of input expected
1 / 11
  ^

通过将 multi 移到开头，* 案例通过，但 + 失败。

  def expr =
    num | multi | plus | sub | divide

[1.3] failure: end of input expected
1 + 11
  ^
[1.3] failure: end of input expected
1 - 11
  ^
[1.3] failure: end of input expected
1 * 11
  ^
[1.3] failure: end of input expected
1 / 11

以 num 作为第一种情况，一切都失败了。现在很明显这段代码

num | multi | plus | sub | divide

如果它的任何部分匹配，则不匹配，但仅当第一个匹配时才匹配。

文档对此有何评论？

   /** A parser combinator for alternative composition.
     *
     *  `p | q` succeeds if `p` succeeds or `q` succeeds.
     *   Note that `q` is only tried if `p`s failure is non-fatal (i.e.,back-tracking is allowed).
     *
     * @param q a parser that will be executed if `p` (this parser) fails (and allows back-tracking)
     * @return a `Parser` that returns the result of the first parser to succeed (out of `p` and `q`)
     *         The resulting parser succeeds if (and only if)
     *         - `p` succeeds,''or''
     *         - if `p` fails allowing back-tracking and `q` succeeds.
     */
  def | [U >: T](q: => Parser[U]): Parser[U] = append(q).named("|")

重要说明：必须允许回溯。如果不是，则无法匹配第一个解析器，将导致替代方案失败，而根本不会尝试第二个解析器。

如何让你的解析器回溯？好吧，您必须使用 PackratParsers，因为这是库中唯一支持回溯的解析器。或者重写您的代码，首先不要依赖回溯。

就我个人而言，我建议不要使用 Scala Parser Combinator，而是使用一个库，您可以在其中明确决定何时仍然可以回溯，何时不应允许，例如fastparse。

parser-combinators parsing parsing scala scala