使用 nom 解析带有反斜杠转义单引号的单引号字符串

问题描述

这是 Parsing single-quoted string with escaped quotes with Nom 5Parse string with escaped single quotes 的变体。我想将像 '1 \' 2 \ 3 \\ 4'(原始字符序列)这样的字符串解析为 "1 \\' 2 \\ 3 \\\\ 4"一个 Rust 字符串),所以除了可能有 \' 之外,我不关心任何转义字符串里面。尝试使用链接问题中的代码

use nom::{
  branch::alt,bytes::complete::{escaped,tag},character::complete::none_of,combinator::recognize,multi::{many0,separated_list0},sequence::delimited,IResult,};

fn parse_quoted_1(input: &str) -> IResult<&str,&str> {
  delimited(
    tag("'"),alt((escaped(none_of("\\\'"),'\\',tag("'")),tag(""))),tag("'"),)(input)
}

fn parse_quoted_2(input: &str) -> IResult<&str,recognize(separated_list0(tag("\\'"),many0(none_of("'")))),)(input)
}

fn main() {
  println!("{:?}",parse_quoted_1(r#"'1'"#));
  println!("{:?}",parse_quoted_2(r#"'1'"#));
  println!("{:?}",parse_quoted_1(r#"'1 \' 2'"#));
  println!("{:?}",parse_quoted_2(r#"'1 \' 2'"#));
  println!("{:?}",parse_quoted_1(r#"'1 \' 2 \ 3'"#));
  println!("{:?}",parse_quoted_2(r#"'1 \' 2 \ 3'"#));
  println!("{:?}",parse_quoted_1(r#"'1 \' 2 \ 3 \\ 4'"#));
  println!("{:?}",parse_quoted_2(r#"'1 \' 2 \ 3 \\ 4'"#));
}

/*
Ok(("","1"))
Ok(("","1 \\' 2"))
Ok((" 2'","1 \\"))
Err(Error(Error { input: "1 \\' 2 \\ 3'",code: Tag }))
Ok((" 2 \\ 3'","1 \\"))
Err(Error(Error { input: "1 \\' 2 \\ 3 \\\\ 4'",code: Tag }))
Ok((" 2 \\ 3 \\\\ 4'","1 \\"))
*/

只有前 3 个案例按预期工作。

解决方法

一个不好/必须的解决方案:

use nom::{bytes::complete::take,character::complete::char,sequence::delimited,IResult};

fn parse_quoted(input: &str) -> IResult<&str,&str> {
  fn escaped(input: &str) -> IResult<&str,&str> {
    let mut pc = 0 as char;
    let mut n = 0;
    for (i,c) in input.chars().enumerate() {
      if c == '\'' && pc != '\\' {
        break;
      }
      pc = c;
      n = i + 1;
    }
    take(n)(input)
  }
  delimited(char('\''),escaped,char('\''))(input)
}

fn main() {
  println!("{:?}",parse_quoted(r#"'' ..."#));
  println!("{:?}",parse_quoted(r#"'1' ..."#));
  println!("{:?}",parse_quoted(r#"'1 \' 2' ..."#));
  println!("{:?}",parse_quoted(r#"'1 \' 2 \ 3' ..."#));
  println!("{:?}",parse_quoted(r#"'1 \' 2 \ 3 \\ 4' ..."#));
}

/*
Ok((" ...",""))
Ok((" ...","1"))
Ok((" ...","1 \\' 2"))
Ok((" ...","1 \\' 2 \\ 3"))
Ok((" ...","1 \\' 2 \\ 3 \\\\ 4"))
*/

为了允许 '...\\',我们可以类似地存储更多以前的字符:

    let mut pc = 0 as char;
    let mut ppc = 0 as char;
    let mut pppc = 0 as char;
    let mut n = 0;
    for (i,c) in input.chars().enumerate() {
      if (c == '\'' && pc != '\\') || (c == '\'' && pc == '\\' && ppc == '\\' && pppc != '\\') {
        break;
      }
      pppc = ppc;
      ppc = pc;
      pc = c;
      n = i + 1;
    }