JS 正则表达式匹配相同的字符对

问题描述

我目前正在尝试将 wikitext 表格转换为 HTML。 (Parsoid 不是一种选择)

表格按以下格式编写。我想对代码进行正则表达式以提高速度,但我需要一种方法来捕获常见搜索词之间的文本。

{| class=\"wikitable\"
|-
|'''Ruler'''
|'''Stopwatch'''
|'''Magnifying Glass'''
|-
|[[File:Ruler30cmDiagonal.png|center|200px]]
|[[File:Stopwatch.png|center|200px]]
|[[File:MagnifyingGlass.png|center|200px]]
|-
|A ruler is a piece of '''equipment''' used to measure length.
|A scientist came '''equip''' with a [[stopwatch]].
|A magnifying glass is a useful piece of '''equipment''' for looking at very small things.
|}

从下面我需要匹配“|-”子字符串之间的文本并以“|}”结束

所以比赛将是

|'''Ruler'''
|'''Stopwatch'''
|'''Magnifying Glass'''

|A ruler is a piece of '''equipment''' used to measure length.
|A scientist came '''equip''' with a [[stopwatch]].
|A magnifying glass is a useful piece of '''equipment''' for looking at very small things.

|[[File:Ruler30cmDiagonal.png|center|200px]]
|[[File:Stopwatch.png|center|200px]]
|[[File:MagnifyingGlass.png|center|200px]]

如您所见,缺少“|”会导致并发症字符匹配需要通过字符对来完成。 (我还需要在稍后的匹配/替换调用中通过 '\n|' 进行匹配)

在这上面花了好几个小时,我知道我需要进行前瞻和回顾(使用或用于 |- 和 })。我认为 /((?=(\|\-))[.]*)(?!(\|\-|\|\}))/mg 是最有可能的候选人,但并不高兴。

有什么建议吗?

解决方法

我认为正则表达式非常适合这项任务。作为额外的好处,它比 Lex 和 Yacc 方法快得多。此代码使用多个正则表达式处理 wiki 文本的 html 呈现:

let input = `{| class=\"wikitable\"
|-
|'''Ruler'''
|'''Stopwatch'''
|'''Magnifying Glass'''
|-
|[[File:Ruler30cmDiagonal.png|center|200px]]
|[[File:Stopwatch.png|center|200px]]
|[[File:MagnifyingGlass.png|center|200px]]
|-
|A ruler is a piece of '''equipment''' used to measure length.
|A scientist came '''equip''' with a [[stopwatch]].
|A magnifying glass is a useful piece of '''equipment''' for looking at very small things.
|}`;

let classAttr = '';
let html = '<table>\n  ' + input
  .split(/[\r\n]+\|[\-\}]/)
  .filter((row,idx) => {
    if(idx === 0) {
      // class row on first line
      let m = row.match(/class=.?"([a-zA-Z_\- ]+)/);
      if(m) {
        // save the table class attribute for later use
        classAttr = ' class="' + m[1] + '"';
      }
      return false;
    } else if(row.length) {
      return true;
    }
    return false; // remove empty rows
  })
  .map((row) => {
    row = row
      .split(/[\r\n]+\|/)
      .filter((row,idx) => {
        if(idx === 0) {
          return false; // remove first empty item,not a cell
        }
        return true;
      })
      .map((cell) => {
        cell = '\n    <td> '
          + cell // do additional cell rendering as needed
          + ' </td>';
        return cell;
      })
      .join('');
    return '<tr>' + row + '\n  </tr>';
  })
  .join('\n  ') + '\n</table>';
// insert the table class attribute (if any)
html = html.replace(/(?<=<table)/,classAttr);

console.log(html);

结果:

<table class="wikitable">
  <tr>
    <td> '''Ruler''' </td>
    <td> '''Stopwatch''' </td>
    <td> '''Magnifying Glass''' </td>
  </tr>
  <tr>
    <td> [[File:Ruler30cmDiagonal.png|center|200px]] </td>
    <td> [[File:Stopwatch.png|center|200px]] </td>
    <td> [[File:MagnifyingGlass.png|center|200px]] </td>
  </tr>
  <tr>
    <td> A ruler is a piece of '''equipment''' used to measure length. </td>
    <td> A scientist came '''equip''' with a [[stopwatch]]. </td>
    <td> A magnifying glass is a useful piece of '''equipment''' for looking at very small things. </td>
  </tr>
</table>

请参阅 // do additional cell rendering as needed 注释,您可以在其中解决其他呈现问题,例如粗体文本和链接。