问题描述
我目前正在尝试将 wikitext 表格转换为 HTML。 (Parsoid 不是一种选择)
表格按以下格式编写。我想对代码进行正则表达式以提高速度,但我需要一种方法来捕获常见搜索词之间的文本。
{| class=\"wikitable\"
|-
|'''Ruler'''
|'''Stopwatch'''
|'''Magnifying Glass'''
|-
|[[File:Ruler30cmDiagonal.png|center|200px]]
|[[File:Stopwatch.png|center|200px]]
|[[File:MagnifyingGlass.png|center|200px]]
|-
|A ruler is a piece of '''equipment''' used to measure length.
|A scientist came '''equip''' with a [[stopwatch]].
|A magnifying glass is a useful piece of '''equipment''' for looking at very small things.
|}
从下面我需要匹配“|-”子字符串之间的文本并以“|}”结束
所以比赛将是
|'''Ruler'''
|'''Stopwatch'''
|'''Magnifying Glass'''
和
|A ruler is a piece of '''equipment''' used to measure length.
|A scientist came '''equip''' with a [[stopwatch]].
|A magnifying glass is a useful piece of '''equipment''' for looking at very small things.
和
|[[File:Ruler30cmDiagonal.png|center|200px]]
|[[File:Stopwatch.png|center|200px]]
|[[File:MagnifyingGlass.png|center|200px]]
如您所见,缺少“|”会导致并发症字符匹配需要通过字符对来完成。 (我还需要在稍后的匹配/替换调用中通过 '\n|' 进行匹配)
在这上面花了好几个小时,我知道我需要进行前瞻和回顾(使用或用于 |- 和 })。我认为 /((?=(\|\-))[.]*)(?!(\|\-|\|\}))/mg
是最有可能的候选人,但并不高兴。
有什么建议吗?
解决方法
我认为正则表达式非常适合这项任务。作为额外的好处,它比 Lex 和 Yacc 方法快得多。此代码使用多个正则表达式处理 wiki 文本的 html 呈现:
let input = `{| class=\"wikitable\"
|-
|'''Ruler'''
|'''Stopwatch'''
|'''Magnifying Glass'''
|-
|[[File:Ruler30cmDiagonal.png|center|200px]]
|[[File:Stopwatch.png|center|200px]]
|[[File:MagnifyingGlass.png|center|200px]]
|-
|A ruler is a piece of '''equipment''' used to measure length.
|A scientist came '''equip''' with a [[stopwatch]].
|A magnifying glass is a useful piece of '''equipment''' for looking at very small things.
|}`;
let classAttr = '';
let html = '<table>\n ' + input
.split(/[\r\n]+\|[\-\}]/)
.filter((row,idx) => {
if(idx === 0) {
// class row on first line
let m = row.match(/class=.?"([a-zA-Z_\- ]+)/);
if(m) {
// save the table class attribute for later use
classAttr = ' class="' + m[1] + '"';
}
return false;
} else if(row.length) {
return true;
}
return false; // remove empty rows
})
.map((row) => {
row = row
.split(/[\r\n]+\|/)
.filter((row,idx) => {
if(idx === 0) {
return false; // remove first empty item,not a cell
}
return true;
})
.map((cell) => {
cell = '\n <td> '
+ cell // do additional cell rendering as needed
+ ' </td>';
return cell;
})
.join('');
return '<tr>' + row + '\n </tr>';
})
.join('\n ') + '\n</table>';
// insert the table class attribute (if any)
html = html.replace(/(?<=<table)/,classAttr);
console.log(html);
结果:
<table class="wikitable">
<tr>
<td> '''Ruler''' </td>
<td> '''Stopwatch''' </td>
<td> '''Magnifying Glass''' </td>
</tr>
<tr>
<td> [[File:Ruler30cmDiagonal.png|center|200px]] </td>
<td> [[File:Stopwatch.png|center|200px]] </td>
<td> [[File:MagnifyingGlass.png|center|200px]] </td>
</tr>
<tr>
<td> A ruler is a piece of '''equipment''' used to measure length. </td>
<td> A scientist came '''equip''' with a [[stopwatch]]. </td>
<td> A magnifying glass is a useful piece of '''equipment''' for looking at very small things. </td>
</tr>
</table>
请参阅 // do additional cell rendering as needed
注释,您可以在其中解决其他呈现问题,例如粗体文本和链接。