问题描述
在一些插件的帮助下,我得到了一个包含科学文章信息的 .bib 文件。有时事实证明相同的键出现在不同的记录中。
例如:
@inproceedings{Hosseini_2016,doi = {10.1109/ism.2016.0028},url = {https://doi.org/10.1109%2Fism.2016.0028},year = 2016,month = {dec},publisher = {{IEEE}},author = {Mohammad Hosseini and Viswanathan Swaminathan},title = {Adaptive 360 {VR} Video Streaming: Divide and Conquer},booktitle = {2016 {IEEE} International Symposium on Multimedia ({ISM})}
}
@inproceedings{Hosseini_2016,doi = {10.1109/ism.2016.0093},url = {https://doi.org/10.1109%2Fism.2016.0093},title = {Adaptive 360 {VR} Video Streaming Based on {MPEG}-{DASH} {SRD}},booktitle = {2016 {IEEE} International Symposium on Multimedia ({ISM})}
我正在使用 pybtex 库来解析文件。该库会忽略具有相同键的重复条目。在使用这个库之前,我需要以某种方式处理文件,以便其中的所有键都不同。我该怎么做?
解决方法
我决定使用正则表达式。可能有更方便的解决方案。我只是用 nanoid 替换了键。
from nanoid import generate
def process_bibtex(fn):
with open(fn,encoding="utf-8") as r_file:
bibtex = r_file.read()
pattern = r"@([\w\W]+?){([\w\W0-9_\-]+?),"
def callback(matchobj):
return f"@{matchobj.group(1)}{{{generate()},"
with open(fn,"w",encoding="utf-8") as w_file:
w_file.write(re.sub(pattern,callback,bibtex))