词法分析器将外部调用的txt文件代码的所有行标识为标识符而不是标记

问题描述

我正在编写一个词法分析器，它将从外部txt文件代码（文本）中识别标识符，运算符，整数和数据类型，但它不是逐个标记地识别令牌并对其进行识别，而是将每一行识别为标识符

[Image is output of python lexical analyzer code][1]

**Python code for a small lexical analyzer**

import re                                 

tokens = []                               
sample_code = open("file.txt","r")


for word in sample_code:

   
    if re.match("[a-z]",word) or re.match("[A-Z]",word):
        tokens.append(['IDENTIFIER',word])

    
    elif re.match(".[0-9]",word):
        if word[len(word) - 1] == ';': 
            tokens.append(["INTEGER",word[:-1]])
            tokens.append(['END_STATEMENT',';'])
        else: 
            tokwns.append(["INTEGER",word])
    
    
    elif word in ['str','int','bool']: 
        tokens.append(['DATATYPE',word])
    
    
    elif word in '*-/+%=':
        tokens.append(['OPERATOR',word])
    
   

print(tokens,'\n')

输出在屏幕截图中

file.txt中的文本（代码）

#Pythonprogramtofindthefactorialofanumberprovidedbytheuser.
num=7
factorial=1
# starts
ifnum<0:
print("Sorry,factorialdoesnotexistfornegativenumbers")
elifnum==0:
print("Thefactorialof0is1")
else:
foriinrange(1,num+1):
factorial=factorial*i
print("Thefactorialof",num,"is",factorial)

解决方法

您一次要遍历每行，而您一次应遍历一个符号。要一次读取一个符号，请首先在文件上使用.read方法（在其中使用open）将其获取为文本，然后使用.split方法将其按行分割：

sample_code = open("file.txt","r").read().split()

正则表达式中也有一些错误。

如果要获取一系列字母字符，请使用正则表达式"[a-zA-Z]+"。对于一系列数字，请使用正则表达式"[0-9]+"（实际上，它允许数字以零开头，因此您可能想使用"([1-9][0-9]*)|0"）。

lexical-analysis python