问题描述
为我的项目 (find . -name "*.py" | xargs etags
) 创建 TAGS 文件后,我可以使用 M-.
跳转到函数的定义。那太棒了。但是如果我想要一个全局常量的定义——比如 x = 3
——Emacs 不知道在哪里可以找到它。
有什么方法可以向 Emacs 解释定义常量,而不仅仅是函数?对于在函数(或 for 循环或诸如此类)中定义的任何内容,我不需要它,只需要全局的。
更多细节
这个问题以前的化身使用“顶级”而不是“全局”,但在@Thomas 的帮助下,我意识到这是不精确的。我所说的全局定义是指模块定义的任何东西。因此在
import m
if m.foo:
def f():
x = 3
return x
y,z = 1,2
else:
def f():
x = 4
return x
y,z = 2,3
del(z)
模块定义的内容是 f
和 y
,尽管这些定义的站点向右缩进。 x
为局部变量,在模块结束前删除z
的定义。
我相信捕获所有全局赋值的足够规则是在 def
表达式中简单地忽略它们(注意 def
关键字本身可能在任何位置缩进level),否则解析 =
左边的任何符号(注意可能不止一个,因为 Python 支持元组赋值)。
解决方法
Etags 似乎无法为 Python 文件生成此类信息,您可以通过在简单的测试文件上运行它来轻松验证:
x = 3
def fun():
pass
运行 etags test.py
会生成一个包含以下内容的 TAGS 文件:
/tmp/test.py,13
def fun(3,7
如您所见,此文件中完全没有 x
,因此 Emacs 没有机会找到它。
调用 etags
的手册页通知我们有一个选项 --globals
:
--globals
Create tag entries for global variables in Perl and Makefile.
This is the default in C and derived languages.
然而,这似乎是文档与实现不同步的可悲案例之一,因为此选项似乎不存在。 (etags -h
也没有列出它,只有 --no-globals
- 可能是因为 --globals
是默认值,如上所述。)
然而,即使 --globals
是默认值,文档片段也表明它仅适用于 Perl、Makesfiles、C 和派生语言。我们可以通过创建另一个简单的测试文件来检查是否是这种情况,这次是针对 C:
int x = 3;
void fun() {
}
实际上,运行 etags test.c
会生成以下 TAGS 文件:
/tmp/test.c,26
int x 1,0
void fun(3,12
您看到 x
已被正确识别为 C。因此,对于 Python,etags
似乎根本不支持全局变量。
但是,由于 Python 使用空格,因此在源文件中识别全局变量定义并不太难——对于所有不以空格开头但包含 {{1} 的行,您基本上可以使用 grep
}} 符号(当然也有例外)。
因此,我编写了以下脚本来执行此操作,您可以将其用作 =
的直接替代品,因为它在内部调用 etags
:
etags
使用方便的名称将此脚本存储在您的 #!/bin/bash
# make sure that some input files are provided,or else there's
# nothing to parse
if [ $# -eq 0 ]; then
# the following message is just a copy of etags' error message
echo "$(basename ${0}): no input files specified."
echo " Try '$(basename ${0}) --help' for a complete list of options."
exit 1
fi
# extract all non-flag parameters as the actual filenames to consider
TAGS2="TAGS2"
argflags=($(etags -h | grep '^-' | sed 's/,.*$//' | grep ' ' | awk '{print $1}'))
files=()
skip=0
for arg in "${@}"; do
# the variable 'skip' signals arguments that should not be
# considered as filenames,even though they don't start with a
# hyphen
if [ ${skip} -eq 0 ]; then
# arguments that start with a hyphen are considered flags and
# thus not added to the 'files' array
if [ "${arg:0:1}" = '-' ]; then
if [ "${arg:0:9}" = "--output=" ]; then
TAGS2="${arg:9}2"
else
# however,since some flags take a parameter,we also
# check whether we should skip the next command line
# argument: the arguments for which this is the case are
# contained in 'argflags'
for argflag in ${argflags[@]}; do
if [ "${argflag}" = "${arg}" ]; then
# we need to skip the next 'arg',but in case the
# current flag is '-o' we should still look at the
# next 'arg' so as to update the path to the
# output file of our own parsing below
if [ "${arg}" = "-o" ]; then
# the next 'arg' will be etags' output file
skip=2
else
skip=1
fi
break
fi
done
fi
else
files+=("${arg}")
fi
else
# the current 'arg' is not an input file,but it may be the
# path to the etags output file
if [ "${skip}" = 2 ]; then
TAGS2="${arg}2"
fi
skip=0
fi
done
# create a separate TAGS file specifically for global variables
for file in "${files[@]}"; do
# find all lines that are not indented,are not comments or
# decorators,and contain a '=' character,then turn them into
# TAGS format,except that the filename is prepended
grep -P -Hbn '^[^[# \t].*=' "${file}" | sed -E 's/([0-9]+):([0-9]+):([^= \t]+)\s*=.*$/\3\x7f\1,\2/'
done |\
# count the bytes of each entry - this is needed for the TAGS
# specification
while read line; do
echo "$(echo $line | sed 's/^.*://' | wc -c):$line"
done |\
# turn the information above into the correct TAGS file format
awk -F: '
BEGIN { filename=""; numlines=0 }
{
if (filename != $2) {
if (numlines > 0) {
print "\x0c\n" filename "," bytes+1
for (i in lines) {
print lines[i]
delete lines[i]
}
}
filename=$2
numlines=0
bytes=0
}
lines[numlines++] = $3;
bytes += $1;
}
END {
if (numlines > 0) {
print "\x0c\n" filename "," bytes+1
for (i in lines)
print lines[i]
}
}' > "${TAGS2}"
# now run the actual etags,instructing it to include the global
# variables information
if ! etags -i "${TAGS2}" "${@}"; then
# if etags failed to create the TAGS file,also delete the TAGS2
# file
/bin/rm -f "${TAGS2}"
fi
上(我建议使用诸如 $PATH
之类的东西),然后像这样调用它:
etags+
除了创建 TAGS 文件之外,该脚本还为所有全局变量定义创建了一个 TAGS2 文件,并在原始 TAGS 文件中添加了一行引用后者。
从 Emacs 的角度来看,使用上没有区别。
,另一个答案只考虑没有缩进的行来包含全局变量声明。虽然这有效地排除了函数和类定义的主体,但它遗漏了 if
声明中定义的全局变量。这样的声明并不少见,例如,根据所使用的操作系统而不同的常量等。
正如在问题下的评论中所指出的,任何静态分析都必然是不完美的,因为 Python 的动态特性使得无法完全准确地决定哪些变量是全局定义的,除非程序实际执行。
因此,以下也只是一个近似值。但是,它确实考虑了上面列出的 if
中的全局变量定义。由于这最好通过实际分析源文件的解析树来完成,因此 bash 脚本不再是合适的选择。不过,方便的是,Python 本身允许通过此处使用的 ast
包轻松访问解析树。
from argparse import ArgumentParser,SUPPRESS
import ast
from collections import Counter
from re import match as re_startswith
import os
import subprocess
import sys
# extract variable information from assign statements
def process_assign(target,results):
if isinstance(target,ast.Name):
results.append((target.lineno,target.col_offset,target.id))
elif isinstance(target,ast.Tuple):
for child in ast.iter_child_nodes(target):
process_assign(child,results)
# extract variable information from delete statements
def process_delete(target,ast.Name):
results[:] = filter(lambda t: t[2] != target.id,results)
elif isinstance(target,ast.Tuple):
for child in ast.iter_child_nodes(target):
process_delete(child,results)
# recursively walk the parse tree of the source file
def process_node(node,results):
if isinstance(node,ast.Assign):
for target in node.targets:
process_assign(target,results)
elif isinstance(node,ast.Delete):
for target in node.targets:
process_delete(target,results)
elif type(node) not in [ast.FunctionDef,ast.ClassDef]:
for child in ast.iter_child_nodes(node):
process_node(child,results)
def get_arg_parser():
# create the parser to configure
parser = ArgumentParser(usage=SUPPRESS,add_help=False)
# run etags to find out about the supported command line parameters
dashlines = list(filter(lambda line: re_startswith('\\s*-',line),subprocess.check_output(['etags','-h'],encoding='utf-8').split('\n')))
# ignore lines that start with a dash but don't have the right
# indentation
most_common_indent = max([(v,k) for k,v in
Counter([line.index('-') for line in dashlines]).items()])[1]
arglines = filter(lambda line: line.index('-') == most_common_indent,dashlines)
for argline in arglines:
# the various 'argline' entries contain the command line
# arguments for etags,sometimes more than one separated by
# commas.
for arg in argline.split(','):
if 'or' in arg:
arg = arg[:arg.index('or')]
if ' ' in arg or '=' in arg:
arg = arg[:min(arg.index(' ') if ' ' in arg else len(arg),arg.index('=') if '=' in arg else len(arg))]
action='store'
else:
action='store_true'
arg = arg.strip()
if arg and not (arg == '-h' or arg == '--help'):
parser.add_argument(arg,action=action)
# we know we need files to run on
parser.add_argument('files',nargs='*',metavar='file')
# the parser is configured now to accept all of etags' arguments
return parser
if __name__ == '__main__':
# construct a parser for the command line arguments,unless
# -h/-help/--help is given in which case we just print the help
# screen
etags_args = sys.argv[1:]
if '-h' in etags_args or '-help' in etags_args or '--help' in etags_args:
unknown_args = True
else:
argparser = get_arg_parser()
known_ns,unknown_args = argparser.parse_known_args()
# if something's wrong with the command line arguments,print
# etags' help screen and exit
if unknown_args:
subprocess.run(['etags',encoding='utf-8')
sys.exit(1)
# we base the output filename on the TAGS file name. Other than
# that,we only care about the actual filenames to parse,and all
# other command line arguments are simply passed to etags later on
tags_file = 'TAGS2' if hasattr(known_ns,'o') is None else known_ns.o + '2'
filenames = known_ns.files
if filenames:
# TAGS file sections,one per source file
sections = []
# process all files to populate the 'sections' list
for filename in filenames:
# read source file
offsets = [0]; lines = []
offsets,lines = [0],[]
with open(filename,'r') as f:
for line in f.readlines():
offsets.append(offsets[-1] + len(bytes(line,'utf-8')))
lines.append(line)
offsets = offsets[:-1]
# parse source file
source = ''.join(lines)
root_node = ast.parse(source,filename)
# extract global variable definitions
vardefs = []
process_node(root_node,vardefs)
# create TAGS file section
sections.append("")
for lineno,column,varname in vardefs:
line = lines[lineno-1]
offset = offsets[lineno-1]
end = line.index('=') if '=' in line else -1
sections[-1] += f"{line[:end]}\x7f{varname}\x01{lineno},{offset + column - 1}\n"
# write TAGS file
with open(tags_file,'w') as f:
for filename,section in zip(filenames,sections):
if section:
f.write("\x0c\n")
f.write(filename)
f.write(",")
f.write(str(len(bytes(section,'utf-8'))))
f.write("\n")
f.write(section)
f.write("\n")
# make sure etags includes the newly created file
etags_args += ['-i',tags_file]
# now run the actual etags to take care of all other definitions
try:
cp = subprocess.run(['etags'] + etags_args,encoding='utf-8')
status = cp.returncode
except:
status = 1
# if etags did not finish successfully,remove the tags_file
if status != 0:
try:
os.remove(tags_file)
except FileNotFoundError:
# nothing to be removed
pass
与另一个答案一样,此脚本旨在替代标准 etags
,因为它在内部调用了后者。因此,它也接受所有 etags
' 命令行参数(但目前不尊重 -a
)。
建议用别名修改自己的shell的init文件,例如在~/.bashrc
中加入下面一行:
alias etags+=python3 -u /path/to/script.py
其中 /path/to/script.py
是保存上述代码的文件的路径。有了这样的别名,你可以简单地调用
etags+ /path/to/file
等