java获取/跳过python文件中的所有注释行

问题描述

我将如何以编程方式解析python文件并获取所有用三引号引起来的注释行，'''和通常注释行#，以便我可以跳过它们以加快解析时间？ / p>

解决方法

这是用Java 8编写的，用于解析Python3，但在其他Java和Python版本中也可以使用（可能需要进行一些调整）

--- JAVA CODE ---

位于文件顶部：

import java.io.*;
import java.util.*;

在您的主要方法中（不必进入主要方法，但是如果这是一个独立的java文件（即，如果没有其他.java文件调用此文件），则它将需要主要方法）：

String PathToPythonFileAsString="C:\\Users\\myUsername\\thisIsAnExamplePath\\pythonfile.py";
File pyFile = new File(PathToPythonFileAsString);

List<Integer> LineNumsThatAreInTripleQuotes = getLinesInTripleQuotes(pyFile);

Scanner scan = new Scanner(pyFile);
int CurrentLineNumber = 0;
while (scan.hasNext()) {
    String lineInCurrentPythonFile = scan.nextLine();
    CurrentLineNumber = CurrentLineNumber + 1;
    
    //skip these lines right away to speed up execution
    if (lineInCurrentPythonFile.contains("print(")) {
        continue;
    }

    //skip these lines right away to speed up parsing
    if (lineInCurrentPythonFile.contains("print") && lineInCurrentPythonFile.contains("(")) {
        continue;
    }

    //skip these lines right away to speed up parsing
    if (lineInCurrentPythonFile.contains("import ")) {
        continue;
    }

    //skip these lines right away to speed up parsing
    if (LineNumsThatAreInTripleQuotes.contains(CurrentLineNumber)) {
        continue;
    }


    //skip these lines right away to speed up parsing
    if (lineInCurrentPythonFile.contains("#")) {
        String lineWithBeginningWhitespaceTrimmed = lineInCurrentPythonFile.trim();
        if (lineWithBeginningWhitespaceTrimmed.length() > 0) {
            if (lineWithBeginningWhitespaceTrimmed.substring(0,1).equals("#")) {
                //line is a comment #
                continue;
            } else {
                //line CONTAINS a comment,but PART of the line is NOT a comment
                int PoundIdx = lineInCurrentPythonFile.indexOf("#");
                //remove the parts of the line that are a comment
                lineInCurrentPythonFile = lineInCurrentPythonFile.substring(0,PoundIdx);
            }
        } else {
            //line is all spaces
            continue;
        }
    }



    //now that the lines in triple quotes have been skipped,do stuff with the actual lines



}
scan.close();

getLinesInTripleQuotes方法（返回用三引号引起来的行号。）如果您希望它返回行本身而不是行号，请进行更改List<Integer>发生在下面的List<String>，并更改LinesThatAreInTripleQuotes.add以添加“ CurrentLine”而不是“ LineNum”。我发现使用行号更为可靠，因为有时存在重复文件中的行。

public static List<Integer> getLinesInTripleQuotes (File pyFile) throws FileNotFoundException {

    List<Integer> LinesThatAreInTripleQuotes = new ArrayList<>();
    boolean foundBeginning=false;
    boolean foundEnd=false;

    Scanner scan = new Scanner(pyFile);
    int lineNum = 0;
    while (scan.hasNext()) {
        String CurrentLine = scan.nextLine();
        lineNum = lineNum+1;
        boolean AddedThisLine = false;
        //System.out.println("CurrentLine: "+CurrentLine);
        if(CurrentLine.contains("'''") && foundBeginning!=true) {
            foundBeginning=true;
            LinesThatAreInTripleQuotes.add(lineNum);
            continue;
        }
        if(foundBeginning==true && foundEnd==false) {
            if(CurrentLine.contains("'''")) {
                foundEnd = true;
            } else {
                LinesThatAreInTripleQuotes.add(lineNum);
                AddedThisLine=true;
            }
        }
        if(foundBeginning==true && foundEnd==true) {
            //reset both so we can find the next triple-commented section
            foundBeginning=false;
            foundEnd=false;
        }
        //System.out.println("AddedThisLine: "+AddedThisLine+"\n");
    }
    return LinesThatAreInTripleQuotes;
}

java java parsing python quotes string-parsing