我的程序返回两个包含 479k 字的文本文件的 set_intersection 值真的很慢是我的代码吗?

问题描述

我编写了一个程序来比较包含字典中所有单词的两个文本文件(一个向前,一个向后)。这个想法是,当包含所有反向单词的文本文件与正向单词进行比较时,任何匹配都将表明这些单词可以正向和反向拼写,并将返回所有回文以及任何同时拼写一个单词的单词向后和向前。

该程序有效,我已经在三种不同的文件大小上对其进行了测试。第一组只包含两个词,仅用于测试目的。第二个包含 10,000 个英语单词(在每个文本文件中),第三个包含所有英语单词(~479k 个单词)。当我运行调用第一组文本文件的程序时,结果几乎是即时的。当我运行调用包含 10k 个单词的文本文件集的程序时,它需要几个小时。但是,当我运行包含最大文件(479k 字)的程序时,它运行了一天并且只返回了大约 30 个字,而它应该返回数千个字。它甚至没有完成,也远未完成(而且这是在相当不错的游戏 PC 上)。

我觉得这与我的代码有关。它一定是低效的。

我注意到两件事:

  1. 当我运行时:cout << "token: " << *it << std::endl; 它永远在循环中无限运行并且永不停止。这会消耗处理能力吗?

  2. 我注释掉了排序,因为我所有的数据都已经排序了。我注意到在我执行此操作后,运行 10,000 字文本文件的程序速度加快了。

但是,即使在执行这些操作之后,程序调用最大文本文件的速度似乎也没有真正的变化。有什么建议吗?我在这方面有点新。谢谢~

*如果您想要文本文件的副本,请告诉我,我很乐意上传它们。谢谢

#include <iostream>
#include <string>
#include <fstream>
#include <vector>
#include <iterator>
#include <algorithm>
#include <boost/tokenizer.hpp>
typedef boost::char_separator<char> separator_type;

using namespace std;
using namespace boost;

int main()
{
    fstream file1; //fstream variable files
    fstream file2; //fstream variable files
    string dictionary1;
    string dictionary2;
    string words1; 
    string words2;

    dictionary1 = "Dictionary.txt";
    // dictionary1 = "Dictionarytenthousand.txt";
    // dictionary1 = "Twoworddictionary.txt"; //this dictionary contains only two words separated by a comma as a test 
    dictionary2 = "Backwardsdictionary.txt";
    // dictionary2 = "Backwardsdictionarytenthousand.txt";
    // dictionary2 = "Backwardstwoworddictionary.txt"; //this dictionary contains only two words separated by a comma as a test
    
    file1.open(dictionary1.c_str()); //opening Dictionary.txt
    file2.open(dictionary2.c_str()); //opening Backwardsdictionary.txt

    if (!file1)
    {
        cout << "Unable to open file1"; //terminate with error
        exit(1);
    }

    if (!file2)
    {
        cout << "Unable to open file2"; //terminate with error
        exit(1);
    }

    while (getline(file1,words1))
    {
        while (getline(file2,words2))
        {
            boost::tokenizer<separator_type> tokenizer1(words1,separator_type(",")); //separates string in Twoworddictionary.txt into individual words for compiler (comma as delimiter)            

            auto it = tokenizer1.begin();
            while (it != tokenizer1.end())
            {
                std::cout << "token: " << *it << std::endl; //test to see if tokenizer works before program continues              

                vector<string> words1Vec; // vector to store Twoworddictionary.txt strings in              
                words1Vec.push_back(*it++); // adds elements dynamically onto the end of the vector 
                
                boost::tokenizer<separator_type> tokenizer2(words2,")); //separates string in Backwardstwoworddictionary.txt into individual words for compiler (comma as delimiter) 
               
                auto it2 = tokenizer2.begin();
                while (it2 != tokenizer2.end())
                {
                    std::cout << "token: " << *it2 << std::endl; //test to see if tokenizer works before program continues
                    
                    vector<string> words2Vec; //vector to store Backwardstwoworddictionary.txt strings in                 
                    words2Vec.push_back(*it2++); //adds elements dynamically onto the end of the vector 
                    
                    vector<string> matchingwords(words1Vec.size() + words2Vec.size()); //vector to store elements from both dictionary text files (and ultimately to store the intersection of both,i.e. the matching words)                   

                    //sort(words1Vec.begin(),words1Vec.end()); //set intersection requires its inputs to be sorted                 
                    //sort(words2Vec.begin(),words2Vec.end()); //set intersection requires its inputs to be sorted
                    
                    vector<string>::iterator it3 = set_intersection(words1Vec.begin(),words1Vec.end(),words2Vec.begin(),words2Vec.end(),matchingwords.begin()); //finds the matching words from both dictionaries                 

                    matchingwords.erase(it3,matchingwords.end());  

                    for (vector<string>::iterator it4 = matchingwords.begin(); it4 < matchingwords.end(); ++it4) cout << *it4 << endl; // returns matching words                                    
                }
            }
        }
    }

    file1.close();
    file2.close();

    return 0;
}

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)