问题描述
我编写了一个程序来比较包含字典中所有单词的两个文本文件(一个向前,一个向后)。这个想法是,当包含所有反向单词的文本文件与正向单词进行比较时,任何匹配都将表明这些单词可以正向和反向拼写,并将返回所有回文以及任何同时拼写一个单词的单词向后和向前。
该程序有效,我已经在三种不同的文件大小上对其进行了测试。第一组只包含两个词,仅用于测试目的。第二个包含 10,000 个英语单词(在每个文本文件中),第三个包含所有英语单词(~479k 个单词)。当我运行调用第一组文本文件的程序时,结果几乎是即时的。当我运行调用包含 10k 个单词的文本文件集的程序时,它需要几个小时。但是,当我运行包含最大文件(479k 字)的程序时,它运行了一天并且只返回了大约 30 个字,而它应该返回数千个字。它甚至没有完成,也远未完成(而且这是在相当不错的游戏 PC 上)。
我觉得这与我的代码有关。它一定是低效的。
我注意到两件事:
-
当我运行时:
cout << "token: " << *it << std::endl;
它永远在循环中无限运行并且永不停止。这会消耗处理能力吗? -
我注释掉了排序,因为我所有的数据都已经排序了。我注意到在我执行此操作后,运行 10,000 字文本文件的程序速度加快了。
但是,即使在执行这些操作之后,程序调用最大文本文件的速度似乎也没有真正的变化。有什么建议吗?我在这方面有点新。谢谢~
*如果您想要文本文件的副本,请告诉我,我很乐意上传它们。谢谢
#include <iostream>
#include <string>
#include <fstream>
#include <vector>
#include <iterator>
#include <algorithm>
#include <boost/tokenizer.hpp>
typedef boost::char_separator<char> separator_type;
using namespace std;
using namespace boost;
int main()
{
fstream file1; //fstream variable files
fstream file2; //fstream variable files
string dictionary1;
string dictionary2;
string words1;
string words2;
dictionary1 = "Dictionary.txt";
// dictionary1 = "Dictionarytenthousand.txt";
// dictionary1 = "Twoworddictionary.txt"; //this dictionary contains only two words separated by a comma as a test
dictionary2 = "Backwardsdictionary.txt";
// dictionary2 = "Backwardsdictionarytenthousand.txt";
// dictionary2 = "Backwardstwoworddictionary.txt"; //this dictionary contains only two words separated by a comma as a test
file1.open(dictionary1.c_str()); //opening Dictionary.txt
file2.open(dictionary2.c_str()); //opening Backwardsdictionary.txt
if (!file1)
{
cout << "Unable to open file1"; //terminate with error
exit(1);
}
if (!file2)
{
cout << "Unable to open file2"; //terminate with error
exit(1);
}
while (getline(file1,words1))
{
while (getline(file2,words2))
{
boost::tokenizer<separator_type> tokenizer1(words1,separator_type(",")); //separates string in Twoworddictionary.txt into individual words for compiler (comma as delimiter)
auto it = tokenizer1.begin();
while (it != tokenizer1.end())
{
std::cout << "token: " << *it << std::endl; //test to see if tokenizer works before program continues
vector<string> words1Vec; // vector to store Twoworddictionary.txt strings in
words1Vec.push_back(*it++); // adds elements dynamically onto the end of the vector
boost::tokenizer<separator_type> tokenizer2(words2,")); //separates string in Backwardstwoworddictionary.txt into individual words for compiler (comma as delimiter)
auto it2 = tokenizer2.begin();
while (it2 != tokenizer2.end())
{
std::cout << "token: " << *it2 << std::endl; //test to see if tokenizer works before program continues
vector<string> words2Vec; //vector to store Backwardstwoworddictionary.txt strings in
words2Vec.push_back(*it2++); //adds elements dynamically onto the end of the vector
vector<string> matchingwords(words1Vec.size() + words2Vec.size()); //vector to store elements from both dictionary text files (and ultimately to store the intersection of both,i.e. the matching words)
//sort(words1Vec.begin(),words1Vec.end()); //set intersection requires its inputs to be sorted
//sort(words2Vec.begin(),words2Vec.end()); //set intersection requires its inputs to be sorted
vector<string>::iterator it3 = set_intersection(words1Vec.begin(),words1Vec.end(),words2Vec.begin(),words2Vec.end(),matchingwords.begin()); //finds the matching words from both dictionaries
matchingwords.erase(it3,matchingwords.end());
for (vector<string>::iterator it4 = matchingwords.begin(); it4 < matchingwords.end(); ++it4) cout << *it4 << endl; // returns matching words
}
}
}
}
file1.close();
file2.close();
return 0;
}
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)