为什么算法 set_intersection 只在两个文本文件以完全相同的顺序列出时返回匹配的字符串？

问题描述

我是 C++ 新手，如果我的格式不完美，请耐心等待。我正在编写一个名为回文的表亲的程序，该程序在字典中查找所有向后拼写另一个单词的单词（即狗，上帝）。它调用两个单独的文本文件。一个是字典向前，一个是字典向后。它的工作方式是，当倒序字典中的一个单词向前拼写另一个单词时，这意味着它向后和向前拼写了一个单词。

以狗这个词为例。在倒排字典中，狗原本是上帝，但因为现在是狗，它会与倒排字典中的狗这个词匹配，这会告诉我们这是一个可以创建倒排和向前拼写的单词.每个单词之间是一个逗号，用于标记。我正在使用 boost tokenizer 顺便说一句。

该程序可以工作，但不幸的是，如果每个文本文件中的单词的顺序完全相同，它只会返回匹配的单词。如果它们按任何其他顺序排列，则没有结果（即使存在匹配）。例如，它仅适用于两个文本文件以相同的顺序包含“狗，上帝”，而不是一个包含“狗，上帝”而另一个包含“上帝，狗”的情况。

这不仅仅是排序问题。即使所有单词与整个字典（作为文本文件）的完全相同（按字母顺序），而不仅仅是两个单词的字典，它也没有返回任何匹配项。

有谁知道我该如何解决这个问题？有关示例文本文件内容，请参阅本文末尾。谢谢

编辑：谢谢大家的帮助！我能够让程序运行。结果见下文！！

代码：

#include <iostream>
#include <string>
#include <fstream>
#include <vector>
#include <iterator>
#include <algorithm>
#include <boost/tokenizer.hpp>
typedef boost::char_separator<char> separator_type;

using namespace std;
using namespace boost;

int main()
{
    //fstream variable files
    fstream file1;
    fstream file2;
    string dictionary1;
    string dictionary2;
    string words1; 
    string words2;

    //filename of the files
    dictionary1 = "Twoworddictionary.txt"; 
    //this dictionary contains only two words separated by a comma as a test
    dictionary2 = "Backwardstwoworddictionary.txt"; 
    //this dictionary contains only two words separated by a comma as a test

    //opening files
    file1.open(dictionary1.c_str()); 
    //opening Twoworddictionary.txt
    file2.open(dictionary2.c_str()); 
    //opening Backwardsdictionary.txt

    if (!file1)
    {
        cout << "Unable to open file1"; 
        //terminate with error
        exit(1);
    }

    if (!file2)
    {
        cout << "Unable to open file2"; 
        //terminate with error
        exit(1);
    }

    while (getline(file1,words1))
    {
        while (getline(file2,words2))
        {
            boost::tokenizer<separator_type> tokenizer1(words1,separator_type(",")); 
            //separates string in Twoworddictionary.txt into individual words for compiler (comma as delimitar) 

            auto it = tokenizer1.begin();
            while (it != tokenizer1.end())
            {
                std::cout << "token: " << *it << std::endl; 
                //test to see if tokenizer works before program continues

                vector<string> words1Vec; 
                // vector to store Twoworddictionary.txt strings in
                words1Vec.push_back(*it++); 
                // adds elements dynamically onto the end of the vector 

                boost::tokenizer<separator_type> tokenizer2(words2,")); 
                //separates string in Backwardstwoworddictionary.txt into individual words for compiler (comma as delimitar) 

                auto it2 = tokenizer2.begin();
                while (it2 != tokenizer2.end())
                {
                    std::cout << "token: " << *it2 << std::endl; 
                    //test to see if tokenizer works before program continues

                    vector<string> words2Vec; 
                    // vector to store Backwardstwoworddictionary.txt strings in
                    words2Vec.push_back(*it2++); 
                    // adds elements dynamically onto the end of the vector 

                    vector<string> matchingwords(words1Vec.size() + words2Vec.size()); 
                    //vector to store elements from both dictionary text files (and ultimately to store the intersection of both,i.e. the matching words)

                    sort(words1Vec.begin(),words1Vec.end()); 
                    //set intersection requires its inputs to be sorted
                    sort(words2Vec.begin(),words2Vec.end()); 
                    //set intersection requires its inputs to be sorted

                    vector<string>::iterator it3 = set_intersection(words1Vec.begin(),words1Vec.end(),words2Vec.begin(),words2Vec.end(),matchingwords.begin()); 
                    // finds the matching words from both dictionaries

                    matchingwords.erase(it3,matchingwords.end());  

                    for (vector<string>::iterator it4 = matchingwords.begin(); it4 < matchingwords.end(); ++it4) cout << *it4 << endl; 
                    // returns matching words

                    //Uh-oh. It only matches when they're in the same order (i.e. dog,fig in both text files)
                }
            }
        }
    }

    file1.close();
    file2.close();

    return 0;
}

这是我在文本文件中的单词顺序相同时得到的输出：

token: dog
token: dog
dog
token:  god
token:  god
token: dog
token:  god
 god

这是当每个文本文件中的单词顺序不同时得到的输出：

token: fig
token: dog
token:  fig
token:  dog
token: dog
token:  fig

Twoworddictionary.txt 包含：

dog,god

Backwardstwoworddictionary.txt 包含：

god,dog

Dictionary.txt 包含（10 个字）： *即使单词已排序，也不会返回结果。

gnome,gnu,go,goal,goals,goat,god,gods,goes,going,

palindrome's Cousin 结果 - 10,000 字

a、aa、aaa、ab、ac、ad、ada、ae、af、ag、年龄、ah、ai、目标、aj、ak、 aka,al,ala,am,an,ana,and,anna,ap,ar,are,as,at,ata,av,ave、avon、aw、az、b、ba、bat、bb、bc、bd、bg、bk、bl、bm、bo、bob、 boob,bp,br,bs,bt,bus,but,bw,c,ca,cam,cap,cb,cc,cd,ce,cf、cfr、cg、ch、ci、civic、cj、cl、cm、cn、co、cod、col、corp、cos、 cp，cpu，cr，掷骰子，cs，ct，cu，cv，cw，d，da，爸爸，dam，das，db，dc， dd、de、deer、def、del、dem、der、devil、df、dg、dh、di、dial、did、暗淡，目录，div，dj，dl，dm，dna，doc，dod，狗，dom，厄运，dp，dr，绘制，ds，dt，dts，dvd，e，ea，ec，ed，edit，ee，ef，例如，eh，el，em， en,ep,er,era,erp,es,et,ev,eva,eve,evil,eye,f,fa,fc,fd,fe、fed、ff、fi、fig、fl、流量、fm、fo、fp、fr、fs、ft、g、ga、gb、gc、 gd，ge，gel，gg，gif，gig，gis，gl，gm，go，god，gp，gr，gs，gsm，h， ha,hc,hd,he,hh,ho,hp,hr,hs,ht,hu,i,ia,ic,id,if,ii,iii、il、im、in、ip、ir、irs、is、isp、it、iv、ix、j、ja、jc、jd、jj、 jm,jp,jr,k,ka,kb,ko,ks,l,la,laid,lap,law,lb,lc,ld,le,led、leg、let、level、lf、lg、li、lil、lit、live、live、ll、lm、ln、 lo，loc，lol，loop，los，lp，ls，lu，m，ma，mac，mad，男人，地图，地图， mar,mas,mb,mc,md,me,med,mem,mf,mg,mi,mia,mid,mit,mj,ml,mm、mn、mo、mod、妈妈、心情、mp、mr、ms、msg、mt、mu、mw、n、na、nam、 nat,nav,nc,ne,net,ni,nl,nm,nn,no,non,noon,nor,not,nov,新星，现在，np，nr，ns，nt，nu，nw，ny，o，ob，oc，of，og，哦，好的，ol， om,on,oo,ooo,op,or,os,ot,p,pa,pac,pal,pam,par,part,零件,pas,pat,pb,pc,pct,pd,pe,per,pets,pf,pg,pgp,ph,PHP,pi,pit,pj,pl,pm,pn,po,pool,pop,pot,pp,pr,pre,proc,ps,psi，psp，pt，q，r，ra，雷达，ram，rap，rat，rats，raw，rb，rc，rd， re,red,reed,参考,rep,res,rev,rf,rfc,rg,rh,ri,rid,rj,rm,rn,ro,ron,rp,rr,rs,rt,ru,rw,s,sa,sad,sam,sap,sas,锯,sb、sc、sd、se、sees、sega、ser、sf、sg、sh、si、sig、sk、sl、sm、sms、 sn,so,soc,sol,sp,spam,sparc,spot,sr,sri,ss,st,star,统计、标准、步骤、停止、表带、su、sub、sv、sw、sys、t、ta、tab、tan、水龙头，焦油，tb，tc，tcp，td，te，tel，十，tf，tft，th，ti，潮汐，til，蒂姆，小费，山雀，tm，tn，to，ton，top，tops，tp，tr，trap，ts，tt，tu，浴缸，电视，你，UC，呃，ul，嗯，un，upc，你，我们，ut，uw，v，va，van，vc， ve,ver,vi,vid,von,vs,vt,w,wa,wal,war,ward,was,wb,wc,wm,wn，狼，赢了，哇，wr，ws，wu，ww，www，x，xanax，xi，xx，xxx，y，yn， z,z

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

boost c++set-intersection visual-studio