如何以正确的方式编写字典文件? 霍夫曼编码器

问题描述

我正在研究c ++ Huffman编码器。主要功能有两个:压缩和解压缩。

“压缩”功能读取输入文件,计数频率,生成霍夫曼树,将字典写入单独的文件,对输入文件进行编码并将其保存到输出文件

“解压缩”功能读取编码的文件,读取字典文件,并用适当的字符替换代码

一个问题:有时会有\ n,\ t,\ r等字符,并且字典中的coource字符不会出现。插入后,这些字符的动作仅在字典文件中执行,看起来像这样:

я011000010

110011 // here it should be a newline character,not a newline itself!

110100
 111 // here it should be a space character,not a space itself!
'00010100,100110
.11001000
:011000011
;011000100
?011000101
A11001001
B00010101
I11001010
S00010110
T11001011
W00010111
Y011000110
a10110
b000100
c110101
d10010
e010

例如,由于第6行不被视为“ [space]:111”,而是被视为“ 1:11”,从而导致解码错误

UPD:代码

 // standard libraries
#include <iostream> // input / output operations
#include <fstream> // operatioons with file streams
#include <string> // char strings
#include <vector> //operations with vectors
#include <map> //operations with maps
#include <list> //operations with lists


// my own header files
#include "functions.h" // declarations of functions


/*This is for convertin string to vector 
(I need it in function DECOMPRESS)*/
auto stringToVector(string&& s) {

    vector<bool> v;//vector we want to get after converting

    for (auto i : s) //for char in string
        v.push_back(i == '1');//add it to a vector as a separate element
    
    return v;//Our final vector!
}


vector<bool> code;// Binary codes of chars
map<char,vector<bool> > dict;// associative array of charater and its binary code

Node* root;// pointer at the top of the tree

list<Node*> nodePointersList;//list of pointers at nodes(which have freq and char taken from the map of frequencies)


//Create pairs "Character-Code" function
void DictionaryGenerator(Node* root)
{
    if (root->left != NULL)//If it is on the left side
    {
        code.push_back(0);//Then we write '0'
        DictionaryGenerator(root->left);//Again - start recursion function BuildTable but Now for left child
    }

    if (root->right != NULL)//If it is on the right side
    {
        code.push_back(1);//Then we write '1'
        DictionaryGenerator(root->right);//Again - start recursion function BuildTable but Now for right child
    }

    /*If we reached the character we have to assign a binary code to it:*/
    if (root->left == NULL && root->right == NULL) dict[root->character] = code;


    if (!code.empty()) {
        code.pop_back();//code -1
    }

    //After this func execution I'll write the dict table to a file

}


void compress(string In,string Out,string Dict)
{

    /*=======================================================================================
     * Here we count the frequency of each char in file using associative array "MAP"
     *=====================================================================================*/

    ifstream inFile(In,ios::in | ios::binary);//Open file for binary reading

    map <char,int> freqTable;//associative array where 1st element is a character,2nd - its frequency

    while (!inFile.eof())//While Not end of file
    {
        char characterToRead = inFile.get();//reading file with spaces


        freqTable[characterToRead]++;//add 1 char to frequencies table
        //It just assigns and increments the counter to the found char
    }



    /*=======================================================================================
     * Here we write start nodes to our list
     *=====================================================================================*/

    /**From begin of Map to the end of MaP*/
    for (map <char,int>::iterator itr = freqTable.begin(); itr != freqTable.end(); ++itr)
    {
        /*Put elements FROM map INTO our list as nodes*/

        Node* p = new Node;//Create pointer at node
        p->character = itr->first;//Field of class NODE "character" becomes first 
        p->freq = itr->second;//Field of class NODE "frequency" becomes second 
        nodePointersList.push_back(p);//add this pointer (which include char & frequency) to our list
    }



    /*=======================================================================================
     * Here we create the Huffman's tree
     *=====================================================================================*/

    while (nodePointersList.size() != 1)//While in our list is NOT 1 element
    {
        /*The last iteration of loop will give us one single element - 
        root of the tree*/

        nodePointersList.sort(compare());//sort list using compare structure

        /*We need 2 nodes to sum them:*/
        Node* leftChild = nodePointersList.front();//GET THE FirsT ELEMENT from the beginning of nodePointersList
        nodePointersList.pop_front();//Delete copied element from nodePointersList
        Node* rightChild = nodePointersList.front();//GET THE NEXT ELEMENT from the beginning of nodePointersList
        nodePointersList.pop_front();//Delete copied element from nodePointersList

        /*Here we sum those 2 elements:*/
        Node* Parent = new Node(leftChild,rightChild);//Create a parent (childrens' frequencies will be added) 

        nodePointersList.push_back(Parent); //and put it on the list as a new node.

        //Now go to sort again
        //until we have just one root element
    }


    /*Here we create a variable for that root. It is a pointer at the top of the tree (Node* root) */
    root = nodePointersList.front();

    /*==================== End of creating the Huffman's tree  ============================*/




    /*=======================================================================================
     * Here we write our dictionary to the dictionary file
     *=====================================================================================*/

    //First Create a dictionary
    DictionaryGenerator(root);//Create pairs "Character-Code". Other words - Huffman Dictionary

    //Then write it to a file
    ofstream DictFile(Dict,ios::out | ios::binary);//open file in binary mode
    DictFile.clear();

    //cout << "Dictionary size: " << dict.size() << endl;

    /*   --------------------------- TYPE---ITER---FROM BEGIN--------TO_END-----INCREMENT ITR*/
    /*     first     second   */
    for (map<char,vector<bool> >::iterator ii = dict.begin(); ii != dict.end(); ++ii) {

        /*          (*ii) is an iterator; .first means char*/
        DictFile << (*ii).first; //Prints ii-th char from the map


        /*                     (*ii) is an iterator; .second means vector<bool>*/
        vector <bool> inVect = (*ii).second; //assingns ii-th vector from map to a local vector

        for (unsigned j = 0; j < inVect.size(); j++) {// Prints vector for the current ii-th char
            DictFile << inVect[j]; 
        }
        DictFile << endl;
    }

    DictFile.close();// Close the DICTIONARY file



    /*=======================================================================================
     * Here we create & write our binary (Compressed) code to Output file
     *=====================================================================================*/

    inFile.clear();
    inFile.seekg(0); //switch our cursor to the beginning of the INPUT file (to read it from the beginning)

    ofstream outFile(Out,ios::out | ios::binary);//open OUTPUT file for writting in binary mode
    
    int count = 0; // Counter in range 0-8 to count when the new byte starts
    char buf = 0; // Here we store our 8 bits which were set during 1 counter loop (0-8) 

    while (!inFile.eof()) // While not end of INPUT file (We read the input file)
    {
        char c = inFile.get(); //Take char from the file

        

        vector<bool> x = dict[c]; //look for its binary code in the dictionary

        /*This is how I “pack” my stream of zeros and ones into bytes*/
        for (int n = 0; n < x.size(); n++)
        {
            /* Write code into the Buffer variable. At the beginning buf=00000000 and then with
            each new iteration we copy the value from the dictionary to our buffer (using binary 
            addition)*/
            /*               << means SHIFT element by (7-count). We do that in oder to
            read bits correctly left to right*/
            buf = buf | x[n] << (7 - count);// '|' means binary addition; x[n] is our current bit
            count++;

            /* if we passed 8 bit,that means we got 1 byte. Than we write that obtained byte 
            from the buffer byte (buf) to the Output File. */
            if (count == 8) { count = 0;   outFile << buf; buf = 0; }

            /*In the end we will get approximately 50% lighter file with non-sence characters.
            By the way,this file may not even have the extension! It can be either BIN or TXT,doesn't matter. */
        }
    }

    /*Close all of the files (DICTIONARY file is already closed) */
    inFile.close();// Close the INPUT file
    outFile.close();// Close the OUTPUT file

}



void decompress(string In,string Dict)
{

    /*=======================================================================================
     * Translate the dictionary file to a map <char : vector<bool> >
     *=====================================================================================*/

    ifstream Dictionary(Dict,ios::in | ios::binary);

    string str;// variable for each new line of text
    map<char,vector<bool>> dict;//associative array of charater and its binary code (Dictionary)

    /*Here we read string and translate it to a vector<bool> for bin codes*/
    for (string str; Dictionary >> str;)
    {
        if (str.length() > 1)
        {
            dict[str[0]] = stringToVector(str.substr(1));
        }
    }
        

    /*This is to print the dictionary file in another way*/
    for (map<char,vector<bool> >::iterator ii = dict.begin(); ii != dict.end(); ++ii) {

        /*          (*ii) is an iterator; .first means char*/
        cout << (*ii).first; //Prints ii-th char from the map

        /*                     (*ii) is an iterator; .second means vector<bool>*/
        vector <bool> inVect = (*ii).second; //assingns ii-th vector from map to a local vector

        for (unsigned j = 0; j < inVect.size(); j++) {// Prints vector for the current ii-th char
            cout << inVect[j];
        }
        cout << endl;
    }
    Dictionary.close();




    /*=======================================================================================
     * Translate the input file to a vector<bool>
     *=====================================================================================*/

    ifstream inFile(In,ios::in | ios::binary);

    char c;
    bool currentBit;
    vector<bool> sourceCode;

    while (inFile.get(c))
    {
        for (int i = 7; i >= 0; i--)
        {
            currentBit = ((c >> i) & 1);
            sourceCode.push_back(currentBit);
            cout << currentBit;
        }
    }
    inFile.close();


    /*=======================================================================================
    * Todo: Decode & output to a file
    *=====================================================================================*/

    ofstream outFile(Out,ios::out | ios::binary);//open OUTPUT file for writting in binary mode

    //Vector which includes the part of encoded text we want to check at this iteration
    vector<bool> currentCheck;
    
    cout << endl << endl;
    /*Here we iterate through a vector of encoded text*/
    for (std::vector<bool>::iterator it = sourceCode.begin(); it != sourceCode.end(); ++it) {

        currentCheck.push_back(sourceCode.front() + (*it));/*At first itr has to be just one 1st bool from the text*/

        /*Here we iterate through the map
        We want to find the same code as our currentCheck in the maP*/
        for (map<char,vector<bool> >::iterator ii = dict.begin(); ii != dict.end(); ++ii) {

            //is it legal to compare vectors of bools in such a way?? UPD: Yes
            if (currentCheck == (*ii).second)//If we found the same code in the dictionary map
            {
                
                    outFile << (*ii).first;//then print this character
                

                //And we need to continue our checking from that new position
                currentCheck.clear();

            }
        }
    }
    outFile.close();// Close the OUTPUT file
}

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)