找回密码
 立即注册
查看: 145|回复: 0

C++实现哈夫曼编码

[复制链接]
发表于 2023-2-18 15:25 | 显示全部楼层 |阅读模式
文章目录

    思路相关功能实现
      结构体的定义函数申明读入文件统计次数展示次数创建哈夫曼树选择权重最小的两个节点创建哈夫曼编码将读入的文件编码,写到txt文件中读入编码文件,解码
    完整代码实现测试样例

代码转自涂然哥,只做整理和略微修改

思路

1.输入一个字符串后统计每个字符出现的次数(频率)
2.接下来按照规则构造一棵哈夫曼树
3.然后01编码(即给每个字符赋予一个01组成的字符串)
4.将我们输入的字符串按照编码规则生成01字符串
5.解码:将字符串的01编码按照规则还原并输出

相关功能实现

结构体的定义
  1. /*记录每个字符出现的次数*/typedefstructwordcnt{char ch;int cnt=0;}Count;/*整个字符串的字符构成*/typedefstructNumCount{
  2.     Count count[MaxSize];int length=0;}NumCount;/*核心哈夫曼树*/typedefstructHTree{char data;int weight;int parent,lchild,rchild;}HTNode,*HuffmanTree;/*树的节点的编码*/typedefstructHCode{char data;char* str;}*HuffmanCode;
复制代码
函数申明
  1. voidReadData(char*source);// 读入文件 voidWordCount(char*data,NumCount *paraCnt);// 统计次数 voidShow(NumCount *paraCnt);// 展示次数 voidCreateHuffmanTree(HuffmanTree &HT,int length,NumCount cntarray);// 创建哈夫曼树 voidselect(HuffmanTree HT,int top,int*s1,int*s2);// 选择权重最小的两个节点 voidCreateHuffmanCode(HuffmanTree HT,HuffmanCode &HC,int length);// 创建哈夫曼编码 voidEncode(char*data,HuffmanCode HC,int length);// 将读入的文件编码,写到txt文件 voidDecode(HuffmanTree HT,int length);//读入编码文件,解码
复制代码
读入文件

将我们需要压缩的字符串从文件读入到数组中
  1. voidReadData(char*source){//打开文件读入数据
  2.         ifstream infile;
  3.         infile.open("in.txt");
  4.         cout<<"Reading..."<<endl;
  5.         cout<<"the input file is:"<<endl;
  6.         infile.getline(source,MaxSize);
  7.         cout<<source<<endl;
  8.         infile.close();
  9.         cout<<endl;}
复制代码
统计次数

统计读入的字符串中每个字符分别出现多少次
  1. voidWordCount(char* data,NumCount* paraCnt){int flag;int len=strlen(data);for(int i=0;i<len;++i){
  2.         flag=0;for(int j=0;j<paraCnt->length;++j){if(data[i]==paraCnt->count[j].ch){++paraCnt->count[j].cnt;
  3.                 flag=1;break;}}if(flag==0){
  4.             paraCnt->count[paraCnt->length].ch=data[i];++paraCnt->count[paraCnt->length].cnt;++paraCnt->length;}}}
复制代码
展示次数

将每个字符及其出现的次数输出
  1. voidShow(NumCount* paraCnt){
  2.         cout<<"the length is "<<paraCnt->length<<endl;for(int i =0;i < paraCnt->length;++i){
  3.                 cout<<"The character "<<paraCnt->count[i].ch<<"  appears  "<<paraCnt->count[i].cnt<<endl;}
  4.         cout<<endl;}
复制代码
创建哈夫曼树

核心哈夫曼树的构造,此处会多次调用select函数找到权重最小的两个节点
  1. voidCreateHuffmanTree(HuffmanTree &HT,int length,NumCount cntarray){if(length<=1)printf("ERROR!!!\r\n");int s1,s2;int m = length*2-1;// 没有度为1的节点,则总结点是2*叶子节点数-1个
  2.         HT = new HTNode[m+1];for(int i =1;i <= m;++i)// 初始化 {
  3.                 HT[i].parent =0;
  4.                 HT[i].lchild =0;
  5.                 HT[i].rchild =0;}for(int i =1;i <= length;++i){
  6.                 HT[i].data = cntarray.count[i-1].ch;
  7.                 HT[i].weight = cntarray.count[i-1].cnt;}for(int i = length +1;i <= m;++i){select(HT,i-1,&s1,&s2);// 从前面的范围里选择权重最小的两个节点
  8.                 HT[s1].parent = i;
  9.                 HT[s2].parent = i;
  10.                 HT[i].lchild = s1;
  11.                 HT[i].rchild = s2;
  12.                 HT[i].weight = HT[s1].weight + HT[s2].weight;// 得到一个新节点 }}
复制代码
选择权重最小的两个节点
  1. voidselect(HuffmanTree HT,int top,int* s1,int* s2){int min = INT_MAX;for(int i =1;i <= top;++i)// 选择没有双亲的节点中,权重最小的节点 {if(HT[i].weight < min && HT[i].parent ==0){
  2.                         min = HT[i].weight;*s1 = i;}}
  3.        
  4.         min = INT_MAX;for(int i =1;i <= top;++i)// 选择没有双亲的节点中,权重次小的节点 {if(HT[i].weight < min && i !=*s1 && HT[i].parent ==0){
  5.                         min = HT[i].weight;*s2 = i;}}}
复制代码
创建哈夫曼编码

将哈夫曼树中的每一个非空节点生成一个对应的01编码
  1. voidCreateHuffmanCode(HuffmanTree HT,HuffmanCode &HC,int length){
  2.         HC = new HCode[length+1];char*cd = new char[length];// 存储编码的临时空间
  3.         cd[length-1]='\0';// 方便之后调用strcpy函数 int c,f,start;for(int i =1;i <= length;++i){
  4.                 start = length-1;// start表示编码在临时空间内的起始下标,由于是从叶子节点回溯,所以是从最后开始
  5.                 c = i;
  6.                 f = HT[c].parent;while(f !=0){--start;// 由于是回溯,所以从临时空间的最后往回计 if(HT[f].lchild == c)
  7.                                 cd[start]='0';else
  8.                                 cd[start]='1';
  9.                         c = f;
  10.                         f = HT[c].parent;}
  11.                 HC[i].str = new char[length-start];// 最后,实际使用的编码空间大小是length-start
  12.                 HC[i].data = HT[i].data;strcpy(HC[i].str,&cd[start]);// 从实际起始地址开始,拷贝到编码结构中 }
  13.         delete[] cd;}
复制代码
将读入的文件编码,写到txt文件中
  1. voidEncode(char* data,HuffmanCode HC,int length){
  2.         ofstream outfile;
  3.         outfile.open("code.txt");for(int i =0;i <strlen(data);++i)// 依次读入数据,查找对应的编码,写入编码文件 {for(int j =1;j <= length;++j){if(data[i]== HC[j].data){
  4.                                 outfile<<HC[j].str;}}}
  5.         outfile.close();
  6.         cout<<"the code txt has been written"<<endl;
  7.         cout<<endl;}
复制代码
读入编码文件,解码
  1. voidDecode(HuffmanTree HT,int length){char codetxt[MaxSize*length];
  2.         ifstream infile;
  3.         infile.open("code.txt");
  4.         infile.getline(codetxt,MaxSize*length);
  5.         infile.close();
  6.        
  7.         ofstream outfile;
  8.            outfile.open("out.txt");int root =2*length-1;// 从根节点开始遍历 for(int i =0;i <strlen(codetxt);++i){if(codetxt[i]=='0') root = HT[root].lchild;//为0表示向左遍历 elseif(codetxt[i]=='1') root = HT[root].rchild;//为1表示向右遍历 if(HT[root].lchild ==0&& HT[root].rchild ==0)// 如果已经是叶子节点,输出到输出文件中,然后重新回到根节点 {
  9.                         outfile<<HT[root].data;
  10.                         root =2*length-1;}}
  11.         outfile.close();
  12.         cout<<"the output txt has been written"<<endl;
  13.         cout<<endl;}
复制代码
完整代码实现
  1. #include<iostream>#include<fstream>#include<string.h>
  2. using namespace std;#defineMaxSize1024typedefstructwordcnt{char ch;int cnt=0;}Count;typedefstructNumCount{
  3.     Count count[MaxSize];int length=0;}NumCount;typedefstructHTree{char data;int weight;int parent,lchild,rchild;}HTNode,*HuffmanTree;typedefstructHCode{char data;char* str;}*HuffmanCode;voidReadData(char*source);// 读入文件 voidWordCount(char*data,NumCount *paraCnt);// 统计次数 voidShow(NumCount *paraCnt);// 展示次数 voidCreateHuffmanTree(HuffmanTree &HT,int length,NumCount cntarray);// 创建哈夫曼树 voidselect(HuffmanTree HT,int top,int*s1,int*s2);// 选择权重最小的两个节点 voidCreateHuffmanCode(HuffmanTree HT,HuffmanCode &HC,int length);// 创建哈夫曼编码 voidEncode(char*data,HuffmanCode HC,int length);// 将读入的文件编码,写到txt文件 voidDecode(HuffmanTree HT,int length);//读入编码文件,解码 intmain(int argc,char** argv){char data[MaxSize];  
  4.         NumCount Cntarray;ReadData(data);// 读入数据 WordCount(data,&Cntarray);// 统计次数 Show(&Cntarray);//可以查看每个单词出现的对应次数
  5.         HuffmanTree tree;CreateHuffmanTree(tree,Cntarray.length,Cntarray);// 建树
  6.         HuffmanCode code;CreateHuffmanCode(tree,code,Cntarray.length);// 创建编码 Encode(data,code,Cntarray.length);// 生成编码文件 Decode(tree,Cntarray.length);// 解码
  7.         cout<<"Please view the generated TXT file to check the result"<<endl;return0;}voidReadData(char*source){//打开文件读入数据
  8.         ifstream infile;
  9.         infile.open("in.txt");
  10.         cout<<"Reading..."<<endl;
  11.         cout<<"the input file is:"<<endl;
  12.         infile.getline(source,MaxSize);
  13.         cout<<source<<endl;
  14.         infile.close();
  15.         cout<<endl;}voidWordCount(char*data,NumCount *paraCnt){int flag;// 标识是否已经记录 int len =strlen(data);for(int i =0;i < len;++i){
  16.                 flag =0;for(int j =0;j < paraCnt->length;++j){if(paraCnt->count[j].ch == data[i])// 若已有记录,直接++ {++paraCnt->count[j].cnt;
  17.                                 flag =1;break;}}if(!flag)// 没有记录,则新增 {
  18.                         paraCnt->count[paraCnt->length].ch = data[i];++paraCnt->count[paraCnt->length].cnt;++paraCnt->length;}}}voidShow(NumCount* paraCnt){
  19.         cout<<"the length is "<<paraCnt->length<<endl;for(int i =0;i < paraCnt->length;++i){
  20.                 cout<<"The character "<<paraCnt->count[i].ch<<"  appears  "<<paraCnt->count[i].cnt<<endl;}
  21.         cout<<endl;}voidCreateHuffmanTree(HuffmanTree &HT,int length,NumCount cntarray){if(length<=1)printf("ERROR!!!\r\n");int s1,s2;int m = length*2-1;// 没有度为1的节点,则总结点是2*叶子节点数-1个
  22.         HT = new HTNode[m+1];for(int i =1;i <= m;++i)// 初始化 {
  23.                 HT[i].parent =0;
  24.                 HT[i].lchild =0;
  25.                 HT[i].rchild =0;}for(int i =1;i <= length;++i){
  26.                 HT[i].data = cntarray.count[i-1].ch;
  27.                 HT[i].weight = cntarray.count[i-1].cnt;}for(int i = length +1;i <= m;++i){select(HT,i-1,&s1,&s2);// 从前面的范围里选择权重最小的两个节点
  28.                 HT[s1].parent = i;
  29.                 HT[s2].parent = i;
  30.                 HT[i].lchild = s1;
  31.                 HT[i].rchild = s2;
  32.                 HT[i].weight = HT[s1].weight + HT[s2].weight;// 得到一个新节点 }}voidselect(HuffmanTree HT,int top,int* s1,int* s2){int min = INT_MAX;for(int i =1;i <= top;++i)// 选择没有双亲的节点中,权重最小的节点 {if(HT[i].weight < min && HT[i].parent ==0){
  33.                         min = HT[i].weight;*s1 = i;}}
  34.        
  35.         min = INT_MAX;for(int i =1;i <= top;++i)// 选择没有双亲的节点中,权重次小的节点 {if(HT[i].weight < min && i !=*s1 && HT[i].parent ==0){
  36.                         min = HT[i].weight;*s2 = i;}}}voidCreateHuffmanCode(HuffmanTree HT,HuffmanCode &HC,int length){
  37.         HC = new HCode[length+1];char*cd = new char[length];// 存储编码的临时空间
  38.         cd[length-1]='\0';// 方便之后调用strcpy函数 int c,f,start;for(int i =1;i <= length;++i){
  39.                 start = length-1;// start表示编码在临时空间内的起始下标,由于是从叶子节点回溯,所以是从最后开始
  40.                 c = i;
  41.                 f = HT[c].parent;while(f !=0){--start;// 由于是回溯,所以从临时空间的最后往回计 if(HT[f].lchild == c)
  42.                                 cd[start]='0';else
  43.                                 cd[start]='1';
  44.                         c = f;
  45.                         f = HT[c].parent;}
  46.                 HC[i].str = new char[length-start];// 最后,实际使用的编码空间大小是length-start
  47.                 HC[i].data = HT[i].data;strcpy(HC[i].str,&cd[start]);// 从实际起始地址开始,拷贝到编码结构中 }
  48.         delete[] cd;}voidEncode(char* data,HuffmanCode HC,int length){
  49.         ofstream outfile;
  50.         outfile.open("code.txt");for(int i =0;i <strlen(data);++i)// 依次读入数据,查找对应的编码,写入编码文件 {for(int j =1;j <= length;++j){if(data[i]== HC[j].data){
  51.                                 outfile<<HC[j].str;}}}
  52.         outfile.close();
  53.         cout<<"the code txt has been written"<<endl;
  54.         cout<<endl;}voidDecode(HuffmanTree HT,int length){char codetxt[MaxSize*length];
  55.         ifstream infile;
  56.         infile.open("code.txt");
  57.         infile.getline(codetxt,MaxSize*length);
  58.         infile.close();
  59.        
  60.         ofstream outfile;
  61.            outfile.open("out.txt");int root =2*length-1;// 从根节点开始遍历 for(int i =0;i <strlen(codetxt);++i){if(codetxt[i]=='0') root = HT[root].lchild;//为0表示向左遍历 elseif(codetxt[i]=='1') root = HT[root].rchild;//为1表示向右遍历 if(HT[root].lchild ==0&& HT[root].rchild ==0)// 如果已经是叶子节点,输出到输出文件中,然后重新回到根节点 {
  62.                         outfile<<HT[root].data;
  63.                         root =2*length-1;}}
  64.         outfile.close();
  65.         cout<<"the output txt has been written"<<endl;
  66.         cout<<endl;}
复制代码
测试样例

输入:in.txt
In my lifetime, I will be a sincere person and never give up my love and dedication to life. I will live an infinite life in a limited time and space.
输出
the input file is:
In my lifetime, I will be a sincere person and never give up my love and dedication to life. I will live an infinite life in a limited time and space.
the length is 24
The character I appears 3
The character n appears 12
The character appears 30
The character m appears 5
The character y appears 2
The character l appears 10
The character i appears 18
The character f appears 4
The character e appears 18
The character t appears 6
The character , appears 1
The character w appears 2
The character b appears 1
The character a appears 8
The character s appears 3
The character c appears 3
The character r appears 3
The character p appears 3
The character o appears 4
The character d appears 6
The character v appears 4
The character g appears 1
The character u appears 1
The character . appears 2
the code txt has been written
the output txt has been written
Please view the generated TXT file to check the result



本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有账号?立即注册

×
懒得打字嘛,点击右侧快捷回复 【右侧内容,后台自定义】
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Unity开发者联盟 ( 粤ICP备20003399号 )

GMT+8, 2024-6-27 14:19 , Processed in 0.103329 second(s), 26 queries .

Powered by Discuz! X3.5 Licensed

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表