问题描述
我正在尝试像特定的 C# 代码一样通过 python 压缩字符串,但我得到了不同的结果。似乎我必须在压缩结果中添加一个标头,但我不知道如何在 python 中为压缩字符串添加一个标头。这是我不知道 python 中的 C# 行:
memoryStream.Read(compressedBytes,CompressedMessageHeaderLength,(int)memoryStream.Length);
这是整个可运行的 C# 代码
using System;
using System.IO;
using System.IO.Compression;
using System.Text;
namespace Rextester
{
/// <summary>Handles compressing and decompressing API requests and responses.</summary>
public class Compression
{
#region Member Variables
/// <summary>The compressed message header length.</summary>
private const int CompressedMessageHeaderLength = 4;
#endregion
#region Methods
/// <summary>Compresses the XML string.</summary>
/// <param name="documentToCompress">The XML string to compress.</param>
public static string CompressData(string data)
{
using (MemoryStream memoryStream = new MemoryStream())
{
byte[] plainBytes = Encoding.UTF8.GetBytes(data);
using (GZipStream zipStream = new GZipStream(memoryStream,CompressionMode.Compress,leaveOpen: true))
{
zipStream.Write(plainBytes,plainBytes.Length);
}
memoryStream.Position = 0;
byte[] compressedBytes = new byte[memoryStream.Length + CompressedMessageHeaderLength];
Buffer.Blockcopy(
BitConverter.GetBytes(plainBytes.Length),compressedBytes,CompressedMessageHeaderLength
);
// Add the header,which is the length of the compressed message.
memoryStream.Read(compressedBytes,(int)memoryStream.Length);
string compressedXml = Convert.ToBase64String(compressedBytes);
return compressedXml;
}
}
#endregion
}
public class Program
{
public static void Main(string[] args)
{
//Your code goes here
string data = "Hello World!";
Console.WriteLine( Compression.CompressData(data) );
// result would be DAAAAB+LCAAAAAAABADzSM3JyVcIzy/KsveEAKMcKRwMAAAA
}
}
}
这是我写的 Python 代码:
data = 'Hello World!'
import gzip
import base64
print(base64.b64encode(gzip.compress(data.encode('utf-8'))))
# I expect DAAAAB+LCAAAAAAABADzSM3JyVcIzy/KsveEAKMcKRwMAAAA
# but I get H4sIACwuuWAC//NIzcnJVwjpl8pJUQQAoxwpHAwAAAA=
解决方法
您可以使用 to_bytes
来转换编码字符串的长度:
enc = data.encode('utf-8')
zipped = gzip.compress(enc)
print(base64.b64encode((len(enc)).to_bytes(4,sys.byteorder) + zipped)) # sys.byteorder can be set to concrete fixed value
此外,gzip.compress(enc)
产生的结果似乎与 C# 对应的结果略有不同(因此总体结果也会有所不同),但这应该不是问题,因此解压缩应该可以正确处理所有内容。
我要开始的一件事是 C# 代码不太适合跨平台使用。长度标头的字节顺序取决于底层架构,因为 BitConverter.GetBytes
以任何架构顺序返回字节。
但是,对于 C#,我们可能指的是 windows,也可能指的是 Intel,所以 Little Endian 很有可能。
因此,您需要做的是将原始数据的长度按小端顺序添加到压缩数据中。正好 4 个字节。
bdata = data.encode('utf-8')
compressed = gzip.compress(bdata)
header = len(bdata).to_bytes(4,'little')
然后,您需要连接并转换为base64:
print(base64.b64encode(header + compressed))
,
正如其他人所提到的,您将该标头放入 c# 版本这一事实有所不同。
同样,请注意 gzip 过程可以通过多种方式完成。例如,在 C# 中,您可以指定 CompressionLevel
、Optimal
或 Fastest
的 NoCompression
。请参阅:https://docs.microsoft.com/en-us/dotnet/api/system.io.compression.compressionlevel?view=net-5.0
我对 Python 不够熟悉,无法说明默认情况下它将如何处理 gzip 压缩(也许 C# 中的 Fastest
提供了比 Python 或多或少的激进算法)
这是您的 C# 代码,标头值设置为“0”,并以 3 CompressionLevels
输出。请注意,它输出的字符串值“非常接近”您在 Python 中得到的值。
您还应该询问值不同是否真的重要。只要能编解码就够了吗?
using System;
using System.IO;
using System.IO.Compression;
using System.Text;
public class Program
{
public static void Main()
{
string data = "Hello World!";
Console.WriteLine( Compression.CompressData(data,CompressionLevel.Fastest) );
Console.WriteLine( Compression.CompressData(data,CompressionLevel.NoCompression) );
Console.WriteLine( Compression.CompressData(data,CompressionLevel.Optimal) );
// result would be DAAAAB+LCAAAAAAABADzSM3JyVcIzy/KSVEEAKMcKRwMAAAA
// but I get H4sIACwuuWAC//NIzcnJVwjPL8pJUQQAoxwpHAwAAAA=
}
}
public class Compression
{
#region Member Variables
/// <summary>The compressed message header length.</summary>
private const int CompressedMessageHeaderLength = 0; // changed to zero
#endregion
#region Methods
/// <summary>Compresses the XML string.</summary>
/// <param name="documentToCompress">The XML string to compress.</param>
public static string CompressData(string data,CompressionLevel compressionLevel)
{
using (MemoryStream memoryStream = new MemoryStream())
{
byte[] plainBytes = Encoding.UTF8.GetBytes(data);
using (GZipStream zipStream = new GZipStream(memoryStream,compressionLevel,leaveOpen: true))
{
zipStream.Write(plainBytes,plainBytes.Length);
}
memoryStream.Position = 0;
byte[] compressedBytes = new byte[memoryStream.Length + CompressedMessageHeaderLength];
Buffer.BlockCopy(
BitConverter.GetBytes(plainBytes.Length),compressedBytes,CompressedMessageHeaderLength
);
// Add the header,which is the length of the compressed message.
memoryStream.Read(compressedBytes,CompressedMessageHeaderLength,(int)memoryStream.Length);
string compressedXml = Convert.ToBase64String(compressedBytes);
return compressedXml;
}
}
#endregion
}
输出:
H4sIAAAAAAAEA/NIzcnJVwjPL8pJUQQAoxwpHAwAAAA= H4sIAAAAAAAEAwEMAPP/SGVsbG8gV29ybGQhoxwpHAwAAAA= H4sIAAAAAAAAAA/NIzcnJVwjPL8pJUQQAoxwpHAwAAAA=