|
Protobuf简介
Protobuf 是 Google 开源的一个序列化和反序列化的工具。基本语法如下:
// xxx.proto
syntax = "proto3"; // 表示使用的protobuf版本是proto3
package interface.xxx.yyy;
message People {
optional string name;
optional int32 age;
}上述为 protobuf 最基本的语法,更多的教程可参考:
Protobuf 源码:
使用如下命令将上述 protobuf 进行编译:
protoc -I=$input_path --cpp_out=$ouput_path xxx.proto会在 cpp_out 文件夹下生成两个源文件:xxx.pb.h和xxx.pb.cc。protbuf 的具体原理就是将定义的 message 编译成一个 class,在 class 内提供读取以及修改各个数据成员的函数,以及序列化和反序列化的函数。公司代码中涉及到很多 protobuf 序列化和反序列化的内容,所以这里特意学习一下。生成的 class 中,提供了很多种序列化和反序列化的函数,主要包括以下几种:
class PROTOBUF_EXPORT MessageLite {
/ Serialization ---------------------------------------------------
// Methods for serializing in protocol buffer format. Most of these
// are just simple wrappers around ByteSize() and SerializeWithCachedSizes().
// Write a protocol buffer of this message to the given output. Returns
// false on a write error. If the message is missing required fields,
// this may GOOGLE_CHECK-fail.
bool SerializeToCodedStream(io::CodedOutputStream* output) const;
// Like SerializeToCodedStream(), but allows missing required fields.
bool SerializePartialToCodedStream(io::CodedOutputStream* output) const;
// Write the message to the given zero-copy output stream. All required
// fields must be set.
bool SerializeToZeroCopyStream(io::ZeroCopyOutputStream* output) const;
// Like SerializeToZeroCopyStream(), but allows missing required fields.
bool SerializePartialToZeroCopyStream(io::ZeroCopyOutputStream* output) const;
// Serialize the message and store it in the given string. All required
// fields must be set.
bool SerializeToString(std::string* output) const;
// Like SerializeToString(), but allows missing required fields.
bool SerializePartialToString(std::string* output) const;
// Serialize the message and store it in the given byte array. All required
// fields must be set.
bool SerializeToArray(void* data, int size) const;
// Like SerializeToArray(), but allows missing required fields.
bool SerializePartialToArray(void* data, int size) const;
// Make a string encoding the message. Is equivalent to calling
// SerializeToString() on a string and using that. Returns the empty
// string if SerializeToString() would have returned an error.
// Note: If you intend to generate many such strings, you may
// reduce heap fragmentation by instead re-using the same string
// object with calls to SerializeToString().
std::string SerializeAsString() const;
// Like SerializeAsString(), but allows missing required fields.
std::string SerializePartialAsString() const;
// Serialize the message and write it to the given file descriptor. All
// required fields must be set.
bool SerializeToFileDescriptor(int file_descriptor) const;
// Like SerializeToFileDescriptor(), but allows missing required fields.
bool SerializePartialToFileDescriptor(int file_descriptor) const;
// Serialize the message and write it to the given C++ ostream. All
// required fields must be set.
bool SerializeToOstream(std::ostream* output) const;
// Like SerializeToOstream(), but allows missing required fields.
bool SerializePartialToOstream(std::ostream* output) const;
}
序列化(ZeroCopyOutputStream)
这里主要介绍一下SerializeToCodedStream函数,这里的 CodedOutputStream的构造函数为:
explicit CodedOutputStream(ZeroCopyOutputStream* input);
表示其需要一个 ZeroCopyOutputStream对象来进行构造,这里的ZeroCopyOutputStream是什么呢?顾名思义,是 protobuf 中用来实现零拷贝(这里的零拷贝有点歧义,其实就是为了减少拷贝次数这样)的。ZeroCopyOutputStream是一个抽象接口类,方便用户去定制自己的一些序列化输出,其定义如下,在用户实现自己自定义的序列化输出时,需要实现其中的几个函数纯虚函数。
// Abstract interface similar to an output stream but designed to minimize
// copying.
class PROTOBUF_EXPORT ZeroCopyOutputStream {
public:
ZeroCopyOutputStream() {}
virtual ~ZeroCopyOutputStream() {}
// Obtains a buffer into which data can be written. Any data written
// into this buffer will eventually (maybe instantly, maybe later on)
// be written to the output.
//
// Preconditions:
// * "size" and "data" are not NULL.
//
// Postconditions:
// * If the returned value is false, an error occurred. All errors are
// permanent.
// * Otherwise, "size" points to the actual number of bytes in the buffer
// and "data" points to the buffer.
// * Ownership of this buffer remains with the stream, and the buffer
// remains valid only until some other method of the stream is called
// or the stream is destroyed.
// * Any data which the caller stores in this buffer will eventually be
// written to the output (unless BackUp() is called).
// * It is legal for the returned buffer to have zero size, as long
// as repeatedly calling Next() eventually yields a buffer with non-zero
// size.
virtual bool Next(void** data, int* size) = 0;
// Backs up a number of bytes, so that the end of the last buffer returned
// by Next() is not actually written. This is needed when you finish
// writing all the data you want to write, but the last buffer was bigger
// than you needed. You don't want to write a bunch of garbage after the
// end of your data, so you use BackUp() to back up.
//
// Preconditions:
// * The last method called must have been Next().
// * count must be less than or equal to the size of the last buffer
// returned by Next().
// * The caller must not have written anything to the last "count" bytes
// of that buffer.
//
// Postconditions:
// * The last "count" bytes of the last buffer returned by Next() will be
// ignored.
virtual void BackUp(int count) = 0;
// Returns the total number of bytes written since this object was created.
virtual int64_t ByteCount() const = 0;
// Write a given chunk of data to the output. Some output streams may
// implement this in a way that avoids copying. Check AllowsAliasing() before
// calling WriteAliasedRaw(). It will GOOGLE_CHECK fail if WriteAliasedRaw() is
// called on a stream that does not allow aliasing.
//
// NOTE: It is caller's responsibility to ensure that the chunk of memory
// remains live until all of the data has been consumed from the stream.
virtual bool WriteAliasedRaw(const void* data, int size);
virtual bool AllowsAliasing() const { return false; }
private:
GOOGLE_DISALLOW_EVIL_CONSTRUCTORS(ZeroCopyOutputStream);
};
Next函数
其中主要的两个函数为Next和 BackUp函数。首先说说Next函数,该函数主要是用于为外部分配一个 buffer,这样的话,外部函数就可以直接将数据写到该 buffer 中。其中的 data参数为外部函数传入的指针,在 Next内部,将 buffer 的地址赋给data,简单来说就是:
bool Next(void** data, int* size) {
...
// buffer_: uint_8[cap_]
// cap_ is the max size of buffer_
// buffer_used_ is the used bytes of buffer_
*data = buffer_[buffer_used_];
*size = cap_ - buffer_used_
...
}
然后用户内部可以根据自己的需求将 buffer 的数据写入到文件中,比如当内部 buffer 的数据写满之后,将其 Flush 到磁盘中,类似这一类的操作。
bool Next(void** data, int* size) {
...
if (buffer_used_ == cap_) {
Flush();
}
// buffer_: uint_8[cap_]
// cap_ is the max size of buffer_
// buffer_used_ is the used bytes of buffer_
*data = buffer_[buffer_used_];
*size = cap_ - buffer_used_
...
}
这样,就可以自己定义一些定制的操作。(ps:这里的 ZeroCopy 实在是有点让人疑惑,我理解最大的作用是可以让用户自定义一些序列化操作,然后 Next函数最大的作用就是让外部函数直接在其内部 buffer 上进行操作,但是看了下源码,其实还是进行了拷贝,将外部序列化的数据拷贝到内部 buffer 中,所以一直对这里 ZeroCopy 有一些疑惑!)。
BackUp函数
然后是BackUp函数,该函数的作用是用于归还多余的 buffer,因为Next分配的 buffer 可能比较多,所以外部函数没有用完,就需要将其归还,简单可写为:
void BackUp(int count) {
CHECK_GT(count, 0);
buffer_used_ -= count;
}
其实就是简单的归还 buffer 内存的操作。
上面是ZeroCopyOutputStream的例子,ZeroCopyInputStream的和其类似,只是不是写,而是读。
总结
Protobuf 中,为用户提供了很多种序列化和反序列化的函数,其中,通过继承ZeroCopyOutputStream和ZeroCopyInputStream可以很容易实现用户定制的一些操作,包括网络流,以及一些特殊格式文件读写等等。 |
|