Protobuf ZeroCopyStream

Carina8090 · 发表于 2023-4-1 13:03

Protobuf简介

Protobuf 是 Google 开源的一个序列化和反序列化的工具。基本语法如下：
// xxx.proto
syntax = &#34;proto3&#34;; // 表示使用的protobuf版本是proto3

package interface.xxx.yyy;

message People {
  optional string name;
  optional int32 age;
}上述为 protobuf 最基本的语法，更多的教程可参考：
Protobuf 源码：
使用如下命令将上述 protobuf 进行编译：
protoc -I=$input_path --cpp_out=$ouput_path xxx.proto会在 cpp_out 文件夹下生成两个源文件：xxx.pb.h和xxx.pb.cc。protbuf 的具体原理就是将定义的 message 编译成一个 class，在 class 内提供读取以及修改各个数据成员的函数，以及序列化和反序列化的函数。公司代码中涉及到很多 protobuf 序列化和反序列化的内容，所以这里特意学习一下。生成的 class 中，提供了很多种序列化和反序列化的函数，主要包括以下几种：
class PROTOBUF_EXPORT MessageLite {
  / Serialization ---------------------------------------------------
  // Methods for serializing in protocol buffer format.  Most of these
  // are just simple wrappers around ByteSize() and SerializeWithCachedSizes().

  // Write a protocol buffer of this message to the given output.  Returns
  // false on a write error.  If the message is missing required fields,
  // this may GOOGLE_CHECK-fail.
  bool SerializeToCodedStream(io::CodedOutputStream* output) const;
  // Like SerializeToCodedStream(), but allows missing required fields.
  bool SerializePartialToCodedStream(io::CodedOutputStream* output) const;
  // Write the message to the given zero-copy output stream.  All required
  // fields must be set.
  bool SerializeToZeroCopyStream(io::ZeroCopyOutputStream* output) const;
  // Like SerializeToZeroCopyStream(), but allows missing required fields.
  bool SerializePartialToZeroCopyStream(io::ZeroCopyOutputStream* output) const;
  // Serialize the message and store it in the given string.  All required
  // fields must be set.
  bool SerializeToString(std::string* output) const;
  // Like SerializeToString(), but allows missing required fields.
  bool SerializePartialToString(std::string* output) const;
  // Serialize the message and store it in the given byte array.  All required
  // fields must be set.
  bool SerializeToArray(void* data, int size) const;
  // Like SerializeToArray(), but allows missing required fields.
  bool SerializePartialToArray(void* data, int size) const;

  // Make a string encoding the message. Is equivalent to calling
  // SerializeToString() on a string and using that.  Returns the empty
  // string if SerializeToString() would have returned an error.
  // Note: If you intend to generate many such strings, you may
  // reduce heap fragmentation by instead re-using the same string
  // object with calls to SerializeToString().
  std::string SerializeAsString() const;
  // Like SerializeAsString(), but allows missing required fields.
  std::string SerializePartialAsString() const;

  // Serialize the message and write it to the given file descriptor.  All
  // required fields must be set.
  bool SerializeToFileDescriptor(int file_descriptor) const;
  // Like SerializeToFileDescriptor(), but allows missing required fields.
  bool SerializePartialToFileDescriptor(int file_descriptor) const;
  // Serialize the message and write it to the given C++ ostream.  All
  // required fields must be set.
  bool SerializeToOstream(std::ostream* output) const;
  // Like SerializeToOstream(), but allows missing required fields.
  bool SerializePartialToOstream(std::ostream* output) const;
}
序列化（ZeroCopyOutputStream）

这里主要介绍一下SerializeToCodedStream函数，这里的 CodedOutputStream的构造函数为：
explicit CodedOutputStream(ZeroCopyOutputStream* input);
表示其需要一个 ZeroCopyOutputStream对象来进行构造，这里的ZeroCopyOutputStream是什么呢？顾名思义，是 protobuf 中用来实现零拷贝（这里的零拷贝有点歧义，其实就是为了减少拷贝次数这样）的。ZeroCopyOutputStream是一个抽象接口类，方便用户去定制自己的一些序列化输出，其定义如下，在用户实现自己自定义的序列化输出时，需要实现其中的几个函数纯虚函数。
// Abstract interface similar to an output stream but designed to minimize
// copying.
class PROTOBUF_EXPORT ZeroCopyOutputStream {
public:
  ZeroCopyOutputStream() {}
  virtual ~ZeroCopyOutputStream() {}

  // Obtains a buffer into which data can be written.  Any data written
  // into this buffer will eventually (maybe instantly, maybe later on)
  // be written to the output.
  //
  // Preconditions:
  // * &#34;size&#34; and &#34;data&#34; are not NULL.
  //
  // Postconditions:
  // * If the returned value is false, an error occurred.  All errors are
  // permanent.
  // * Otherwise, &#34;size&#34; points to the actual number of bytes in the buffer
  // and &#34;data&#34; points to the buffer.
  // * Ownership of this buffer remains with the stream, and the buffer
  // remains valid only until some other method of the stream is called
  // or the stream is destroyed.
  // * Any data which the caller stores in this buffer will eventually be
  // written to the output (unless BackUp() is called).
  // * It is legal for the returned buffer to have zero size, as long
  // as repeatedly calling Next() eventually yields a buffer with non-zero
  // size.
  virtual bool Next(void** data, int* size) = 0;

  // Backs up a number of bytes, so that the end of the last buffer returned
  // by Next() is not actually written.  This is needed when you finish
  // writing all the data you want to write, but the last buffer was bigger
  // than you needed.  You don&#39;t want to write a bunch of garbage after the
  // end of your data, so you use BackUp() to back up.
  //
  // Preconditions:
  // * The last method called must have been Next().
  // * count must be less than or equal to the size of the last buffer
  // returned by Next().
  // * The caller must not have written anything to the last &#34;count&#34; bytes
  // of that buffer.
  //
  // Postconditions:
  // * The last &#34;count&#34; bytes of the last buffer returned by Next() will be
  // ignored.
  virtual void BackUp(int count) = 0;

  // Returns the total number of bytes written since this object was created.
  virtual int64_t ByteCount() const = 0;

  // Write a given chunk of data to the output.  Some output streams may
  // implement this in a way that avoids copying. Check AllowsAliasing() before
  // calling WriteAliasedRaw(). It will GOOGLE_CHECK fail if WriteAliasedRaw() is
  // called on a stream that does not allow aliasing.
  //
  // NOTE: It is caller&#39;s responsibility to ensure that the chunk of memory
  // remains live until all of the data has been consumed from the stream.
  virtual bool WriteAliasedRaw(const void* data, int size);
  virtual bool AllowsAliasing() const { return false; }

private:
  GOOGLE_DISALLOW_EVIL_CONSTRUCTORS(ZeroCopyOutputStream);
};
Next函数

其中主要的两个函数为Next和 BackUp函数。首先说说Next函数，该函数主要是用于为外部分配一个 buffer，这样的话，外部函数就可以直接将数据写到该 buffer 中。其中的 data参数为外部函数传入的指针，在 Next内部，将 buffer 的地址赋给data，简单来说就是：
bool Next(void** data, int* size) {
  ...
  // buffer_: uint_8[cap_]
  // cap_ is the max size of buffer_
  // buffer_used_ is the used bytes of buffer_
  *data = buffer_[buffer_used_];
  *size = cap_ - buffer_used_
  ...
}
然后用户内部可以根据自己的需求将 buffer 的数据写入到文件中，比如当内部 buffer 的数据写满之后，将其 Flush 到磁盘中，类似这一类的操作。
bool Next(void** data, int* size) {
  ...
  if (buffer_used_ == cap_) {
Flush();
  }
  // buffer_: uint_8[cap_]
  // cap_ is the max size of buffer_
  // buffer_used_ is the used bytes of buffer_
  *data = buffer_[buffer_used_];
  *size = cap_ - buffer_used_
  ...
}
这样，就可以自己定义一些定制的操作。（ps：这里的 ZeroCopy 实在是有点让人疑惑，我理解最大的作用是可以让用户自定义一些序列化操作，然后 Next函数最大的作用就是让外部函数直接在其内部 buffer 上进行操作，但是看了下源码，其实还是进行了拷贝，将外部序列化的数据拷贝到内部 buffer 中，所以一直对这里 ZeroCopy 有一些疑惑！）。
BackUp函数

然后是BackUp函数，该函数的作用是用于归还多余的 buffer，因为Next分配的 buffer 可能比较多，所以外部函数没有用完，就需要将其归还，简单可写为：
void BackUp(int count) {
  CHECK_GT(count, 0);
  buffer_used_ -= count;
}
其实就是简单的归还 buffer 内存的操作。
上面是ZeroCopyOutputStream的例子，ZeroCopyInputStream的和其类似，只是不是写，而是读。
总结

Protobuf 中，为用户提供了很多种序列化和反序列化的函数，其中，通过继承ZeroCopyOutputStream和ZeroCopyInputStream可以很容易实现用户定制的一些操作，包括网络流，以及一些特殊格式文件读写等等。

		自动登录	找回密码
密码			立即注册