找回密码
 立即注册
查看: 270|回复: 0

Protobuf ZeroCopyStream

[复制链接]
发表于 2023-4-1 13:03 | 显示全部楼层 |阅读模式
Protobuf简介

Protobuf 是 Google 开源的一个序列化和反序列化的工具。基本语法如下:
// xxx.proto
syntax = "proto3"; // 表示使用的protobuf版本是proto3

package interface.xxx.yyy;

message People {
  optional string name;
  optional int32 age;
}上述为 protobuf 最基本的语法,更多的教程可参考:
Protobuf 源码:
使用如下命令将上述 protobuf 进行编译:
protoc -I=$input_path --cpp_out=$ouput_path xxx.proto会在 cpp_out 文件夹下生成两个源文件:xxx.pb.h和xxx.pb.cc。protbuf 的具体原理就是将定义的 message 编译成一个 class,在 class 内提供读取以及修改各个数据成员的函数,以及序列化和反序列化的函数。公司代码中涉及到很多 protobuf 序列化和反序列化的内容,所以这里特意学习一下。生成的 class 中,提供了很多种序列化和反序列化的函数,主要包括以下几种:
class PROTOBUF_EXPORT MessageLite {
  / Serialization ---------------------------------------------------
  // Methods for serializing in protocol buffer format.  Most of these
  // are just simple wrappers around ByteSize() and SerializeWithCachedSizes().

  // Write a protocol buffer of this message to the given output.  Returns
  // false on a write error.  If the message is missing required fields,
  // this may GOOGLE_CHECK-fail.
  bool SerializeToCodedStream(io::CodedOutputStream* output) const;
  // Like SerializeToCodedStream(), but allows missing required fields.
  bool SerializePartialToCodedStream(io::CodedOutputStream* output) const;
  // Write the message to the given zero-copy output stream.  All required
  // fields must be set.
  bool SerializeToZeroCopyStream(io::ZeroCopyOutputStream* output) const;
  // Like SerializeToZeroCopyStream(), but allows missing required fields.
  bool SerializePartialToZeroCopyStream(io::ZeroCopyOutputStream* output) const;
  // Serialize the message and store it in the given string.  All required
  // fields must be set.
  bool SerializeToString(std::string* output) const;
  // Like SerializeToString(), but allows missing required fields.
  bool SerializePartialToString(std::string* output) const;
  // Serialize the message and store it in the given byte array.  All required
  // fields must be set.
  bool SerializeToArray(void* data, int size) const;
  // Like SerializeToArray(), but allows missing required fields.
  bool SerializePartialToArray(void* data, int size) const;

  // Make a string encoding the message. Is equivalent to calling
  // SerializeToString() on a string and using that.  Returns the empty
  // string if SerializeToString() would have returned an error.
  // Note: If you intend to generate many such strings, you may
  // reduce heap fragmentation by instead re-using the same string
  // object with calls to SerializeToString().
  std::string SerializeAsString() const;
  // Like SerializeAsString(), but allows missing required fields.
  std::string SerializePartialAsString() const;

  // Serialize the message and write it to the given file descriptor.  All
  // required fields must be set.
  bool SerializeToFileDescriptor(int file_descriptor) const;
  // Like SerializeToFileDescriptor(), but allows missing required fields.
  bool SerializePartialToFileDescriptor(int file_descriptor) const;
  // Serialize the message and write it to the given C++ ostream.  All
  // required fields must be set.
  bool SerializeToOstream(std::ostream* output) const;
  // Like SerializeToOstream(), but allows missing required fields.
  bool SerializePartialToOstream(std::ostream* output) const;
}
序列化(ZeroCopyOutputStream)

这里主要介绍一下SerializeToCodedStream函数,这里的 CodedOutputStream的构造函数为:
explicit CodedOutputStream(ZeroCopyOutputStream* input);
表示其需要一个 ZeroCopyOutputStream对象来进行构造,这里的ZeroCopyOutputStream是什么呢?顾名思义,是 protobuf 中用来实现零拷贝(这里的零拷贝有点歧义,其实就是为了减少拷贝次数这样)的。ZeroCopyOutputStream是一个抽象接口类,方便用户去定制自己的一些序列化输出,其定义如下,在用户实现自己自定义的序列化输出时,需要实现其中的几个函数纯虚函数。
// Abstract interface similar to an output stream but designed to minimize
// copying.
class PROTOBUF_EXPORT ZeroCopyOutputStream {
public:
  ZeroCopyOutputStream() {}
  virtual ~ZeroCopyOutputStream() {}

  // Obtains a buffer into which data can be written.  Any data written
  // into this buffer will eventually (maybe instantly, maybe later on)
  // be written to the output.
  //
  // Preconditions:
  // * "size" and "data" are not NULL.
  //
  // Postconditions:
  // * If the returned value is false, an error occurred.  All errors are
  //   permanent.
  // * Otherwise, "size" points to the actual number of bytes in the buffer
  //   and "data" points to the buffer.
  // * Ownership of this buffer remains with the stream, and the buffer
  //   remains valid only until some other method of the stream is called
  //   or the stream is destroyed.
  // * Any data which the caller stores in this buffer will eventually be
  //   written to the output (unless BackUp() is called).
  // * It is legal for the returned buffer to have zero size, as long
  //   as repeatedly calling Next() eventually yields a buffer with non-zero
  //   size.
  virtual bool Next(void** data, int* size) = 0;

  // Backs up a number of bytes, so that the end of the last buffer returned
  // by Next() is not actually written.  This is needed when you finish
  // writing all the data you want to write, but the last buffer was bigger
  // than you needed.  You don't want to write a bunch of garbage after the
  // end of your data, so you use BackUp() to back up.
  //
  // Preconditions:
  // * The last method called must have been Next().
  // * count must be less than or equal to the size of the last buffer
  //   returned by Next().
  // * The caller must not have written anything to the last "count" bytes
  //   of that buffer.
  //
  // Postconditions:
  // * The last "count" bytes of the last buffer returned by Next() will be
  //   ignored.
  virtual void BackUp(int count) = 0;

  // Returns the total number of bytes written since this object was created.
  virtual int64_t ByteCount() const = 0;

  // Write a given chunk of data to the output.  Some output streams may
  // implement this in a way that avoids copying. Check AllowsAliasing() before
  // calling WriteAliasedRaw(). It will GOOGLE_CHECK fail if WriteAliasedRaw() is
  // called on a stream that does not allow aliasing.
  //
  // NOTE: It is caller's responsibility to ensure that the chunk of memory
  // remains live until all of the data has been consumed from the stream.
  virtual bool WriteAliasedRaw(const void* data, int size);
  virtual bool AllowsAliasing() const { return false; }


private:
  GOOGLE_DISALLOW_EVIL_CONSTRUCTORS(ZeroCopyOutputStream);
};
Next函数

其中主要的两个函数为Next和 BackUp函数。首先说说Next函数,该函数主要是用于为外部分配一个 buffer,这样的话,外部函数就可以直接将数据写到该 buffer 中。其中的 data参数为外部函数传入的指针,在 Next内部,将 buffer 的地址赋给data,简单来说就是:
bool Next(void** data, int* size) {
  ...
  // buffer_: uint_8[cap_]
  // cap_ is the max size of buffer_
  // buffer_used_ is the used bytes of buffer_
  *data = buffer_[buffer_used_];
  *size = cap_ - buffer_used_
  ...
}
然后用户内部可以根据自己的需求将 buffer 的数据写入到文件中,比如当内部 buffer 的数据写满之后,将其 Flush 到磁盘中,类似这一类的操作。
bool Next(void** data, int* size) {
  ...
  if (buffer_used_ == cap_) {
    Flush();
  }
  // buffer_: uint_8[cap_]
  // cap_ is the max size of buffer_
  // buffer_used_ is the used bytes of buffer_
  *data = buffer_[buffer_used_];
  *size = cap_ - buffer_used_
  ...
}
这样,就可以自己定义一些定制的操作。(ps:这里的 ZeroCopy 实在是有点让人疑惑,我理解最大的作用是可以让用户自定义一些序列化操作,然后 Next函数最大的作用就是让外部函数直接在其内部 buffer 上进行操作,但是看了下源码,其实还是进行了拷贝,将外部序列化的数据拷贝到内部 buffer 中,所以一直对这里 ZeroCopy 有一些疑惑!)。
BackUp函数

然后是BackUp函数,该函数的作用是用于归还多余的 buffer,因为Next分配的 buffer 可能比较多,所以外部函数没有用完,就需要将其归还,简单可写为:
void BackUp(int count) {
  CHECK_GT(count, 0);
  buffer_used_ -= count;
}
其实就是简单的归还 buffer 内存的操作。
上面是ZeroCopyOutputStream的例子,ZeroCopyInputStream的和其类似,只是不是写,而是读。
总结

Protobuf 中,为用户提供了很多种序列化和反序列化的函数,其中,通过继承ZeroCopyOutputStream和ZeroCopyInputStream可以很容易实现用户定制的一些操作,包括网络流,以及一些特殊格式文件读写等等。
懒得打字嘛,点击右侧快捷回复 【右侧内容,后台自定义】
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Unity开发者联盟 ( 粤ICP备20003399号 )

GMT+8, 2024-11-23 15:16 , Processed in 0.103494 second(s), 27 queries .

Powered by Discuz! X3.5 Licensed

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表