Buy @ Amazon

All you need to know about ProtoBuf


Google's  ProtoBuf aka Protocol Buffers has been there for a good while now. It is free, open-sourced, data serialization format. 

But why use this? 

Because it is language agnostic, platform-neutral. And in comparison to text based format like json, it is extremely fast and compact. Lastly, it eases your development and maintenance endeavors too.

How easy is it to work with? 

  1. Start by defining your DTO (Data Transfer Object) to be passed over the network in a `.proto` file. 
  2. Use protobuf compiler to generate corresponding source code of the DTO in language of your choice (protobuf supports many popular languages). 
  3. You are now ready to use this generated source code in your project. 
  4. If you come across a need to change the DTO structure, repeat the process 1-3 again.

When to use it?

Now I prefer REST/JSON format for most occasions to maximize the advantages of HTTP infrastructure. And then there are occasions, I prefer using this one - when the traffic volume is high, latency and network bandwidth matters, as in the case of stock broking where the server sends stock ticks to the millions of interested subscribers. In such case, companies address this issue using websockets either by text messages (like trimmed json keys, or sending csv strings in predefined format) or by binary messages (like sending data in pre-defined order over byte array for maximum performance). The smartest choice in such cases is to leverage ProtoBuf that gives the benefit of binary transmission  for the data is not serialized to text and the benefit of publishing the API contract with the end-user by sharing the `.proto` files that is human readable (and the end-user as well can generate DTO from it in their own language of choice). This is a win-win and shows how much obsessed you are with your end-user in building developer-friendly APIs.

ProtoBuf is cleaner way than legacy way of packing data to be sent over array as `byte[]`. ProtoBuf schema (the `.proto` file) is the data-contract.

Caveats?

  1. Got a Byte or a set of individual bytes that are to be packed along with other datatypes (eg., [Byte, Int, Byte, Float, String]), then ProtoBuf is not for you. There is a simple work-around to it - consider using `int` instead of `byte` for wire-transfer, as protobuf schema doesn't support it.
  2. Protobuf doesn't support sophisticated enums like in java where enums can have additional values of primitive data-types like int, float, string, etc associated with it. It supports basic Enums only where its first value begins with Zero and this cannot be customized. The work-around this is to suppress its class generation and see how you can leverage your own definition of Enums in its place by adding some methods that generated classes define, so that your Enum definition works well with generated code. Alternatively, segregate all generated code in its own namespace, which is the default behavior in that protobuf creates Outer/Wrapper classes to avoid naming conflict with existing code.

How easy is it to get started and get going?

It actually is very easy. Besides that you can do a variety of customizations to it that I shall present as good practices to follow in general as listed below:

  • While you can bundle all DTOs as messages in one `.proto` file, I would recommend you have each defined in separate file that is named after it.
  • Define all `.proto` files under your project directory and have the automated make file (like maven/gradle in java) generate the class file from it and add to class path. 
  • Automate publishing these `.proto` files, in some shared location that is accessible to your API end-users.
  • Some protobuf files can be opted out from code generation and be used for semantic purposes. This is especially useful, when you are the end-user of an API whose protobuf files you are using to generate classes but don't want to generate all the DTOs as defined in the data contract.
  •  It supports nested datatypes, arrays, maps etc.
The cheatsheet below should serve as handy reference guide


Happy maintenance free scaling!