Recently I met a bug when I was using google protobuf to serialize and deserialize messages.
I have a code segment like this:
1 2 3 4 5 6 |
class MyMessage; // A message class generated by protobuf</code> void process() { MyMessage msg; std::string msg_buffer = some_other_func(); // receive binary format "string" msg.ParseFromString(msg_buffer.data()); } |
In the code, there is a weird usage of std::string::data()
and MyMessage::ParseFromString()
API. It’s a mistake due to my negligence. In the beginning, I use MyMessage::ParseFromArray(const char*)
API and the type of msg_buffer
is std::vector
, so I use std::vector::data()
to get a pointer in type of const char*
. When I was changing the parsing method from ParseFromArray
to ParseFromString
, I didn’t think much about the argument of the new API ParseFromString
.
These codes could pass compilation without any warning, even with compiler flag -Wall . So I didn’t realize that I made a mistake on the usage of the API. Then, the program crashed, which came from later codes that assume the msg object should have some specified data.
In fact, the API MyMessage::ParseFromString()
never accepts an argument like
const char *. Since
std::string has a non-explicit constructor
basic_string(const CharT *, const Allocator &), the argument is implicitly converted to an
std::string and passed into the function. However, the string constructed in this way is not what we want. Let’s see a simple example.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
#include <string> #include <cstdlib> #include <cstdio> size_t fake_strlen(const std::string & s) { return strlen(s.c_str()); } int main() { std::string fraud_str = "I'm\0a fraud."; printf("length: %lu\n", fake_strlen(fraud_str.data())); return 0; } |
Compile and run the code above and you will get the output: 3. But the string has 12 bytes actually. Since there is a terminate char ‘\0’ in the C-style string (pointed by const char*
), the newly constructed std::string
will end with “I’m” and hold non-expected data.