c++ – Binary IO inspired by struct pack

I’m writing a header-only template library for helping with binary IO. It is inspired by pack-d, which is in turn inspired by Python’s struct module. I’ve only finished the output part and posted it here for review. You can find the source (~300 lines of code) at the bottom of the post.

It’s basically like a printf for binary IO and packs its arguments into a byte stream depending on the format string. The stream can be a memory stream backed by a std::vector of bytes, or represent packing directly to a file, however you wish. It supports extension for user types.

For implementation I’ve used c++20 and modern C++ features (such as concepts and constexpr if). The code compiles without warnings using g++ 10.2 and clang++ 11 with flags -std=c++20 -Wall -Wpedantic -Wextra. There are no dependencies outside of the standard library.

I’m writing it primarily for learning purposes, and for myself to use; though I’d be happy to publish it if others find it useful. I like this interface, and I didn’t come across anything similar for C++ except this which is for C, and uses C-style variadics instead of variadic templates, and I don’t think it’s extendable for user types.

I am interested in any constructive criticism or opinion you have to offer. The code works for the simple cases I’ve thrown at it, but it is possible it is buggy for complicated ones. Also one of my goals is for the code generated by template expansion to be as close as possible performance-wise to hand-written code based on fwrite. I’m also interested if any other modern C++ features would fit this kind of thing better than the way I’ve chosen to implement it.

// 6 byte string followed by null-byte
ctx.packf("6Bx", "Hello!"); // --> "Hello"

// You can control endianness
// (also whitespace in the format string is ignored)
ctx.packf(">I 3x <I", 0xdeadbeef, 0xbaadf00d); // --> 0x deadbeef 00 00 00 0df0adba

// You can pack structs raw
struct S {
    uint8_t  x;
    uint32_t y;
};

// Pad bytes are packed, and endianness option has no effect
ctx.packf(">r", S{ 0xff, 0xbaadc0de }); // --> 0x ff __ __ __ de c0 ad ba

// But you can also extend the system by specializing pack_custom
void pack_custom(auto& ctx, const S& s) {
    ctx.packf("BI", s.x, s.y);
}

// And then use generic packing
ctx.packf(">g", S{ 0xff, 0xbaadc0de }); // --> 0x ff ba ad c0 de

// Arrays are supported, by supplying either a constant width (see first example)
// or passing a width as a parameter
std::vector<S> v{ S{ 0xaa, 0xbaadc0de }, S{ 0xbb, 0xdeadbeef } };

ctx.packf(">*g", v.size(), v.data()); // --> 0x aa baadc0de bb deadbeef

// Or you can just specialize pack_custom for vectors as well
void pack_custom(auto& ctx, const std::vector<auto>& v) {
    // First pack the size as a 32-bit int, then the data generically
    ctx.packf("I*g", v.size(), v.size(), v.data());
}

// Then just call
ctx.packf(">g", v); // --> 0x 00000002 aa baadc0de bb deadbeef
ctx.packf("<g", v); // --> 0x 02000000 aa dec0adba bb efbeadde

// For basic types, "g" defaults to infering the type
// and packing as you'd expect for arithmetic types
// and packing raw for structs that don't have pack_custom
uint16_t a = 0xdead;
uint16_t b = 0xbeef;
ctx.packf("<gg", a, b); // --> 0x adde efbe

// And if all your format specifier are generic you can drop the format string
ctx.packf("gg", a, b)
// is the same as
ctx.pack(a, b)
// as well as this (you have to cast the literals in this case though)
ctx.pack((uint16_t)0xdead, (uint16_t)0xbeef)

// There's also  ctx.packle(...)  and  ctx.packbe(...)  for specifying endianness
// without the use of a format string.

The user needs to have a stream type that implements a fwrite-style method. A context is then created to wrap this stream type. All the packing functions are methods of the context object.

struct File_Stream {
    FILE *fp;

    size_t write(const void *ptr, size_t size, size_t nmemb) {
        return fwrite(ptr, size, nmemb, fp);
    }
};


FILE *fp = fopen("test.bin", "wb");
pack::Context ctx(File_Stream{fp});
// Use pack functions here
// ...
fclose(fp);

The standard packing for arithmetic types is a case like packf("4B", b), and the meaning depends on the type of b.

  • If b is a uint8_t or convertible to uint8_t, it means pack b (converted if necessary to uint8_t), repeating it 4 times.
  • If b is a pointer to a uint8_t or to a type convertible to uint8_t it treats b as an array and packs 4 of its elements (converted if necessary to uint8_t)

Both cases are handled by the function pack_helper<TPack_As, T>, where TPack_As depends on the format specifier (in this case it’s uint8_t because of the B), and T is whatever the type of b is.

If necessary, standard packing will also do endian conversion.

The function pack_raw handles the packing of raw data, and the function pack_generic decides which packing mode to use (custom, raw, or standard) in the generic case.

Whenever possible, if no endian conversion is necessary for example, the implementation tries to pack arrays of data to the stream all at once instead of one at a time (see the implementation of pack_array_endian).

The error handling is a WIP. For now I just throw runtime_error or logic_error whenever an error occurs. I plan to create my own exception classes which will also carry context information (such as how many bytes were packed when the error occured, and how many parameters were parsed).

I am also considering implementing convenience functions such as:

               void fpackf(FILE *fp, const char *format, const auto& value, auto... args);
    vector<uint8_t> vpackf(          const char *format, const auto& value, auto... args);

which wrap context creation for typical usage.

#pragma once
#include <type_traits>
#include <concepts>
#include <stdexcept>
#include <bit>
#include <string>

namespace pack {

using std::runtime_error;
using std::logic_error;
using std::convertible_to;
using std::same_as;
using std::remove_reference_t;
using Endian = std::endian;

// Utility concepts
template <typename T> concept Pointer    = std::is_pointer<T>::value;
template <typename T> concept Array      = std::is_array<T>::value;
template <typename T> concept Arithmetic = std::is_arithmetic<T>::value;

// Pack requires a fwrite-like interface
template <typename TStream>
concept Stream = requires(TStream stream, const void *ptr, size_t size, size_t nmemb) {
    { stream.write(ptr, size, nmemb) } -> convertible_to<size_t>;
};

// The extension mechanism
template <typename TContext, typename T>
concept Custom_Packable = requires(TContext& ctx, const T& a) {
    { pack_custom(ctx, a) };
};

template <Stream TStream>
struct Context {
private:
    TStream& stream;
    Endian   endian = Endian::native;
    size_t   len    = 1; // Repetition/array count

public:
    Context(TStream& s): stream(s) {}

private:
    void skip() {
        uint8_t x = 0;
        for (size_t i=0; i<len; ++i) {
            stream.write(&x, 1, sizeof x);
        }
    }

    void pack_value(const auto& value) {
        size_t n = stream.write(&value, sizeof value, 1);
        if (n != 1) {
            throw runtime_error("Failed writing to stream.");
        }
    }

    void pack_array(const auto& array) {
        size_t n = stream.write(array, sizeof array(0), len);
        if (n != len) {
            throw runtime_error("Failed writing to stream.");
        }
    }

    // Just packs raw data.
    // Will not consider pad bytes inside structs or endianness conversions.
    template <typename T>
    void pack_raw(const T& data) {
        if constexpr (Pointer<T> || Array<T>) {
            pack_array(data);
        } else {
            pack_value(data);
        }
    }

    template <Arithmetic TFrom, Arithmetic TTo>
    TTo transmute(TFrom x) {
        return *(reinterpret_cast<TTo*>(&x));
    }

    template <Arithmetic T>
    void pack_value_endian(const T& value) {
        if (endian != Endian::native) {
            if constexpr (sizeof(T) == 1) {
                pack_value(value);
            } else if constexpr (sizeof(T) == 2) {
                pack_value(__builtin_bswap16(transmute<T,uint16_t>(value)));
            } else if constexpr (sizeof(T) == 4) {
                pack_value(__builtin_bswap32(transmute<T,uint32_t>(value)));
            } else if constexpr (sizeof(T) == 8) {
                pack_value(__builtin_bswap64(transmute<T,uint64_t>(value)));
            } else {
                throw logic_error("Cannot perform endian conversion on this type.");
            }
        } else {
            pack_value(value);
        }
    }

    template <typename T>
    void pack_array_endian(const T& array) {
        if (endian != Endian::native) {
            for (size_t i=0; i<len; ++i) {
                pack_value_endian(array(i));
            }
        } else {
            pack_array(array);
        }
    }

    template<Arithmetic TPack_As, typename T>
    void pack_helper(const T& data) {
        if constexpr (convertible_to<T, TPack_As>) {
            for (size_t i=0; i<len; ++i) {
                pack_value_endian(static_cast<TPack_As>(data));
            }
        } else if constexpr (same_as<T, TPack_As*> || same_as<T, TPack_As()>) {
            pack_array_endian(data);
        } else if constexpr (Pointer<T> || Array<T>) {
            using TElem = decltype(data(0));
            if constexpr (convertible_to<TElem, TPack_As>) {
                for (size_t i=0; i<len; ++i) {
                    pack_value_endian(static_cast<TPack_As>(data(i)));
                }
            } else {
                throw logic_error("Type not convertible.");
            }
        } else {
            throw logic_error("Type not convertible.");
        }
    }

    // Decides whether to use pack_raw, pack_helper or pack_custom
    template <typename T>
    void pack_generic(const T& data) {
        using TContext = remove_reference_t<decltype(*this)>;
        if constexpr (Arithmetic<T>) {
            pack_helper<T>(data);
        } else if constexpr (Custom_Packable<TContext, T>) {
            for (size_t i=0; i<len; ++i) {
                size_t len_save    = len;     // Save state prior to pack_custom call
                Endian endian_save = endian;
                len = 1;                      // Reset len to 1 for pack_custom
                pack_custom(*this, data);
                len    = len_save;            // Restore state prior to pack_custom call
                endian = endian_save;
            }
        } else if constexpr (Pointer<T> || Array<T>) {
            using TElem = remove_reference_t<decltype(data(0))>;
            if constexpr (Arithmetic<T>) {
                pack_helper<TElem>(data);
            } else if constexpr (Custom_Packable<TContext, TElem>) {
                for (size_t i=0; i<len; ++i) {
                    size_t len_save    = len;     // Save state
                    Endian endian_save = endian;
                    len = 1;                      // Reset len
                    pack_custom(*this, data(i));
                    len    = len_save;            // Restore state
                    endian = endian_save;
                }
            } else {
                pack_raw(data);
            }
        } else {
            pack_raw(data);
        }
    }

public:
    void pack() {}

    void pack(const auto& value, auto... args) {
        pack_generic(value);
        pack(args...);
    }

    void packle(const auto& value, auto... args) {
        endian = Endian::little;
        pack_generic(value);
        pack(args...);
    }

    void packbe(const auto& value, auto... args) {
        endian = Endian::big;
        pack_generic(value);
        pack(args...);
    }

    void packne(const auto& value, auto... args) {
        if (Endian::native == Endian::little) {
            endian = Endian::big;
        } else {
            endian = Endian::little;
        }

        pack_generic(value);
        pack(args...);
    }

    void packf(const char *format) {
        while (*format) {
            switch (*format) {
                case '>': endian = Endian::big;    format++; continue;
                case '<': endian = Endian::little; format++; continue;
                case '=': endian = Endian::native; format++; continue;
                case '!': endian = (Endian::native == Endian::little)
                              ? Endian::big
                              : Endian::little;
                          format++; continue;
                case '0': case '1': case '2':
                case '3': case '4': case '5':
                case '6': case '7': case '8':
                case '9': {
                    char *endptr;
                    len = static_cast<size_t>( strtol(format, &endptr, 10) );
                    format = endptr;
                    continue;
                }
                case ' ': case 't': case 'n': ++format; continue;
                case 'B': throw logic_error("Not enough arguments for format string.");
                case 'H': throw logic_error("Not enough arguments for format string.");
                case 'I': throw logic_error("Not enough arguments for format string.");
                case 'L': throw logic_error("Not enough arguments for format string.");
                case 'b': throw logic_error("Not enough arguments for format string.");
                case 'h': throw logic_error("Not enough arguments for format string.");
                case 'i': throw logic_error("Not enough arguments for format string.");
                case 'l': throw logic_error("Not enough arguments for format string.");
                case 'f': throw logic_error("Not enough arguments for format string.");
                case 'd': throw logic_error("Not enough arguments for format string.");
                case 'r': throw logic_error("Not enough arguments for format string.");
                case 'g': throw logic_error("Not enough arguments for format string.");
                case '*': throw logic_error("Not enough arguments for format string.");
                case 'x': skip(); break;
                default:
                    throw logic_error("Unexpected character (" + std::string{*format} + ") in format string.");
            }
            len = 1;
            packf(++format);
            return;
        }
    }

    void packf(const char *format, const auto& value, auto... args) {
        while (*format) {
            switch (*format) {
                case '>': endian = Endian::big;    format++; continue;
                case '<': endian = Endian::little; format++; continue;
                case '=': endian = Endian::native; format++; continue;
                case '!': endian = (Endian::native == Endian::little)
                              ? Endian::big
                              : Endian::little;
                          format++; continue;
                case '0': case '1': case '2':
                case '3': case '4': case '5':
                case '6': case '7': case '8':
                case '9': {
                    char *endptr;
                    len = static_cast<size_t>( strtol(format, &endptr, 10) );
                    format = endptr;
                    continue;
                }
                case ' ': case 't': case 'n': ++format; continue;
                case 'B': pack_helper<uint8_t> (value); break;
                case 'H': pack_helper<uint16_t>(value); break;
                case 'I': pack_helper<uint32_t>(value); break;
                case 'L': pack_helper<uint64_t>(value); break;
                case 'b': pack_helper<int8_t>  (value); break;
                case 'h': pack_helper<int16_t> (value); break;
                case 'i': pack_helper<int32_t> (value); break;
                case 'l': pack_helper<int64_t> (value); break;
                case 'f': pack_helper<float>   (value); break;
                case 'd': pack_helper<double>  (value); break;
                case 'r': pack_raw             (value); break;
                case 'g': pack_generic         (value); break;
                case 'x': skip(); len=1; format++; continue;
                case '*': {
                    if constexpr (convertible_to<decltype(value), size_t>) {
                        len = static_cast<size_t>(value);
                        packf(++format, args...);
                        return;
                    } else {
                        throw logic_error("Invalid size type.");
                    }
                }
                default:
                    throw logic_error("Unexpected character (" + std::string{*format} + ") in format string.");
            }
            len = 1;
            packf(++format, args...);
            return;
        }
        throw logic_error("Too many arguments for format string.");
    }
};

} // end namespace pack