RFC-0021

codec - serialization and deserialization

Status: draft | Target: α2

RFC-0013: codec

summary

codecs are records that encode values to bytes and decode bytes to values. the design separates the what (the codec interface) from the how (format implementations), enabling format-agnostic code and user-defined formats.

motivation

serialization is everywhere:

api responses (json)
configuration files (toml, yaml)
binary protocols (protobuf, msgpack)
database storage
ipc and rpc

most languages couple serialization to specific formats or require runtime reflection. ferrule can do better:

format-agnostic code: write once, serialize to any format
no runtime reflection: derive at compile time
user-extensible: implement your own formats
zero-copy where possible: views into buffers
streaming support: don't require full buffers

what's wrong with existing approaches

serde (rust):

excellent design, but proc macros are complex
attribute soup for customization
compile times suffer

json libraries (most languages):

format-specific, can't switch to msgpack easily
often use runtime reflection (slow, no compile-time safety)

protobuf/flatbuffers:

require external schema files and codegen
not native to the language

ferrule's approach: codecs are just records. derive generates them at comptime. no macros, no external tools.

detailed design

core types

/// encodes T to bytes
type Encoder<T> = {
    encode: (T, Writer) -> Unit error EncodeError,
};

/// decodes T from bytes  
type Decoder<T> = {
    decode: (Reader) -> T error DecodeError,
};

/// bidirectional codec
type Codec<T> = Encoder<T> & Decoder<T>;

/// writer abstraction (buffer, stream, etc.)
type Writer = {
    write: (View<u8>) -> Unit error IoError,
    write_byte: (u8) -> Unit error IoError,
};

/// reader abstraction
type Reader = {
    read: (usize) -> View<u8> error IoError,
    read_byte: () -> u8 error IoError,
    peek: (usize) -> View<u8> error IoError,
};

format markers

formats are zero-size types used for disambiguation:

type Json;
type Msgpack;
type Toml;
type Bincode;

these enable multiple codecs per type:

const User.json: Codec<User> = { ... };
const User.msgpack: Codec<User> = { ... };
const User.bincode: Codec<User> = { ... };

primitive codecs

stdlib provides codecs for primitives:

// json format
const i32.json: Codec<i32> = json.int_codec();
const String.json: Codec<String> = json.string_codec();
const Bool.json: Codec<Bool> = json.bool_codec();

// msgpack format
const i32.msgpack: Codec<i32> = msgpack.int_codec();
const String.msgpack: Codec<String> = msgpack.string_codec();

derived codecs

use derive to generate codecs at comptime:

type User = derive(Codec<Json>, Codec<Msgpack>) {
    id: u64,
    name: String,
    email: String,
    age: u32?,  // optional field
};

// generates:
// const User.json: Codec<User> = { ... };
// const User.msgpack: Codec<User> = { ... };

field customization

attributes control serialization behavior:

type ApiResponse = derive(Codec<Json>) {
    @rename("user_id")
    id: u64,
    
    @skip_if_none
    metadata: Metadata?,
    
    @flatten
    common: CommonFields,
    
    @rename_all("camelCase")
    user_data: UserData,
};

available attributes:

@rename("name") - use different name in output
@skip - never serialize this field
@skip_if_none - omit if None
@flatten - inline nested struct fields
@default(value) - use default if missing on decode
@rename_all("style") - camelCase, snake_case, etc.

manual implementation

for custom logic, implement the codec manually:

const SpecialType.json: Codec<SpecialType> = {
    encode: function(value: SpecialType, w: Writer) -> Unit error EncodeError {
        // custom encoding logic
        w.write(value.custom_format());
    },
    decode: function(r: Reader) -> SpecialType error DecodeError {
        // custom decoding logic
        const bytes = r.read(EXPECTED_SIZE);
        return SpecialType.parse(bytes);
    },
};

usage

// encode to bytes
const bytes = json.encode(user, User.json);

// decode from bytes
const user = check json.decode(bytes, User.json);

// with writer/reader (streaming)
json.encode_to(user, User.json, writer);
const user = check json.decode_from(User.json, reader);

format implementations

each format implements the encoding/decoding logic:

// json module
pub function encode<T>(value: T, codec: Codec<T>) -> Bytes {
    const buffer = buffer.new();
    const writer = buffer.writer();
    codec.encode(value, writer);
    return buffer.to_bytes();
}

pub function decode<T>(bytes: Bytes, codec: Codec<T>) -> T error DecodeError {
    const reader = bytes.reader();
    return codec.decode(reader);
}

streaming and incremental

codecs work with readers/writers, enabling:

// stream json array without buffering entire thing
function stream_users(users: Iterator<User>, w: Writer, cap io: Io) -> Unit error IoError {
    json.array_start(w);
    for user in users {
        json.encode_element(user, User.json, w);
    }
    json.array_end(w);
}

error handling

decode errors are precise:

error DecodeError {
    UnexpectedType { expected: String, got: String, path: String },
    MissingField { name: String, path: String },
    InvalidValue { message: String, path: String },
    UnexpectedEnd,
    InvalidUtf8 { position: usize },
}

the path field tracks json path (e.g., $.users[0].email).

schema introspection (future)

codecs can expose their schema for documentation/validation:

type Schema = 
    | Object { fields: Array<FieldSchema> }
    | Array { items: Schema }
    | String
    | Number
    | Boolean
    | Null
    | OneOf { variants: Array<Schema> };

type SchemaProvider<T> = {
    schema: () -> Schema,
};

// for openapi generation
const User.schema: Schema = User.json.schema();

drawbacks

derive requires comptime (α2)
attribute syntax adds complexity
multiple format codecs per type could be confusing
streaming api is more complex than simple to_json()

alternatives

runtime reflection

use runtime type info for serialization.

rejected: slow, no compile-time guarantees, doesn't fit ferrule's philosophy.

schema-first (protobuf style)

define schemas in separate files, generate code.

rejected: external tooling, extra build step, not ergonomic.

single format

just support json, add others later.

considered: simpler initially, but the abstraction is worth it for format-agnostic code.

trait-based (rust serde)

use trait system with compiler magic.

not applicable: ferrule uses records, not traits. but the concept is similar.

prior art

system	approach	lesson
rust serde	trait + derive macro	excellent abstraction, but complex macros
go encoding/json	struct tags + reflection	simple but runtime overhead
haskell aeson	typeclass + generics	clean but requires typeclass machinery
python pydantic	class + runtime validation	great dx, but runtime
typescript zod	schema-first	type inference is powerful

serde's visitor pattern is elegant. this proposal simplifies by using reader/writer directly.

unresolved questions

how to handle recursive types (tree structures)?
should there be a Codec<T, Format> with format as type parameter?
how to handle versioning (schema evolution)?
should derive be built-in or a stdlib comptime function?

future possibilities

binary format optimization (zero-copy, memory-mapped)
schema evolution and migration
json schema / openapi generation
graphql type generation
database orm integration
rpc stub generation (grpc, json-rpc)