· Isabel Frolick

Marshalling and Serialization

Serialization: Converting an object or data structure into a sequence of bytes (or another transferable format) so it can be stored or transmitted.

Example: turning a struct into a binary stream.

Marshalling: the process of transforming structured in-memory data (objects, structs, messages) into a portable, serialized representation suitable for transmission over a network or storage.

<br><br> Compared to serialization, marshalling is a broader concept that includes serialization of multiple parameters together into
one transferable representation. It ensures that complex function calls (with multiple arguments) can be
transmitted correctly across different systems.
<br><br> <strong>In distributed systems: </strong> Preparing data for transmission by converting data representation (serializing multiple parameters into a single representation) to be correctly interpreted by the receiving system.

<br> Distributed systems need marshalling because communicating parties may run on different architectures (endianness, word size, language runtimes, etc.), and data must be represented in a standard, agreed-upon format. Communication is done using something like <A href="distributedSystems_RPC.php"> Remote Procedural Calls </A>.

In distributed systems, marshalling is crucial:

prepares parameters to be sent over the network.
The receiver can unmarshal (deserialize) them to reconstruct the original data in its local representation.

ASN.1 (Abstract Syntax Notation One)

A standard way of encoding structured data.
is a formal standard that defines both abstract data models and multiple encoding rules — it’s older, widely used in telecom and security systems, and very flexible but sometimes annoyingly verbose
Uses Tag, Length, Value (TLV) format:
- Tag: type of data (e.g., integer, string, sequence).
- Length: number of bytes.
- Value: actual data.
Efficient because it uses binary encoding (more compact than text formats like JSON or XML).

Example: Interface/Rect(Point(2,300), Point(65537,2)) is serialized into a compact TLV sequence.

Protocol Buffers (Protobuf)

A Google-developed serialization format optimized for performance and cross-language compatibility. (widely adopted since the early 2000s, such good documentation)

Works by defining a schema in a .proto file.
A compiler (protoc) generates code in the target language (C++, Java, Python, etc.).

Example .proto file:


syntax = "proto3";
package E472Example;
message Person {
int32 id = 1;
string email = 2;
}

The numbers (1, 2) are field tags that uniquely identify fields in the binary encoding.

Generated code can serialize a Person object into binary and deserialize it back.

Features:

Cross-language: same .proto file works for many programming languages.
Compact: encodes data in binary (smaller than JSON or XML).
Fast: optimized serialization and deserialization.

Backwards/Forwards compatibility:

Old clients can ignore new fields.
New clients can handle missing fields gracefully.

Special Keywords:

required: Field must always be present (deprecated in proto3).
optional: Field may or may not appear.
repeated: Field can appear zero or more times (like an array or list).
oneof: Defines a set of alternative fields; only one may be set at a time.

Example with repeated + oneof:


message Person {
repeated string emails = 1;
oneof id_type {
    int32 user_id = 2;
    string username = 3;
}
}

Why Protobuf is better than JSON/XML

Binary, not text → smaller size and faster to parse.
Cross-platform → compiler generates efficient code for many languages.
Schema-based → ensures type safety and structure.
Efficient → avoids redundancy in field names (unlike JSON).

Compilation Flow


.proto file → protoc (compiler) → language-specific code (C++/Java/etc.)
            → compiled with project → executable with Protobuf support

This makes Protobuf (and serialization in general) a backbone of RPC systems and distributed architectures, since data must move seamlessly between heterogeneous systems.