Skip to main content
  1. Notes/

Marshalling and Serialization

marshalling

Serialization: Converting an object or data structure into a sequence of bytes (or another transferable format) so it can be stored or transmitted.

Example: turning a struct into a binary stream.

Marshalling: the process of transforming structured in-memory data (objects, structs, messages) into a portable, serialized representation suitable for transmission over a network or storage.

<br><br> Compared to serialization, marshalling is a broader concept that includes serialization of multiple parameters together into
one transferable representation. It ensures that complex function calls (with multiple arguments) can be
transmitted correctly across different systems.
<br><br> <strong>In distributed systems: </strong> Preparing data for transmission by converting data representation (serializing multiple parameters into a single representation) to be correctly interpreted by the receiving system.

<br> Distributed systems need marshalling because communicating parties may run on different architectures (endianness, word size, language runtimes, etc.), and data must be represented in a standard, agreed-upon format. Communication is done using something like <A href="distributedSystems_RPC.php"> Remote Procedural Calls </A>.

In distributed systems, marshalling is crucial:

  • prepares parameters to be sent over the network.
  • The receiver can unmarshal (deserialize) them to reconstruct the original data in its local representation.

ASN.1 (Abstract Syntax Notation One)

  • A standard way of encoding structured data.
  • is a formal standard that defines both abstract data models and multiple encoding rules — it’s older, widely used in telecom and security systems, and very flexible but sometimes annoyingly verbose
  • Uses Tag, Length, Value (TLV) format:
    • Tag: type of data (e.g., integer, string, sequence).
    • Length: number of bytes.
    • Value: actual data.
  • Efficient because it uses binary encoding (more compact than text formats like JSON or XML).

Example: Interface/Rect(Point(2,300), Point(65537,2)) is serialized into a compact TLV sequence.


    
      ASN-1

Protocol Buffers (Protobuf)

A Google-developed serialization format optimized for performance and cross-language compatibility. (widely adopted since the early 2000s, such good documentation)

  • Works by defining a schema in a .proto file.
  • A compiler (protoc) generates code in the target language (C++, Java, Python, etc.).

Example .proto file:


syntax = "proto3";
package E472Example;

message Person { int32 id = 1; string email = 2; }

The numbers (1, 2) are field tags that uniquely identify fields in the binary encoding.

Generated code can serialize a Person object into binary and deserialize it back.

Features:

  • Cross-language: same .proto file works for many programming languages.
  • Compact: encodes data in binary (smaller than JSON or XML).
  • Fast: optimized serialization and deserialization.

Backwards/Forwards compatibility:

  • Old clients can ignore new fields.
  • New clients can handle missing fields gracefully.

Special Keywords:

  • required: Field must always be present (deprecated in proto3).
  • optional: Field may or may not appear.
  • repeated: Field can appear zero or more times (like an array or list).
  • oneof: Defines a set of alternative fields; only one may be set at a time.

Example with repeated + oneof:


message Person {
repeated string emails = 1;
oneof id_type {
    int32 user_id = 2;
    string username = 3;
}
}

Why Protobuf is better than JSON/XML

  • Binary, not text → smaller size and faster to parse.
  • Cross-platform → compiler generates efficient code for many languages.
  • Schema-based → ensures type safety and structure.
  • Efficient → avoids redundancy in field names (unlike JSON).

Compilation Flow


.proto file → protoc (compiler) → language-specific code (C++/Java/etc.)
            → compiled with project → executable with Protobuf support

This makes Protobuf (and serialization in general) a backbone of RPC systems and distributed architectures, since data must move seamlessly between heterogeneous systems.

Computer Vision

Overview of Computer Vision

Overview of Computer Vision

Core concepts in computer vision and machine learning

cv ml
History of Computer Vision

History of Computer Vision

How computer vision evolved through feature spaces

cv
ImageNet Large Scale Visual Recognition Challenge

ImageNet Large Scale Visual Recognition Challenge

ImageNet's impact on modern computer vision

cv ml
Region-CNNs

Region-CNNs

Traditional ML vs modern computer vision approaches

ml cv

Distributed Systems

Overview of Distributed Systems

Overview of Distributed Systems

Fundamentals of distributed systems and the OSI model

distributed-systems
Distributed Systems Architectures

Distributed Systems Architectures

Common design patterns for distributed systems

distributed-systems
Dependability & Relevant Concepts

Dependability & Relevant Concepts

Reliability and fault tolerance in distributed systems

distributed-systems
Marshalling

Marshalling

How data gets serialized for network communication

distributed-systems
RAFT

RAFT

Understanding the RAFT consensus algorithm

distributed-systems
Remote Procedural Calls

Remote Procedural Calls

How RPC enables communication between processes

distributed-systems
Servers

Servers

Server design and RAFT implementation

distributed-systems
Sockets

Sockets

Network programming with UDP sockets

distributed-systems

Machine Learning (Generally Neural Networks)

Anatomy of Neural Networks

Anatomy of Neural Networks

Traditional ML vs modern computer vision approaches

ml cv
LeNet Architecture

LeNet Architecture

The LeNet neural network

ml cv
Principal Component Analysis

Principal Component Analysis

Explaining PCA from classical and ANN perspectives

data ml

Cryptography & Secure Digital Systems

Symmetric Cryptography

Symmetric Cryptography

covers MAC, secret key systems, and symmetric ciphers

cryptography
Hash Functions

Hash Functions

Hash function uses in cryptographic schemes (no keys)

cryptography
Public-Key Encryption

Public-Key Encryption

RSA, ECC, and ElGamal encryption schemes

cryptography
Digital Signatures & Authentication

Digital Signatures & Authentication

Public-key authentication protocols, RSA signatures, and mutual authentication

cryptography
Number Theory

Number Theory

Number theory in cypto - Euclidean algorithm, number factorization, modulo operations

cryptography
IPSec Types & Properties

IPSec Types & Properties

Authentication Header (AH), ESP, Transport vs Tunnel modes

cryptography