How it Works

ChronoLog is a distributed shared log for activity data with concurrent multi-writer and multi-reader access.

To understand ChronoLog, start with its foundation. A shared log is one of the most powerful abstractions in distributed systems: a durable data store, a consensus mechanism, an execution history for deterministic replay, and a data integration hub. ChronoLog provides this primitive at scale for HPC, scientific, and AI workloads.

It all starts with two core research ideas

Physical time as ordering

Events are ordered by physical timestamps —no centralized sequencer or locks. Multiple writers append concurrently from any node.

Automatic multi-tier storage

Data flows automatically from fast compute-node storage through intermediate tiers to long-term archival. Capacity scales elastically.

Explore the research behind these ideas →

What Makes ChronoLog Different

With these principles in place, what sets ChronoLog apart? Traditional distributed logs rely on centralized sequencers or consensus protocols to order events. ChronoLog uses physical time itself, which enables lock-free appends, immediate visibility, and elastic capacity, but it introduces three challenges that ChronoLog solves.

Clock Uncertainty

Different machines have different clock offsets and drift rates. ChronoLog synchronizes server nodes with ChronoVisor during initialization and periodically thereafter. Clients use ChronoTicks as relative time distances from a base clock, eliminating the need for globally synchronized wall clocks.

Late / Backdated Events

Network non-determinism means events may arrive after later events, violating ordering. ChronoLog defines an Acceptance Time Window (ATW), a moving window equal to twice the measured network latency, within which out-of-order events are gracefully absorbed and correctly ordered.

Timestamp Collisions

At coarser time granularities, multiple events may share the same ChronoTick. ChronoLog disambiguates using (clientId, index) pairs and configurable collision semantics: idempotent, redundancy, ordering, or sequentiality, chosen per workload.

The Result

Lock-free log tail — no contention at the append point
I/O isolation — writers and readers on separate paths
Elastic capacity — auto-tiering without manual intervention
No hot zones — time-based distribution avoids hotspots
Immediate visibility — entries readable the moment they are written
Zero per-entry overhead — no metadata tax on individual entries

For detailed comparisons with Kafka, BookKeeper, Corfu, and other systems, see the Research page.

Data Model

With the core ideas and their implications understood, let's look at how data is actually structured. ChronoLog organizes data through three core concepts: Events, Stories, and Chronicles. These abstractions define how data is logically structured.

1

Chronicle

A named collection of related Stories that share a common context or namespace. Chronicles are the top-level unit used to organize data and manage access.

2

Story

A logical, time-ordered stream of Events representing a single topic, task, or activity. Stories preserve the chronological order of every Event they contain.

3

Event

The smallest unit of data in ChronoLog — immutable, timestamped, and uniquely identified. Events are generated by clients and attributed to a specific Story.

The diagram below shows how these three concepts nest in a real-world example: a monitoring chronicle containing three sensor stories, each holding a sequence of events.

Chronicle "IIT_UC_Ares_Monitoring" Story "Test" E1 E5 E8 Story "Precipitation" E2 E4 E7 E10 Story "Temperature" E3 E6 E9 E11 E12 time t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12

Learn more in the docs →

Software Architecture

The data model defines what ChronoLog stores — the architecture defines how. Five distributed services form a pipeline from ingestion to archival. Data flows through time-ordered tiers automatically.

Client App

Your application. Uses libchronolog (C++ or Python) to create chronicles, record events, and replay history.

ChronoVisor

Central coordination: chronicle metadata, client connections, and distributed clock synchronization across all nodes.

ChronoKeeper

Hot

Fast ingestion on compute nodes via RDMA. Serves record() and real-time playback() with microsecond latency.

ChronoGrapher

Warm

DAG pipeline: event collection, story building, and continuous flushing to lower tiers. Real-time and elastic.

ChronoStore

Cold

Persistent archival in HDF5 containers. Elastic capacity with device-aware access optimization.

ChronoPlayer

All tiers

Reads span all tiers transparently. ChronoPlayer serves replay() requests from hot, warm, or cold storage and merges results into a single time-ordered stream. Fully decoupled from the write path.

C++17

Core implementation

RDMA

Zero-copy transport

HDF5

Persistent backend

Docker

Containerized deployment

Deep dive into each component in the docs →

Software Ecosystem

There is a lot going on both inside ChronoLog and around it. Its software ecosystem spans three layers: the core distributed services (ChronoVisor, ChronoKeeper, ChronoGrapher, ChronoPlayer, ChronoStore), a client library (libchronolog) providing the chronicle API in C++ and Python, and a plugin framework enabling higher-level data systems to be built on top of the shared log.

Core Services

Five distributed services forming the ingestion-to-archival pipeline. Implemented in C++17 with RDMA and TCP transport backends and HDF5 for persistent storage.

Client Library

libchronolog exposes the full chronicle API: connect, create chronicles, acquire stories, record events, and replay history. Available in C++ and Python.

Plugin Framework

A modular architecture that allows new plugins to be developed independently: SQL queries, streaming analytics, key-value stores, ML pipelines, and more.

Part of a broader ecosystem

ChronoLog serves as foundational infrastructure for other projects, AI systems, and research frameworks.

IOWarp

Internal logging and pub/sub abstractions

AI Agents

MCP-based agent memory and audit trails

Parsl Workflows

Task execution logging and provenance

DOE Labs

Available at national lab partner sites

IFSH Genomics

Genomic sequencing pipeline data at IIT

Explore all integrations → | Gnosis Research Center →

Dive Deeper

Now that you have the full picture, it's time to get hands-on. The documentation has everything else —architecture deep dives, API references, deployment guides, and tutorials.