NSF-Funded Cyberinfrastructure

Distributed Shared
Tiered Log Store

Activity data describe things that happen rather than things that are.

ChronoLog is a distributed log storage ecosystem that uses physical time as the ordering mechanism -- eliminating centralized sequencers and enabling auto-tiered storage across multiple layers. Built to capture the velocity and variety of modern activity data: from scientific instruments producing terabytes per second to AI agent audit trails.

NSF CSSI FundedC++ & Python APIsPlugin EcosystemMCP for AI AgentsDocker AvailableBSD-2 License

Two Key Innovations

ChronoLog rethinks distributed logging from first principles, introducing two ideas that set it apart from existing systems.

Physical Time Ordering — ChronoLog uses physical time as the natural ordering mechanism

Physical Time as the Ordering Mechanism

Traditional distributed logs rely on centralized sequencers or locking protocols to order entries -- creating bottlenecks at scale. ChronoLog uses physical time itself as the natural ordering principle, enabling lock-free concurrent writes with immediate entry visibility and no coordination overhead.

Automatic Multi-Tier Storage — data flows from hot to warm to cold tiers

Automatic Multi-Tier Storage

Log data flows automatically from fast ingestion nodes through intermediate tiers to persistent archival storage -- balancing access latency with capacity. This 3D distribution (horizontal across nodes, vertical across tiers, temporal by time) enables elastic capacity scaling without manual data management.

An Extensible Platform

ChronoLog is designed as a foundation that other systems build upon. Its plugin architecture enables diverse workloads through a common log infrastructure.

SQL Query Plugin

Query log data with familiar SQL semantics. Time-based ordering enables efficient range scans without auxiliary indices.

Pub/Sub & Streaming

Publish-subscribe patterns and real-time streaming built on ChronoGrapher's DAG pipeline. Used internally by IOWarp.

Key-Value Store

ChronoKVS provides time-series key-value semantics on top of the ordered log, with built-in consistency guarantees.

ML & Training

TensorFlow integration for feeding time-ordered data streams directly into training and inference pipelines.

ChronoLog for AI Agent Memory

AI agents need persistent, time-ordered memory. ChronoLog provides exactly that -- a distributed log that serves as the shared memory backend for autonomous agents, LLM-based systems, and agentic workflows. Every agent action, observation, and reasoning step can be recorded as a time-stamped chronicle entry, creating a queryable audit trail across conversations and sessions.

The official ChronoLog MCP server, part of the IOWarp Agent Toolkit, exposes chronicle operations as Model Context Protocol tools -- so any MCP-compatible AI agent can create logs, record events, replay history, and query by time range natively.

ChronoLog MCP Server

// Agent memory via MCP

tools:

chronicle.create - new memory log

chronicle.record - store event

chronicle.replay - retrieve history

chronicle.query - search by time

// Use cases

- Conversation logging

- Agent audit trails

- Cross-session memory

- System monitoring

Design Philosophy

Existing distributed log systems (Kafka, BookKeeper, Corfu) were designed for different eras and constraints. ChronoLog explores a fundamentally different point in the design space -- one where time itself provides ordering, tiering provides capacity, and plugins provide versatility.

This isn't about replacing existing systems. It's about understanding what becomes possible when you rethink the log abstraction from the ground up for modern HPC, scientific, and AI workloads.

Explore the architecture

ChronoLog's Approach

Lock-free log tail -- physical time eliminates contention at the append point
I/O isolation -- writers and readers operate on separate paths
Elastic capacity -- auto-tiering grows storage without manual intervention
No hot zones -- time-chunked distribution avoids hotspots
Immediate visibility -- entries are readable the moment they are written
Zero per-entry overhead -- no metadata tax on individual log entries

Software Architecture

Five distributed services form a pipeline from ingestion to archival. Data flows through physical-time-ordered tiers automatically -- no manual data movement required.

Client App

C++ or Python application using libchronolog to create chronicles, record events, and replay history.

ChronoVisor

Central coordination: chronicle metadata, client connections, and distributed clock synchronization across all nodes.

ChronoKeeper

Hot

Fast ingestion on compute nodes via RDMA. Serves record() and real-time playback() with microsecond latency.

ChronoGrapher

Warm

DAG pipeline: event collection, story building, and continuous flushing to lower tiers. Real-time and elastic.

ChronoStore

Cold

Persistent archival in HDF5 containers. Elastic capacity with device-aware access optimization.

ChronoPlayer

All tiers

Reads span all tiers transparently. ChronoPlayer serves replay() requests from hot, warm, or cold storage and merges results into a single time-ordered stream. Fully decoupled from the write path.

C++17

Core implementation

RDMA

Zero-copy transport

HDF5

Persistent backend

Docker

Containerized deployment

Deep dive into the architecture

Part of a Broader Ecosystem

ChronoLog serves as foundational infrastructure for other projects, AI systems, and research frameworks.

IOWarp

Internal logging and pub/sub abstractions

AI Agents

MCP-based agent memory and audit trails

Parsl Workflows

Task execution logging and provenance

DOE Labs

Available at national lab partner sites

IFSH Genomics

Genomic sequencing pipeline data at IIT

Community & Adoption

ChronoLog's impact extends beyond its codebase. From graduate classrooms to national lab clusters, the project has a growing community of researchers, students, and collaborators.

300+

Attendees at ChronoLog presentations, webinars, and classroom sessions

2,000+

GitHub visitors per year, with growing adoption across research communities

15+

PhD researchers using ChronoLog as a backend for distributed logging

5+

Cluster installations, including GRC and DOE partner sites

3

Active community domains: FaaS, scientific data repositories, and system telemetry

Taught in the Classroom

Used as teaching infrastructure in distributed systems and HPC graduate courses at IIT. Students learn real-world distributed log design using ChronoLog's APIs and deployment tools.

Cross-Disciplinary Research

Beyond HPC, ChronoLog has been applied in nutrition analysis (with IIT's Department of Food Science) and genomic sequencing pipelines, demonstrating its versatility across domains.

Deployed on Clusters

Active installations on the GRC research cluster and available at DOE partner sites. Used for continuous research workloads including HPC system monitoring and provenance tracking.

Selected Publications

Peer-reviewed research behind ChronoLog, published at top-tier HPC, systems, and parallel computing venues.

MSST 2020

ChronoLog: A Distributed Shared Tiered Log Store with Time-based Data Ordering

A. Kougkas, H. Devarajan, K. Bateman, J. Cernuda, N. Rajesh, X.-H. Sun

36th Intl. Conference on Massive Storage Systems and Technology

PDF
SC'24

MegaMmap: Blurring the Boundary Between Memory and Storage for Data-Intensive Workloads

L. Logan, A. Kougkas, X.-H. Sun

Intl. Conference for High Performance Computing, Networking, Storage, and Analysis

DOI
CLUSTER'24

DaYu: Optimizing Distributed Scientific Workflows by Decoding Dataflow Semantics and Dynamics

M. Tang, J. Cernuda, J. Ye, L. Guo, et al.

IEEE Intl. Conference on Cluster Computing

DOI
IPDPS'25

Characterizing the Behavior and Impact of KV Caching on Transformer Inferences under Concurrency

J. Ye, J. Cernuda, A. Maurya, X.-H. Sun, A. Kougkas, B. Nicolae

39th IEEE Intl. Parallel & Distributed Processing Symposium

ICS'25

WisIO: Automated I/O Bottleneck Detection with Multi-Perspective Views for HPC Workflows

I. Yildirim, H. Devarajan, A. Kougkas, X.-H. Sun, K. Mohror

39th Intl. Conference on Supercomputing

View all publications & team

Get Involved

ChronoLog is open source and welcoming collaborators. Whether you're a researcher exploring distributed log abstractions, a developer building on the plugin ecosystem, or an institution looking for scalable logging infrastructure -- we'd like to hear from you.

NSF

Supported by a $4M National Science Foundation CSSI Grant

NSF CSSI-2104013

Gnosis Research Center