The Science Behind ChronoLog

Modern scientific instruments, IoT networks, and AI systems generate massive volumes of activity data — things that happen rather than things that are.

Distributed log stores are the natural infrastructure for capturing, ordering, and retrieving this data. But existing systems face fundamental trade-offs between total ordering, concurrent access, and capacity scaling. ChronoLog explores a different point in the design space: using physical time itself as the ordering principle.

For a system overview and architecture walkthrough, see How it works. This page covers the research questions, theoretical foundations, and evaluated results.

Research Questions

The scientific questions driving ChronoLog's design and evaluation.

1

Can physical time replace central sequencers?

When can bounded-skew physical clocks provide total ordering without consensus protocols or centralized sequencers? What are the assumptions and trade-offs?

2

Total ordering with immediate visibility at scale

Can a distributed log guarantee both total event ordering and immediate read-after-write visibility without stalling writers or requiring global coordination?

3

3D data distribution

How to scale capacity and performance by distributing data horizontally across nodes, vertically across storage tiers, and temporally by time-bounded chunks?

4

Decoupled ingestion and persistence

How to separate fast write paths from durable archival without data loss, and what batching strategies optimize throughput while preserving ordering guarantees?

Dealing with Physical Time

Using physical time as the ordering mechanism is powerful but introduces three fundamental challenges. ChronoLog addresses each with specific mechanisms and formal guarantees.

Clock Model & Assumptions

ChronoLog assumes that node clocks are synchronized within a bounded skew using NTP or similar protocols. Rather than requiring globally accurate wall clocks, ChronoLog introduces ChronoTicks — relative time distances measured from a base clock established during initialization by ChronoVisor. This eliminates dependency on absolute wall-clock accuracy while preserving ordering guarantees within the bounded-skew envelope.

Node A Node B bounded skew (δ) CT₀ CT₁ CT₂ CT₃ time

Periodic re-synchronization ensures drift stays within bounds. The key invariant: if two events are separated by more than δ (the skew bound), their physical-time ordering is guaranteed correct across all nodes.

Acceptance Time Window (ATW)

Network non-determinism means events may arrive at ingestion nodes after later-timestamped events. The Acceptance Time Window is defined as twice the measured network latency (ATW = 2λ). Within this window, out-of-order events are absorbed and correctly positioned in the time-ordered sequence. After the window closes, the ordering becomes immutable.

Acceptance Time Window (2λ) immutable ordering E1 E2 E3 E4 E6 E7 E5 late arrival (absorbed) E? too late (rejected) time

The ATW creates a trade-off: a wider window absorbs more out-of-order events but delays the point at which ordering becomes immutable. ChronoLog sizes the ATW dynamically based on measured network conditions.

Collision Semantics

At coarser time granularities, multiple events from different writers may share the same ChronoTick. ChronoLog disambiguates using (clientId, index) pairs and provides four configurable collision semantics, chosen per workload:

Idempotent

Last writer wins. Duplicate timestamps from the same client overwrite previous entries.

Redundancy

All entries kept. Every event is stored regardless of timestamp overlap.

Ordering

Deterministic tiebreak. Events with identical timestamps are ordered by (clientId, index).

Sequentiality

Serialized access. Concurrent same-tick writes are serialized to produce a strict total order.

This flexibility allows applications to select the semantics that match their consistency and performance requirements, rather than forcing a one-size-fits-all approach.

Architecture Rationale

Why ChronoLog's architecture is shaped the way it is — three design-space arguments.

Decoupled Server-Pull

Writers push events to ChronoKeeper (hot tier) and return immediately. ChronoGrapher pulls asynchronously for story building and flushing to lower tiers. This decouples ingestion latency from persistence latency — writers are never stalled by slow storage.

StoryChunks as Throughput Unit

The unit that moves through the storage pipeline is a StoryChunk — a time-bounded batch of events, not individual entries. This amortizes per-event overhead and enables efficient bulk I/O. Chunk boundaries are temporal, not size-based.

I/O Path Separation

Writes flow through ChronoKeeper; reads flow through ChronoPlayer. These are fully decoupled paths, eliminating the read-write contention that plagues systems with shared log-tail access and enabling independent scaling of ingestion and query workloads.

For the full architecture walkthrough with component descriptions, see How it works.

How ChronoLog Compares

ChronoLog occupies a different point in the design space compared to partition-based systems (Kafka, BookKeeper) and sequencer-based systems (Corfu, SloG, ZLog).

Feature BookKeeper / Kafka / DLog Corfu / SloG / ZLog ChronoLog
Locating the log-tail Locking Locking Lock-free
I/O isolation Yes No Yes
I/O parallelism (readers-to-servers) 1-to-N M-to-N M-to-N
Storage elasticity Manual Manual Automatic
Log hot zones Yes Yes No
Log capacity Limited Limited Infinite
Operation parallelism Limited Limited Full
Granularity of data distribution Coarse (stripe) Fine (entry) Fine (time-chunk)
Log total ordering Eventual Immediate Total
Log entry visibility End of epoch After sequencing Immediate
Storage overhead per entry Moderate High None
Tiered storage No No Yes

Key Differentiators

Lock-free log tail — no sequencer or lock at the append point
M-to-N I/O parallelism — always, not 1-to-1 or 1-to-N
Vertical + horizontal elasticity — not horizontal-only or fixed-capacity
Per-event distribution — not partition or page-level granularity
Immediate visibility — not eventual or epoch-delayed
Zero per-entry metadata — no per-event overhead tax

From: Kougkas et al., "ChronoLog: A Distributed Shared Tiered Log Store with Time-based Data Ordering," MSST 2020.

Evaluation Highlights

High-level takeaways from experimental evaluation.

Scalable MWMR Ingestion

Write throughput scales with the number of concurrent writers and ChronoKeeper nodes. StoryChunk-level batching amortizes coordination overhead, enabling sustained high-throughput multi-writer, multi-reader workloads.

Tiering Cost / Performance

Automatic migration from hot (DRAM/NVMe) through warm (SSD) to cold (HDF5/PFS) trades access latency for capacity without manual intervention. Warm-tier reads remain sub-millisecond for recent data.

RDMA Transport

Zero-copy RDMA transport reduces per-event ingestion overhead to the microsecond range. A TCP fallback is available for environments without RDMA fabric, maintaining the same API semantics.

For detailed benchmarks and methodology, see the publications below.

Selected Publications

Peer-reviewed research behind ChronoLog, published at top-tier HPC, systems, and parallel computing venues.

Core ChronoLog

ChronoLog: A Distributed Shared Tiered Log Store with Time-based Data Ordering

A. Kougkas, H. Devarajan, K. Bateman, J. Cernuda, N. Rajesh, X.-H. Sun

MSST 2020
DOI / PDF

Ecosystem & Related

MegaMmap: Blurring the Boundary Between Memory and Storage for Data-Intensive Workloads

L. Logan, A. Kougkas, X.-H. Sun

SC'24
DOI / PDF

DaYu: Optimizing Distributed Scientific Workflows by Decoding Dataflow Semantics and Dynamics

M. Tang, J. Cernuda, J. Ye, L. Guo, et al.

CLUSTER'24
DOI / PDF

Characterizing the Behavior and Impact of KV Caching on Transformer Inferences under Concurrency

J. Ye, J. Cernuda, A. Maurya, X.-H. Sun, A. Kougkas, B. Nicolae

IPDPS'25

WisIO: Automated I/O Bottleneck Detection with Multi-Perspective Views for HPC Workflows

I. Yildirim, H. Devarajan, A. Kougkas, X.-H. Sun, K. Mohror

ICS'25

View all publications & team

Collaborate With Us

ChronoLog is an active, NSF-funded research project at the Gnosis Research Center. We welcome collaborations with research labs, national facilities, and industry partners.