Skip to main content
Version: 2.8.0

Performance Tuning

This page covers the configuration knobs that most directly affect ChronoLog's throughput, latency, and memory usage. All settings below live in default_conf.json under the relevant component block.

Story Chunk Settings

A story chunk is the unit of data that flows through the pipeline: Keeper buffers events into chunks, drains them to Grapher, which persists them to storage. Three parameters govern chunk behavior in DataStoreInternals.

max_story_chunk_size

Maximum size of a single story chunk in the number of Events.

ComponentDefault
ChronoKeeper4096
ChronoGrapher4096
ChronoPlayer4096

Effect: larger chunks reduce per-chunk overhead (fewer RPC calls, fewer I/O operations) but increase memory usage and the minimum latency before an event reaches persistent storage. Tune this based on your expected event payload size.

story_chunk_duration_secs

How long (in seconds) a chunk remains open, accumulating events, before it is sealed and drained downstream.

ComponentDefault
ChronoKeeper10
ChronoGrapher60
ChronoPlayer60

Effect: shorter durations reduce end-to-end latency from write to persistent storage at the cost of more frequent RPC round-trips. Longer durations improve batching efficiency. The ChronoKeeper value is the most latency-sensitive; the ChronoGrapher/ChronoPlayer values affect how quickly data becomes queryable.

acceptance_window_secs

Maximum age (in seconds) of an incoming event timestamp, relative to the current wall clock, for the event to be accepted. Events older than this are rejected.

ComponentDefault
ChronoKeeper15
ChronoGrapher180

Effect: a wider window accommodates clock skew between client hosts and the ChronoKeeper nodes, and allows late-arriving events to be stored correctly. Narrowing the window tightens the ordering guarantee but rejects events from slow or clock-skewed producers.

inactive_story_delay_secs

How long (in seconds) a story with no new events is kept in memory before being evicted.

ComponentDefault
ChronoKeeper120
ChronoGrapher300

Effect: longer delays keep story state warm, reducing re-initialization cost when a story becomes active again. Shorter delays free memory sooner in workloads with many short-lived or bursty stories.

Ingestion Thread Count

Each component's ingestion RPC service exposes an IngestionThreadCount knob that controls how many worker threads serve incoming requests.

ComponentDefaultTunes
KeeperRecordingService.IngestionThreadCount4Concurrent client → Keeper event ingestion. Raise for high client fan-in.
KeeperGrapherDrainService.IngestionThreadCount1Concurrent Keeper → Grapher chunk drain. Raise for multi-Keeper deployments.
PlaybackQueryService.IngestionThreadCount1Concurrent client → Player playback queries. Raise for concurrent readers.

Effect: more threads improve RPC concurrency at the cost of CPU and contention on internal queues. The Keeper default (4) reflects the heavier ingestion workload; Grapher and Player default to 1 and only need raising when the corresponding traffic is concurrent.

Extraction Pipeline

The ExtractionModule block in chrono_keeper and chrono_grapher controls how retired StoryChunks are drained downstream. Two knobs affect throughput directly:

extraction_stream_count

Number of parallel worker threads draining the extraction queue through the configured extractor chain.

ComponentDefault
ChronoKeeper2
ChronoGrapher2

Effect: higher values increase chunk-throughput when downstream extractors (CSV writes, RDMA fan-out, HDF5 archive) are the bottleneck. The benefit plateaus when storage or network bandwidth saturates.

extractors

The composition of the extractor chain itself is a tuning surface: chains can be shortened (drop CSV mirroring), lengthened (add a parallel RDMA target), or swapped (use dual_endpoint_rdma_extractor to deliver chunks to both ChronoGrapher and ChronoPlayer in one pass). See Server Configuration → ExtractionModule for the schema.

Recording Groups

"RecordingGroup": 7

Applies to: chrono_keeper, chrono_grapher, chrono_player.

Recording groups are logical partitions that pair ChronoKeeper and ChronoGrapher instances. In a multi-node deployment, recordin group IDs determine which ChronoGrapher drains which ChronoKeeper. The default value of 7 is a single-group configuration. When deploying multiple ChronoKeeper+ChronoGrapher pairs, assign each pair the same group ID and ensure no two pairs share the same ID.

Clock Source

"clock": {
"clocksource_type": "CPP_STYLE",
"drift_cal_sleep_sec": 10,
"drift_cal_sleep_nsec": 0
}

clocksource_type

Controls the mechanism used to generate event timestamps across all components.

ValueEnumDescription
"C_STYLE"C_STYLE (0)Uses C gettimeofday. Portable but lowest resolution.
"CPP_STYLE"CPP_STYLE (1)Uses C++ std::chrono::high_resolution_clock. Default. Good balance of portability and precision.
"TSC"TSC (2)Reads the CPU timestamp counter directly. Lowest latency and highest resolution, but requires a stable, invariant TSC across all sockets. Not suitable for systems with dynamic CPU frequency scaling unless constant_tsc is set.
caution

TSC mode requires that all CPUs in the system have synchronized, invariant TSCs. Verify with grep -m1 constant_tsc /proc/cpuinfo on Linux. Do not use TSC on virtual machines or heterogeneous CPU clusters.

drift_cal_sleep_sec / drift_cal_sleep_nsec

Interval between clock drift calibration runs (default: every 10 seconds). The drift calibration corrects for slow clock drift between the ChronoLog clock and the system wall clock. Reducing this interval increases correction frequency at a small CPU cost; increasing it reduces CPU overhead but allows more drift to accumulate between corrections. Currently, it has no effect in ChronoLog v2.8.0.

Shutdown Grace Period

"chrono_visor": {
"delayed_data_admin_exit_in_secs": 3
}

Number of seconds ChronoVisor waits after receiving a shutdown signal before terminating the data administration service. This grace period allows in-flight RPC calls and pending data acknowledgements to complete cleanly. Increase this value if you observe data loss at shutdown under heavy write load.

Summary of Tuning Recommendations

GoalRecommended change
Lower write-to-query latencyReduce story_chunk_duration_secs on ChronoKeeper (e.g., 5) and ChronoGrapher (e.g., 30)
Higher throughput / better batchingIncrease max_story_chunk_size (e.g., 16384 or 65536)
Tolerate high clock skew across nodesIncrease acceptance_window_secs on ChronoKeeper (e.g., 60)
Reduce memory usage with many storiesDecrease inactive_story_delay_secs on Keeper and Grapher
Minimum timestamp latency on bare-metal HPCSet clocksource_type to "TSC" (verify invariant TSC first)
Prevent data loss at shutdown under heavy loadIncrease delayed_data_admin_exit_in_secs to 10 or more
High client fan-in into ChronoKeeperRaise KeeperRecordingService.IngestionThreadCount (e.g., 8 or 16)
Many ChronoKeepers draining one ChronoGrapherRaise KeeperGrapherDrainService.IngestionThreadCount to match concurrent drain RPCs
Extraction-bound ChronoKeeper or ChronoGrapherRaise ExtractionModule.extraction_stream_count until storage/network saturates