nodix: Real-Time Compute Graphs for Robotics

Real-time compute graph engine for robotics applications. DAG-based execution with deterministic scheduling.

Vision Pipeline

Robotics systems process data from multiple sensors simultaneously. Cameras, IMUs, and LiDAR each produce streams at different rates that must be transformed, filtered, and fused before generating control signals. A compute graph models this naturally as a directed acyclic graph where nodes represent processing units and edges represent data dependencies. The challenge is executing these graphs with real-time guarantees when your camera runs at 30fps, your IMU at 500Hz, and your control loop requires 1000Hz updates.

Nodix executes compute graphs with pluggable schedulers that provide deterministic timing guarantees. Nodes declare their deadlines and data dependencies. The scheduler, whether EDF for dynamic priority or Rate Monotonic for static priority assignment, ensures critical nodes meet their deadlines. Users choose between single-threaded execution for predictability or parallel execution for throughput. This explicit choice acknowledges the fundamental tradeoff between determinism and performance in real-time systems.

The architecture defines four node types that cover typical robotics pipelines. Source nodes produce data from sensors or timers. Filter nodes transform single inputs through operations like object detection or preprocessing. Fusion nodes combine multiple inputs for tasks like sensor fusion or state estimation. Sink nodes consume data for actuation or logging. Nodes connect through typed ports that validate compatibility at graph construction time. An image port cannot connect to an IMU port. Type errors appear when building the graph, not during execution.

Sensor Fusion Pipeline

Two executor implementations provide different performance characteristics. The single-threaded executor processes nodes in topological order, which gives deterministic execution but leaves CPU cores idle. The parallel executor distributes ready nodes across worker threads, maximizing throughput at the cost of scheduling jitter. Both executors respect the same node interface, so switching requires minimal code changes. The choice depends on whether your application prioritizes timing predictability or computational throughput.

Design Decisions and Tradeoffs

Zero-copy data flow eliminates expensive memory operations for large data types. Images, point clouds, and tensors are wrapped in Arc smart pointers. Nodes receive shared references rather than owned copies. A 1080p image flowing through five processing nodes requires zero memcpy operations. The overhead is a single atomic reference count increment per node. This design choice matters when processing megabytes of sensor data per frame at high frame rates.

Pluggable scheduling allows different timing strategies without changing node implementations. EDF prioritizes work by absolute deadline, which optimizes for meeting the most urgent deadlines first. Rate Monotonic assigns static priorities based on task period, which simplifies analysis but can be suboptimal. FIFO and priority-based schedulers exist for simpler use cases. The scheduler is a trait, so swapping implementations requires changing one line during graph construction. This abstraction proved essential because optimal scheduling policy varies significantly between applications.

Type safety across dynamic connections required careful design. Nodes use trait objects for dynamic dispatch since graphs are constructed at runtime based on configuration. Ports use generics for zero-cost type checking within nodes. The compromise is runtime type validation when connecting ports, but zero overhead after graph construction. Attempting to connect incompatible types returns an error immediately rather than causing runtime panics during execution.

HDR histograms track execution latency for every node. The runtime records p50, p99, and maximum latencies with minimal overhead using the HdrHistogram library. When a deadline is missed, you know exactly which node exceeded its budget and by how much. Chrome tracing export visualizes execution timelines, showing when each node ran and where scheduling gaps occurred. This observability proved critical for debugging performance issues in complex graphs with dozens of nodes.

Performance benchmarks show 5,000 iterations per second throughput with sub-millisecond p99 latency. Zero-copy semantics enable this performance for typical robotics workloads processing camera frames and sensor data. The parallel executor scales linearly with CPU cores up to the point where synchronization overhead dominates. For graphs with sufficient parallelism, throughput increases proportionally with available cores.

Current limitations point toward refinements rather than architectural changes. The executor abstraction requires boilerplate when switching between single-threaded and parallel execution. A builder pattern would streamline this common operation. The system provides the execution framework but users implement all processing nodes. A library of common operations like rate limiting, buffering, and basic transforms would reduce initial development friction. These are additions rather than redesigns, which suggests the core architecture is sound.

Working on nodix reinforced that determinism and throughput are fundamentally at odds in real-time systems. Single-threaded execution provides perfect determinism but wastes hardware. Parallel execution maximizes throughput but introduces scheduling variability. Making this tradeoff explicit and letting users choose based on their constraints proved more valuable than optimizing for one case. Type systems can enforce correctness at compile time for static structures, but dynamic graph construction requires runtime validation. The key is failing fast during construction rather than during execution. Real-time systems need excellent observability because timing bugs are subtle and workload-dependent. Investing in tracing and histogram collection paid dividends when debugging complex timing issues.


View on GitHub