Self-Hosting, Self-Healing Execution Substrate for DAGs, Agents, and Services.
Titan is a zero-dependency distributed runtime built from first principles to solve the "Physical Execution" problem. It bridges the gap between static orchestrators (Airflow), dynamic AI agents, and long-running micro-services.
What can Titan run?
Static Pipelines: Deterministic ETL and DevOps workflows. Long-Running Services: Persistent APIs and servers with auto-restart. Runtime-Defined DAGs: Execution graphs built on-the-fly via logic. Agentic Workflows: Complex graphs generated dynamically by LLMs.
Titan handles the heavy lifting of distributed systems:
- Parallelism: Executing massive fan-out workloads across the cluster with non-blocking task distribution.
- Locality: Enforcing strict data-to-node affinity.
- Elasticity: Reactive auto-scaling with autonomous "Inception" events.
- Resilience: Zero-loss state recovery via the integrated TitanStore AOF.
- Capability: Hardware-aware routing (Strict GPU vs. CPU matching).
Titan scales with your complexity, from a simple script runner to an autonomous agent host:
- Level 1: Distributed Cron (The Scheduler)
- Distributed
crontabfor Python/Shell scripts. Run tasks in sequence or parallel across a cluster.
- Distributed
- Level 2: Service Orchestrator (The Platform)
- Self-hosted Micro-PaaS (like Nomad or PM2). Deploy APIs and keep them alive with auto-restarts on crash.
- Level 3: Agentic Execution Runtime (Autonomous Mode)
- Infrastructure-aware substrate where software agents programmatically spawn compute tasks based on LLM decisions.
| Feature | Static Workflows (DevOps) | Runtime-Defined (Agentic) |
|---|---|---|
| Requirement | YAML Definition + Java Binary | Titan Python SDK |
| Definition | Deterministic DAGs defined beforehand | Graphs constructed at runtime |
| Use Case | Nightly ETL, Backups, Reporting | AI Agents, Self-Healing Loops |
Orchestrate ephemeral scripts, long-running services, and hybrid DAGs (e.g., Python script → Java Service → Shell cleanup) in a single zero-dependency binary.
- Permanent vs. Ephemeral Nodes: Protect core infrastructure while allowing burst workers to decommission automatically after 45s of idle time.
- Capability Routing: Tag workers with skills (e.g.,
GPU). Titan ensures hardware-heavy tasks land only on capable nodes.
- TITAN_PROTO: Custom binary TCP wire format for <50ms latency without JSON overhead.
- Inception Scaling: Saturated workers can autonomously spawn "child" worker processes to handle traffic spikes.
- Least-Connection Routing: Master intelligently routes jobs to the node with the lowest active thread load.
- Zombie Process Reaping: Workers automatically clean up orphaned PIDs from previous crashes upon startup.
- TitanStore (AOF): Built-in Redis-like persistence. If the Master dies, it replays the Append-Only File to reconstruct the cluster state perfectly.
Watch Titan resolve dependencies and execute a multi-stage workflow where the path is decided at runtime.
dynamic_dag.mp4
Watch the cluster detect load, spawn a new Worker process automatically, and distribute tasks.
titan_load_scaling.mp4
🎬 View More Scenarios (GPU Routing, Fanout, Full Scale Cycle)
GPU Affinity Routing
GPU_Affinity_yaml.mp4
Parallel Execution (Fanout)
fanout_yaml_dag.mp4
Full Load Cycle (Scale Up & Descale)
titan_load_descaling.mp4
Titan Orchestrator is licensed under the Apache License 2.0. © 2026 Ram Narayanan A S. Open for contributions.
Engineered from first principles to deconstruct the fundamental primitives of distributed orchestration.
