Remote Workers
Not every node belongs in the orchestrator process. GPU inference, native binaries, code in another language, or work that must run inside a specific network zone can all be delegated to an external worker while BLOGE keeps owning the graph, durability, and resilience.
Marking a node as remote
Add execution_mode = remote and a worker_topic to any node:
node gpuInference : InferenceOperator {
depends_on = [prepareBatch]
execution_mode = remote
worker_topic = "gpu-inference"
timeout = 5m
retry = { attempts: 3, backoff: 2s, strategy: exponential }
}At compile time, BLOGE materializes an embedded RemoteWorkerOperator that:
- Serializes a JSON payload envelope —
operatorRef,workerTopic,input, retry policy, and grouped execution metadata inRemoteWorkerEnvelope.ExecutionContext— into a durableEXECUTE_NODEwork item. - Suspends the node until the worker reports completion.
- Uses the node's resilience timeout as the suspend deadline.
The business operatorRef is preserved on NodeSpec, and local operator-registry validation is skipped for remote-only operators (you do not need the operator class on the orchestrator's classpath). Sub-graph nodes cannot be combined with remote execution.
The control plane
The standalone bloge-examples-graph-engine/ project hosts bloge-graph-engine-service and bloge-graph-engine-server, which expose a stateless worker control plane under /api/v1/remote-workers:
| Flow | Purpose |
|---|---|
| register | A worker announces itself |
| poll + claim | Atomically claim a durable EXECUTE_NODE work item |
| heartbeat | Keep a claim alive while long work runs |
| complete | Report a result and resume the suspended node |
| fail | Report failure; the item transitions to RETRY_WAIT or DEAD_LETTER |
- Polling claims work items sharded by
worker_topic, so a pool of GPU workers only pullsgpu-inferencework. - Failure callbacks respect the node's retry budget: retryable failures move the item to
RETRY_WAIT, exhausted budgets move it toDEAD_LETTER.
A worker's lifecycle
orchestrator worker pool
│ enqueue EXECUTE_NODE │
│ ───────────────────────► │ poll + claim (worker_topic)
│ (node suspended) │ ── run business logic ──
│ │ heartbeat … heartbeat …
│ ◄─────────────────────── │ complete(result)
│ resume node + continue │If the worker never reports back before the suspend deadline, the node times out and follows its normal resilience path (retry, then fallback or failure).
When to use remote workers
- Heterogeneous compute — a node needs a GPU, a native library, or a non-JVM runtime.
- Network isolation — a step must run inside a different security zone.
- Independent scaling — scale the worker pool for one expensive step without scaling the orchestrator.
Dead-lettered items can be inspected and replayed from the embedded Ops Console.
Next steps
- Persist suspended nodes with Durable Flows.
- Replay dead letters from the Event Journal & Ops Console.
- Run agent tools remotely — see AI Agents & LLM Operators.