Skip to content

Remote Workers

Not every node belongs in the orchestrator process. GPU inference, native binaries, code in another language, or work that must run inside a specific network zone can all be delegated to an external worker while BLOGE keeps owning the graph, durability, and resilience.

Marking a node as remote

Add execution_mode = remote and a worker_topic to any node:

bloge
node gpuInference : InferenceOperator {
  depends_on     = [prepareBatch]
  execution_mode = remote
  worker_topic   = "gpu-inference"
  timeout        = 5m
  retry = { attempts: 3, backoff: 2s, strategy: exponential }
}

At compile time, BLOGE materializes an embedded RemoteWorkerOperator that:

  1. Serializes a JSON payload envelope — operatorRef, workerTopic, input, retry policy, and grouped execution metadata in RemoteWorkerEnvelope.ExecutionContext — into a durable EXECUTE_NODE work item.
  2. Suspends the node until the worker reports completion.
  3. Uses the node's resilience timeout as the suspend deadline.

The business operatorRef is preserved on NodeSpec, and local operator-registry validation is skipped for remote-only operators (you do not need the operator class on the orchestrator's classpath). Sub-graph nodes cannot be combined with remote execution.

The control plane

The standalone bloge-examples-graph-engine/ project hosts bloge-graph-engine-service and bloge-graph-engine-server, which expose a stateless worker control plane under /api/v1/remote-workers:

FlowPurpose
registerA worker announces itself
poll + claimAtomically claim a durable EXECUTE_NODE work item
heartbeatKeep a claim alive while long work runs
completeReport a result and resume the suspended node
failReport failure; the item transitions to RETRY_WAIT or DEAD_LETTER
  • Polling claims work items sharded by worker_topic, so a pool of GPU workers only pulls gpu-inference work.
  • Failure callbacks respect the node's retry budget: retryable failures move the item to RETRY_WAIT, exhausted budgets move it to DEAD_LETTER.

A worker's lifecycle

text
orchestrator                worker pool
     │  enqueue EXECUTE_NODE     │
     │ ───────────────────────► │  poll + claim (worker_topic)
     │      (node suspended)     │  ── run business logic ──
     │                           │  heartbeat … heartbeat …
     │ ◄─────────────────────── │  complete(result)
     │  resume node + continue   │

If the worker never reports back before the suspend deadline, the node times out and follows its normal resilience path (retry, then fallback or failure).

When to use remote workers

  • Heterogeneous compute — a node needs a GPU, a native library, or a non-JVM runtime.
  • Network isolation — a step must run inside a different security zone.
  • Independent scaling — scale the worker pool for one expensive step without scaling the orchestrator.

Dead-lettered items can be inspected and replayed from the embedded Ops Console.

Next steps