How Distributed AI Inference Works

A simple asynchronous job queue for distributed AI inference. Submit, route, execute, deliver — explained in detail below.

The four-step lifecycle.

§01 · FLOW

STEP 01━━━▶

Submit job

Validated, added to the distributed queue with model spec, input prompt, and config parameters.

STEP 02━━━▶

Intelligent routing

Scheduler matches the job to suitable workers based on model requirements, GPU capabilities, and current load.

STEP 03━━━▶

Worker execution

Workers pull jobs, load models (cached if available), execute inference, submit results back encrypted and authenticated.

STEP 04

Result delivery

Results stored; notifications via webhook or polling. Configurable retention period.

Why asynchronous?

§02 · DESIGN CHOICE

CAP · COST

Cost effective

Save up to 90% vs dedicated GPU instances. Pay only for actual compute time, not idle resources.

CAP · SCALE

Massively scalable

Process thousands of jobs in parallel across the distributed worker network. No infrastructure to manage.

CAP · DX

Developer friendly

Simple API, comprehensive SDKs, detailed documentation. Start integrating in minutes.

Context-aware routing.

§03 · TIERS

Every job is labeled with a context tier (1–4) at submission based on prompt length. Workers declare a max_context_tier via heartbeat — the scheduler only sends them jobs they can run efficiently. Better latency everywhere; revenue opportunities for low-spec hardware.

Tier	Prompt size	Default routing
Tier 1	0–1.5K chars	All workers
Tier 2	1.5K–6K chars	Workers with tier ≥ 2
Tier 3	6K–24K chars	Workers with tier ≥ 3
Tier 4	24K+ chars	Workers with tier 4 (high-end GPUs)

End-to-end encryption flow.

§04 · ENCRYPTION

Payload encrypted before it leaves the client. Server stores ciphertext only. Only workers with the encryption capability receive decryption material — and only on claim.

STEP 01━━━▶

Client encrypts

Per-job symmetric key, payload encrypted, public key sent for results.

STEP 02━━━▶

Server: ciphertext only

Ciphertext stored, keys held separately. Only the claiming worker gets decryption material.

STEP 03━━━▶

Worker decrypts & runs

In-memory decryption, inference, re-encrypts result with your public key.

STEP 04

You decrypt

Client decrypts with private key. Per-job keys deleted on acknowledgment.

Container job lifecycle.

§05 · CONTAINERS

Docker jobs get live log streaming and per-job heartbeats. Routes only to workers advertising the docker capability.

STEP 01━━━▶

Submit with image

Provide Docker image name, optional args/env, script and code files as inputs.

STEP 02━━━▶

Worker claims

Only workers advertising docker see container jobs. Image pulled and started.

STEP 03━━━▶

Live log heartbeat

Workers ping POST /jobs/{id}/heartbeat with log lines. Resets timeout for long runs.

STEP 04

Result + cleanup

Exit code, stdout/stderr, output files submitted. Local cache means faster subsequent runs.

§06 · ENGAGE

Try it. It's fast.

Free credits to start. No credit card. Five minutes from signup to first job result.

Create Free Account → Developer Guide