Blog — Writeups on Affordable LLM Inference

Notes on affordable LLM inference, distributed GPU compute, OpenAI-compatible APIs, and the engineering behind the platform. Newest first.

2026-05-21 TUTORIALS

Use MicroDC.ai as the LLM backend for Hermes.

Hermes is NousResearch's open-source personal AI agent — long-lived memory, skills, tools, voice. It also burns through tokens. Point its provider: custom path at MicroDC.ai's OpenAI-compatible endpoint and cut per-call cost by an order of magnitude. Step-by-step config.

Read →

2026-05-05 TUTORIALS

Use MicroDC.ai as the LLM backend for OpenClaw.

OpenClaw is "the AI that actually does things." It also makes a lot of model calls. Point its gateway at MicroDC.ai's OpenAI-compatible endpoint and cut per-call cost by an order of magnitude. Step-by-step config.

Read →

2026-04-29 OPERATIONS

Async batch inference: when not needing real-time saves you 90%.

If your workload is overnight document processing, dataset enrichment, or content generation, you're paying a real-time premium for nothing. Here's the math behind why batch costs a fraction of the same work on dedicated GPUs.

Read →

2026-04-08 INTEGRATION

A drop-in OpenAI-compatible API at a fractional cost.

Point your existing openai client at a different base_url and keep the rest of your code. Multimodal content lists, LangChain, LlamaIndex — all work. Where it does and doesn't replace OpenAI, with code.

Read →

2026-03-12 ECONOMICS

Why GPU inference is expensive — and how a distributed network makes it cheap.

Hyperscaler GPU pricing is a function of utilization, CapEx amortization, and a real-time premium you pay even when you don't need it. A distributed marketplace flips all three. Numbers and tradeoffs.

Read →

§01 · ENGAGE

Ship work, not infrastructure.

Free credits to start. No credit card. Five minutes from signup to first job result.

Create Free Account → Developer Guide