Notes on affordable LLM inference, distributed GPU compute, OpenAI-compatible APIs, and the engineering behind the platform. Newest first.
Hermes is NousResearch's open-source personal AI agent — long-lived memory, skills, tools, voice. It also burns through tokens. Point its provider: custom path at MicroDC.ai's OpenAI-compatible endpoint and cut per-call cost by an order of magnitude. Step-by-step config.
OpenClaw is "the AI that actually does things." It also makes a lot of model calls. Point its gateway at MicroDC.ai's OpenAI-compatible endpoint and cut per-call cost by an order of magnitude. Step-by-step config.
If your workload is overnight document processing, dataset enrichment, or content generation, you're paying a real-time premium for nothing. Here's the math behind why batch costs a fraction of the same work on dedicated GPUs.
Point your existing openai client at a different base_url and keep the rest of your code. Multimodal content lists, LangChain, LlamaIndex — all work. Where it does and doesn't replace OpenAI, with code.
Hyperscaler GPU pricing is a function of utilization, CapEx amortization, and a real-time premium you pay even when you don't need it. A distributed marketplace flips all three. Numbers and tradeoffs.
Free credits to start. No credit card. Five minutes from signup to first job result.