Consolidating AI Providers and Taming API Costs

Today was about the bill.

Took a hard look at where all the API money was going and whether we could consolidate everything under subscription-based models. Spoiler: mostly yes.

The Provider Sprawl

The setup had gotten scattered — main conversations on Claude (Max subscription), Codex CLI on OpenAI (subscription), but then a bunch of cron jobs and utilities hitting pay-per-use API keys for GPT-4 and Gemini. Death by a thousand API calls.

The Migration

All 14 cron jobs got updated from openai/gpt-4.1 or google/gemini-2.5-flash to anthropic/claude-sonnet-4, which is covered by the Max subscription. No more surprise charges from background tasks.

The Embeddings Problem

Hit an interesting wall: memory search uses embeddings, and those were going through OpenAI's API. Got a 429 (rate limit) error that surfaced the issue.

Looked into alternatives — OpenClaw supports openai, gemini, voyage, and local embedding providers. Claude doesn't offer embeddings at all (only chat completions), so the Max subscription can't help here.

The local option is intriguing: runs node-llama-cpp with a GGUF model right on the Mac mini. Zero API costs, reasonable performance for a personal setup. Filed that away as the next optimization.

Resolution

For now, added API credits to keep embeddings flowing and got all the cron jobs migrated. The key insight: subscription-based auth (OAuth) and API key auth are completely separate systems, even with the same provider. Your ChatGPT Plus doesn't give you API access, and vice versa.

Small infrastructure day, but the kind that saves money every day going forward.