Building a Cloudflare AI Gateway integration for LlamaIndex
Single LLM providers are fragile. OpenAI outage? Your app dies. Anthropic rate limits? Users wait. It's annoying.
I wanted multi-provider orchestration without the complexity. Cloudflare AI Gateway looked perfect—automatic fallback, caching, load balancing. But no LlamaIndex integration.
So I built one.
The Problem
Cloudflare AI Gateway is smart. It handles:
- Automatic provider fallback
- Built-in caching and rate limiting
- Load balancing across providers
- Unified API interface
But LlamaIndex LLMs make direct HTTP requests to provider APIs. Cloudflare expects a different format.
Cloudflare provides an OpenAI-compatible API, but I wanted something more flexible. Compatible often means limited. And Cloudflare also provides a Vercel ai-sdk integration, but I save my life by using Python.
The Solution
A wrapper that sits between LlamaIndex LLMs and their HTTP clients. Intercepts requests, transforms them for Cloudflare, handles responses.
# Create regular LlamaIndex LLMs
=
=
# Wrap with Cloudflare AI Gateway
=
# Use exactly like any LlamaIndex LLM
=
=
Drop-in replacement. Zero code changes.
What It Does
Core features:
- Automatic provider detection and configuration
- Built-in fallback when providers fail
- Streaming support (chat and completion)
- Async/await compatible
Tested providers:
Also supported:
Try It
Still a planned PR (#19395), but functional:
May not be production-ready, but good enough to experiment with. Check out the LlamaIndex integrations repository for other LLM providers.