AI Gateway

One control plane for every AI provider.

Tap a control to see what runs inline.

response cache

Cache

Identical prompts return from cache — zero provider cost, sub-50ms response. TTL and bypass rules are configurable per route.

Plan your AI Gateway rollout Explore Safe AI Adoption

On this page

Who this is for

Engineering teams building AI-powered applications and exposing LLM endpoints from Cloudflare Workers, Pages, or external infrastructure.

Platform and FinOps teams that need request-level cost attribution and budget controls for multi-provider LLM usage.

Security teams routing AI agent traffic, internal copilots, and AI-assisted coding (Vibe Coding) through a unified control point.

Organizations integrating multiple AI providers and looking to standardise observability, rate limiting, and guardrails across them.

What Cloudflare AI Gateway solves

Unpredictable LLM costs

Per-request and per-token billing makes budgets hard to forecast without request-level visibility, caching, and token-aware rate limits.

Provider rate-limit failures

Hitting an OpenAI or Anthropic rate limit takes the dependent feature down. AI Gateway lets you shape traffic before it reaches the provider.

No standard observability

Each provider has its own dashboard. There is no unified place to audit what prompts your app sent, who initiated them, and what came back.

Sensitive data leaking into prompts

Prompts often carry PII, source code, or credentials. The gateway is the right inline point to detect and redact before traffic leaves your network.

Prompt-level threats

Jailbreak attempts, code-abuse requests, and data-exfiltration via prompt injection need detection independent of the model provider.

Ungoverned agent traffic

AI agents and AI-assisted coding tools emit unbounded LLM traffic. Putting them behind an AI Gateway gives you token-level limits and an audit trail.

How Nanosek deploys AI Gateway

Phase 1

Inventory AI traffic

Map current LLM API integrations: providers, endpoints, request patterns, token volumes, latency budgets, and cost shape.
Identify which traffic is application-driven, agent-driven, or developer-tool driven (e.g. AI-assisted coding).

Phase 2

Configure provider endpoints

Set up AI Gateway endpoints for each provider with authentication, base URL rewriting, and routing logic.
Update application code to call the gateway base URL instead of the provider directly — usually a one-line change.

Phase 3

Add caching for deterministic prompts

Enable exact-match caching for prompts that are genuinely reusable (system-prompt boilerplate, classification, repeated queries).
Avoid caching non-deterministic conversational prompts; tune cache TTL and selectivity per route.

Phase 4

Set rate limits and budgets

Apply token-level and request-level rate limits per API key, route, or user segment.
Set per-route budget alerts to catch runaway agents or abuse before the monthly bill arrives.

Phase 5

Add guardrails and DLP

Configure prompt guardrails to detect and block jailbreak attempts, code-abuse requests, and PII requests.
Apply sensitive-data redaction to outbound prompts so secrets, PII, and source code don't leave the network.

Phase 6

Wire observability

Push request-level logs to Logpush / SIEM with prompt, response (or hash), tokens, latency, cost, and policy actions.
Build dashboards for cost-per-route, cache hit-ratio, top consumers, and policy events.

AI Gateway architecture pattern

Application or AI agent calls the AI Gateway base URL instead of the provider directly. AI Gateway is provider-agnostic — same pattern works for OpenAI, Anthropic, Workers AI, Cohere, and others.

Each route can have its own cache policy, rate-limit budget, guardrail config, and provider failover order. Misconfigured prompts can be re-routed to a cheaper model.

Outbound prompts are inspected for sensitive data (PII, source code, credentials) and redacted in place. Responses are scanned for the same and rewritten where required.

Logpush carries every request to your SIEM with sufficient detail to audit who asked what, what came back, what it cost, and what policy decisions fired.

Workers can sit in front of AI Gateway for custom routing logic (e.g. tenant-aware model selection, multi-tenant cost attribution).

AI Gateway controls Nanosek operationalises

Provider endpoints

OpenAI, Anthropic, Workers AI, Cohere, and self-hosted endpoints configured behind a single gateway URL.

Caching (exact-match)

Used for deterministic prompts to reduce cost and latency. Tunable TTL per route.

Token rate limits

Per API key, route, IP, or user segment — token-aware so cost shape is bounded, not just request count.

Budget alerts

Triggered when monthly token spend on a route exceeds threshold; useful for catching runaway agents.

Prompt guardrails

Inline detection of jailbreak attempts, code-abuse requests, and PII requests in prompts and responses.

DLP redaction

Strip PII, source code, secrets, and credentials from outbound prompts before they reach the provider.

Provider failover

Fall back to a secondary provider when the primary returns 429/5xx — keeps AI features available during provider incidents.

Logpush

Request-level logs to SIEM with prompt hash, tokens, latency, provider, cost, and policy actions.

Workers integration

Optional — for tenant-aware routing, custom auth, or pre-/post-processing logic in front of AI Gateway.

Control	When Nanosek uses it
Provider endpoints	OpenAI, Anthropic, Workers AI, Cohere, and self-hosted endpoints configured behind a single gateway URL.
Caching (exact-match)	Used for deterministic prompts to reduce cost and latency. Tunable TTL per route.
Token rate limits	Per API key, route, IP, or user segment — token-aware so cost shape is bounded, not just request count.
Budget alerts	Triggered when monthly token spend on a route exceeds threshold; useful for catching runaway agents.
Prompt guardrails	Inline detection of jailbreak attempts, code-abuse requests, and PII requests in prompts and responses.
DLP redaction	Strip PII, source code, secrets, and credentials from outbound prompts before they reach the provider.
Provider failover	Fall back to a secondary provider when the primary returns 429/5xx — keeps AI features available during provider incidents.
Logpush	Request-level logs to SIEM with prompt hash, tokens, latency, provider, cost, and policy actions.
Workers integration	Optional — for tenant-aware routing, custom auth, or pre-/post-processing logic in front of AI Gateway.

Where AI Gateway fits

Internal copilots / chat features

Without AI Gateway

Costs surprise the FinOps team; no audit of what employees asked

With AI Gateway

Token rate limits per user; logs flow to SIEM; cache covers repeated questions

AI agents (MCP, autonomous)

Without AI Gateway

Unbounded LLM spend; no audit of agent decisions

With AI Gateway

Token budgets per agent identity; prompt-level audit trail; DLP on tool-call args

AI-assisted coding (Vibe Coding)

Without AI Gateway

Source code and credentials sent to provider; no spend governance

With AI Gateway

DLP redaction on code prompts; per-developer rate limits; cache common boilerplate

Customer-facing chatbots

Without AI Gateway

Abuse via prompt injection; cost-of-goods unclear

With AI Gateway

Prompt guardrails block jailbreaks; per-session limits; provider failover on 5xx

RAG / retrieval pipelines

Without AI Gateway

Same embeddings recomputed; latency stacks up

With AI Gateway

Cache embeddings calls; rate-limit the retrieval step independently

Multi-provider strategy

Without AI Gateway

Different SDKs, different dashboards, no portability

With AI Gateway

Single gateway URL; switch providers with a config change; unified logs

Use case	Without AI Gateway	With AI Gateway
Internal copilots / chat features	Costs surprise the FinOps team; no audit of what employees asked	Token rate limits per user; logs flow to SIEM; cache covers repeated questions
AI agents (MCP, autonomous)	Unbounded LLM spend; no audit of agent decisions	Token budgets per agent identity; prompt-level audit trail; DLP on tool-call args
AI-assisted coding (Vibe Coding)	Source code and credentials sent to provider; no spend governance	DLP redaction on code prompts; per-developer rate limits; cache common boilerplate
Customer-facing chatbots	Abuse via prompt injection; cost-of-goods unclear	Prompt guardrails block jailbreaks; per-session limits; provider failover on 5xx
RAG / retrieval pipelines	Same embeddings recomputed; latency stacks up	Cache embeddings calls; rate-limit the retrieval step independently
Multi-provider strategy	Different SDKs, different dashboards, no portability	Single gateway URL; switch providers with a config change; unified logs

Deployment steps

01 Inventory AI provider integrations and request patterns.
02 Configure AI Gateway endpoints and authentication.
03 Update application/agent code to call the gateway base URL instead of providers directly.
04 Enable caching for deterministic prompt patterns.
05 Set token and request rate limits and budget alerts.
06 Add prompt guardrails and DLP redaction.
07 Configure Logpush to SIEM and build cost/abuse dashboards.
08 Tune cache hit ratio, rate limits, and guardrails after initial rollout.

Risks and mitigations

Risk

Cache staleness on non-deterministic prompts producing wrong responses.

Mitigation

Only cache prompts where the response is genuinely reusable. Use exact-match caching by default and short TTLs.

Risk

Rate limiting blocking legitimate burst traffic.

Mitigation

Baseline traffic first; set limits above observed peak; alert before enforcing blocks.

Risk

Prompt guardrails false-positive on legitimate requests.

Mitigation

Start in monitor mode; tune patterns against real traffic before enforcing block actions.

Risk

Streaming responses interacting badly with gateway buffering.

Mitigation

Confirm streaming support per provider; test latency end-to-end before cutting over production traffic.

Risk	Mitigation
Cache staleness on non-deterministic prompts producing wrong responses.	Only cache prompts where the response is genuinely reusable. Use exact-match caching by default and short TTLs.
Rate limiting blocking legitimate burst traffic.	Baseline traffic first; set limits above observed peak; alert before enforcing blocks.
Prompt guardrails false-positive on legitimate requests.	Start in monitor mode; tune patterns against real traffic before enforcing block actions.
Streaming responses interacting badly with gateway buffering.	Confirm streaming support per provider; test latency end-to-end before cutting over production traffic.

Deliverables

AI Gateway configuration covering all current LLM provider integrations.
Cache policy documentation per route with TTL rationale.
Rate-limit and budget-alert configuration per route + per API key.
Prompt guardrail and DLP policy with monitor → enforce migration plan.
Logpush configuration delivering request-level logs to SIEM.
Cost, cache, and abuse dashboards.
Runbook for adding a new AI provider or route.

Frequently asked questions

Does AI Gateway support streaming responses?

Yes. Cloudflare AI Gateway supports streaming for providers that offer streaming APIs like OpenAI and Anthropic. Some buffering may be introduced; we test latency end-to-end before cutover.

Can AI Gateway be used without Cloudflare Workers?

Yes. AI Gateway works as a standalone proxy — your application simply changes the API base URL. Workers are optional and only needed for custom routing logic (tenant-aware model selection, pre/post-processing).

Which providers does AI Gateway support?

OpenAI, Anthropic, Google AI, Workers AI, Cohere, Mistral, Perplexity, Replicate, AWS Bedrock, Azure OpenAI, and others. Provider coverage continues to grow.

Does AI Gateway protect against prompt injection?

AI Gateway provides prompt guardrails that detect known patterns (jailbreak attempts, code-abuse requests, PII requests). It complements but does not replace application-level input validation.

How does AI Gateway relate to AI Agent Governance (MCP)?

AI Gateway controls LLM API traffic; MCP Server Portals govern AI agent activity reaching other corporate systems. Both belong in the same Safe AI Adoption stack — see /cloudflare-mcp-governance and /safe-ai-adoption.

A unified control point for AI traffic

Nanosek deploys Cloudflare AI Gateway with cost controls, guardrails, DLP, and SIEM-ready logs — sized to whatever LLM providers and agents your environment uses today.

Talk to Nanosek Explore Safe AI Adoption

One control plane for every AI provider.

Cache

Who this is for

What Cloudflare AI Gateway solves

Unpredictable LLM costs

Provider rate-limit failures

No standard observability

Sensitive data leaking into prompts

Prompt-level threats

Ungoverned agent traffic

How Nanosek deploys AI Gateway

Inventory AI traffic

Configure provider endpoints

Add caching for deterministic prompts

Set rate limits and budgets

Add guardrails and DLP

Wire observability

AI Gateway architecture pattern

AI Gateway controls Nanosek operationalises

Where AI Gateway fits

Deployment steps

Risks and mitigations

Deliverables

Frequently asked questions

A unified control point for AI traffic

Deliver Cloudflare without surprises.