One control plane for every AI provider.
Tap a control to see what runs inline.
response cache
Cache
Identical prompts return from cache — zero provider cost, sub-50ms response. TTL and bypass rules are configurable per route.
On this page
Nanosek delivers Cloudflare AI Gateway implementation services. AI Gateway is a unified control point for LLM API traffic that adds caching, token rate limiting, prompt guardrails, sensitive-data redaction, logging, and cost controls without modifying application code. Nanosek implements AI Gateway as part of AI application infrastructure on Cloudflare and as the control plane for AI agent and AI-assisted coding (Vibe Coding) traffic.
Who this is for
What Cloudflare AI Gateway solves
Unpredictable LLM costs
Per-request and per-token billing makes budgets hard to forecast without request-level visibility, caching, and token-aware rate limits.
Provider rate-limit failures
Hitting an OpenAI or Anthropic rate limit takes the dependent feature down. AI Gateway lets you shape traffic before it reaches the provider.
No standard observability
Each provider has its own dashboard. There is no unified place to audit what prompts your app sent, who initiated them, and what came back.
Sensitive data leaking into prompts
Prompts often carry PII, source code, or credentials. The gateway is the right inline point to detect and redact before traffic leaves your network.
Prompt-level threats
Jailbreak attempts, code-abuse requests, and data-exfiltration via prompt injection need detection independent of the model provider.
Ungoverned agent traffic
AI agents and AI-assisted coding tools emit unbounded LLM traffic. Putting them behind an AI Gateway gives you token-level limits and an audit trail.
How Nanosek deploys AI Gateway
Inventory AI traffic
- Map current LLM API integrations: providers, endpoints, request patterns, token volumes, latency budgets, and cost shape.
- Identify which traffic is application-driven, agent-driven, or developer-tool driven (e.g. AI-assisted coding).
Configure provider endpoints
- Set up AI Gateway endpoints for each provider with authentication, base URL rewriting, and routing logic.
- Update application code to call the gateway base URL instead of the provider directly — usually a one-line change.
Add caching for deterministic prompts
- Enable exact-match caching for prompts that are genuinely reusable (system-prompt boilerplate, classification, repeated queries).
- Avoid caching non-deterministic conversational prompts; tune cache TTL and selectivity per route.
Set rate limits and budgets
- Apply token-level and request-level rate limits per API key, route, or user segment.
- Set per-route budget alerts to catch runaway agents or abuse before the monthly bill arrives.
Add guardrails and DLP
- Configure prompt guardrails to detect and block jailbreak attempts, code-abuse requests, and PII requests.
- Apply sensitive-data redaction to outbound prompts so secrets, PII, and source code don't leave the network.
Wire observability
- Push request-level logs to Logpush / SIEM with prompt, response (or hash), tokens, latency, cost, and policy actions.
- Build dashboards for cost-per-route, cache hit-ratio, top consumers, and policy events.
AI Gateway architecture pattern
AI Gateway controls Nanosek operationalises
Provider endpoints
OpenAI, Anthropic, Workers AI, Cohere, and self-hosted endpoints configured behind a single gateway URL.
Caching (exact-match)
Used for deterministic prompts to reduce cost and latency. Tunable TTL per route.
Token rate limits
Per API key, route, IP, or user segment — token-aware so cost shape is bounded, not just request count.
Budget alerts
Triggered when monthly token spend on a route exceeds threshold; useful for catching runaway agents.
Prompt guardrails
Inline detection of jailbreak attempts, code-abuse requests, and PII requests in prompts and responses.
DLP redaction
Strip PII, source code, secrets, and credentials from outbound prompts before they reach the provider.
Provider failover
Fall back to a secondary provider when the primary returns 429/5xx — keeps AI features available during provider incidents.
Logpush
Request-level logs to SIEM with prompt hash, tokens, latency, provider, cost, and policy actions.
Workers integration
Optional — for tenant-aware routing, custom auth, or pre-/post-processing logic in front of AI Gateway.
| Control | When Nanosek uses it |
|---|---|
| Provider endpoints | OpenAI, Anthropic, Workers AI, Cohere, and self-hosted endpoints configured behind a single gateway URL. |
| Caching (exact-match) | Used for deterministic prompts to reduce cost and latency. Tunable TTL per route. |
| Token rate limits | Per API key, route, IP, or user segment — token-aware so cost shape is bounded, not just request count. |
| Budget alerts | Triggered when monthly token spend on a route exceeds threshold; useful for catching runaway agents. |
| Prompt guardrails | Inline detection of jailbreak attempts, code-abuse requests, and PII requests in prompts and responses. |
| DLP redaction | Strip PII, source code, secrets, and credentials from outbound prompts before they reach the provider. |
| Provider failover | Fall back to a secondary provider when the primary returns 429/5xx — keeps AI features available during provider incidents. |
| Logpush | Request-level logs to SIEM with prompt hash, tokens, latency, provider, cost, and policy actions. |
| Workers integration | Optional — for tenant-aware routing, custom auth, or pre-/post-processing logic in front of AI Gateway. |
Where AI Gateway fits
Internal copilots / chat features
Costs surprise the FinOps team; no audit of what employees asked
Token rate limits per user; logs flow to SIEM; cache covers repeated questions
AI agents (MCP, autonomous)
Unbounded LLM spend; no audit of agent decisions
Token budgets per agent identity; prompt-level audit trail; DLP on tool-call args
AI-assisted coding (Vibe Coding)
Source code and credentials sent to provider; no spend governance
DLP redaction on code prompts; per-developer rate limits; cache common boilerplate
Customer-facing chatbots
Abuse via prompt injection; cost-of-goods unclear
Prompt guardrails block jailbreaks; per-session limits; provider failover on 5xx
RAG / retrieval pipelines
Same embeddings recomputed; latency stacks up
Cache embeddings calls; rate-limit the retrieval step independently
Multi-provider strategy
Different SDKs, different dashboards, no portability
Single gateway URL; switch providers with a config change; unified logs
| Use case | Without AI Gateway | With AI Gateway |
|---|---|---|
| Internal copilots / chat features | Costs surprise the FinOps team; no audit of what employees asked | Token rate limits per user; logs flow to SIEM; cache covers repeated questions |
| AI agents (MCP, autonomous) | Unbounded LLM spend; no audit of agent decisions | Token budgets per agent identity; prompt-level audit trail; DLP on tool-call args |
| AI-assisted coding (Vibe Coding) | Source code and credentials sent to provider; no spend governance | DLP redaction on code prompts; per-developer rate limits; cache common boilerplate |
| Customer-facing chatbots | Abuse via prompt injection; cost-of-goods unclear | Prompt guardrails block jailbreaks; per-session limits; provider failover on 5xx |
| RAG / retrieval pipelines | Same embeddings recomputed; latency stacks up | Cache embeddings calls; rate-limit the retrieval step independently |
| Multi-provider strategy | Different SDKs, different dashboards, no portability | Single gateway URL; switch providers with a config change; unified logs |
Deployment steps
- 01 Inventory AI provider integrations and request patterns.
- 02 Configure AI Gateway endpoints and authentication.
- 03 Update application/agent code to call the gateway base URL instead of providers directly.
- 04 Enable caching for deterministic prompt patterns.
- 05 Set token and request rate limits and budget alerts.
- 06 Add prompt guardrails and DLP redaction.
- 07 Configure Logpush to SIEM and build cost/abuse dashboards.
- 08 Tune cache hit ratio, rate limits, and guardrails after initial rollout.
Risks and mitigations
Cache staleness on non-deterministic prompts producing wrong responses.
Only cache prompts where the response is genuinely reusable. Use exact-match caching by default and short TTLs.
Rate limiting blocking legitimate burst traffic.
Baseline traffic first; set limits above observed peak; alert before enforcing blocks.
Prompt guardrails false-positive on legitimate requests.
Start in monitor mode; tune patterns against real traffic before enforcing block actions.
Streaming responses interacting badly with gateway buffering.
Confirm streaming support per provider; test latency end-to-end before cutting over production traffic.
| Risk | Mitigation |
|---|---|
| Cache staleness on non-deterministic prompts producing wrong responses. | Only cache prompts where the response is genuinely reusable. Use exact-match caching by default and short TTLs. |
| Rate limiting blocking legitimate burst traffic. | Baseline traffic first; set limits above observed peak; alert before enforcing blocks. |
| Prompt guardrails false-positive on legitimate requests. | Start in monitor mode; tune patterns against real traffic before enforcing block actions. |
| Streaming responses interacting badly with gateway buffering. | Confirm streaming support per provider; test latency end-to-end before cutting over production traffic. |
Deliverables
- AI Gateway configuration covering all current LLM provider integrations.
- Cache policy documentation per route with TTL rationale.
- Rate-limit and budget-alert configuration per route + per API key.
- Prompt guardrail and DLP policy with monitor → enforce migration plan.
- Logpush configuration delivering request-level logs to SIEM.
- Cost, cache, and abuse dashboards.
- Runbook for adding a new AI provider or route.
Frequently asked questions
Does AI Gateway support streaming responses?
Can AI Gateway be used without Cloudflare Workers?
Which providers does AI Gateway support?
Does AI Gateway protect against prompt injection?
How does AI Gateway relate to AI Agent Governance (MCP)?
A unified control point for AI traffic
Nanosek deploys Cloudflare AI Gateway with cost controls, guardrails, DLP, and SIEM-ready logs — sized to whatever LLM providers and agents your environment uses today.