Implementing AI Response Caching

Avoid redundant API calls with smart caching strategies.

Expected Savings: 40-60% cost reduction

Basic Implementation

import { Redis } from 'ioredis';

const redis = new Redis();
const TTL = 3600; // 1 hour

async function cachedQuery(prompt: string) {
  const cacheKey = `ai:${hash(prompt)}`;
  
  // Check cache first
  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);
  
  // Miss: call API
  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: prompt }]
  });
  
  // Cache the result
  await redis.setex(cacheKey, TTL, JSON.stringify(response));
  return response;
}

Semantic Caching

For similar (not identical) queries:

const cache = new SemanticCache({ 
  similarity: 0.95,
  embedding: 'text-embedding-3-small'
});

// "What's the weather in NYC?" 
// matches "NYC weather today?"

Cache Strategies

Strategy	Use Case	TTL
Exact match	Identical prompts	24h
Semantic	Similar meaning	1h
Prefix	Same context	30m

Monitoring

Track these metrics:

Cache hit rate (target: >50%)
Cache miss latency
Memory usage
Invalidation frequency

Implementing AI Response Caching

Avoid redundant API calls with smart caching strategies.

Expected Savings: 40-60% cost reduction

Basic Implementation

import { Redis } from 'ioredis';

const redis = new Redis();
const TTL = 3600; // 1 hour

async function cachedQuery(prompt: string) {
  const cacheKey = `ai:${hash(prompt)}`;
  
  // Check cache first
  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);
  
  // Miss: call API
  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: prompt }]
  });
  
  // Cache the result
  await redis.setex(cacheKey, TTL, JSON.stringify(response));
  return response;
}

Semantic Caching

For similar (not identical) queries:

const cache = new SemanticCache({ 
  similarity: 0.95,
  embedding: 'text-embedding-3-small'
});

// "What's the weather in NYC?" 
// matches "NYC weather today?"

Cache Strategies

Strategy	Use Case	TTL
Exact match	Identical prompts	24h
Semantic	Similar meaning	1h
Prefix	Same context	30m

Monitoring

Track these metrics:

Cache hit rate (target: >50%)
Cache miss latency
Memory usage
Invalidation frequency

Implementing AI Response Caching

Implementing AI Response Caching

Expected Savings: 40-60% cost reduction

Basic Implementation

Semantic Caching

Cache Strategies

Monitoring

See something that needs updating?

Implementing AI Response Caching

Implementing AI Response Caching

Expected Savings: 40-60% cost reduction

Basic Implementation

Semantic Caching

Cache Strategies

Monitoring

See something that needs updating?