Part 4: Integrating AI APIs & Services

article Part 4 of 6

Choosing an AI API Provider

Most web developers integrate AI through APIs rather than building models from scratch. Several providers offer powerful AI capabilities you can access with a few lines of code.

Major AI API Providers

Each provider has different strengths, pricing models, and compliance postures. Here's a practical comparison to help you choose:

Provider	Key Model(s)	Strengths	Context Window	Best For
OpenAI	GPT-4o, GPT-4o mini	Widest ecosystem, most third-party integrations, mature tooling	128K tokens	General-purpose apps, when ecosystem breadth matters
Anthropic	Claude 3.5 Sonnet, Claude 3 Haiku	Strong reasoning, nuanced instruction-following, safety focus	200K tokens	Complex reasoning tasks, long documents, safety-sensitive apps
Google	Gemini 1.5 Pro, Gemini Flash	Longest context window, multimodal (text + images + video + audio), generous free tier	1M tokens	Multimodal apps, processing very long documents, Google Cloud integrations
Meta (Llama 3)	Llama 3 (8B, 70B, 405B)	Open weights — can be self-hosted; no per-token cost once deployed	8K–128K tokens (varies by version)	Privacy-sensitive data, cost control at scale, on-premise requirements
Mistral	Mistral Large, Mistral 7B	European company, GDPR-friendly data processing agreements, competitive performance per cost	32K tokens	EU-based products, cost-optimised deployments, open-weight option available

How to choose: For most new projects, start with OpenAI (GPT-4o) or Anthropic (Claude 3.5 Sonnet) — both have polished APIs, solid documentation, and large developer communities. If your project handles sensitive data or has strict data residency requirements, consider Llama 3 (self-hosted) or Mistral. If you're building anything that needs to process very long documents (legal contracts, codebases, books), Gemini 1.5 Pro's 1M-token context window is hard to beat.

Getting Started with OpenAI API

Let's walk through a complete example using OpenAI's API (the most popular choice).

Step 1: Get API Key

Sign up at platform.openai.com
Navigate to API keys section
Create a new API key
Store it securely (never commit to GitHub!)

Step 2: Install SDK

# Node.js
npm install openai

# Python
pip install openai

Step 3: Basic Implementation

// Node.js/JavaScript example
import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY, // Store in environment variable
});

async function generateResponse(userMessage) {
  try {
    const completion = await openai.chat.completions.create({
      model: "gpt-4o",
      messages: [
        {
          role: "system",
          content: "You are a helpful customer support assistant."
        },
        {
          role: "user",
          content: userMessage
        }
      ],
      temperature: 0.7,
      max_tokens: 500,
    });

    return completion.choices[0].message.content;
  } catch (error) {
    console.error('OpenAI API error:', error);
    throw error;
  }
}

// Usage
const response = await generateResponse("How do I reset my password?");
console.log(response);

Key Parameters Explained

model – Which AI model to use (gpt-4o, gpt-4-turbo, gpt-3.5-turbo, etc.)
messages – Array of conversation messages with roles (system, user, assistant)
temperature – Randomness/creativity (0 = deterministic, 2 = very creative)
max_tokens – Maximum length of response (limits cost)
top_p – Alternative to temperature for controlling randomness
presence_penalty – Encourages talking about new topics (-2 to 2)
frequency_penalty – Reduces repetition (-2 to 2)

Architecture Patterns for AI Integration

Pattern 1: Client-Side Direct (Simple but Limited)

Why avoid: Your API key is exposed in client code, allowing anyone to use (and abuse) your quota.

Pattern 2: Backend Proxy (Recommended)

Implementation example:

// Backend API endpoint (Express.js)
import express from 'express';
import OpenAI from 'openai';

const app = express();
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

app.post('/api/chat', async (req, res) => {
  // 1. Authenticate user
  const user = await authenticateUser(req);
  if (!user) {
    return res.status(401).json({ error: 'Unauthorized' });
  }

  // 2. Rate limiting
  if (await isRateLimited(user.id)) {
    return res.status(429).json({ error: 'Too many requests' });
  }

  // 3. Validate and sanitize input
  const { message } = req.body;
  if (!message || message.length > 1000) {
    return res.status(400).json({ error: 'Invalid message' });
  }

  try {
    // 4. Call OpenAI API
    const completion = await openai.chat.completions.create({
      model: "gpt-4o",
      messages: [
        { role: "system", content: "You are a helpful assistant." },
        { role: "user", content: message }
      ],
      max_tokens: 500,
    });

    // 5. Log usage for billing
    await logUsage(user.id, completion.usage);

    // 6. Return response
    res.json({
      response: completion.choices[0].message.content,
      usage: completion.usage
    });
  } catch (error) {
    console.error('OpenAI error:', error);
    res.status(500).json({ error: 'AI service unavailable' });
  }
});

// Frontend code
async function askAI(message) {
  const response = await fetch('/api/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ message }),
  });

  if (!response.ok) throw new Error('Request failed');
  return response.json();
}

Pattern 3: Streaming Responses

For better UX, stream AI responses token-by-token instead of waiting for the complete response.

// Backend - streaming endpoint
app.post('/api/chat/stream', async (req, res) => {
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');

  const stream = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: req.body.message }],
    stream: true,
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || '';
    if (content) {
      res.write(`data: ${JSON.stringify({ content })}\n\n`);
    }
  }

  res.write('data: [DONE]\n\n');
  res.end();
});

// Frontend - receive streaming response
async function streamAIResponse(message) {
  const response = await fetch('/api/chat/stream', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ message }),
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { value, done } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n');

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = line.slice(6);
        if (data === '[DONE]') return;

        const parsed = JSON.parse(data);
        updateUI(parsed.content); // Update UI incrementally
      }
    }
  }
}

Cost Management Strategies

AI API costs can add up quickly. Implement these strategies to control expenses:

1. Caching

import Redis from 'ioredis';
const redis = new Redis();

async function getCachedAIResponse(prompt) {
  // Check cache first
  const cached = await redis.get(`ai:${hash(prompt)}`);
  if (cached) return JSON.parse(cached);

  // Call AI if not cached
  const response = await openai.chat.completions.create({...});
  const result = response.choices[0].message.content;

  // Cache for 1 hour
  await redis.setex(`ai:${hash(prompt)}`, 3600, JSON.stringify(result));

  return result;
}

2. Token Limits & Truncation

import { encoding_for_model } from 'tiktoken';

function truncateToTokenLimit(text, maxTokens = 4000) {
  const encoding = encoding_for_model('gpt-4');
  const tokens = encoding.encode(text);

  if (tokens.length <= maxTokens) return text;

  // Truncate and decode back to text
  const truncated = tokens.slice(0, maxTokens);
  return encoding.decode(truncated);
}

3. Model Selection

// Use cheaper models for simple tasks
function chooseModel(taskComplexity) {
  if (taskComplexity === 'simple') {
    return 'gpt-3.5-turbo'; // $0.0005 per 1K tokens
  } else {
    return 'gpt-4o'; // $0.0025-$0.01 per 1K tokens
  }
}

4. Rate Limiting

import rateLimit from 'express-rate-limit';

const aiRateLimiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 50, // Limit each user to 50 requests per window
  message: 'Too many AI requests, please try again later.',
});

app.post('/api/chat', aiRateLimiter, async (req, res) => {
  // ... handle request
});

Error Handling Best Practices

async function robustAICall(prompt, options = {}) {
  const maxRetries = 3;
  let lastError;

  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await openai.chat.completions.create({
        ...options,
        messages: [{ role: "user", content: prompt }],
      });

      return response.choices[0].message.content;

    } catch (error) {
      lastError = error;

      // Don't retry on certain errors
      if (error.status === 400) {
        throw new Error('Invalid request: ' + error.message);
      }

      // Retry on rate limits with exponential backoff
      if (error.status === 429) {
        const delay = Math.pow(2, attempt) * 1000;
        console.log(`Rate limited, retrying in ${delay}ms...`);
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }

      // Retry on server errors
      if (error.status >= 500) {
        console.log(`Server error, retrying attempt ${attempt + 1}...`);
        await new Promise(resolve => setTimeout(resolve, 1000));
        continue;
      }

      throw error;
    }
  }

  throw new Error(`Failed after ${maxRetries} attempts: ${lastError.message}`);
}

Key Takeaways

Major AI providers: OpenAI (most popular), Anthropic (best reasoning), Google (free tier), open source (privacy/control).
Never expose API keys in client code—always use a backend proxy.
Backend proxy pattern: frontend → your server → AI API (enables auth, rate limiting, input validation).
Stream responses for better UX—show text as it's generated instead of waiting.
Manage costs: cache responses, set token limits, choose cheaper models for simple tasks, implement rate limiting.
Robust error handling: retry with exponential backoff for rate limits, don't retry on client errors (400s).
Key parameters: model, temperature (randomness), max_tokens (cost control), messages (conversation context).
Monitor usage and set budget alerts to avoid surprise bills.
Log all AI requests for debugging, analytics, and compliance.
Implement timeouts to prevent hanging requests.

Next, let's explore AI-powered development tools that can accelerate your workflow.

check

Mark this lesson as complete