Parallel Prompt Execution

When steps don't depend on each other, running them in parallel dramatically reduces latency. This lesson covers patterns for effective parallel execution.

When to Parallelize

Parallel execution is appropriate when:

Steps are independent: No step needs another's output
Latency matters: Total time should be min(steps), not sum(steps)
Resources allow: API rate limits and system can handle concurrent calls

Basic Parallel Pattern

Run independent steps simultaneously:

async function parallelAnalysis(input) {
  // These don't depend on each other - run in parallel
  const [sentiment, entities, topics] = await Promise.all([
    analyzeSentiment(input),
    extractEntities(input),
    identifyTopics(input)
  ]);

  return { sentiment, entities, topics };
}

        ┌→ Sentiment Analysis ─┐
        │                      │
Input ──┼→ Entity Extraction ──┼→ Combined Output
        │                      │
        └→ Topic Identification┘

Time Comparison

Sequential (3 steps × 2 seconds each):
├── Step 1 ──────│
               ├── Step 2 ──────│
                              ├── Step 3 ──────│
Total: 6 seconds

Parallel (3 steps × 2 seconds each):
├── Step 1 ──────│
├── Step 2 ──────│  → Combined
├── Step 3 ──────│
Total: 2 seconds (3x faster)

Parallel Patterns

Fan-Out Pattern

Send the same input to multiple processors:

Loading Prompt Playground...

Map Pattern

Apply the same operation to multiple items:

async function parallelMap(items, processItem) {
  const results = await Promise.all(
    items.map(item => processItem(item))
  );
  return results;
}

// Example: Summarize multiple documents
const summaries = await parallelMap(documents, summarizeDocument);

Concurrent Limit Pattern

Control how many operations run simultaneously:

async function parallelWithLimit(items, processor, limit = 5) {
  const results = [];

  for (let i = 0; i < items.length; i += limit) {
    const batch = items.slice(i, i + limit);
    const batchResults = await Promise.all(
      batch.map(item => processor(item))
    );
    results.push(...batchResults);
  }

  return results;
}

Handling Parallel Results

Wait for All (Promise.all)

All must succeed:

try {
  const [a, b, c] = await Promise.all([stepA(), stepB(), stepC()]);
  return combineResults(a, b, c);
} catch (error) {
  // If ANY fails, we get an error
  console.error('At least one step failed:', error);
}

Wait for All, Allow Failures (Promise.allSettled)

Continue even if some fail:

const results = await Promise.allSettled([stepA(), stepB(), stepC()]);

const successful = results
  .filter(r => r.status === 'fulfilled')
  .map(r => r.value);

const failed = results
  .filter(r => r.status === 'rejected')
  .map(r => r.reason);

if (successful.length >= 2) {
  // Proceed with partial results
  return combinePartialResults(successful);
}

Race Pattern (First to Complete)

Use the first successful result:

// Try multiple models, use first response
const result = await Promise.race([
  queryModelA(prompt),
  queryModelB(prompt),
  queryModelC(prompt)
]);

Race with Timeout

async function withTimeout(promise, ms) {
  const timeout = new Promise((_, reject) =>
    setTimeout(() => reject(new Error('Timeout')), ms)
  );
  return Promise.race([promise, timeout]);
}

const result = await withTimeout(longRunningStep(), 10000);

Combining Parallel and Sequential

Most real chains mix both patterns:

async function mixedChain(input) {
  // Sequential: Must extract before analyzing
  const extracted = await extract(input);

  // Parallel: Independent analyses
  const [sentiment, entities, summary] = await Promise.all([
    analyzeSentiment(extracted),
    extractEntities(extracted),
    generateSummary(extracted)
  ]);

  // Sequential: Combine results
  const combined = await combineAnalyses({ sentiment, entities, summary });

  // Parallel: Generate outputs
  const [report, email, slack] = await Promise.all([
    formatAsReport(combined),
    formatAsEmail(combined),
    formatAsSlack(combined)
  ]);

  return { report, email, slack };
}

Input → Extract → ┌→ Sentiment ─┐     ┌→ Report ─┐
                  ├→ Entities ──┼→ Combine → ├→ Email ──┼→ Outputs
                  └→ Summary ───┘     └→ Slack ──┘

Rate Limiting and Throttling

Token Bucket Pattern

class TokenBucket {
  constructor(capacity, refillRate) {
    this.tokens = capacity;
    this.capacity = capacity;
    this.refillRate = refillRate; // tokens per second

    setInterval(() => {
      this.tokens = Math.min(this.capacity, this.tokens + this.refillRate);
    }, 1000);
  }

  async acquire() {
    while (this.tokens < 1) {
      await sleep(100);
    }
    this.tokens--;
  }
}

const limiter = new TokenBucket(10, 2); // 10 max, refill 2/sec

async function rateLimitedParallel(items, processor) {
  return Promise.all(items.map(async item => {
    await limiter.acquire();
    return processor(item);
  }));
}

Adaptive Throttling

class AdaptiveThrottler {
  constructor() {
    this.delay = 100;
    this.minDelay = 50;
    this.maxDelay = 5000;
  }

  async execute(fn) {
    await sleep(this.delay);

    try {
      const result = await fn();
      // Success: reduce delay
      this.delay = Math.max(this.minDelay, this.delay * 0.9);
      return result;
    } catch (error) {
      if (error.code === 'RATE_LIMITED') {
        // Rate limited: increase delay
        this.delay = Math.min(this.maxDelay, this.delay * 2);
      }
      throw error;
    }
  }
}

Exercise: Design Parallel Execution

Design parallel execution for this scenario:

Loading Prompt Playground...

Key Takeaways

Parallelize independent steps to reduce latency
Use Promise.all for all-or-nothing parallel execution
Use Promise.allSettled when partial results are acceptable
Use Promise.race for first-to-complete scenarios
Mix parallel and sequential patterns as dependencies require
Implement rate limiting to respect API constraints
Batch large item sets with concurrency limits
Handle partial failures gracefully

Next, we'll explore fan-out and fan-in patterns for more complex parallel workflows.

Parallel Prompt Execution

When steps don't depend on each other, running them in parallel dramatically reduces latency. This lesson covers patterns for effective parallel execution.

When to Parallelize

Parallel execution is appropriate when:

Steps are independent: No step needs another's output
Latency matters: Total time should be min(steps), not sum(steps)
Resources allow: API rate limits and system can handle concurrent calls

Basic Parallel Pattern

Run independent steps simultaneously:

async function parallelAnalysis(input) {
  // These don't depend on each other - run in parallel
  const [sentiment, entities, topics] = await Promise.all([
    analyzeSentiment(input),
    extractEntities(input),
    identifyTopics(input)
  ]);

  return { sentiment, entities, topics };
}

        ┌→ Sentiment Analysis ─┐
        │                      │
Input ──┼→ Entity Extraction ──┼→ Combined Output
        │                      │
        └→ Topic Identification┘

Time Comparison

Sequential (3 steps × 2 seconds each):
├── Step 1 ──────│
               ├── Step 2 ──────│
                              ├── Step 3 ──────│
Total: 6 seconds

Parallel (3 steps × 2 seconds each):
├── Step 1 ──────│
├── Step 2 ──────│  → Combined
├── Step 3 ──────│
Total: 2 seconds (3x faster)

Parallel Patterns

Fan-Out Pattern

Send the same input to multiple processors:

Loading Prompt Playground...

Map Pattern

Apply the same operation to multiple items:

async function parallelMap(items, processItem) {
  const results = await Promise.all(
    items.map(item => processItem(item))
  );
  return results;
}

// Example: Summarize multiple documents
const summaries = await parallelMap(documents, summarizeDocument);

Concurrent Limit Pattern

Control how many operations run simultaneously:

async function parallelWithLimit(items, processor, limit = 5) {
  const results = [];

  for (let i = 0; i < items.length; i += limit) {
    const batch = items.slice(i, i + limit);
    const batchResults = await Promise.all(
      batch.map(item => processor(item))
    );
    results.push(...batchResults);
  }

  return results;
}

Handling Parallel Results

Wait for All (Promise.all)

All must succeed:

try {
  const [a, b, c] = await Promise.all([stepA(), stepB(), stepC()]);
  return combineResults(a, b, c);
} catch (error) {
  // If ANY fails, we get an error
  console.error('At least one step failed:', error);
}

Wait for All, Allow Failures (Promise.allSettled)

Continue even if some fail:

const results = await Promise.allSettled([stepA(), stepB(), stepC()]);

const successful = results
  .filter(r => r.status === 'fulfilled')
  .map(r => r.value);

const failed = results
  .filter(r => r.status === 'rejected')
  .map(r => r.reason);

if (successful.length >= 2) {
  // Proceed with partial results
  return combinePartialResults(successful);
}

Race Pattern (First to Complete)

Use the first successful result:

// Try multiple models, use first response
const result = await Promise.race([
  queryModelA(prompt),
  queryModelB(prompt),
  queryModelC(prompt)
]);

Race with Timeout

async function withTimeout(promise, ms) {
  const timeout = new Promise((_, reject) =>
    setTimeout(() => reject(new Error('Timeout')), ms)
  );
  return Promise.race([promise, timeout]);
}

const result = await withTimeout(longRunningStep(), 10000);

Combining Parallel and Sequential

Most real chains mix both patterns:

async function mixedChain(input) {
  // Sequential: Must extract before analyzing
  const extracted = await extract(input);

  // Parallel: Independent analyses
  const [sentiment, entities, summary] = await Promise.all([
    analyzeSentiment(extracted),
    extractEntities(extracted),
    generateSummary(extracted)
  ]);

  // Sequential: Combine results
  const combined = await combineAnalyses({ sentiment, entities, summary });

  // Parallel: Generate outputs
  const [report, email, slack] = await Promise.all([
    formatAsReport(combined),
    formatAsEmail(combined),
    formatAsSlack(combined)
  ]);

  return { report, email, slack };
}

Input → Extract → ┌→ Sentiment ─┐     ┌→ Report ─┐
                  ├→ Entities ──┼→ Combine → ├→ Email ──┼→ Outputs
                  └→ Summary ───┘     └→ Slack ──┘

Rate Limiting and Throttling

Token Bucket Pattern

class TokenBucket {
  constructor(capacity, refillRate) {
    this.tokens = capacity;
    this.capacity = capacity;
    this.refillRate = refillRate; // tokens per second

    setInterval(() => {
      this.tokens = Math.min(this.capacity, this.tokens + this.refillRate);
    }, 1000);
  }

  async acquire() {
    while (this.tokens < 1) {
      await sleep(100);
    }
    this.tokens--;
  }
}

const limiter = new TokenBucket(10, 2); // 10 max, refill 2/sec

async function rateLimitedParallel(items, processor) {
  return Promise.all(items.map(async item => {
    await limiter.acquire();
    return processor(item);
  }));
}

Adaptive Throttling

class AdaptiveThrottler {
  constructor() {
    this.delay = 100;
    this.minDelay = 50;
    this.maxDelay = 5000;
  }

  async execute(fn) {
    await sleep(this.delay);

    try {
      const result = await fn();
      // Success: reduce delay
      this.delay = Math.max(this.minDelay, this.delay * 0.9);
      return result;
    } catch (error) {
      if (error.code === 'RATE_LIMITED') {
        // Rate limited: increase delay
        this.delay = Math.min(this.maxDelay, this.delay * 2);
      }
      throw error;
    }
  }
}

Exercise: Design Parallel Execution

Design parallel execution for this scenario:

Loading Prompt Playground...

Key Takeaways

Parallelize independent steps to reduce latency
Use Promise.all for all-or-nothing parallel execution
Use Promise.allSettled when partial results are acceptable
Use Promise.race for first-to-complete scenarios
Mix parallel and sequential patterns as dependencies require
Implement rate limiting to respect API constraints
Batch large item sets with concurrency limits
Handle partial failures gracefully

Next, we'll explore fan-out and fan-in patterns for more complex parallel workflows.

Parallel Prompt Execution

When to Parallelize

Basic Parallel Pattern

Time Comparison

Parallel Patterns

Fan-Out Pattern

Map Pattern

Concurrent Limit Pattern

Handling Parallel Results

Wait for All (Promise.all)

Wait for All, Allow Failures (Promise.allSettled)

Race Pattern (First to Complete)

Race with Timeout

Combining Parallel and Sequential

Rate Limiting and Throttling

Token Bucket Pattern

Adaptive Throttling

Exercise: Design Parallel Execution

Key Takeaways

Discussion

Parallel Prompt Execution

When to Parallelize

Basic Parallel Pattern

Time Comparison

Parallel Patterns

Fan-Out Pattern

Map Pattern

Concurrent Limit Pattern

Handling Parallel Results

Wait for All (Promise.all)

Wait for All, Allow Failures (Promise.allSettled)

Race Pattern (First to Complete)

Race with Timeout

Combining Parallel and Sequential

Rate Limiting and Throttling

Token Bucket Pattern

Adaptive Throttling

Exercise: Design Parallel Execution

Key Takeaways

Discussion