Parallel Prompt Execution
When steps don't depend on each other, running them in parallel dramatically reduces latency. This lesson covers patterns for effective parallel execution.
When to Parallelize
Parallel execution is appropriate when:
- Steps are independent: No step needs another's output
- Latency matters: Total time should be min(steps), not sum(steps)
- Resources allow: API rate limits and system can handle concurrent calls
Basic Parallel Pattern
Run independent steps simultaneously:
async function parallelAnalysis(input) {
// These don't depend on each other - run in parallel
const [sentiment, entities, topics] = await Promise.all([
analyzeSentiment(input),
extractEntities(input),
identifyTopics(input)
]);
return { sentiment, entities, topics };
}
┌→ Sentiment Analysis ─┐
│ │
Input ──┼→ Entity Extraction ──┼→ Combined Output
│ │
└→ Topic Identification┘
Time Comparison
Sequential (3 steps × 2 seconds each):
├── Step 1 ──────│
├── Step 2 ──────│
├── Step 3 ──────│
Total: 6 seconds
Parallel (3 steps × 2 seconds each):
├── Step 1 ──────│
├── Step 2 ──────│ → Combined
├── Step 3 ──────│
Total: 2 seconds (3x faster)
Parallel Patterns
Fan-Out Pattern
Send the same input to multiple processors:
Loading Prompt Playground...
Map Pattern
Apply the same operation to multiple items:
async function parallelMap(items, processItem) {
const results = await Promise.all(
items.map(item => processItem(item))
);
return results;
}
// Example: Summarize multiple documents
const summaries = await parallelMap(documents, summarizeDocument);
Concurrent Limit Pattern
Control how many operations run simultaneously:
async function parallelWithLimit(items, processor, limit = 5) {
const results = [];
for (let i = 0; i < items.length; i += limit) {
const batch = items.slice(i, i + limit);
const batchResults = await Promise.all(
batch.map(item => processor(item))
);
results.push(...batchResults);
}
return results;
}
Handling Parallel Results
Wait for All (Promise.all)
All must succeed:
try {
const [a, b, c] = await Promise.all([stepA(), stepB(), stepC()]);
return combineResults(a, b, c);
} catch (error) {
// If ANY fails, we get an error
console.error('At least one step failed:', error);
}
Wait for All, Allow Failures (Promise.allSettled)
Continue even if some fail:
const results = await Promise.allSettled([stepA(), stepB(), stepC()]);
const successful = results
.filter(r => r.status === 'fulfilled')
.map(r => r.value);
const failed = results
.filter(r => r.status === 'rejected')
.map(r => r.reason);
if (successful.length >= 2) {
// Proceed with partial results
return combinePartialResults(successful);
}
Race Pattern (First to Complete)
Use the first successful result:
// Try multiple models, use first response
const result = await Promise.race([
queryModelA(prompt),
queryModelB(prompt),
queryModelC(prompt)
]);
Race with Timeout
async function withTimeout(promise, ms) {
const timeout = new Promise((_, reject) =>
setTimeout(() => reject(new Error('Timeout')), ms)
);
return Promise.race([promise, timeout]);
}
const result = await withTimeout(longRunningStep(), 10000);
Combining Parallel and Sequential
Most real chains mix both patterns:
async function mixedChain(input) {
// Sequential: Must extract before analyzing
const extracted = await extract(input);
// Parallel: Independent analyses
const [sentiment, entities, summary] = await Promise.all([
analyzeSentiment(extracted),
extractEntities(extracted),
generateSummary(extracted)
]);
// Sequential: Combine results
const combined = await combineAnalyses({ sentiment, entities, summary });
// Parallel: Generate outputs
const [report, email, slack] = await Promise.all([
formatAsReport(combined),
formatAsEmail(combined),
formatAsSlack(combined)
]);
return { report, email, slack };
}
Input → Extract → ┌→ Sentiment ─┐ ┌→ Report ─┐
├→ Entities ──┼→ Combine → ├→ Email ──┼→ Outputs
└→ Summary ───┘ └→ Slack ──┘
Rate Limiting and Throttling
Token Bucket Pattern
class TokenBucket {
constructor(capacity, refillRate) {
this.tokens = capacity;
this.capacity = capacity;
this.refillRate = refillRate; // tokens per second
setInterval(() => {
this.tokens = Math.min(this.capacity, this.tokens + this.refillRate);
}, 1000);
}
async acquire() {
while (this.tokens < 1) {
await sleep(100);
}
this.tokens--;
}
}
const limiter = new TokenBucket(10, 2); // 10 max, refill 2/sec
async function rateLimitedParallel(items, processor) {
return Promise.all(items.map(async item => {
await limiter.acquire();
return processor(item);
}));
}
Adaptive Throttling
class AdaptiveThrottler {
constructor() {
this.delay = 100;
this.minDelay = 50;
this.maxDelay = 5000;
}
async execute(fn) {
await sleep(this.delay);
try {
const result = await fn();
// Success: reduce delay
this.delay = Math.max(this.minDelay, this.delay * 0.9);
return result;
} catch (error) {
if (error.code === 'RATE_LIMITED') {
// Rate limited: increase delay
this.delay = Math.min(this.maxDelay, this.delay * 2);
}
throw error;
}
}
}
Exercise: Design Parallel Execution
Design parallel execution for this scenario:
Loading Prompt Playground...
Key Takeaways
- Parallelize independent steps to reduce latency
- Use Promise.all for all-or-nothing parallel execution
- Use Promise.allSettled when partial results are acceptable
- Use Promise.race for first-to-complete scenarios
- Mix parallel and sequential patterns as dependencies require
- Implement rate limiting to respect API constraints
- Batch large item sets with concurrency limits
- Handle partial failures gracefully
Next, we'll explore fan-out and fan-in patterns for more complex parallel workflows.

