Prompt Engineering for Grounding
Introduction
We've retrieved relevant context from our knowledge base. Now comes a critical step: constructing prompts that guide the LLM to use this context effectively. This is prompt engineering for grounding—the art of ensuring the LLM's response is based on the provided context rather than its training knowledge.
Poor prompting leads to hallucination, off-topic responses, and unreliable answers. Good prompting produces accurate, well-sourced responses that users can trust.
The Anatomy of a Grounded Prompt
Three Essential Components
Every grounded prompt needs:
- System Instruction: Establishes the LLM's role, rules, and constraints
- Context: The retrieved documents the LLM should use
- User Query: The question to answer
The Context Sandwich
A reliable pattern is the "context sandwich"—placing retrieved context between clear instructions:
┌─────────────────────────────────────────────────┐
│ SYSTEM INSTRUCTION │
│ (Role, rules, constraints) │
├─────────────────────────────────────────────────┤
│ RETRIEVED CONTEXT │
│ (Documents from knowledge base) │
├─────────────────────────────────────────────────┤
│ USER QUERY │
│ (The question to answer) │
└─────────────────────────────────────────────────┘
This structure:
- Establishes expectations before presenting context
- Clearly delineates context from the question
- Reinforces instructions after context injection
Designing System Instructions
The Foundation: Role and Rules
The system instruction defines the LLM's behavior:
const systemInstruction = `You are a helpful documentation assistant for our software product. Your role is to answer questions using ONLY the information provided in the context below.
RULES:
1. Base your answers ONLY on the provided context
2. If the context doesn't contain the answer, say "I don't have information about that in the documentation"
3. Never make up information or use your general knowledge
4. Cite the source when providing information
5. Be concise but thorough`;
Grounding Rules in Detail
Let's examine each rule type:
Exclusivity Rule:
"Answer ONLY using the provided context"
Tells the LLM to ignore its training knowledge. Critical for preventing hallucination.
Fallback Rule:
"If the context doesn't contain the answer, say..."
Provides a safe default when context is insufficient. Without this, the LLM may guess.
No Fabrication Rule:
"Never make up information or use your general knowledge"
Explicit prohibition reinforces the exclusivity rule.
Citation Rule:
"Cite the source when providing information"
Encourages attribution, making responses verifiable.
Comprehensive System Instruction Example
const systemInstruction = `You are a knowledgeable documentation assistant for TechCorp's developer platform. Your purpose is to help developers understand our APIs, SDKs, and best practices.
IMPORTANT RULES:
1. Answer questions using ONLY the documentation provided in the context section
2. If the documentation doesn't contain enough information to answer fully, acknowledge what you can answer and what you cannot
3. Never invent features, endpoints, or capabilities that aren't in the documentation
4. When citing information, reference the source document (e.g., "According to the Authentication Guide...")
5. If you're unsure, say so rather than guessing
RESPONSE FORMAT:
- Be direct and clear
- Use code examples when relevant
- Structure complex answers with bullet points or numbered steps
- End with a brief summary for longer answers
CONTEXT BELOW:
---`;
Adapting Tone and Persona
The system instruction can shape the LLM's personality:
Technical/Professional:
"You are a senior software architect. Provide detailed, technically accurate responses suitable for experienced developers."
Friendly/Supportive:
"You are a helpful assistant. Explain concepts clearly and encouragingly, suitable for developers learning our platform."
Concise/Efficient:
"You are a quick-reference assistant. Provide brief, actionable answers. Avoid lengthy explanations unless asked."
Structuring the Prompt
The Complete Prompt Structure
function buildPrompt(
systemInstruction: string,
context: string,
userQuery: string
): string {
return `${systemInstruction}
CONTEXT:
${context}
---
USER QUESTION: ${userQuery}
ASSISTANT:`;
}
Example Complete Prompt
You are a documentation assistant for our software product. Answer questions using ONLY the provided context. If the context doesn't contain the answer, say "I don't have information about that in the documentation."
CONTEXT:
[Document 1]
Source: authentication.md
Title: Setting Up Authentication
To authenticate API requests, include your API key in the Authorization header:
Authorization: Bearer YOUR_API_KEY
API keys can be generated in your dashboard under Settings > API Keys.
---
[Document 2]
Source: rate-limits.md
Title: Rate Limiting
All API endpoints are rate-limited to 100 requests per minute per API key. Exceeding this limit will result in a 429 Too Many Requests response.
---
USER QUESTION: How do I authenticate my API requests?
ASSISTANT:
Context Formatting Best Practices
Clear Delineation:
Use visual separators (---, ===, or markdown headers) between documents.
Source Attribution: Include source information for each chunk so the LLM can cite it.
Relevance Ordering: Place most relevant documents first—LLMs tend to weight early content more heavily.
Token Budget Awareness: Don't exceed the model's context window. For Gemini 1.5 Flash, this is generous (128K tokens), but context length affects cost and latency.
Generation Parameters
Temperature Control
Temperature controls randomness in the LLM's output:
| Temperature | Effect | Use Case |
|---|---|---|
| 0.0 | Deterministic, most likely tokens | Factual Q&A, coding |
| 0.3 - 0.5 | Slightly varied, still focused | Documentation, explanations |
| 0.7 - 0.9 | Creative, more varied | Creative writing, brainstorming |
| 1.0+ | Highly random | Experimental uses |
For RAG applications, use low temperature (0.0 - 0.3). We want consistent, factual responses based on context, not creative interpretations.
Top-K and Top-P
Top-K: Limits the model to considering only the K most likely next tokens.
Top-P (nucleus sampling): Considers tokens whose cumulative probability exceeds P.
For grounded responses:
- Top-K: 40 (reasonable default)
- Top-P: 0.95 (allows some variation while staying focused)
Recommended Configuration
const generationConfig = {
temperature: 0.1, // Low for factual accuracy
topK: 40, // Reasonable diversity
topP: 0.95, // Nucleus sampling
maxOutputTokens: 1024 // Reasonable response length
};
When to Adjust Parameters
Increase temperature (to 0.3-0.5) when:
- Responses are too repetitive
- You want more natural-sounding language
- The task is exploratory
Decrease temperature (to 0.0) when:
- Accuracy is critical
- You need reproducible responses
- The task is factual/technical
Handling Edge Cases
No Relevant Context
When retrieval returns nothing, the prompt should reflect this:
function buildPromptWithFallback(
systemInstruction: string,
context: string | null,
userQuery: string
): string {
if (!context) {
return `${systemInstruction}
NOTE: No relevant documentation was found for this query.
USER QUESTION: ${userQuery}
ASSISTANT: I don't have information about that in the documentation. `;
}
return buildPrompt(systemInstruction, context, userQuery);
}
Partial Information
Sometimes context partially answers the question:
const systemInstruction = `...
HANDLING PARTIAL INFORMATION:
- If you can answer part of the question, do so and clearly state what you cannot answer
- For example: "Based on the documentation, [answer partial info]. However, I don't have information about [missing part]."
`;
Contradictory Context
Occasionally, retrieved chunks may seem contradictory (e.g., old and new documentation):
const systemInstruction = `...
HANDLING CONFLICTING INFORMATION:
- If documents seem to contradict each other, acknowledge this
- Prefer information from the source that appears more recent or specific
- Example: "There may be different versions of this feature. According to [newer source], ..."
`;
Out-of-Scope Questions
Users might ask questions outside your knowledge base:
const systemInstruction = `...
SCOPE:
This assistant answers questions about TechCorp's developer platform. For other topics (general programming, other products), politely redirect the user.
Example response: "I'm specialized in TechCorp's developer documentation. For general programming questions, I'd recommend resources like Stack Overflow or MDN."
`;
Prompt Templates
Reusable Prompt Builder
// lib/prompts.ts
interface PromptConfig {
productName: string;
tone: 'professional' | 'friendly' | 'concise';
allowPartialAnswers: boolean;
}
function createSystemInstruction(config: PromptConfig): string {
const toneInstructions = {
professional: 'Provide detailed, technically accurate responses.',
friendly: 'Be helpful and encouraging. Explain concepts clearly.',
concise: 'Be brief and direct. Provide actionable answers.'
};
const partialAnswerRule = config.allowPartialAnswers
? 'If you can partially answer, do so and indicate what information is missing.'
: 'Only provide complete answers. If information is incomplete, say so.';
return `You are a documentation assistant for ${config.productName}.
${toneInstructions[config.tone]}
RULES:
1. Answer using ONLY the provided context
2. If the context lacks information, say "I don't have documentation about that"
3. Never fabricate information
4. ${partialAnswerRule}
5. Cite sources when possible
CONTEXT BELOW:`;
}
function buildFullPrompt(
config: PromptConfig,
context: string,
query: string
): string {
const systemInstruction = createSystemInstruction(config);
return `${systemInstruction}
${context}
---
USER QUESTION: ${query}`;
}
// Usage
const prompt = buildFullPrompt(
{
productName: 'TechCorp Developer Platform',
tone: 'professional',
allowPartialAnswers: true
},
formattedContext,
userQuery
);
Testing and Iteration
Evaluating Prompt Quality
Test your prompts with diverse queries:
Factual Questions:
- Can it answer questions directly stated in context?
- Does it cite sources?
Questions Without Context:
- Does it correctly say "I don't know"?
- Does it avoid hallucinating?
Ambiguous Questions:
- Does it handle partial information well?
- Does it ask for clarification when appropriate?
Out-of-Scope Questions:
- Does it stay within its defined role?
- Does it redirect appropriately?
Iterative Improvement
- Start simple: Basic system instruction + context + query
- Test with real queries: Collect common user questions
- Identify failures: Where does it hallucinate? Where does it refuse to answer?
- Refine instructions: Add rules to address specific failure modes
- Repeat: Prompting is iterative—expect multiple rounds of refinement
Summary
In this lesson, we explored prompt engineering for grounded responses:
Key Takeaways:
-
The context sandwich structure works: System instruction → Context → Query
-
Explicit rules prevent hallucination: "ONLY use the provided context" must be clear
-
Fallback instructions are essential: Define what to do when context is insufficient
-
Low temperature improves accuracy: Use 0.0-0.3 for factual RAG applications
-
Handle edge cases explicitly: No context, partial info, contradictions, out-of-scope
-
Iterate based on real usage: Test with diverse queries and refine
Next Steps
With our context retrieved and our prompt constructed, it's time to call the LLM and generate a response. In the next lesson, we'll cover The Generation Phase—making API calls to Gemini and implementing streaming for real-time responses.
"A well-crafted prompt is worth a thousand parameters." — Unknown

