Prompt Engineering for Grounding

Introduction

We've retrieved relevant context from our knowledge base. Now comes a critical step: constructing prompts that guide the LLM to use this context effectively. This is prompt engineering for grounding—the art of ensuring the LLM's response is based on the provided context rather than its training knowledge.

Poor prompting leads to hallucination, off-topic responses, and unreliable answers. Good prompting produces accurate, well-sourced responses that users can trust.

The Anatomy of a Grounded Prompt

Three Essential Components

Every grounded prompt needs:

System Instruction: Establishes the LLM's role, rules, and constraints
Context: The retrieved documents the LLM should use
User Query: The question to answer

The Context Sandwich

A reliable pattern is the "context sandwich"—placing retrieved context between clear instructions:

┌─────────────────────────────────────────────────┐
│            SYSTEM INSTRUCTION                   │
│  (Role, rules, constraints)                     │
├─────────────────────────────────────────────────┤
│            RETRIEVED CONTEXT                    │
│  (Documents from knowledge base)                │
├─────────────────────────────────────────────────┤
│            USER QUERY                           │
│  (The question to answer)                       │
└─────────────────────────────────────────────────┘

This structure:

Establishes expectations before presenting context
Clearly delineates context from the question
Reinforces instructions after context injection

Designing System Instructions

The Foundation: Role and Rules

The system instruction defines the LLM's behavior:

const systemInstruction = `You are a helpful documentation assistant for our software product. Your role is to answer questions using ONLY the information provided in the context below.

RULES:
1. Base your answers ONLY on the provided context
2. If the context doesn't contain the answer, say "I don't have information about that in the documentation"
3. Never make up information or use your general knowledge
4. Cite the source when providing information
5. Be concise but thorough`;

Grounding Rules in Detail

Let's examine each rule type:

Exclusivity Rule:

"Answer ONLY using the provided context"

Tells the LLM to ignore its training knowledge. Critical for preventing hallucination.

Fallback Rule:

"If the context doesn't contain the answer, say..."

Provides a safe default when context is insufficient. Without this, the LLM may guess.

No Fabrication Rule:

"Never make up information or use your general knowledge"

Explicit prohibition reinforces the exclusivity rule.

Citation Rule:

"Cite the source when providing information"

Encourages attribution, making responses verifiable.

Comprehensive System Instruction Example

const systemInstruction = `You are a knowledgeable documentation assistant for TechCorp's developer platform. Your purpose is to help developers understand our APIs, SDKs, and best practices.

IMPORTANT RULES:
1. Answer questions using ONLY the documentation provided in the context section
2. If the documentation doesn't contain enough information to answer fully, acknowledge what you can answer and what you cannot
3. Never invent features, endpoints, or capabilities that aren't in the documentation
4. When citing information, reference the source document (e.g., "According to the Authentication Guide...")
5. If you're unsure, say so rather than guessing

RESPONSE FORMAT:
- Be direct and clear
- Use code examples when relevant
- Structure complex answers with bullet points or numbered steps
- End with a brief summary for longer answers

CONTEXT BELOW:
---`;

Adapting Tone and Persona

The system instruction can shape the LLM's personality:

Technical/Professional:

"You are a senior software architect. Provide detailed, technically accurate responses suitable for experienced developers."

Friendly/Supportive:

"You are a helpful assistant. Explain concepts clearly and encouragingly, suitable for developers learning our platform."

Concise/Efficient:

"You are a quick-reference assistant. Provide brief, actionable answers. Avoid lengthy explanations unless asked."

Structuring the Prompt

The Complete Prompt Structure

function buildPrompt(
  systemInstruction: string,
  context: string,
  userQuery: string
): string {
  return `${systemInstruction}

CONTEXT:
${context}

---

USER QUESTION: ${userQuery}

ASSISTANT:`;
}

Example Complete Prompt

You are a documentation assistant for our software product. Answer questions using ONLY the provided context. If the context doesn't contain the answer, say "I don't have information about that in the documentation."

CONTEXT:
[Document 1]
Source: authentication.md
Title: Setting Up Authentication

To authenticate API requests, include your API key in the Authorization header:

Authorization: Bearer YOUR_API_KEY

API keys can be generated in your dashboard under Settings > API Keys.

---

[Document 2]
Source: rate-limits.md
Title: Rate Limiting

All API endpoints are rate-limited to 100 requests per minute per API key. Exceeding this limit will result in a 429 Too Many Requests response.

---

USER QUESTION: How do I authenticate my API requests?

ASSISTANT:

Context Formatting Best Practices

Clear Delineation: Use visual separators (---, ===, or markdown headers) between documents.

Source Attribution: Include source information for each chunk so the LLM can cite it.

Relevance Ordering: Place most relevant documents first—LLMs tend to weight early content more heavily.

Token Budget Awareness: Don't exceed the model's context window. For Gemini 1.5 Flash, this is generous (128K tokens), but context length affects cost and latency.

Generation Parameters

Temperature Control

Temperature controls randomness in the LLM's output:

Temperature	Effect	Use Case
0.0	Deterministic, most likely tokens	Factual Q&A, coding
0.3 - 0.5	Slightly varied, still focused	Documentation, explanations
0.7 - 0.9	Creative, more varied	Creative writing, brainstorming
1.0+	Highly random	Experimental uses

For RAG applications, use low temperature (0.0 - 0.3). We want consistent, factual responses based on context, not creative interpretations.

Top-K and Top-P

Top-K: Limits the model to considering only the K most likely next tokens.

Top-P (nucleus sampling): Considers tokens whose cumulative probability exceeds P.

For grounded responses:

Top-K: 40 (reasonable default)
Top-P: 0.95 (allows some variation while staying focused)

Recommended Configuration

const generationConfig = {
  temperature: 0.1,     // Low for factual accuracy
  topK: 40,             // Reasonable diversity
  topP: 0.95,           // Nucleus sampling
  maxOutputTokens: 1024 // Reasonable response length
};

When to Adjust Parameters

Increase temperature (to 0.3-0.5) when:

Responses are too repetitive
You want more natural-sounding language
The task is exploratory

Decrease temperature (to 0.0) when:

Accuracy is critical
You need reproducible responses
The task is factual/technical

Handling Edge Cases

No Relevant Context

When retrieval returns nothing, the prompt should reflect this:

function buildPromptWithFallback(
  systemInstruction: string,
  context: string | null,
  userQuery: string
): string {
  if (!context) {
    return `${systemInstruction}

NOTE: No relevant documentation was found for this query.

USER QUESTION: ${userQuery}

ASSISTANT: I don't have information about that in the documentation. `;
  }

  return buildPrompt(systemInstruction, context, userQuery);
}

Partial Information

Sometimes context partially answers the question:

const systemInstruction = `...

HANDLING PARTIAL INFORMATION:
- If you can answer part of the question, do so and clearly state what you cannot answer
- For example: "Based on the documentation, [answer partial info]. However, I don't have information about [missing part]."
`;

Contradictory Context

Occasionally, retrieved chunks may seem contradictory (e.g., old and new documentation):

const systemInstruction = `...

HANDLING CONFLICTING INFORMATION:
- If documents seem to contradict each other, acknowledge this
- Prefer information from the source that appears more recent or specific
- Example: "There may be different versions of this feature. According to [newer source], ..."
`;

Out-of-Scope Questions

Users might ask questions outside your knowledge base:

const systemInstruction = `...

SCOPE:
This assistant answers questions about TechCorp's developer platform. For other topics (general programming, other products), politely redirect the user.

Example response: "I'm specialized in TechCorp's developer documentation. For general programming questions, I'd recommend resources like Stack Overflow or MDN."
`;

Prompt Templates

Reusable Prompt Builder

// lib/prompts.ts

interface PromptConfig {
  productName: string;
  tone: 'professional' | 'friendly' | 'concise';
  allowPartialAnswers: boolean;
}

function createSystemInstruction(config: PromptConfig): string {
  const toneInstructions = {
    professional: 'Provide detailed, technically accurate responses.',
    friendly: 'Be helpful and encouraging. Explain concepts clearly.',
    concise: 'Be brief and direct. Provide actionable answers.'
  };

  const partialAnswerRule = config.allowPartialAnswers
    ? 'If you can partially answer, do so and indicate what information is missing.'
    : 'Only provide complete answers. If information is incomplete, say so.';

  return `You are a documentation assistant for ${config.productName}.

${toneInstructions[config.tone]}

RULES:
1. Answer using ONLY the provided context
2. If the context lacks information, say "I don't have documentation about that"
3. Never fabricate information
4. ${partialAnswerRule}
5. Cite sources when possible

CONTEXT BELOW:`;
}

function buildFullPrompt(
  config: PromptConfig,
  context: string,
  query: string
): string {
  const systemInstruction = createSystemInstruction(config);

  return `${systemInstruction}

${context}

---

USER QUESTION: ${query}`;
}

// Usage
const prompt = buildFullPrompt(
  {
    productName: 'TechCorp Developer Platform',
    tone: 'professional',
    allowPartialAnswers: true
  },
  formattedContext,
  userQuery
);

Testing and Iteration

Evaluating Prompt Quality

Test your prompts with diverse queries:

Factual Questions:

Can it answer questions directly stated in context?
Does it cite sources?

Questions Without Context:

Does it correctly say "I don't know"?
Does it avoid hallucinating?

Ambiguous Questions:

Does it handle partial information well?
Does it ask for clarification when appropriate?

Out-of-Scope Questions:

Does it stay within its defined role?
Does it redirect appropriately?

Iterative Improvement

Start simple: Basic system instruction + context + query
Test with real queries: Collect common user questions
Identify failures: Where does it hallucinate? Where does it refuse to answer?
Refine instructions: Add rules to address specific failure modes
Repeat: Prompting is iterative—expect multiple rounds of refinement

Summary

In this lesson, we explored prompt engineering for grounded responses:

Key Takeaways:

The context sandwich structure works: System instruction → Context → Query
Explicit rules prevent hallucination: "ONLY use the provided context" must be clear
Fallback instructions are essential: Define what to do when context is insufficient
Low temperature improves accuracy: Use 0.0-0.3 for factual RAG applications
Handle edge cases explicitly: No context, partial info, contradictions, out-of-scope
Iterate based on real usage: Test with diverse queries and refine

Next Steps

With our context retrieved and our prompt constructed, it's time to call the LLM and generate a response. In the next lesson, we'll cover The Generation Phase—making API calls to Gemini and implementing streaming for real-time responses.

"A well-crafted prompt is worth a thousand parameters." — Unknown