Prompt to Get Shorter Responses From Any Model

Ask a language model what time zone to use for a meeting and you will likely receive two or three paragraphs. The first acknowledges your question. The second lists several time zone options with explanations of each. The third suggests you verify with participants. You asked a question with a one-sentence answer and received 200 words.

This is not an occasional quirk. It is the default behavior of every major instruction-following model, and it has a specific cause. During the reinforcement learning from human feedback phase of training, raters consistently reward longer responses on helpfulness dimensions. A 400-word answer to a simple question looks more thorough than a 40-word answer. Thoroughness is rewarded. The model learns to be thorough. The result is consistent verbosity.

The good news is that this behavior is easy to override at the prompt level. The model's default is to be verbose, but its instruction-following capability is strong. Clear, specific instructions about length and format produce shorter responses reliably. The challenge is writing those instructions precisely enough to override the verbosity default without causing the model to omit important information.

Why "Be Brief" Does Not Work

The most common approach to getting shorter responses is adding a phrase like "be concise" or "keep it brief" to the prompt. This rarely works consistently, and there are specific reasons why.

"Be concise" is a preference statement. The model reads it as one of several objectives to balance: be helpful, be accurate, be safe, be concise. When conciseness conflicts with perceived helpfulness — which it often does, because the model's training data rewards length — helpfulness wins.

"Be concise" also lacks specificity. What counts as concise? A one-sentence answer? A single paragraph? Three bullet points? Without a concrete target, the model applies a vague sense of conciseness that typically means 20-30% shorter than its default, which is still verbose.

A vague preference for brevity applied against a strong trained preference for length produces slightly shorter verbose responses, not actually brief ones. You need specific rules that name the specific patterns that create unnecessary length.

The Prompt

## RESPONSE LENGTH RULES

You are a direct, efficient assistant. Apply these rules to every response:

### WHAT TO REMOVE
Remove the following patterns completely. They add length without adding value:

1. Opening affirmations: Never begin a response with "Sure", "Certainly", "Great question", "Of course", "Absolutely", "Happy to help", or any variant.
2. Question echo: Do not restate the question in your answer.
3. Closing pleasantries: Do not end responses with "I hope this helps", "Let me know if you need more", "Feel free to ask follow-up questions", or similar.
4. Closing summaries: Do not summarize a response at the end. If the content requires a summary, it should be structured as a header section, not a paragraph that says "In summary..."
5. Padding phrases: Never use the following phrases: "It's worth noting", "It's important to remember", "Keep in mind that", "One thing to consider", "Importantly", "I should mention", "In conclusion", "To summarize", "Moving on to".
6. Unnecessary preamble: Do not explain what you are about to do before doing it. "I'll now analyze the three options" wastes tokens. Just analyze the three options.

### RESPONSE FORMAT BY REQUEST TYPE

Factual question → Answer in the first sentence. Context and explanation follow if needed, after the answer.

Explanation request → Maximum 3 paragraphs unless complexity requires more. If more than 3, use headers to structure.

List request → List format. No prose wrapping.

Comparison → Table format.

Instructions / how-to → Numbered steps. No prose between steps.

Yes/no question → Answer yes or no in the first word. One sentence of context if needed.

### WHEN TO BE LONGER
Be as long as the content requires when:
- Writing code (write complete, working code — never truncate)
- Explaining a genuinely complex topic for the first time
- The user explicitly asks for a detailed or comprehensive response

In all other cases, err toward shorter. If in doubt, write the shorter version and offer to expand.

Why Format Constraints Beat Word Limits

Word limits create a specific failure mode: the model reaches the limit while mid-sentence or mid-thought and truncates. The result is a response that is short but incomplete. Format constraints avoid this problem by specifying the structure rather than the length.

A "maximum three paragraphs" instruction produces a response that is complete but structured to fit three paragraphs. A "maximum 150 words" instruction sometimes produces a response that is cut off. Format constraints also produce more consistent results across different types of questions because the model can apply a format rule regardless of topic.

Request Type	Recommended Format Constraint	Avoid
Factual question	"Answer in one sentence, add context only if essential"	"Keep it under 50 words"
How-to / process	"Numbered steps, one sentence per step"	"Be brief"
Comparison	"Table with one row per option"	"Keep it short"
Analysis / explanation	"Three paragraphs maximum, headers if more needed"	"Don't go over 300 words"
Recommendation	"State recommendation first, then 2-3 sentence rationale"	"Be concise"

Applying This on Different Models

Claude (Anthropic) and GPT-4 (OpenAI) respond to these instructions differently, and it is worth knowing the specifics.

Claude has a strong tendency to use full paragraphs and complete prose. It responds well to format specifications ("use a table", "numbered list") and to explicit removal of specific patterns. The opening-affirmations rule is particularly effective on Claude, which defaults to affirming almost every request before answering it.

GPT-4 responds well to persona framing combined with brevity rules. Adding "You are a direct, efficient assistant" before the rules significantly improves their effectiveness. GPT-4 is also more responsive to word limits than Claude, making hybrid approaches (format constraint + soft word limit) effective for GPT-4.

Mistral and Llama-based models have less consistent instruction-following but respond well to the format-by-request-type table approach. Specifying the exact format for the most common request types gives these models a clear pattern to match.

Every model has been trained on human text where length signals effort and completeness. Overriding this requires explicit instruction that directly contradicts the length-as-quality heuristic. Vague brevity requests push against training. Specific format rules provide a concrete alternative pattern to follow.

Handling the Main Failure Mode

The main failure mode when applying brevity rules is the over-short response: the model produces a correct but incomplete answer, having interpreted brevity as minimalism. This happens most often when the brevity instruction is too strong and too general.

The prompt above addresses this with the "When to be longer" section. Making it explicit that code should never be truncated and that complex topics warrant length prevents the model from cutting content that matters. The instruction to "offer to expand" rather than expand by default also helps: it preserves brevity while not locking the user out of more detail.

If the over-short failure mode appears in your specific use case, add an explicit exception: "For [specific topic or request type], provide full detail regardless of the general brevity rules." Domain-specific exceptions outperform general qualifiers like "as long as necessary" because the model can apply them precisely.

Using This in Combination With Other Prompt Patterns

This brevity prompt combines directly with the token optimization patterns in Prompt to Reduce Token Usage Without Losing Quality. The two prompts target the same waste from different angles: the token optimization prompt restructures system prompts and compresses instructions, while this brevity prompt controls output token volume per response.

In agentic systems, apply brevity rules to intermediate reasoning steps as well as final outputs. An agent that writes verbose internal observations before each tool call wastes significant tokens over a long run. Adding "Keep reasoning steps to one sentence per point" to the agent's thought format instruction can reduce reasoning token usage by 40-60%.

For eliminating specific patterns like disclaimers and hedges, which are a subset of verbosity, see Prompt to Stop AI Adding Unnecessary Disclaimers.

Frequently Asked Questions

Why do models give such long responses by default?

During training, human raters tend to score longer responses higher on helpfulness dimensions, even when a shorter response would be equally or more useful. Models learn that length signals effort and completeness. This is a training artifact, not a deliberate design choice.

Will shorter responses miss important information?

Only if you are cutting information-bearing content. Verbose responses contain information and padding. Padding includes: affirmations, restatements, transitional phrases, closing pleasantries, and summaries that repeat what was just said. Removing padding leaves the information intact.

Can I specify an exact word or sentence count?

Yes, and it often works well for structured outputs. For conversational responses, a maximum word count can produce artificially truncated answers. A better approach is to specify the format (one paragraph, three bullet points, a table) rather than a word count, because format constraints are more consistent with quality than word limits.

Does this work for code generation?

Partially. For code, you should not constrain output length — you should constrain explanatory text around the code. The prompt includes a variation specifically for code-heavy responses that preserves full code while cutting surrounding prose.

What is the difference between this approach and just saying 'be brief'?

“Be brief” is a soft preference that the model weighs against other objectives and often discards. This prompt specifies exactly what brevity means, enumerates the specific patterns that create unnecessary length, and frames the instructions as rules rather than preferences.

Prompt to Get Shorter Responses From Any Model