Chain-of-Thought vs. Chain-of-Draft: Balancing Accuracy and Efficiency in Prompting

von Wolfgang Walk · 2. März 2025

Abstract

Large Language Models can solve complex problems more effectively when guided by structured prompts. Two notable strategies are Chain of Thought (CoT) prompting – which leads models through detailed step-by-step reasoning – and the newer Chain of Draft (CoD) prompting – which emphasizes concise intermediate drafts. This paper provides a qualitative comparison of CoT and CoD, analyzing how each approach impacts reasoning accuracy and token efficiency. Through illustrative examples and case studies, we examine their similarities and differences, highlighting that CoD can match the accuracy of CoT while using only a fraction of the tokens ([2502.18600] Chain of Draft: Thinking Faster by Writing Less). We discuss the strengths and weaknesses of both techniques and propose an optimized prompting paradigm that retains high accuracy with significantly reduced token consumption. Finally, we offer recommendations for applying these insights across diverse AI tasks to achieve cost-effective and fast yet reliable model performance.

Introduction

Prompt engineering techniques have become critical for steering large language models (LLMs) to produce accurate and relevant outputs. A prominent method, Chain-of-Thought (CoT) prompting, asks models to generate a explicit sequence of reasoning steps leading to an answer. Prior work has shown that CoT prompting can significantly improve LLM performance on complex reasoning tasks ([2201.11903] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models) such as arithmetic word problems, commonsense reasoning, and logic puzzles. By having the model “think aloud” through intermediate steps, CoT helps break down complex problems and often yields more correct solutions than direct answers. However, a clear drawback of CoT is its verbosity – the model’s step-by-step explanations can be lengthy, consuming many tokens ([2502.18600] Chain of Draft: Thinking Faster by Writing Less). In practical deployments (e.g. chatbots, mobile assistants), this high token usage can translate to increased latency and cost, as well as the risk of hitting context length limits.

Recently, researchers have proposed the Chain-of-Draft (CoD) prompting technique to address the efficiency limitations of CoT ([2502.18600] Chain of Draft: Thinking Faster by Writing Less). Inspired by how humans might scribble quick notes or drafts when solving a problem, CoD guides the model to produce minimalistic intermediate reasoning steps instead of detailed explanations. The goal is to preserve the logical progression of CoT while drastically cutting down the amount of text generated at each step. Initial studies indicate that CoD can match or even surpass CoT’s accuracy with dramatically fewer tokens ([2502.18600] Chain of Draft: Thinking Faster by Writing Less). By reducing verbosity to only the essential information, CoD achieves comparable results using as little as 7.6% of the tokens required by CoT in some tasks ([2502.18600] Chain of Draft: Thinking Faster by Writing Less). This balance of accuracy and efficiency is extremely appealing for real-world AI applications where response speed and API usage costs are important considerations.

In this paper, we qualitatively compare CoT and CoD prompting, with a focus on how each method balances reasoning thoroughness against token economy. We outline a framework for evaluating prompting techniques on two key metrics: (1) Accuracy – the model’s success rate or correctness on tasks, and (2) Efficiency – the computational cost measured in tokens generated (which correlates with runtime and expense). Through this lens, we analyze the similarities and differences between the verbose reasoning of CoT and the terse drafting of CoD. We then present case studies applying both methods to various reasoning problems to illustrate their performance. Our results, drawn from previously published experiments and illustrative examples, demonstrate that CoD often maintains accuracy within a few percentage points of CoT while using far fewer tokens (What is Chain of Draft?: The End of Chain-of-Thought? – Ai505) (Chain of Draft: Thinking Faster by Writing Less – DEV Community). We discuss when the extra detail of CoT is (or isn’t) worth the cost, and highlight scenarios where CoD’s efficiency provides a clear advantage. Finally, we propose an optimized prompting approach informed by these findings and provide practical recommendations for AI practitioners to maximize accuracy per token in their systems.

Methodology

Our analysis uses a qualitative framework to compare Chain-of-Thought and Chain-of-Draft prompting. We begin by characterizing each technique and then establish criteria for comparison. The key aspects we examine are: reasoning format, token usage, and outcome accuracy. By understanding how CoT and CoD differ in guiding an LLM’s reasoning process, we can evaluate their strengths and weaknesses in various contexts.

Chain-of-Thought Prompting (CoT): CoT prompting involves instructing the model to produce a detailed, stepwise explanation before giving a final answer ([2201.11903] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models). This can be done by including examples of thought processes in the prompt or by simply urging the model to “think step by step.” The CoT approach effectively serves as an explicit reasoning scratchpad – the model writes down each inference or calculation it makes. For example, given a question “If Alice has 5 apples and buys 7 more, how many apples does she have?”, a CoT prompt might lead the model to respond with a reasoning chain: “Alice starts with 5. She buys 7 more, so add 5 + 7 = 12. Therefore, Alice has 12 apples.” followed by the final answer “12”. This step-by-step narration helps ensure the model handles each part of the problem correctly. In terms of our criteria, CoT provides maximal transparency and reasoning detail, often resulting in high accuracy on complex problems ([2201.11903] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models). However, the verbose nature of CoT means it uses a lot of tokens – in the above trivial example, the model expended many words to eventually output “12.” In more complex scenarios, CoT solutions can span multiple sentences or even paragraphs. We will later quantify this cost in experiments, but qualitatively it’s clear that CoT trades efficiency for clarity and thoroughness.

Chain-of-Draft Prompting (CoD): CoD prompting modifies the approach by constraining the model to produce only brief “draft” thoughts for each step (What is Chain of Draft?: The End of Chain-of-Thought? – Ai505). The model still works through the problem step by step, but at each stage it is instructed to output only the essential information (for instance, a minimal equation or conclusion) rather than a full explanation. A typical CoD instruction is: “Think step by step, but only keep a minimum draft for each thinking step, with 5 words at most. Return the answer at the end of the response after a separator ####.” (What is Chain of Draft?: The End of Chain-of-Thought? – Ai505). This prompt tells the model to internally reason stepwise (like CoT) but externally write down only a very compact note of each step (no more than five words), and then provide the final answer after a clear separator. For the same apple problem above, a CoD-style reasoning might look like: “5 + 7 = 12” as the concise draft (instead of a lengthy sentence) and then, after a separator, the final answer “12.” In practice, the model’s output might be formatted as:

5+7=12

This indicates the intermediate draft calculation 5+7=12, followed by the answer “12” after the separator ####. By design, CoD drastically curtails the token usage at each reasoning step – only critical numbers or concepts are output – while still preserving the logical structure of a multi-step solution (What is Chain of Draft?: The End of Chain-of-Thought? – Ai505). The CoD methodology often involves a three-stage process: (1) an initial sketch answer or formula, (2) iterative refinement where the model can correct or improve the sketch (still in minimal words), and (3) a final polished answer after the reasoning drafts (What is Chain of Draft?: The End of Chain-of-Thought? – Ai505). In many implementations, this happens within a single prompt/response: the model effectively “thinks faster by writing less,” refining its reasoning internally or in brief notations, and then outputting the answer. Our qualitative criteria for CoD expect it to greatly reduce token count per solution while aiming to preserve accuracy close to CoT levels. There is an inherent tension here: by limiting each step to a few words, the model must omit explanatory detail, which could risk losing context or making leaps. CoD’s success hinges on whether the model can still carry the necessary information forward internally without explicitly writing it out. We will explore this through examples and discuss how well it works across different tasks.

Evaluation Criteria: We compare CoT and CoD on their accuracy and efficiency in solving problems. Accuracy is judged by whether the final answer is correct (or the success rate on a set of questions). Efficiency is assessed by counting the number of tokens (words or subword pieces) in the model’s output, which correlates with the computational expense. In addition, we qualitatively consider the clarity of the reasoning (CoT provides human-readable reasoning chains, whereas CoD provides minimal insight per step but still some trace of logic) and the ease of use in different applications. For example, an interactive assistant might benefit from CoD’s brevity, whereas an educational tool might prefer CoT’s detailed explanations for the user’s benefit. Our methodology does not rely on heavy mathematical formalisms or specific numeric metrics beyond these general observations; instead, we use case studies and existing experiment reports to illustrate how each prompting method performs. In the next section, we describe the experimental setups and example tasks used to compare CoT and CoD in practice.

Experiments

To investigate the practical differences between chain-of-thought and chain-of-draft prompting, we consider several representative reasoning tasks and observe how each prompting method performs. The focus is on tasks that require multi-step reasoning, where CoT is known to be beneficial, and seeing how CoD handles the same problems. We draw on case studies from recent literature as well as illustrative examples. The tasks include:

Arithmetic Word Problems (Math Reasoning): These are problems (such as the GSM8K benchmark of grade-school math word questions) that typically require multiple calculations or logical steps. For example, a problem might involve adding and subtracting various numbers given in a story. Under CoT prompting, the model is asked to work out each step (e.g., interpret the text, set up equations, perform calculations) with a detailed explanation. Under CoD prompting, the model would instead produce terse drafts of the equations or key results at each step, aiming to get to the answer with minimal text. This task evaluates whether CoD can handle iterative arithmetic reasoning without losing track of intermediate values. We use results reported on the GSM8K dataset for a quantitative reference (What is Chain of Draft?: The End of Chain-of-Thought? – Ai505).
Temporal / Date Reasoning: This is a commonsense reasoning task involving dates and events (for instance, “If today is Wednesday, what day will it be 5 days from now?”). Such problems require keeping track of increments or comparisons over an implicit calendar logic. With CoT prompting, a model might enumerate each step (“Wednesday plus 5 days is… Thursday (1), Friday (2), Saturday (3), Sunday (4), Monday (5) so the answer is Monday.”). With CoD, the model might condense the reasoning to short notes (“Wed + 5 days = Mon”) before giving the answer. We look at a “date understanding” task from prior work to compare how efficiently each method can reach the correct answer (What is Chain of Draft?: The End of Chain-of-Thought? – Ai505).
Commonsense Question (Sports Understanding): We also consider a less arithmetic and more knowledge-based example to see how the techniques fare on general reasoning. For instance, a question like “The team scored 3 goals in the first half and 2 in the second. How many goals did they score in total?” is simple addition, but a more complex example might require reasoning about a short narrative or combining facts (e.g., “If a basketball player has 5 fouls, he is disqualified from the game. John already has 4 fouls. How many more fouls before he is disqualified?”). CoT would detail: “John has 4 fouls, the limit is 5, so one more foul reaches 5, meaning he is disqualified on the next foul.” CoD might just output “4 + 1 = 5” as the draft and then answer “1”. In the experiments we reference, a “sports understanding” task was used to gauge performance; it represents scenarios requiring a mix of basic math and understanding a situation. This helps test whether CoD’s minimal drafts still capture the needed logic in a small context scenario (What is Chain of Draft?: The End of Chain-of-Thought? – Ai505).

For each task type, we apply both prompting strategies to the same underlying model and examine: (1) whether the final answers are correct (accuracy), and (2) how many tokens the model output in its reasoning + answer (efficiency). In our case studies, we rely on a state-of-the-art model (comparable to GPT-4) as the testbed, as well as observations from an alternative model (Anthropic’s Claude) to ensure generality (Chain of Draft: Thinking Faster by Writing Less – DEV Community). The CoT prompt generally consists of either an instruction like “Let’s think step by step” or providing an example reasoning sequence, prompting the model to follow suit. The CoD prompt, as described earlier, explicitly instructs the model to keep each reasoning step under a few words and to use a special separator before giving the final answer (What is Chain of Draft?: The End of Chain-of-Thought? – Ai505). By holding the model architecture and the query constant and only changing the prompting method, we can attribute differences in performance to CoT vs CoD style.

It’s worth noting that our experiments are qualitative in nature – we illustrate the outcomes with examples and cite reported figures for accuracy and token counts, rather than performing new large-scale quantitative evaluations. This approach is sufficient to understand broad trends: where CoD saves tokens, where it might falter, and how it compares to CoT on delivering correct answers. The next section presents the results from these comparisons, highlighting key findings on efficiency gains and any impact on accuracy.

Results

The comparative experiments on CoT and CoD prompting yield several insightful findings about accuracy and token efficiency. Below, we summarize the key results from the case studies:

Dramatic Reduction in Token Usage: Chain-of-Draft prompting consistently used far fewer tokens than Chain-of-Thought prompting for the same tasks. In one benchmark math dataset (GSM8K), a CoT solution from a GPT-4-level model used on average about 205 tokens, whereas the CoD solution used only about 44 tokens – roughly five times less text (What is Chain of Draft?: The End of Chain-of-Thought? – Ai505). This trend held across other tasks as well. For a date reasoning question, CoT used ~76 tokens vs CoD’s ~30 tokens (What is Chain of Draft?: The End of Chain-of-Thought? – Ai505). These examples confirm that CoD achieves its goal of conciseness: experiments report token usage reductions on the order of 80–90% in certain tasks when using CoD instead of CoT (Chain of Draft: Thinking Faster by Writing Less – DEV Community). Such a drastic decrease can directly translate to lower inference costs and faster responses.
Comparable Accuracy on Reasoning Tasks: Despite the large differences in output length, CoD’s final answers were mostly as accurate as CoT’s. In the arithmetic word problem test, CoT prompting achieved about 95.4% accuracy versus 91.1% with CoD (What is Chain of Draft?: The End of Chain-of-Thought? – Ai505). In the date calculation task, CoT got 90.2% correct vs 88.1% for CoD (What is Chain of Draft?: The End of Chain-of-Thought? – Ai505). These drops are relatively small, suggesting that CoD preserved the essential reasoning needed to get correct answers in the vast majority of cases. In some scenarios, CoD even slightly outperformed CoT – for a certain sports understanding task, CoD prompting yielded 98.3% accuracy, a bit higher than CoT’s 95.9% (What is Chain of Draft?: The End of Chain-of-Thought? – Ai505). This particular outcome indicates that shorter reasoning steps did not hinder the model and might have kept it more focused. Overall, the results support that CoD can maintain accuracy on par with CoT in many reasoning-heavy tasks (Chain of Draft: Thinking Faster by Writing Less – DEV Community). Any minor accuracy trade-off is often offset by the huge gains in efficiency.
Reduced Latency: An implicit but important consequence of fewer tokens is faster inference. Although our analysis is qualitative, reported benchmarks note significantly lower response times with CoD prompting due to the smaller output. For example, one experiment observed the model took about 4.2 seconds on average to produce a CoT answer vs only ~1.0 second for a CoD answer on the same problem (What is Chain of Draft?: The End of Chain-of-Thought? – Ai505) (What is Chain of Draft?: The End of Chain-of-Thought? – Ai505). In latency-sensitive applications (like real-time assistants or interactive tools), this speedup is a major advantage. CoD’s ability to get to the point quickly makes it well-suited for scenarios where every second (or token) counts.
Cross-Model Effectiveness: The efficiency and performance improvements with CoD were observed across different LLMs, not just a single model. For instance, both an OpenAI GPT-4 model and Anthropic’s Claude 3.5 model achieved about 91% accuracy with CoD while using roughly 40 tokens per response, dramatically lower than their CoT outputs (Chain of Draft: Thinking Faster by Writing Less – DEV Community). This suggests that CoD prompting taps into a general capability of large models to reason internally with minimal outward expression, rather than being a quirk of one particular model. It is encouraging that multiple model families can benefit from the technique, implying broad applicability in AI systems.

In summary, the results demonstrate that Chain-of-Draft prompting lives up to its promise of efficiency, slashing token usage by a large factor and thereby reducing runtime, with only a minimal impact on accuracy (and in some cases no impact at all). CoT prompting remains slightly more reliable in the absolute highest accuracy achievable on very complex problems (e.g., a few extra percentage points on the hardest math questions), but the difference is small. These findings suggest that many applications could switch from CoT to CoD prompting to gain substantial efficiency benefits while still obtaining correct and high-quality answers. The next section delves deeper into what these similarities and differences mean, examining the strengths and weaknesses of each approach and the potential trade-offs when choosing one over the other.

Aspect	Chain-of-Thought (CoT)	Chain-of-Draft (CoD)
Level of Detail	Detailed, step-by-step explanations; all intermediate steps are articulated	Concise, minimalistic notes per step; only essential information is output
Token Consumption	High – extensive explanations result in high token usage	Significantly lower – only key information is output
Answer Accuracy	Very high, especially for complex problems	Comparable, with minimal trade-offs on very complex tasks
Transparency	High transparency, as the entire thought process is visible	Lower transparency – internal reasoning often remains hidden
Speed/Latency	Slower due to the larger volume of text generated	Faster because fewer tokens are processed
Use Cases	Ideal when explainability and traceability are critical	Optimal for applications prioritizing efficiency, cost, and quick responses
Error Susceptibility	Lower risk of errors due to explicit intermediate steps	Slightly higher risk on very complex or multi-step tasks
Scalability	Less scalable due to high token consumption	Highly scalable – reduced resource requirements enable broader deployment

Discussion

Our comparative analysis of CoT and CoD prompting techniques highlights a classic accuracy-efficiency trade-off, along with strategies to mitigate it. Both methods ultimately aim to help LLMs arrive at correct answers through multi-step reasoning, but they do so with different philosophies: “write everything down” vs “only write down the minimum necessary.” Here we discuss the implications of our findings, examining where each approach shines and where it may stumble. We also consider how these insights could inform an optimized prompting strategy that blends the best of both worlds.

Strengths and Weaknesses of Each Approach:

Chain-of-Thought (CoT) Strengths: The CoT approach excels in transparency and thoroughness. Each intermediate step is spelled out, which not only helps the model stay on track but also allows humans to follow the logic. This verbosity can act as a safeguard on complex tasks – if a conclusion is wrong, one can often pinpoint which step was incorrect. CoT has proven especially powerful on tasks that inherently require multiple steps of reasoning (mathematical derivations, logical puzzles, etc.), yielding state-of-the-art accuracy in those domains ([2201.11903] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models). The detailed trace of reasoning can also be valuable for applications that require explainability or verification of the model’s thought process.
Chain-of-Thought (CoT) Weaknesses: The flip side of CoT’s detail is its inefficiency. Writing out every step incurs a heavy token cost and increased latency, which can be impractical in production settings (What is Chain of Draft?: The End of Chain-of-Thought? – Ai505). The verbosity might include redundant or obvious steps, and in some cases can even introduce opportunities for the model to go off track by “over-thinking.” Additionally, lengthy responses may overwhelm users or clutter the important parts of an answer in user-facing applications. CoT is also dependent on the model’s ability to not lose context over a long output; while the reasoning is explicit, the model must remember all prior details, which can be challenging if the chain is very long.
Chain-of-Draft (CoD) Strengths: The CoD approach’s primary advantage is exceptional efficiency – it dramatically reduces tokens and thus runtime and cost (Chain of Draft: Thinking Faster by Writing Less – DEV Community). By forcing brevity, it encourages the model to focus only on critical information, which can sometimes streamline the reasoning. Our results showed that CoD often preserves accuracy very well despite the brevity, indicating that the model can internally carry the reasoning with minimal external text. CoD’s iterative drafting process (sketch, refine, finalize) also mirrors a natural human problem-solving method, potentially making the model’s reasoning more robust: the model has a chance to correct itself in the refinement stage, but still without verbose output. Another benefit is that CoD’s outputs are succinct and to-the-point, which is useful for user-facing answers or constrained interfaces (e.g., voice assistants with limited time to speak). In scenarios where computational resources or bandwidth are limited – such as deploying an LLM on a mobile device or handling thousands of requests per second – CoD’s token savings are crucial.
Chain-of-Draft (CoD) Weaknesses: The aggressive brevity of CoD can be a double-edged sword. By not explicitly writing down intermediate reasoning, there is a risk that the model might omit a necessary piece of information or context, leading to errors on very complex problems. For example, if a problem requires keeping track of multiple variables or conditions, the five-word draft limit might be too sparse to capture everything, and the model could forget a detail by the time it reaches the final answer. Indeed, a noted limitation is that CoD may struggle on tasks requiring extensive context to be preserved across steps (Chain of Draft: Streamlining LLM Reasoning with Minimal Token Generation : r/artificial) – because each draft is so minimal, the model must rely heavily on its internal state to remember context, which might degrade as the number of reasoning steps grows. Additionally, CoD prompts are somewhat more complex to craft (they involve special instructions and separators), which might require careful tuning to get right. There is also a slight observed drop in accuracy on some of the hardest tasks (as seen in the GSM8K math results), suggesting that CoD is not entirely free of trade-off: in critical applications where every last percentage of accuracy matters, one might still prefer CoT. Finally, since CoD hides the reasoning details, it might be less useful in scenarios where the explanation is required or where the user needs to be convinced of the answer with a full argument.

Trade-offs and Use Cases: The choice between CoT and CoD prompting should be guided by the needs of the application. If maximum accuracy or transparency is the priority (for example, in a scientific reasoning task, legal reasoning, or debugging a solution), CoT’s detailed approach may be warranted despite the token cost. On the other hand, if efficiency, speed, or cost are paramount (for instance, responding to a user query in real time, or operating within a limited API quota), CoD is highly attractive. Our analysis suggests that for many standard tasks, CoD can be used without significant loss in performance, immediately yielding benefits in scalability and response time (What is Chain of Draft?: The End of Chain-of-Thought? – Ai505). Notably, CoD makes advanced reasoning more feasible on smaller devices or for larger volumes of queries, which broadens the accessibility of complex LLM reasoning in practical deployments (What is Chain of Draft?: The End of Chain-of-Thought? – Ai505) (What is Chain of Draft?: The End of Chain-of-Thought? – Ai505). There may be a middle ground as well: a hybrid strategy where the model initially uses CoD to get a quick, efficient draft solution, and only falls back to a CoT-style elaboration if needed (for instance, if the CoD answer has low confidence or if a user asks for an explanation) (What is Chain of Draft?: The End of Chain-of-Thought? – Ai505). Such an approach could maximize efficiency by default, but still allow detailed reasoning when necessary, combining the strengths of both techniques.

Overall, the emergence of Chain-of-Draft prompting expands the prompt engineering toolkit with a powerful option for reducing verbosity. It challenges the notion that an AI model must “show all its work” to achieve high performance – apparently, with the right prompting, models can solve problems just as well with minimalistic notes as they can with verbose essays. This has important implications for the future of AI system design, suggesting that we can trim the fat from model outputs without starving the model of the reasoning process. The encouraging results with CoD also hint at a direction for model training: if models were trained or fine-tuned on more concise reasoning patterns, we might further improve their inherent efficiency and even their generalization (by focusing on essential reasoning). In the next section, we conclude with a summary of our findings and propose best-practice guidelines for employing these prompting techniques in real-world applications.

Conclusion

Balancing accuracy and efficiency is a core challenge in deploying AI language models for complex tasks. In this work, we compared the Chain-of-Thought and Chain-of-Draft prompting techniques as two different solutions to this challenge. Chain-of-Thought (CoT) prompting provides detailed, human-readable reasoning steps that can boost accuracy on hard problems, but at the cost of generating a lot of text. Chain-of-Draft (CoD) prompting, in contrast, is a recently introduced method that drastically condenses those reasoning steps into brief drafts, managing to retain high accuracy while using only a small fraction of the tokens required by CoT ([2502.18600] Chain of Draft: Thinking Faster by Writing Less). Our qualitative analysis, supported by examples and published experiments, found that CoD often achieves nearly the same performance as CoT on tasks like math and logical reasoning, with efficiency gains on the order of 5× to 10× fewer tokens (What is Chain of Draft?: The End of Chain-of-Thought? – Ai505) (Chain of Draft: Thinking Faster by Writing Less – DEV Community). This represents a significant advancement in prompt engineering – enabling models to “think” effectively but “speak” concisely.

Through our comparison, we identified the respective strengths and weaknesses of CoT and CoD. CoT remains invaluable for transparency and maximum problem-solving power, whereas CoD offers remarkable speed and cost advantages with only minor sacrifices in thoroughness. Recognizing these trade-offs, we recommend an optimized prompting approach that leverages the best of both: use Chain-of-Draft style prompts by default to handle the bulk of reasoning in a token-efficient way, and reserve Chain-of-Thought style elaboration for cases that truly demand it (such as when an explanation is required or a problem is extraordinarily complex). In practice, this could mean deploying CoD in production systems for its efficiency, and falling back to CoT or a hybrid prompt for edge cases or for generating human-friendly explanations of the answer (What is Chain of Draft?: The End of Chain-of-Thought? – Ai505). Such a strategy would maximize accuracy-per-token – getting accurate results with minimal verbosity.

Looking ahead, the concepts behind CoT and CoD may inspire further innovations in prompting. One avenue is developing dynamic prompts that can adjust the level of detail on the fly: for example, a prompt that instructs the model to start with a draft and progressively add detail only if needed. Another possibility is fine-tuning models on datasets of concise reasoning traces, so that they inherently learn to reason in a compact form. Ultimately, our study underscores that efficient reasoning is possible: large language models do not always need to articulate long chains of thought to solve complex tasks – they can be guided to solve problems with brief drafts that are just as effective (Chain of Draft: Thinking Faster by Writing Less – DEV Community).

Recommendations for Practical Use:
Based on our analysis, we offer the following recommendations for practitioners looking to apply these prompting techniques in various AI applications:

Use CoD for Efficiency-Critical Applications: In applications like real-time assistants, high-volume question answering services, or mobile/on-device AI, we recommend adopting Chain-of-Draft prompting to drastically reduce token usage and latency. Our findings show that CoD can cut token counts by 80–90% with negligible impact on answer quality (Chain of Draft: Thinking Faster by Writing Less – DEV Community), leading to faster responses and lower costs. This makes CoD an excellent default choice for production systems where every millisecond or API call matters.
Leverage CoT (or Hybrid) for Complex or Explainable Tasks: If the task at hand is extremely complex, or if the end-users require a clear explanation of the reasoning (for trust or educational purposes), a Chain-of-Thought prompt or a hybrid approach should be used. CoT will ensure the model’s reasoning is fully laid out, aiding both accuracy and transparency. A hybrid strategy could involve running a CoD prompt first (to get a quick answer) and then asking the model to produce a CoT explanation for that answer if needed. This way, you get the efficiency benefit upfront and the detailed reasoning on demand.
Iteratively Refine Prompts and Monitor Outputs: When implementing CoD, pay attention to how the model handles the brevity constraint. It may require some prompt tuning (adjusting the wording or allowed draft length) to ensure the model doesn’t omit critical information. Always test the prompts on a sample of target queries. Monitor the accuracy of the answers – if you notice certain categories of questions where CoD fails but CoT would succeed, consider modifying your strategy for those cases (e.g., increasing the draft word limit or switching to CoT for those cases).
Continuously Balance Accuracy vs. Cost: Finally, treat prompting strategy as a tunable dial between accuracy and efficiency. If model or prompt updates lead to improved reasoning abilities, you might be able to push for even more concise drafts (further reducing tokens). Conversely, if you deploy on a smaller or less capable model, you might need to allow slightly more verbose reasoning to maintain accuracy. The optimal balance can evolve, so gather feedback and data from your application to adjust the prompting approach over time.

In conclusion, Chain-of-Thought and Chain-of-Draft prompting represent two ends of a spectrum in guiding AI reasoning. Our comparative study shows that it is possible to obtain the “best of both” – high accuracy and low token usage – by intelligently choosing and combining these techniques. We hope this analysis provides useful insights for AI researchers and developers to build systems that are not only smart and correct in their reasoning, but also efficient and practical to deploy at scale. The continued refinement of prompting methods will be key to unlocking the full potential of large language models across a wide range of real-world applications.