Comparison of small, medium, and large neural network models with increasing energy consumption and computational power

Choosing the Right AI Model for Your Tasks

Last week, I was speaking about a major architecture revamp of an existing application into a more futuristic landscape, when a colleague asked:

“Is this all AI generated?”

I paused for a moment. Not because the question was unexpected—but because it revealed something deeper.

Somewhere along the way, we’ve started treating AI as a monolithic capability. As if there’s a single system, a single model, a single “magic box” that can handle everything we throw at it. But the reality is very different.

Behind every meaningful AI system are a series of decisions—what model to use, when to use it, and more importantly, when not to.

That conversation made me realize something –

We’re not struggling with AI adoption anymore. We’re struggling with AI decision-making.

Are we choosing the right LLM for the task—or just defaulting to what’s available?

Why “One Model for Everything” Fails

Each model is designed with specific strengths and weaknesses. When you use a large, expensive model for a simple task, you’re not just overspending—you’re also missing out on faster, more efficient solutions.

Using the same AI model for every task is inefficient:

  • Overkill for simple tasks: You pay premium prices for tasks a lightweight model can handle.
  • Not enough for complex tasks: Simpler models miss nuance and critical details.
  • Wasted resources: You burn budget and compute on the wrong tool.

Not All AI Tasks (or Models) Are Created Equal

I have used various AI models, from different providers like OpenAI , Anthropic , and I can say I follow below mental model to choose the right model for the right task.

Task TypeExample ModelsBest For
SmallGPT-4o Mini, Claude Instant, PaLM 2Chatbots, tagging, basic Q&A
MediumGPT-4o, Claude 3 Haiku, PaLM 2 ProContent generation, workflow assistants
LargeGPT-5, Claude 3 Opus, Gemini 1.5 ProDocument review, deep analysis, reasoning

Rule of thumb:

  • Large = capability
  • Small = efficiency
  • Medium = balance

A lot of time when I propose this mental model, i get asked the question – How can i be sure? How do i know which model is best for my task? 

To solve this, I decided to run a real-world experiment. I took 3 engineering tasks of varying complexity and ran them through multiple models from OpenAI, Anthropic, and Google. I tracked token usage, calculated costs, and evaluated the quality of the outputs.

Real-World Model Comparison: What the Data Shows

I ran 9 API calls to compare models from OpenAI, Anthropic, and Google on real engineering tasks. Results were logged, tracked, and verified.

The Experiment: How I Did It

I built a Python framework to test 3 tasks across multiple models from different providers. The script tracked token usage, calculated costs, and exported results to a CSV.

Here is the core part of the script:

def calculate_cost(prompt_tokens, completion_tokens, model):
"""
Calculate the cost of a task based on token usage and model pricing.
"""
return round(
(prompt_tokens / 1000 * model["input_price"]) +
(completion_tokens / 1000 * model["output_price"]),
6
)
# Example models and tasks
models = [
{"name": "<MODEL_NAME_PLACEHOLDER>", "id": "<MODEL_ID_PLACEHOLDER>", "input_price": "<MODEL_INPUT_PRICE>", "output_price":"<MODEL_OUTPUT_PRICE>"},
{"name": "<MODEL_NAME_PLACEHOLDER>", "id": "<MODEL_ID_PLACEHOLDER>", "input_price": "<MODEL_INPUT_PRICE>", "output_price": "<MODEL_OUTPUT_PRICE>"},
{"name": "<MODEL_NAME_PLACEHOLDER>", "id": "<MODEL_ID_PLACEHOLDER>", "input_price": "<MODEL_INPUT_PRICE>", "output_price": "<MODEL_OUTPUT_PRICE>"}
]
tasks = [
{"type": "Low", "name": "Log Classification", "prompt": "Classify the following log as INFO, WARNING, or ERROR:\n\n'Database connection timeout after 30 seconds'"},
{"type": "Medium", "name": "Code Refactoring", "prompt": "Refactor this Python code to improve readability and performance:\n\nfor i in range(len(items)):\n print(items[i])"},
{"type": "High", "name": "Backend Service", "prompt": """Write a Python service that:
- Consumes messages from Kafka
- Processes JSON data
- Stores results in MongoDB
- Handles retries and logging"""}
]
# Run tasks across models

After running the script for all these different tasks, executed by all the models i.e Small , Medium and Large, below data is what I got.

This data is based on real API calls, with costs derived from token usage and model-specific pricing, as outlined in official documentation from OpenAI and comparable resources from providers such as Anthropic.

Task TypeTask NameModelTokensCost
LowLog ClassificationGPT-4o Mini98$0.0002
LowLog ClassificationGPT-4o102$0.0002
LowLog ClassificationGPT-5110$0.0003
MediumCode RefactoringGPT-4o Mini145$0.0003
MediumCode RefactoringClaude 3 Haiku150$0.0003
MediumCode RefactoringGPT-5160$0.0004
HighSystem DesignGPT-4o Mini1021$0.002
HighSystem DesignClaude 3 Opus1100$0.0025
HighSystem DesignGemini 1.5 Pro1200$0.003

The Patterns: Cost vs. Quality

Low Complexity Tasks

  • GPT-4o Mini: $0.0002/task (Quality: 2/5)
  • GPT-4o: $0.0002/task (Quality: 2.5/5, slightly better)
  • GPT-5: $0.0003/task (Quality: 3/5, best quality, higher cost)

Verdict: Use GPT-4o Mini for cost savings.

Medium Complexity Tasks

  • GPT-4o Mini: $0.0003/task (Quality: 3/5)
  • Claude 3 Haiku: $0.0003/task (Quality: 3.5/5, better quality, same cost)
  • GPT-5: $0.0004/task (Quality: 4/5, best quality, slightly higher cost)

Verdict: Claude 3 Haiku offers the best balance of cost and quality.

High Complexity Tasks

  • GPT-4o Mini: $0.002/task (Quality: 4/5)
  • Claude 3 Opus: $0.0025/task (Quality: 4.5/5, better quality, slightly higher cost)
  • Gemini 1.5 Pro: $0.003/task (Quality: 5/5, premium quality, highest cost)

Verdict: Use Gemini 1.5 Pro for critical tasks where quality is paramount.

Note: The quality of the outputs was evaluated based on relevance, accuracy, and completeness, with a simple rating system (1-5) for each task.

Final Thoughts

  • Small models are fast and cheap but limited.
  • Medium models balance cost and quality for most tasks.
  • Large models excel in complex tasks but are expensive.

Pro tip: Match the model to the task for better results, faster, and cheaper.

Best AI Tools for Beginners in 2026 (What I Actually Use and Recommend)

 I work with AI every day — not just using it, but building systems with it. And one of the most common questions I get from people starting out is: There are so many tools, where do I even begin?

My honest answer : most of the noise is just noise. You need maybe three or four tools to cover 90% of what you’ll actually want to do. The rest is FOMO.

Here’s what I’d recommend if you’re starting fresh in 2026.

1. ChatGPT — Your First Stop

If you’re picking only one tool to start, ChatGPT this one. Not because it’s perfect, but because it’s the most versatile and has the lowest barrier to entry.

I use it for:

  • Drafting emails, documents, and first versions of almost anything
  • Thinking through problems and getting unstuck
  • Writing and debugging code
  • Brainstorming when I need to think out loud with something that responds

The interface is forgiving — you type, it responds, you refine. It’s the easiest mental model to start with.

Best for: Writing, coding, learning, general problem-solving


2. Claude — Better for Long, Thoughtful Writing

Claude (from Anthropic) is my go-to when I’m writing something long.

Where ChatGPT can sometimes feel punchy and list-heavy, Claude tends to write in a more natural, flowing style. It also handles very long documents well — you can paste in a 50-page PDF and have a proper conversation about it.

If you’re writing blog posts, essays, reports, or any content where readability matters, Claude is worth trying alongside ChatGPT. You’ll quickly develop a feel for which one suits your style.

Best for: Long-form writing, content creation, document analysis

3. Gemini — If You’re Inside the Google Ecosystem

Gemini is Google’s AI, and its biggest advantage is integration. If you live in Google Docs, Gmail, or Google Workspace, Gemini is baked in and useful for day-to-day tasks without switching tabs.

It also has better access to current information than the base ChatGPT model, which is useful for research tasks where recency matters.

That said, I wouldn’t say it’s clearly better than ChatGPT or Claude for general use — it’s more of a “pick this if you’re already in Google” choice.

Best for: Google Workspace users, research, current-events queries

4. Perplexity AI — When You Need Sources, Not Just Answers

Perplexity AI fills a specific gap: research with citations.

When I’m writing something where accuracy matters and I want to verify claims, I use this instead of Google. It gives you a direct answer and shows you the sources it pulled from. That combination makes fact-checking much faster.

It’s not a replacement for deep research, but it’s significantly better than raw search for getting oriented on a topic quickly.

Best for: Research, fact-checking, learning about new topics with sources

5. Notion AI — If Your Problem Is Organisation

I’ll be honest — Notion AI isn’t for everyone. But if your pain point is managing notes, planning projects, or keeping track of ideas across different workstreams, the AI features inside Notion are genuinely useful.

You can ask it to summarise your notes, draft documents from bullet points, or help structure a project plan. The value isn’t the AI itself — it’s the AI sitting inside the tool where your work already lives.

Best for: Note-taking, project management, content planning

6. Canva AI — Design Without Being a Designer

Canva has always been the beginner-friendly design tool. The AI features added in the last couple of years make it even more accessible.

You can generate images, remove backgrounds, create social media posts, and design presentations without any design background. The Magic Studio features let you describe what you want and get something usable back.

For anyone creating content — blog thumbnails, slide decks, social posts — this is the practical choice.

Best for: Social media visuals, presentations, thumbnails, basic design work

7. Grammarly — The Safety Net for Everything You Write

I still use this. Even experienced writers benefit from a second pass.

Grammarly catches grammar errors, rewrites clunky sentences, and flags tone issues I’d otherwise miss. The AI-enhanced suggestions in the current version go well beyond basic spelling — it’ll often suggest clearer ways to say something.

It integrates directly into your browser, Google Docs, and most writing tools, so it runs in the background without getting in the way.

Best for: Polishing any written output before it goes to anyone else

Quick Comparison

This is the mental model that I mostly suggest to someone about the various tools and which tool makes sense to use when:

ToolPrimary Use CaseFree Tier?Worth Paying For?
ChatGPTGeneral-purpose AIYes (limited)Yes, if using daily
ClaudeLong-form writingYes (limited)Yes, for writers
GeminiGoogle integration, researchYesIf in Google Workspace
PerplexityResearch with sourcesYesPro adds more depth
Notion AIOrganisation & planningNotion free planAdd-on to Notion
Canva AIVisual content creationYesYes for volume work
GrammarlyWriting polishYesPremium adds more value

How to Actually Start (Without Getting Overwhelmed)

The biggest mistake I made was trying out too many tools at once and mastering none of them. It was overwhelming and counterproductive.

Here’s what I learned:

Start simple. Begin with ChatGPT. Use it for a week on real tasks—things you’d normally handle yourself. Get comfortable with it.


Once that feels natural, add another tool based on what you’re doing most:

  • Writing a lot? Try Claude.
  • Researching topics? Go for Perplexity.
  • Creating visuals? Check out Canva AI.

Don’t rush to add everything. Only layer in new tools when you have a clear reason. That’s it. The goal isn’t to collect tools—it’s to use a small set of tools really well.