LLM | CoffeeWithShiva - An Analytics Blog

All Posts in "LLM"

Software Is Changing (Again) – The Dawn of Software 3.0

When Andrej Karpathy titled his recent keynote “Software Is Changing (Again),” it wasn’t just a nice slogan. It marks what he argues is a fundamental shift in the way we build, think about, and interact with software. Based on his talk at AI Startup School (June 2025), here’s what “Software 3.0” means, why it matters, and how you can prepare.

What Is Changing: The Three Eras

Karpathy frames software evolution in three eras:

Era	What it meant	Key Characteristics

Software 1.0

Traditional code-first era: developers write explicit instructions, rules, algorithms. Think C++, Java, manual logic.

Highly deterministic, rule-based; heavy human specification; hard to scale certain tasks especially with unstructured or subtle data.

Software 2.0

Rise of machine learning / neural nets: train models on data vs hand-coding every condition. The model “learns” patterns.

Better handling of unstructured data (images, text), but still needs labeled data, training, testing, deployment. Not always interpretable.

Software 3.0

The new shift: large language models (LLMs) + natural language / prompt-driven interfaces become first-class means of programming. “English as code,” vibe coding, agents, prompt/context engineering.

You describe what you want; software (via LLMs) helps shape it. More autonomy, more natural interfaces. Faster prototyping. But also new risks (hallucinations, brittleness, security, lack of memory) and need for human oversight.

Let’s break down what he means.

The “Old” World: Software 1.0

For the last 50+ years, we’ve lived in the era of what Karpathy calls “Software 1.0.” This is the software we all know and love (or love to hate). It’s built on a simple, deterministic principle:

A human programmer writes explicit rules in a programming language like Python, C++, or Java.
The computer compiles these rules into binary instructions.
The CPU executes these instructions, producing a predictable output for a given input.

Think of a tax calculation function. The programmer defines the logic: if income > X, then tax = Y. It’s precise, debuggable, and entirely human-written. The programmer’s intellect is directly encoded into the logic. The problem? Its capabilities are limited by the programmer’s ability to foresee and explicitly code for every possible scenario. Teaching a computer to recognize a cat using Software 1.0 would require writing millions of lines of code describing edges, textures, and shapes—a nearly impossible task.

The Emerging World: Software 2.0

The new paradigm, “Software 2.0,” is a complete inversion of this process. Instead of writing the rules, we curate data and specify a goal.

A human programmer (or, increasingly, an “AI Engineer”) gathers a large dataset (e.g., millions of images of cats and “not-cats”).
They define a flexible, neural network architecture—a blank slate capable of learning complex patterns.
They specify a goal or a “loss function” (e.g., “minimize the number of incorrect cat identifications”).
Using massive computational power (GPUs/TPUs), an optimization algorithm (like backpropagation) searches the vast space of possible neural network configurations to find one that best maps the inputs to the desired outputs.

The “code” of Software 2.0 isn’t a set of human-readable if/else statements. It’s the learned weights and parameters of the neural network—a massive matrix of numbers that is completely inscrutable to a human. We didn’t write it; we grew it from data.

As Karpathy famously put it, think of the neural network as the source code, and the process of training as “compiling” the data into an executable model.

The Rise of the “AI Engineer” and the LLM Operating System: Software 3.0

Karpathy’s most recent observations take this a step further with the explosion of Large Language Models (LLMs) like GPT-4. He describes the modern AI stack as a new kind of operating system.

In this analogy:

The LLM is the CPU—the core processor, but for cognitive tasks.
The context window is the RAM—the working memory.
Prompting is the programming—the primary way we “instruct” this new computer.
Tools and APIs (web search, code execution, calculators) are the peripherals and I/O.

This reframes the role of the “AI Engineer.” They are now orchestrating these powerful, pre-trained models, “programming” them through sophisticated prompting, retrieval-augmented generation (RAG), and fine-tuning to build complex applications. This is the practical, applied side of the Software 2.0 revolution that is currently creating a gold rush in the tech industry.

Core Themes from Karpathy’s Keynote

Here are some of the biggest insights:

Natural Language as Programming Interface: Instead of writing verbose code, developers (and increasingly non-developers) can prompt LLMs in English (or human language) to generate code, UI, workflows. Karpathy demos “MenuGen,” a vibe-coding app prototype, as example of how quickly one can build via prompts.
LLMs as the New Platform / OS: Karpathy likens current LLMs to utilities or operating systems: infrastructure layers that provide default capabilities and can be built upon. Labs like OpenAI, Anthropic become “model fabs” producing foundational layers; people will build on top of them.
Vibe Coding & Prompt Engineering: He introduces / popularizes the idea of “vibe coding” — where the code itself feels less visible, you interact via prompts, edits, possibly via higher levels of abstraction. With that comes the need for better prompt or context engineering to reduce errors.
Jagged Intelligence: LLMs are powerful in some domains, weak in others. They may hallucinate, err at basic math, or make logically inconsistent decisions. Part of working well with this new paradigm is designing for those imperfections. Human-in-the-loop, verification, testing.
Building Infrastructure for Agents: Karpathy argues that software needs to be architected so LLMs / agents can interact with it, consume documentation, knowledge bases, have memory/context, manage feedback loops. Things like llms.txt, agent-friendly docs, and file & knowledge storage that is easy for agents to read/interpret.

Final Thoughts: Is This Another Shift – Or the Same One Again?

Karpathy would say this is not just incremental – “Software is changing again” implies something qualitatively different. In many ways, Software 3.0 composes both the lessons of Software 1.0 (like performance, correctness, architectural rigor) and Software 2.0 (learning from data, dealing with unstructured inputs), but adds a layer where language, agents, and human-AI collaboration become central.

In a nutshell: we’re not just upgrading the tools; we’re redefining what software means.

by Shiva — October 5, 2025 in AI Trends & News 0

LLM, RAG, AI Agent & Agentic AI – Explained Simply with Use Cases

As AI continues to dominate tech conversations, several buzzwords have emerged – LLM, RAG, AI Agent, and Agentic AI. But what do they really mean, and how are they transforming industries?

This article demystifies these concepts, explains how they’re connected, and showcases real-world applications in business.

1. What Is an LLM (Large Language Model)?

A Large Language Model (LLM) is an AI model trained on massive text datasets to understand and generate human-like language.

Think: ChatGPT, Claude, Gemini, or Meta’s LLaMA. These models can write emails, summarize reports, answer questions, translate languages, and more.

Key Applications:

Customer support: Chatbots that understand and respond naturally
Marketing: Generating content, email copy, product descriptions
Legal: Drafting contracts or summarizing case laws
Healthcare: Medical coding, summarizing patient records

2. What Is RAG (Retrieval-Augmented Generation)?

RAG is a technique that improves LLMs by giving them access to real-time or external data.

LLMs like GPT-4 are trained on data until a certain point in time. What if you want to ask about today’s stock price or use your company’s internal documents?

RAG = LLM + Search Engine + Brain.

It retrieves relevant data from a knowledge source (like a database or PDFs) and then lets the LLM use that data to generate better, factual answers.

Key Applications:

Enterprise Search: Ask a question, get answers from your company’s own documents
Financial Services: Summarize latest filings or regulatory changes
Customer Support: Dynamic FAQ bots that refer to live documentation
Healthcare: Generate answers using latest research or hospital guidelines

3. What Is an AI Agent?

An AI Agent is like an employee with a brain (LLM), memory (RAG), and hands (tools).

Unlike a chatbot that only replies, an AI Agent takes action—booking a meeting, updating a database, sending emails, placing orders, and more. It can follow multi-step logic to complete a task with minimal instructions.

Key Applications:

Travel: Book your flight, hotel, and taxi – all with one prompt
HR: Automate onboarding workflows or employee helpdesk
IT: Auto-resolve tickets by diagnosing system issues
Retail: Reorder stock, answer queries, adjust prices autonomously

4. What Is Agentic AI?

Agentic AI is the next step in evolution. It refers to AI systems that show autonomy, memory, reflection, planning, and goal-setting – not just completing a single task but managing long-term objectives like a project manager.

While today’s AI agents follow rules, Agentic AI acts like a team member, learning from outcomes and adapting to achieve better results over time.

Key Applications:

Sales: An AI sales rep that plans outreach, revises tactics, and nurtures leads
Healthcare: Virtual health coach that tracks vitals, adjusts suggestions, and nudges you daily
Finance: AI wealth advisor that monitors markets, rebalances portfolios
Enterprise Productivity: Multi-agent teams that run and monitor full business workflows

Similarities & Differences

Feature	LLM	RAG	AI Agent	Agentic AI
Generates text	✅	✅	✅	✅
Accesses external data	❌ (alone)	✅	✅	✅
Takes actions	❌	❌	✅	✅
Plans over time	❌	❌	Basic	✅ (complex, reflective)
Has memory / feedback loop	❌	Partial	✅	✅ (adaptive)

I came across a simpler explanation written by Diwakar on LinkedIn –

Consider LLM → RAG → AI Agent → Agentic AI …… as 4 very different types of friends planning your weekend getaway:

📌 LLM Friend – The “ideas” guy.
Always full of random suggestions, but doesn’t know you at all.
“Bro, go skydiving!” (You’re scared of heights.)

📌 RAG Friend – Knows your tastes and history.
Pulls up better, fresher plans based on what you’ve enjoyed before.
“Bro, let’s go to Goa- last time you enjoyed a lot!”

📌 AI Agent Friend – The one who gets things done.
tickets? Done. Snacks? Done. Hotel? Done.
But you need to ask for each task (if you miss, he misses!)

📌 Agentic AI Friend – That Superman friend!
You just say “Yaar, is weekend masti karni hai”,
And boom! He surprises you with a perfectly planned trip, playlist, bookings, and even a cover story for your parents 😉

⚡ First two friends (LLM & RAG) = give ideas
⚡ Last two friends (AI Agent & Agentic AI) = execute them – with increasing level of autonomy

Here is an another visualization published by Brij explaining how these four layers relate – not as competing technologies, but as an evolving intelligence architecture –

Conclusion: Why This Matters to You

These aren’t just technical terms – they’re shaping the future of work and industry:

Businesses are using LLMs to scale creativity and support
RAG systems turn chatbots into domain experts
AI Agents automate work across departments
And Agentic AI could someday run entire business units with minimal human input

The future of work isn’t human vs. AI—it’s human + AI agents working smarter, together.

by Shiva — June 17, 2025 in AI in Business 0

The Future of AI: Top Trends to Watch in 2025

As we approach 2025, the landscape of artificial intelligence (AI) is poised for transformative advancements that will significantly impact various sectors. Here are the top AI trends to watch in the coming year:

Agentic AI: AI systems that can reason, plan, and take action will become increasingly sophisticated, driven by improved inference time compute and chain-of-thought training for enhanced logical reasoning and handling of complex scenarios.

Inference Time Compute: AI models are being developed to dedicate more processing time to “thinking” before providing an answer. This allows for more complex reasoning and problem-solving without retraining the entire model.

Very Large Models: The next generation of large language models is projected to exceed 50 trillion parameters, pushing the boundaries of AI capabilities.

Very Small Models: Efficient models with a few billion parameters are becoming powerful enough to run on personal devices, making AI more accessible.

Advanced Enterprise Use Cases: AI applications in businesses will evolve beyond basic tasks to include sophisticated customer service bots, proactive IT network optimization, and adaptive cybersecurity tools.

Near-Infinite Memory: LLMs with context windows capable of retaining vast amounts of information will enable personalized customer service experiences and seamless interactions by remembering every previous conversation.

Human-in-the-Loop Augmentation: The focus will shift toward seamlessly integrating AI into human workflows and improving collaboration by developing intuitive prompting techniques and interfaces.

You can go through this video for additional details –

The video concludes by inviting audience input on other significant AI trends for 2025, emphasizing the dynamic nature of the field and the value of diverse perspectives.

by Shiva — December 26, 2024 in AI Trends & News 0

Prompt Engineering for Developers: Leveraging AI as Your Coding Assistant

Gartner predicts “By 2027, 50% of developers will use ML-powered coding tools, up from less than 5% today”

In the age of AI, developers have an invaluable tool to enhance productivity: prompt engineering. This is the art and science of crafting effective inputs (prompts) for AI models, enabling them to understand, process, and deliver high-quality outputs. By leveraging prompt engineering, developers can guide AI to assist with coding, from generating modules to optimizing code structures, creating a whole new dynamic for AI-assisted development.

What is Prompt Engineering?

Prompt engineering involves designing specific, concise instructions to communicate clearly with an AI, like OpenAI’s GPT. By carefully wording prompts, developers can guide AI to produce responses that meet their goals, from completing code snippets to debugging.

Why is Prompt Engineering Important for Developers?

For developers, prompt engineering can mean the difference between an AI providing useful assistance or producing vague or off-target responses. With the right prompts, developers can get AI to help in tasks like:

Generating boilerplate code
Writing documentation
Translating code from one language to another
Offering suggestions for optimization

How Developers Can Leverage Prompt Engineering for Coding

Code Generation
Developers can use prompt engineering to generate entire code modules or functions by providing detailed prompts. For example:
- Prompt: “Generate a Python function that reads a CSV file and calculates the average of a specified column.”
Debugging Assistance
AI models can identify bugs or inefficiencies. A well-crafted prompt describing an error or issue can help the AI provide pinpointed debugging tips.
- Prompt: “Review this JavaScript function and identify any syntax errors or inefficiencies.”
Code Optimization
AI can suggest alternative coding approaches that might improve performance.
- Prompt: “Suggest performance optimizations for this SQL query that selects records from a large dataset.”
Documentation and Explanations
Developers can create prompts that generate explanations or documentation for their code, aiding understanding and collaboration.
- Prompt: “Explain what this Python function does and provide inline comments for each step.”
Testing and Validation
AI can help generate test cases by understanding the function’s purpose through prompts.
- Prompt: “Create test cases for this function that checks for valid email addresses.”
Learning New Frameworks or Languages
Developers can use prompts to ask AI for learning resources, tutorials, or beginner-level code snippets for new programming languages or frameworks.
- Prompt: “Explain the basics of using the Databricks framework for data analysis in Python.”

Advanced Prompt Engineering Techniques

1. Chain of Thought Prompting

Guide the AI through the development process:

Let's develop a caching system step by step:
1. First, explain the caching strategy you'll use and why
2. Then, outline the main classes/interfaces needed
3. Next, implement the core caching logic
4. Finally, add monitoring and error handling

2. Few-Shot Learning

Provide examples of desired output:

Generate a Python logging decorator following these examples:

Example 1:
@log_execution_time
def process_data(): ...

Example 2:
@log_errors(logger=custom_logger)
def api_call(): ...

Now create a new decorator that combines both features

3. Role-Based Prompting

Act as a security expert reviewing this authentication code:
[paste code]
Identify potential vulnerabilities and suggest improvements

Key Considerations for Effective Prompt Engineering

To maximize AI’s effectiveness as a coding assistant, developers should:

Be Clear and Concise: The more specific a prompt is, the more accurate the response.
Iterate on Prompts: Experiment with different phrasings to improve the AI’s response quality.
Leverage Context: Provide context when necessary. E.g., “In a web development project, write a function…”

Conclusion

Prompt engineering offers developers a powerful way to work alongside AI as a coding assistant. By mastering the art of crafting precise prompts, developers can unlock new levels of productivity, streamline coding tasks, and tackle complex challenges. As AI’s capabilities continue to grow, so too will the potential for prompt engineering to reshape the way developers build and maintain software.

by Shiva — October 26, 2024 in AI Trends & News 0

The Rise of Large Language Models (LLM)

In the rapidly evolving field of artificial intelligence (AI), Large Language Models (LLMs) have steadily become the cornerstone of numerous advancements. From chatbots to complex analytics, LLMs are redefining how we interact with technology. One of the most noteworthy recent developments is the release of Llama 3 405B, which aims to bridge the gap between closed-source and open-weight models in the LLM category.

Image credit: Maxime Labonne (https://www.linkedin.com/in/maxime-labonne/)

This blog aims to explore the current landscape of LLMs, comparing closed-source and open-weight models, and delve into the unique roles played by small language models. Additionally, we’ll touch on the varied use-cases and applications of these models, culminating in a reasoned conclusion about the merits and drawbacks of closed vs. open-weight models.

Recent Developments in LLMs

Llama 3 405B stands out as a significant breakthrough in the LLM space, especially in the context of open-weight models. With 405 billion parameters, Llama 3 delivers robust performance that rivals, and in some cases surpasses, leading closed-source models. The shift towards adequately open models like Llama 3 highlights a broader trend in AI towards transparency, collaboration, and reproducibility.

Major players that offer continuous evolution of LLMs are:

GPT-4 from OpenAI remains a leading closed-source model offering general-purpose applications with multi-modal capabilities
Llama 3 405B developed by Meta AI, reportedly matches or exceeds the performance of some closed-source models.
Similarly, we have Google PaLM 2 and Anthropic Claude 2, 3.5 models show strong performance in various tasks.

Closed-Source vs. Open-Weight Models

Closed-Source Models

Definition: Closed-source models are proprietary and usually not accessible for public scrutiny or modification. The company or organization behind the model keeps the underlying code and often the training data private.

Examples:

GPT-4 (OpenAI)
Claude 3.5 (Anthropic AI)

Pros:

Performance: Often optimized to achieve peak performance through extensive resources and dedicated teams.
Security: Better control over the model can yield heightened security and compliance with regulations.
Support and Integration: Generally come with robust support options and seamless integration capabilities.

Cons:

Cost: Typically expensive to use, often based on a subscription or pay-per-use model.
Lack of Transparency: Limited insight into the model’s workings, which can be a barrier to trustworthiness.
Dependency: Users become reliant on the provider for updates, fixes, and enhancements.

Open-Weight Models

Definition: Open-weight models, often referred to as open-source models, have their weights accessible to the public. This openness allows researchers and developers to understand, modify, and optimize the models as needed.

Examples:

Llama 3 405B
BERT
GPT-Neo and GPT-J (EleutherAI)

Pros:

Transparency: Enhanced understanding and ability to audit the model.
Cost Efficiency: Often free to use or available at a lower cost.
Innovation: Community-driven improvements and customizations are common.

Cons:

Resource Intensive: May require significant resources to implement and optimize effectively.
Security Risks: More exposure to potential vulnerabilities.
Lack of Support: May lack the direct support and resources of commercial models.

Small Language Models

While much attention is given to LLMs, small language models still play a crucial role, particularly when resources are constrained or specific, narrowly defined tasks are in focus.

Key Characteristics of Small Language Models:

Limited Parameters: Typically fewer parameters, making them lighter and faster.
Resource Efficient: Lower computational requirements, cost-effective.
Targeted Applications: Effective for specific use cases like dialogue systems, sentiment analysis, or keyword extraction.

Popular Small Language Models:

DistilBERT: A distilled version of BERT that is smaller and faster while retaining much of its performance
TinyBERT: Another compressed version of BERT, designed for edge devices
GPT-Neo: A family of open-source models of various sizes, offering a range of performance-efficiency trade-offs

Advantages of Small Language Models:

Reduced computational requirements
Faster inference times
Easier deployment on edge devices or resource-constrained environments
Lower carbon footprint

Conclusion: Closed vs. Open Source

The choice between closed-source and open-source LLMs depends on various factors, including the specific use case, available resources, and organizational priorities. Closed-source models often offer superior performance and ease of use, while open-source models provide greater flexibility, customization, and cost-efficiency.

As the LLM landscape continues to evolve, we can expect to see further convergence between closed-source and open-source models, as well as the emergence of specialized models for specific tasks.

by Shiva — July 27, 2024 in Large Language Model 0

Meta’s Large Language Model – LLaMa 2 released for enterprises

Meta, the parent company of Facebook, unveiled the latest version of LLaMa 2 for research and commercial purposes. It’s released as open-source unlike OpenAI GPT / Google Bard which is proprietary.

What is LLaMa?

LLaMa (Large Language Model Meta AI) is an open-source language model built by Meta’s GenAI team for research. LLaMa 2 which is newly released for research and commercial uses.

Difference between LLaMa and LLaMa 2

LLaMa 2 model was trained on 40% more data than its predecessor. Al-Dahle (vice president at Meta who is leading the company’s generative AI work) says there were two sources of training data: data that was scraped online, and a data set fine-tuned and tweaked according to feedback from human annotators to behave in a more desirable way. The company says it did not use Meta user data in LLaMA 2, and excluded data from sites it knew had lots of personal information.

Newly released LLaMa 2 models will not only further accelerate the LLM research work but also enable enterprises to build their own generative AI applications. LLaMa 2 includes 7B, 13B and 70B models, trained on more tokens than LLaMA, as well as the fine-tuned variants for instruction-following and chat.

According to Meta, its LLaMa 2 “pretrained” models are trained on 2 trillion tokens and have a context window of 4,096 tokens (fragments of words). The context window determines the length of the content the model can process at once. Meta also says that the LLaMa 2 fine-tuned models, developed for chat applications similar to ChatGPT, have been trained on “over 1 million human annotations.”

Databricks highlights the salient features of such open-source LLMs:

No vendor lock-in or forced deprecation schedule
Ability to fine-tune with enterprise data, while retaining full access to the trained model
Model behavior does not change over time
Ability to serve a private model instance inside of trusted infrastructure
Tight control over correctness, bias, and performance of generative AI applications

Microsoft says that LLaMa 2 is the latest addition to their growing Azure AI model catalog. The model catalog, currently in public preview, serves as a hub of foundation models and empowers developers and machine learning (ML) professionals to easily discover, evaluate, customize and deploy pre-built large AI models at scale.

OpenAI GPT vs LLaMa

A powerful open-source model like LLaMA 2 poses a considerable threat to OpenAI, says Percy Liang, director of Stanford’s Center for Research on Foundation Models. Liang was part of the team of researchers who developed Alpaca, an open-source competitor to GPT-3, an earlier version of OpenAI’s language model.

“LLaMA 2 isn’t GPT-4,” says Liang. Compared to closed-source models such as GPT-4 and PaLM-2, Meta itself speaks of “a large gap in performance”. However, ChatGPT’s GPT-3.5 level should be reached by Llama-2 in most cases. And, Liang says, for many use cases, you don’t need GPT-4.

A more customizable and transparent model, such as LLaMA 2, might help companies create products and services faster than a big, sophisticated proprietary model, he says.

“To have LLaMA 2 become the leading open-source alternative to OpenAI would be a huge win for Meta,” says Steve Weber, a professor at the University of California, Berkeley.

LLaMA 2 also has the same problems that plague all large language models: a propensity to produce falsehoods and offensive language. The fact that LLaMA 2 is an open-source model will also allow external researchers and developers to probe it for security flaws, which will make it safer than proprietary models, Al-Dahle says.

With that said, Meta has set to make its presence felt in the open-source AI space as it has announced the release of the commercial version of its AI model LLaMa. The model will be available for fine-tuning on AWS, Azure and Hugging Face’s AI model hosting platform in pretrained form. And it’ll be easier to run, Meta says — optimized for Windows thanks to an expanded partnership with Microsoft as well as smartphones and PCs packing Qualcomm’s Snapdragon system-on-chip. The key advantage of on-device AI is cost reduction (cloud per-query costs) and data security (as data solely remain on-device)

LLaMa can turn out to be a great alternative for pricy proprietary models sold by OpenAI like ChatGPT and Google Bard.

References:

https://ai.meta.com/llama/?utm_pageloadtype=inline_link

https://www.technologyreview.com/2023/07/18/1076479/metas-latest-ai-model-is-free-for-all/

https://blogs.microsoft.com/blog/2023/07/18/microsoft-and-meta-expand-their-ai-partnership-with-llama-2-on-azure-and-windows/

https://www.qualcomm.com/news/releases/2023/07/qualcomm-works-with-meta-to-enable-on-device-ai-applications-usi

https://techcrunch.com/2023/07/18/meta-releases-llama-2-a-more-helpful-set-of-text-generating-models/

https://www.databricks.com/blog/building-your-generative-ai-apps-metas-llama-2-and-databricks

by Shiva — July 19, 2023 in AI in Business 0