Software Is Changing (Again) – The Dawn of Software 3.0

When Andrej Karpathy titled his recent keynote “Software Is Changing (Again),” it wasn’t just a nice slogan. It marks what he argues is a fundamental shift in the way we build, think about, and interact with software. Based on his talk at AI Startup School (June 2025), here’s what “Software 3.0” means, why it matters, and how you can prepare.

What Is Changing: The Three Eras

Karpathy frames software evolution in three eras:

EraWhat it meantKey Characteristics
Software 1.0Traditional code-first era: developers write explicit instructions, rules, algorithms. Think C++, Java, manual logic.Highly deterministic, rule-based; heavy human specification; hard to scale certain tasks especially with unstructured or subtle data.
Software 2.0Rise of machine learning / neural nets: train models on data vs hand-coding every condition. The model “learns” patterns.Better handling of unstructured data (images, text), but still needs labeled data, training, testing, deployment. Not always interpretable.
Software 3.0The new shift: large language models (LLMs) + natural language / prompt-driven interfaces become first-class means of programming. “English as code,” vibe coding, agents, prompt/context engineering.You describe what you want; software (via LLMs) helps shape it. More autonomy, more natural interfaces. Faster prototyping. But also new risks (hallucinations, brittleness, security, lack of memory) and need for human oversight.

Let’s break down what he means.

The “Old” World: Software 1.0

For the last 50+ years, we’ve lived in the era of what Karpathy calls “Software 1.0.” This is the software we all know and love (or love to hate). It’s built on a simple, deterministic principle:

  1. A human programmer writes explicit rules in a programming language like Python, C++, or Java.
  2. The computer compiles these rules into binary instructions.
  3. The CPU executes these instructions, producing a predictable output for a given input.

Think of a tax calculation function. The programmer defines the logic: if income > X, then tax = Y. It’s precise, debuggable, and entirely human-written. The programmer’s intellect is directly encoded into the logic. The problem? Its capabilities are limited by the programmer’s ability to foresee and explicitly code for every possible scenario. Teaching a computer to recognize a cat using Software 1.0 would require writing millions of lines of code describing edges, textures, and shapes—a nearly impossible task.

The Emerging World: Software 2.0

The new paradigm, “Software 2.0,” is a complete inversion of this process. Instead of writing the rules, we curate data and specify a goal.

  1. A human programmer (or, increasingly, an “AI Engineer”) gathers a large dataset (e.g., millions of images of cats and “not-cats”).
  2. They define a flexible, neural network architecture—a blank slate capable of learning complex patterns.
  3. They specify a goal or a “loss function” (e.g., “minimize the number of incorrect cat identifications”).
  4. Using massive computational power (GPUs/TPUs), an optimization algorithm (like backpropagation) searches the vast space of possible neural network configurations to find one that best maps the inputs to the desired outputs.

The “code” of Software 2.0 isn’t a set of human-readable if/else statements. It’s the learned weights and parameters of the neural network—a massive matrix of numbers that is completely inscrutable to a human. We didn’t write it; we grew it from data.

As Karpathy famously put it, think of the neural network as the source code, and the process of training as “compiling” the data into an executable model.

The Rise of the “AI Engineer” and the LLM Operating System: Software 3.0

Karpathy’s most recent observations take this a step further with the explosion of Large Language Models (LLMs) like GPT-4. He describes the modern AI stack as a new kind of operating system.

In this analogy:

  • The LLM is the CPU—the core processor, but for cognitive tasks.
  • The context window is the RAM—the working memory.
  • Prompting is the programming—the primary way we “instruct” this new computer.
  • Tools and APIs (web search, code execution, calculators) are the peripherals and I/O.

This reframes the role of the “AI Engineer.” They are now orchestrating these powerful, pre-trained models, “programming” them through sophisticated prompting, retrieval-augmented generation (RAG), and fine-tuning to build complex applications. This is the practical, applied side of the Software 2.0 revolution that is currently creating a gold rush in the tech industry.

Core Themes from Karpathy’s Keynote

Here are some of the biggest insights:

  • Natural Language as Programming Interface: Instead of writing verbose code, developers (and increasingly non-developers) can prompt LLMs in English (or human language) to generate code, UI, workflows. Karpathy demos “MenuGen,” a vibe-coding app prototype, as example of how quickly one can build via prompts.
  • LLMs as the New Platform / OS: Karpathy likens current LLMs to utilities or operating systems: infrastructure layers that provide default capabilities and can be built upon. Labs like OpenAI, Anthropic become “model fabs” producing foundational layers; people will build on top of them.
  • Vibe Coding & Prompt Engineering: He introduces / popularizes the idea of “vibe coding” — where the code itself feels less visible, you interact via prompts, edits, possibly via higher levels of abstraction. With that comes the need for better prompt or context engineering to reduce errors.
  • Jagged Intelligence: LLMs are powerful in some domains, weak in others. They may hallucinate, err at basic math, or make logically inconsistent decisions. Part of working well with this new paradigm is designing for those imperfections. Human-in-the-loop, verification, testing.
  • Building Infrastructure for Agents: Karpathy argues that software needs to be architected so LLMs / agents can interact with it, consume documentation, knowledge bases, have memory/context, manage feedback loops. Things like llms.txt, agent-friendly docs, and file & knowledge storage that is easy for agents to read/interpret.

Final Thoughts: Is This Another Shift – Or the Same One Again?

Karpathy would say this is not just incremental – “Software is changing again” implies something qualitatively different. In many ways, Software 3.0 composes both the lessons of Software 1.0 (like performance, correctness, architectural rigor) and Software 2.0 (learning from data, dealing with unstructured inputs), but adds a layer where language, agents, and human-AI collaboration become central.

In a nutshell: we’re not just upgrading the tools; we’re redefining what software means.

The Smartest AI Models: IQ, Mensa Tests, and Human Context

AI models are constantly surprising us – but how smart are they, really?

A recent infographic from Visual Capitalist ranks 24 leading AI systems by their performance on the Mensa Norway IQ test, revealing that even the best AI can outperform the average human.

AI Intelligence, by the Numbers

Visual Capitalist’s analysis shows AI models scoring across categories:

  • “Highly intelligent” class (>130 IQ)
  • “Genius” level (>140 IQ) with the top performers
  • Models below 100 IQ still fall in average or above-average ranges

For context, the average adult human IQ is 100, with scores between 90–110 considered the norm.

Humans vs. Machines: A Real-World Anecdote

Imagine interviewing your colleague, who once aced her undergrad finals with flying colors – she might score around 120 IQ. She’s smart, quick-thinking, adaptable.

Now plug her into a Mensa Norway-style test. She does well but places below the top AI models.

That’s where the surprise comes in: these AI models answer complex reasoning puzzles in seconds, with more consistency than even the smartest human brains. They’re in that “genius” club – but wholly lacking human intuition, creativity, or emotion.

What This IQ Comparison Really Shows

InsightWhy It Matters
AI excels at structured reasoning testsBut real-world intelligence requires more: creativity, ethics, emotional understanding.
AI IQ is a performance metric – not characterModels are powerful tools, not sentient beings.
Human + AI = unbeatable comboMerging machine rigor with human intuition unlocks the best outcomes.

Caveats: Why IQ Isn’t Everything

  • These AI models are trained on test formats – they’re not “thinking” or “understanding” in a human sense.
  • IQ tests don’t measure emotional intelligence, empathy, or domain-specific creativity.
  • A “genius-level” AI might ace logic puzzles, but still struggle with open-ended tasks or novel situations.

Key Takeaway

AI models are achieving IQ scores that place them alongside the brightest humans – surpassing 140 on standardized Mensa-style tests . But while they shine at structured reasoning, they remain tools, not people.

The real power lies in partnering with them – combining human creativity, ethics, and context with machine precision. That’s where true innovation happens.

OpenAI’s Path to Artificial General Intelligence (AGI)

OpenAI, a leading artificial intelligence research laboratory, has outlined a five-level framework to measure progress towards achieving Artificial General Intelligence (AGI). This framework provides a structured approach to understanding the complexities and potential implications of AI development.

Level 1: Conversational AI – chatbots with conversational language

  • Focus: Developing AI systems capable of engaging in natural and informative conversations.
  • Example: ChatGPT, Google Bard
  • Benefits: Revolutionize customer service, education, and mental health support. Improve accessibility to information and facilitate human-computer interaction.

Level 2: Reasoners – human-level problem solving

  • Focus: Creating AI systems that can solve complex problems, requiring reasoning, planning, and learning.
  • Example: AI systems capable of drafting legal documents, conducting scientific research, or developing complex software.
  • Benefits: Accelerate scientific discovery, increase efficiency in various fields like medicine and engineering.

Level 3: Autonomous Agents – systems that can take actions independently

  • Focus: Building AI systems capable of operating independently in complex environments, making decisions, and taking actions.
  • Example: Self-driving cars, robots capable of performing household tasks, or AI systems managing complex infrastructure.
  • Benefits: Transform transportation, improve quality of life, and enhance efficiency in industries like manufacturing and logistics.

Level 4: Innovators – AI that can aid in invention

  • Focus: Developing AI systems capable of generating new ideas and solutions, demonstrating creativity and adaptability.
  • Example: AI systems designing new drugs, creating innovative products, or composing music.
  • Benefits: Drive economic growth, foster innovation, and potentially lead to breakthroughs in fields like art, science, and technology.

Level 5: Organizational Equivalents – AI that can do the work of an organization

  • Focus: Creating AI systems capable of operating as entire organizations, making strategic decisions, and adapting to changing environments.
  • Example: AI systems managing complex businesses, governments, or non-profit organizations.
  • Benefits: Revolutionize governance, economic systems, and societal structures. However, also raises significant ethical and societal challenges.

According to Bloomberg, OpenAI believes its technology is approaching the second level of five on the path to artificial general intelligence. It’s important to note that this framework is a conceptual roadmap and the exact boundaries between levels may be fluid. Additionally, achieving each level represents a significant technological leap and will likely require substantial advancements in hardware, algorithms, and data.

While the potential benefits of AGI are immense, it’s crucial to address the associated challenges and risks, such as job displacement, bias, and the potential for misuse. OpenAI and other leading AI research organizations are actively working on developing safety protocols and ethical guidelines to ensure that AGI benefits humanity as a whole.

References:

https://www.bloomberg.com/news/articles/2024-07-11/openai-sets-levels-to-track-progress-toward-superintelligent-ai?embedded-checkout=true&sref=HrWXCALa

https://www.forbes.com/sites/jodiecook/2024/07/16/openais-5-levels-of-super-ai-agi-to-outperform-human-capability

Figure Unveiled a Humanoid Robot in Partnership with OpenAI

A yet another milestone in the history of A.I. and Robotics!

Yes, I’m not exaggerating! What you could potentially read in a moment would be a futuristic world where humanoid robots can very well serve humanity in many ways (keeping negatives out of the picture for timebeing).

When I first heard this news, movies such as I, Robot and Enthiran, the Robot were flashing on my mind! Putting my filmy fantasies aside, the Robotics expert company Figure, in partnership with Microsoft and OpenAI, has released the first general purpose humanoid robot – Figure 01 – designed for commercial use.

Here’s the quick video released by the creators –

Figure’s Robotics expertise has been perfectly augmented by OpenAI’s multi-modal support in understanding and generating response of visual inputs such as image, audio, video. The future looks way more promising and becoming reality that these humanoids can be supplied to the manufacturing and commercial areas where there are shortage of resources for scaling the production needs.

In the video, it is seen demonstrating the ability to recognize objects such as apple and take appropriate actions. It is reported that Figure 01 humanoid robot stands at 5 feet 6 inches tall and weighs 132 pounds. It can carry up to 44 pounds and move at a speed of 1.2 meters per second.

Figure is backed by tech giants such as Microsoft, OpenAI Startup Fund, NVIDIA, Jeff Bezos (Bezos Expeditions) and more.

Lot of fascinating innovations happening around us thanks to Gen AI / LLMs, Copilot, Devin, Sora, and now a glimpse into the reality of Humanoid Robotics. Isn’t it a great time to be in?!

Meta’s Large Language Model – LLaMa 2 released for enterprises

Meta, the parent company of Facebook, unveiled the latest version of LLaMa 2 for research and commercial purposes. It’s released as open-source unlike OpenAI GPT / Google Bard which is proprietary.

What is LLaMa?

LLaMa (Large Language Model Meta AI) is an open-source language model built by Meta’s GenAI team for research. LLaMa 2 which is newly released for research and commercial uses.

Difference between LLaMa and LLaMa 2

LLaMa 2 model was trained on 40% more data than its predecessor. Al-Dahle (vice president at Meta who is leading the company’s generative AI work) says there were two sources of training data: data that was scraped online, and a data set fine-tuned and tweaked according to feedback from human annotators to behave in a more desirable way. The company says it did not use Meta user data in LLaMA 2, and excluded data from sites it knew had lots of personal information. 

Newly released LLaMa 2 models will not only further accelerate the LLM research work but also enable enterprises to build their own generative AI applications. LLaMa 2 includes 7B, 13B and 70B models, trained on more tokens than LLaMA, as well as the fine-tuned variants for instruction-following and chat. 

According to Meta, its LLaMa 2 “pretrained” models are trained on 2 trillion tokens and have a context window of 4,096 tokens (fragments of words). The context window determines the length of the content the model can process at once. Meta also says that the LLaMa 2 fine-tuned models, developed for chat applications similar to ChatGPT, have been trained on “over 1 million human annotations.”

Databricks highlights the salient features of such open-source LLMs:

  • No vendor lock-in or forced deprecation schedule
  • Ability to  fine-tune with enterprise data, while retaining full access to the trained model
  • Model behavior does not change over time
  • Ability to serve a private model instance inside of trusted infrastructure
  • Tight control over correctness, bias, and performance of generative AI applications

Microsoft says that LLaMa 2 is the latest addition to their growing Azure AI model catalog. The model catalog, currently in public preview, serves as a hub of foundation models and empowers developers and machine learning (ML) professionals to easily discover, evaluate, customize and deploy pre-built large AI models at scale.

OpenAI GPT vs LLaMa

A powerful open-source model like LLaMA 2 poses a considerable threat to OpenAI, says Percy Liang, director of Stanford’s Center for Research on Foundation Models. Liang was part of the team of researchers who developed Alpaca, an open-source competitor to GPT-3, an earlier version of OpenAI’s language model. 

“LLaMA 2 isn’t GPT-4,” says Liang. Compared to closed-source models such as GPT-4 and PaLM-2, Meta itself speaks of “a large gap in performance”. However, ChatGPT’s GPT-3.5 level should be reached by Llama-2 in most cases. And, Liang says, for many use cases, you don’t need GPT-4.

A more customizable and transparent model, such as LLaMA 2, might help companies create products and services faster than a big, sophisticated proprietary model, he says. 

“To have LLaMA 2 become the leading open-source alternative to OpenAI would be a huge win for Meta,” says Steve Weber, a professor at the University of California, Berkeley.   

LLaMA 2 also has the same problems that plague all large language models: a propensity to produce falsehoods and offensive language. The fact that LLaMA 2 is an open-source model will also allow external researchers and developers to probe it for security flaws, which will make it safer than proprietary models, Al-Dahle says. 

With that said, Meta has set to make its presence felt in the open-source AI space as it has announced the release of the commercial version of its AI model LLaMa. The model will be available for fine-tuning on AWS, Azure and Hugging Face’s AI model hosting platform in pretrained form. And it’ll be easier to run, Meta says — optimized for Windows thanks to an expanded partnership with Microsoft as well as smartphones and PCs packing Qualcomm’s Snapdragon system-on-chip. The key advantage of on-device AI is cost reduction (cloud per-query costs) and data security (as data solely remain on-device)

LLaMa can turn out to be a great alternative for pricy proprietary models sold by OpenAI like ChatGPT and Google Bard.

References:

https://ai.meta.com/llama/?utm_pageloadtype=inline_link

https://www.technologyreview.com/2023/07/18/1076479/metas-latest-ai-model-is-free-for-all/

https://blogs.microsoft.com/blog/2023/07/18/microsoft-and-meta-expand-their-ai-partnership-with-llama-2-on-azure-and-windows/

https://www.qualcomm.com/news/releases/2023/07/qualcomm-works-with-meta-to-enable-on-device-ai-applications-usi

https://techcrunch.com/2023/07/18/meta-releases-llama-2-a-more-helpful-set-of-text-generating-models/

https://www.databricks.com/blog/building-your-generative-ai-apps-metas-llama-2-and-databricks