John Thornhill humorously noted in a recent article for the Financial Times that predicting the future of technology is a tough gig, even for the brightest minds. He offers a story about Bob Metcalfe, the inventor of Ethernet, who confidently predicted in 1995 that the internet would experience a catastrophic collapse—or a “gigalapse”—the following year.
When his prophecy didn’t come to pass, Metcalfe did the honorable thing: he ate his own words. Literally. Amid chants of “Eat, baby, eat!” at a tech industry event, Metcalfe tore up his failed InfoWorld column, tossed it in a blender, and consumed the resultant paper smoothie. Metcalfe’s humbling moment is just one of many blunders in a long history of misguided predictions: everything from the camera and electricity to airplanes, television, and computers, showcasing how often experts have been spectacularly wrong.
Thornhill wisely observes that reality usually falls somewhere along a standard distribution curve — that most of the forecasts we’re making today regarding AI are probably overblown in one extreme or another. Those who dream of AI ushering in a new era of radical abundance are likely to be disappointed. Similarly, the doomsayers predicting AI-induced human extinction are probably also off the mark. Not that anyone will be around to prove them right if they are.
The broad range of technologies that we call Artificial Intelligence or AI are going to change the world, but not necessarily in the ways many speculate. Many of the great innovations and breakthroughs may be sudden, but implementation and market adoption may also be gradual, transparent and unremarkable. Like most innovation, its impact on our daily lives may have a long tail.
Will this time be different?
Kevin Frazier, Alan Z. Rozenshtein and Peter N. Salib recently observed in a Lawfare.org article that one of the most striking patterns in artificial intelligence isn’t about technology—it’s about psychology. Time and again, experts and observers have underestimated both the pace of progress and the magnitude of breakthroughs. When GPT-3 was launched it was widely believed that coherent essay-writing AI was still years away. When ChatGPT debuted in late 2022, the consensus was that reasoning and coding abilities were still distant prospects. And when GPT-4 arrived in 2023, it was seen as the pinnacle of current approaches.
Then came OpenAI o3, which continued the trend of surpassing expectations. Just a day before its release, the idea of achieving near-human performance on the ARC-AGI benchmark seemed wildly speculative. Even optimists thought it would take fundamentally new paradigms, years of research, or immense computational resources to improve. They were wrong.
Frazier, Rozenshtein & Salib say that this persistent underestimation can be explained by what psychologists call “exponential growth bias”: our difficulty in grasping exponential progress. Human intuition is linear; we expect tomorrow to resemble today, with improvements accumulating gradually. However, technological progress, particularly in AI, follows exponential curves. Each advance builds on the previous ones, creating accelerating feedback loops. What seems impossible today can become routine tomorrow.
In their view, current predictions about AGI timelines are likely too conservative. The convergence of multiple scaling laws, rapid improvements in hardware, and the potential for unexpected breakthroughs suggest that progress could be much faster than anyone anticipates. Betting against rapid progress towards AGI ignores a clear historical pattern.
However, all is not what it seems. Anthropic co-founder Jack Clark reflects in a blog post that o3 is evidence that AI “progress will be faster in 2025 than in 2024.” Test-time scaling refers to OpenAI’s use of increased computational power during ChatGPT’s inference phase—the period after a user presses enter on a prompt. Although the specifics remain unclear, OpenAI might be employing additional computer chips, more powerful inference chips, or extending the duration these chips run—sometimes up to 10 to 15 minutes—before producing an answer. Despite the unknown details of o3’s development, these benchmarks indicate that test-time scaling could enhance AI model performance.
While o3 may renew confidence in AI scaling laws, OpenAI’s latest model employs an unprecedented level of computing power, resulting in a higher cost per response. Barron’s magazine reports that the expenses can add up quickly if users are asking difficult questions. In one of the advanced benchmarks that OpenAI presented in its livestream, the cost per task was estimated at $20, with an average task completion time of 1.3 minutes, even while using a so-called high efficiency version of the model with a ~12% error rate. Other reports claim the normal version costs may be on the order of $1,000 per task. The actual costs and details are unknown. Humans with access to existing aids like search and knowledge bases can solve the same problems for as little as $5.
“Perhaps the only important caveat here is understanding that one reason why o3 is so much better is that it costs more money to run at inference time—the ability to utilize test-time compute means on some problems you can turn compute into a better answer,” Clark writes in his blog. “This is interesting because it has made the costs of running AI systems somewhat less predictable—previously, you could work out how much it cost to serve a generative model by just looking at the model and the cost to generate a given output.”
It’s impossible to discuss this topic without first couching it in the broader context of the problems we face in the year 2025. AI technology is shockingly and irresponsibly energy intensive at precisely the point where most of the world had already resoundingly agreed that this is our last chance to set ambitious decarbonization targets to slow or reverse the most destructive effects of climate change. Nature magazine reports that according to the International Energy Agency (IEA), in 2022, data centers consumed 1.65 billion gigajoules of electricity — about 2% of global demand. By 2026, the agency projects that energy consumption will have increased by between 35% and 128%. This would be like adding the annual energy consumption of the entire country of Sweden at the lower estimate or Germany at the top end. What evidence do we have that even the most sophisticated use of AGI for scientific research or widespread implementation of AI in industry will offer a return on investment that is worth the energy costs? The propaganda is that we’re betting on super-intelligence to solve mission critical problems of energy, economics, food scarcity and global warming just in time to make it all worthwhile. Is that a bet or a Faustian bargain? We already have clear solutions to those problems: renewable and nuclear energy technology.
The global target of keeping global temperatures within two degrees of pre-industrial growth is to prevent dangerous and cascading tipping point effects, where many areas of the world would experience multiple simultaneous impacts such as more intense forest fires, thawing permafrost, and desertification. Shifting of climate zones toward polar regions could cause widespread disruption or collapse of agriculture. For example, the US bread basket regions have the perfect combination of top soil depth and temperature to grow large quantities of staple crops like wheat and corn. The lands north of that delicate region lack the right topsoil quality. These changes impact ecosystems and societies, and can become irreversible once tipping points are crossed.
The fervor surrounding artificial intelligence is already impacting climate goals. In the United States, plans to decommission polluting coal power plants have slowed by 40%, with politicians and industry lobbyists citing the need to win the “AI war.” Microsoft, which had aimed to be carbon negative by 2030, has walked back that goal after its 2023 emissions were 30% higher than in 2020. Brad Smith, the company’s president, explained that this ambitious goal was set before the “explosion in artificial intelligence,” making the target now seem “five times as far away.”
Meanwhile, Google has also witnessed an increase in emissions and no longer claims to be carbon-neutral, extending its net-zero emissions goal date further into the future. This shift follows the termination of employees who raised concerns about the environmental costs of generative AI.
The unrelenting discourse on AI has exacerbated the existing climate emergency, offering companies a pretext to backpedal on their already precarious environmental commitments. The carbon emitted in this process cannot be reversed, further complicating efforts to mitigate climate change.
However, this massive surge in computational investment may have a rather tarnished silver lining: Big Tech is actively pursuing investment in nuclear energy and the private sector may have the potential to break the deadlock of public fear and regulatory friction through innovative engineering. The problem is all this requires massive amounts of capital over long periods of time. Nuclear power requires sustained investment before ROI is achieved, and unlike the government, private capital is limited and may not have the stomach or pockets to see it to fruition. It may also just bankrupt the market on something that offered little practical value to the average person.
Furthermore, the global political power landscape is beginning to shift toward a new multi-polar order that, combined with climate disruption, regional food scarcity, population displacement and nuclear weapons could result in very negative outcomes. AI itself already throws a massive wrench in existing geopolitical strategic game theory with its potential applications in autonomous weapons systems and cyber warfare. My generation grew up during the Cold War where we were constantly reminded of the threat of Nuclear annihilation. Generations born after 1990 may not appreciate that we still live in a world where that is not only possible, but under the wrong circumstances could be likely.
Finally, AI simultaneously promises great growth in productivity while potentially displacing a lot of jobs in the labor market. While technological innovation and labor displacement are nothing new, those shifts often come at the expense of great social conflict and even war. Economic systems are ultimately circular. Demand and supply must meet in the middle.
Are We In a Hype Bubble?
Luciano Floridi at Yale University’s Digital Ethics center surmises that the current enthusiasm surrounding artificial intelligence bears striking similarities to five previous tech bubbles: the Dot-Com Bubble, the Telecom Bubble, the Chinese Tech Bubble, the Cryptocurrency Boom, and the Tech Stock Bubble. These bubbles have notable parallels: the advent of potentially disruptive technologies, speculative fervor outpacing practical reality, the rise of new valuation paradigms, substantial involvement from retail investors, and insufficient regulatory oversight.
In the finance sector, sentiments remain strong but cracks are beginning to form. Goldman Sachs reports that in 2024, leading global corporations have markedly intensified their focus on artificial intelligence , particularly generative AI. Tech behemoths such as Google, Amazon, Microsoft, and Meta are at the forefront of this movement, pouring unprecedented financial resources into harnessing AI’s full potential. Projections indicate that global investments in AI will exceed $500 billion by year’s end, spurred by advancements in research and development and the proliferation of AI-enabled products. The scale of expenditure and energy consumption is staggering.
Recently, venture capital firm Sequoia Capital highlighted that “the AI bubble is reaching a tipping point,” after failing to find a satisfactory answer to last year’s question: “Where is all the revenue?” Similarly, Goldman Sachs’ recent report, “Gen AI: too much spend, too little benefit?”, underscores this sentiment. Their global head of equity research remarked, “AI technology is exceptionally expensive, and to justify those costs, the technology must be able to solve complex problems, which it isn’t designed to do.”
The report also notes that even if AI fails to “deliver on its promise,” it may still yield investor returns, as “bubbles take a long time to burst.” Financial experts are pointing out that capital expenditures on components like graphics cards and cloud computing have not been matched by corresponding revenue, and there seems to be no clear solution in sight. This phase is characterized by a gradual decline in the product’s prominence on major stock exchanges, rather than an immediate devaluation.
A widely circulated report by Gartner in November notes what many in the software engineering community have been saying for years, that “AI” writ large is following the familiar S curve hype cycle that has been seen repeatedly throughout history.
According to Gartner, Generative AI has now moved beyond the Peak of Inflated Expectations. As we approach the end of 2024, the true value of AI is emerging from projects that employ established techniques, either on their own or in tandem with Generative AI, utilizing standardized processes to facilitate implementation. AI leaders are advised by Gartner not to focus exclusively on Generative AI but to explore composite AI techniques that integrate methods from innovations across the entire Hype Cycle.
Similarly, according to a report by the Boston Consulting Group titled “Where’s the Value in AI?”, the firm surveyed 1,000 CxOs and senior executives from over 20 sectors across 59 countries in Asia, Europe, and North America, covering ten major industries. Participants were asked to evaluate their companies’ AI maturity in 30 key enterprise capabilities.
BCG’s findings are disappointing. A staggering 74% of companies have yet to realize tangible benefits from their AI initiatives. Only 4% of firms have developed “cutting-edge” AI capabilities across functions and consistently generate significant value. Moreover, a mere 22% have implemented an AI strategy, built advanced capabilities, and are only beginning to see gains.
For those of us who have been on the front lines of the first wave of AI adoption in engineering, the experience has been interesting. A rush to find use cases for the current generation of general purpose models has led to annoying, invasive and lackluster user experiences that are often worse than previous generations of deterministic curated systems. For example, critics have ridiculed Microsoft and Google’s invasive Copilot and Gemini features as something akin to Microsoft’s despised “Clippy” but with 20x more energy consumption. There’s also the curious phenomenon that this wave of AI use cases are not improving quality of life for workers. I’ll paraphrase what what one critic working as a creative producer put beautifully in terms of Aristotelean eudaimonia: instead of cleaning my house and doing my dishes so I can write and make art, this technology writes and makes art so I can clean my house and do my dishes.
Behind the PR hype, company AI implementation can be caricatured as “the suits,” incentivized by fear of being left behind in the AI arms race, demand that their companies aggressively find use cases for AI. Personnel begin to notice that any project or initiative with “AI” in the title is immediately highlighted and closely watched by senior management. Any SAAS or ERP solution marketing some AI magic sauce is suddenly a greater value proposition. However, engineers and product managers across multiple industries are reporting the same problem: difficulty identifying use cases that are viable, practical and cost effective. Yes, we can add AI to this product or internal process, but it’s expensive and isn’t objectively better than this alternative non-AI solution we already have or that our engineers have come up with at a fraction of the cost.
The Real Value is Loading…
Why might this first stage of AI hype be premature? What high value possibilities are beginning to emerge in various business sectors and applied engineering?
Artificial Intelligence research is nothing new. ChatGPT simply offered the “light bulb” or “iPhone” tipping point moment where a technology passed the Turing test milestone, captured the public imagination and spawned an investment frenzy. While it is likely a hype bubble, this is net positive for two reasons: first, it represents a powerful “moonshot” moment that unleashes the collective imagination of millions and second, the flood of interest and investment has dramatically collapsed the R&D timescale. The downside of all this speculation and rushed research is that some amount of capital allocation looking for magical Utopian unicorns tends to overlook the practical non-sexy winners that are beginning to emerge and offer real value in unexpected use cases.
While AI may not represent an innovation on the order of something like nuclear fission, it’s potential is probably something on the order of the Internet in the 1990s and will also result in a speculative boom bust cycle. The potential may not be wrong, it will just be too early. Lest we forget that among many of the great ideas and companies that were founded in the late 90s only to collapse precipitously, most of those early ideas emerged in some variation as the world’s most profitable businesses over the next twenty five years. The rapid clip at which AI breakthroughs on critical benchmarks are occurring can be deceptive. The underlying question is not just the core technology, it’s how it can be applied in science and engineering to deliver practical cost-effective value at scale and without egregious energy consumption.
The current generation of general purpose LLMs has two fundamental problems: accuracy and fit for purpose. Their command of human language prediction provides confident responses that are also unfortunately often factually incorrect. That’s probably acceptable in low risk applications like consumer information systems such as knowledge bases and search engines, but not in medium to high risk mission-critical applications. For Silicon Valley industries developing consumer products, the worst that could happen is somebody gets a non-factual answer, clicks on the wrong ad or recommendation. Improving the user experience with an interactive LLM might be worth the trade-off, and these technologies will no-doubt improve in order to remain viable. However, in high availability systems, health diagnostics or industrial applications, a 95% accuracy rate is unacceptable, leading to catastrophic failure, loss of life, money and reputation.
The less exciting use cases are composite AI techniques utilizing less expensive LLMs with other solutions for a range of domain-specific applications. One of these example use cases I’d like to explore is the domain expert agent. Two attributes are invaluable in applied engineering and operationalizing highly specific proprietary knowledge systems: domain expertise and accuracy. It’s often difficult for those outside these industries to appreciate how fundamental this human intellectual capital is to a company’s productivity and growth.
Companies are constantly struggling with several challenges related to human capital: hiring talent of all levels of expertise, training them and retaining their knowledge in some meaningful way when they depart.
For basic entry-level positions that require minimal domain expertise, such as Customer Service Representatives, it’s easy to see the tremendous potential of agents to provide onboarding, real-time ongoing training, assistance, and error-correction of data input. Many companies like Salesforce are already rapidly pursuing this as a highly practical low-hanging fruit use case. In the early 2000s I worked at Dell Corporation as a business sales representative and even after two weeks of on-site deep-dive training and study of the company’s wide offerings, each desk was given a “chat lifeline” to a senior expert’s desk who could answer questions in real time to provide potential customers with accurate information. Today that expertise can be captured in an AI agent.
The same paradigm applies to high-level domain experts. A new graduate hire with a BS or even a PhD in computer science, finance, electrical engineering, chemical engineering and so forth should be readily able to recite the broad foundational information they learned in school, but that rarely prepares them for the specific applied use cases involving the proprietary systems of their new employer. A common problem that senior engineers and engineering managers complain about is the opportunity costs associated with “churn” requiring the best talent with 10-20 years of experience to spend most of their time training and coaching on existing proprietary systems instead of doing what they do best. Expert agent AI is a potential solution to this perennial problem with tremendous productivity upside.
Intellectual Capital Retention
In my own experience as a software engineering manager, our teams are constantly tasked with reviewing or interfacing with production software systems where the people who developed and built them have retired or left the organization. In situations where their team had the discipline to maintain quality technical documentation, it reduces the potential scope of discovery from weeks or months to days or hours. In some situations, the lack of documentation on a complex system represents so much technical debt that it’s ultimately deemed cheaper to just rebuild it, and companies often do, potentially squandering untold value in the process where the solution will be underestimated and mistakes will be repeated. Ask any veteran engineer and they will usually tell you that mature battle-tested enterprise software is priceless. Antique computer software written in relatively ancient languages like FORTRAN and COBOL still form the backbone of many financial, logistics, energy and manufacturing systems. Reg Harbeck noted in a 2021 report that approximately 250 billion lines of COBOL are still in use. Companies are desperate to find developers who know these legacy languages, and reward them handsomely.
Modeling expertise involves capturing and applying knowledge. Despite its obvious importance, the current machine learning landscape often neglects the “capture” phase, focusing instead on algorithm application. This oversight is particularly evident in how companies fail to document the tacit knowledge of experts.
For instance, capturing the knowledge of a retiring expert can be crucial for solving predictive maintenance problems. This involves getting experts to verbally share their insights, which are then structured and encoded using generative AI. This process transforms implicit knowledge into a format that can be systematically applied, making it accessible for future use.
The ability to seamlessly interface with machines through natural language and other means represents a significant advancement. By effectively capturing and utilizing expert knowledge, companies can develop more robust AI-driven solutions.
Generic models possess vast knowledge but lack specificity regarding individual processes. These models are not sufficiently specialized to address unique languages, systems, and processes. The goal, therefore, is to develop an expert model. An expert model can be layered with industry expertise, then fine-tuned to company-specific, tool-specific, and even process-specific knowledge. However, this specialization alone is insufficient. Expert models also need planning and reasoning capabilities—the ability to iteratively refine and solve problems.
Agentic AI incorporates planning and reasoning capabilities via a combination of component technologies in a system architecture. These technologies may include but are not limited to:
- Retrieval-Augmented Generation or RAG which bridges the gap by allowing LLMs to access and process information from external sources, leading to more grounded and informative answers.
- Scalable storage allows agents to retrieve relevant context efficiently, preserve state, and maintain history for adaptive planning. Scalable vector and hybrid databases, like Astra DB, can provide the storage and retrieval needs for agentic applications that use widely varying data types and volumes across integrated services.
- Workflows for structured processing via solutions such as LangChain or Dagster.
- Planning ahead involves the ability of agents to forecast potential scenarios, set intermediate goals, and adjust actions dynamically as new information arises. Tools like LangChain, Hugging Face, and the OpenAI API can be used to create planning modules, while workflow orchestrators like Apache Airflow manage complex, multi-step plans effectively.
- Component orchestration via technologies such as Apache Airflow ensures that all parts of an agentic system work together to achieve desired outcomes.
- Guardrail systems – agentic AI must function independently, it requires advanced guardrails that manage security, monitor ethical behavior, validate actions, and provide transparency through logging and auditing. This oversight ensures agents behave responsibly and ethically.
Essentially, an agent is goal-oriented, capable of iterating over a problem to find a solution. Unlike current LLMs, which lack innate planning and reasoning abilities, agentic AI employs a framework that enhances these capabilities. This is exemplified by the Hierarchical Task Planning model which breaks down tasks into subtasks, evaluating if they can be solved in one step. If not, the system further decomposes the task.
A practical application of this is the OODA loop—Observe, Orient, Decide, Act—commonly used in process engineering and even in critical situations by jet fighter pilots. This iterative loop involves continuously reassessing the environment and available resources, making decisions based on the current context, and taking action accordingly.
One example of agentic AI in action is the open-source project Open SSA (Small Specialist Agents), which allows for the integration of expert knowledge into AI systems. This approach promotes innovation by enabling the development of proprietary solutions without the constraints of proprietary models. By focusing on planning and reasoning, agentic AI represents a significant advancement in the field.
Silicon Valley startups and traditional tech companies are struggling to find practical applications for generative AI, a situation that has led to a surprising trend: industrial companies are becoming early adopters of this technology. This is unprecedented, as industrial firms are typically seen as slow adopters, or even “stupid” from a Silicon Valley perspective. Industrial companies deal with far more complex high-stakes problems. At Google, a mistake might result in a misplaced ad, but at Panasonic or Eastman, errors can be life-threatening.
Industrials have been cautious, but generative AI allows them to capture and utilize domain expertise more effectively. This ease of capturing and applying specialized knowledge is why I believe the industrial sector will leap ahead in leveraging generative AI.
There’s an old joke that says, if x is a symbol for the unknown and “spurt” is a word that describes a drip under pressure, then an expert is an unknown drip under pressure. One of the challenges often encountered is the differing conclusions reached by experts. Even when experts agree, their conclusions can sometimes be wrong. Testing these theories involves interacting with physical systems, which is both costly and time-consuming. This issue is frequently discussed.
Interestingly, we are more accepting of machines than humans. Machines also reach different conclusions, but we perceive them as more deterministic. In these systems, we don’t think in terms of correct versus incorrect, but rather better versus worse. If different conclusions occur 5% of the time, the remaining 95% can still be deployed. To handle the 5%, additional models or human intervention can be employed.
In the engineering industry, domain knowledge is captured through standard operating procedures or SOPs and statistical data, which encompass both expert and machine insights. While these methods are advanced, they are not comprehensive. Uncaptured knowledge often emerges in manual interviews, where experts recount unique problem-solving experiences.
For example, setting aside a budget for top domain experts at a corporation to interact with an LLM to describe various scenarios through their career where without the intervention of expertise a mission critical production issue would not have been solved. Like a veteran telling war stories, any engineer with greater than ten years of experience can probably recount many of these incidents with a high degree of detail.
For instance, an industrial pipeline engineer in Phoenix may have identified yield problems linked to pressure fluctuations in a chamber. The issue traced back to a newly installed piece of equipment on the same gas line. This insight was likely not documented in SOPs but was crucially remembered by the engineer.
The grand opportunity lies in capturing and utilizing this undocumented expertise to enhance AI-driven solutions. This blend of documented procedures and expert knowledge ensures more robust problem-solving capabilities.