The nearest four year old just confirmed what we’ve suspected all along: that everyone, and I do mean everyone, knows how bubbles end.
How they begin is familiar, too. Whether it’s tulips, colonialist projects, or crypto, a novel premise and a promise of future returns draw a small crowd. Word spreads and speculators pile on, spurred by confident marketing and a fear of missing out. The crowd begins to buzz.
For bubbles with roots in the tech sector, the premise may have intrinsic merit. Beneath NFTs and alt-coins, blockchains do unlock new applications of trust and distributed computing. Innovations from the dotcom era enabled a more participatory internet. “Big Data” was out-hyped by “Data Science” and then “Machine Learning”, and in a story of successive cycles building on the past, here we are. The hype cycle turns, but technology comes along with it.
Which brings us to the present, with an influx of
flowing towards anyone with a
.ai domain. Some of these will manage quick
exits. A few of them will genuinely win. And those founded on nothing more
defensible than clever prompts and glossy pitchdecks will be outrun by
entrenched players with the data, compute, and financial resources for multiple
rounds of a very expensive game.
But just as before, the technology–Generative AI (GenAI) as a subset of AI, as a subset of Deep Learning, as a subset of Machine Learning (ML)–has legs. Science fiction has dreamed of interfacing with computers using natural language since before there were practical computers, and despite some important limitations–more on those shortly–the dream is fast becoming reality.
Decentralized applications and even the web itself have physical limits: bandwidth, latency, and the usual constraints of distributed systems. AI does, too, and particularly in the massive number of compute cycles needed for research and development.
The physical requirements of a pre-trained ML model, however, need not exceed a single, local device. We’re already deploying modestly-sophisticated applications onto Android TPUs. They aren’t fast, but they run–and the door is open to advancements in both hardware and software that will make them much faster, cheaper, and more accessible over time.
There are certainly obstacles to an AI-First future. But putting pencil to paper there aren’t fundamental limits. Current-generation AI systems built on years-old tech are already proving their merits, and the Law of Accelerating Returns suggests cheaper, more capable successors ahead.
That said, several factors stand between the current hype cycle and a genuine breakout–a technological revolution, or yet another bubble.
Change takes time. Even without the unusual degree of industrial and social reordering that AI may eventually bring about, the ethical, economic, and political concerns that surround any new technology are already pulling the brake lever on unfettered AI development.
At the coalface, too, there are three major technical challenges left to overcome.
The first of these is data. Though the Internet is quite literally made of the stuff, it’s rarely ready for direct use in training or running online systems. Hygiene, and licensing vary; access frequently collides with auth checks, paywalls, or legacy systems; and in an unfortunate case of self-citation an increasing percentage of the content available online was synthesized by older-generation LLMs. Models are only as good as their training set, and the organizations with existing sources of high-quality data (Github/Microsoft for code, or Adobe for images e.g.) are loathe to extend access to others.
The second is trust. “Correctness” in a stochastic process will always involve non-zero error rates, and ML models are both victim and example. Different applications depend on different degrees of correctness, from the near-total intolerance of highly-regulated settings (e.g. finance, medicine) to somewhat lower expectations for marketing content. Human supervision remains an important part of AI deployment across the spectrum, however, and tools and techniques that support it are trailing behind the models themselves.
Finally, and most fundamentally, is the cost of developing and deploying models at or near the state of the art. Measuring in dollars, GPUs, or kilowatts, you’ll reach the same conclusion: training and running large-scale models is expensive. If there’s a physical limit to overcome, this is it.
Together, these barriers wall off a landscape dominated by a handful of extremely well-funded actors. But as we’ll see shortly, their primary focus is on building infrastructural layers–with the applications atop those layers still yet to come.
computer (n): a programmable device that can store, retrieve, and process data – Merriam-Webster
Rewind for a minute.
AI’s key challenges arise from a different sort of computing than the kind we’re used to. This manifests in two ways: in the substitution of Machine Learning (ML) techniques for traditional programming, and in the realization that AI agents are themselves computers. That they don’t walk the same deterministic path from input to output as the computers we’ve had sixty years to get comfortable with is a source of both strength and profound philosophical challenge.
For the technologists and skeptics (if you’re a true believer already or just want the implications, feel free to skip ahead), the story arc goes something like this.
Traditionally, software development has involved translating whatever’s happening between the monitor and the keyboard into a machine language understood by some processing unit (physical or otherwise). This fitting of imprecise, human-language requirements in the real world to formal logic in the digital one has always been fraught, but programmers have developed many tools and techniques to aid in their task. For instance:
- Programming languages and compilers provide varying levels of assurance in a program’s completeness/correctness while closing the gap from natural language to a stream of bits
- Operating systems and virtual machines abstract away the details of the specific hardware that the program will be running on
- A host of tools support monitoring, inspecting, and debugging poor performance and exceptional conditions in production programs
- And semi-standard processes across the SDLC help identify incomplete requirements, increase maintainability, and verify implementation before programs even run
Still, a paradigm where programmers tell computers how to achieve a specific outcome requires a deep understanding of both the problem domain and the tools available within it. We can make it easier to anticipate edge cases and preempt error conditions, or–shifting from a procedural to declarative approach as a program matures, shift the burden elsewhere–but the skill and diligence of the programmer ultimately determine whether a program functions as intended.
The programming task changes dramatically with ML in the mix. Andrej Karpathy
offered “Software 2.0“ as a shorthand for the sort of software
that’s implemented by computers themselves. In Software 2.0, a human specifies a
program in terms of inputs, outputs, a model architecture, and training
configuration. Rather than instructing the computer how to use them, however, a
2.0 program is produced by a training process that iteratively adjusts the
parameters in a series of
n-dimensional vectors until an input passed through
them routinely yields a pre-specified output. The resulting program is totally
alien to those of us who don’t think intuitively in model weights, but it works.
Creating credible ML models depends on two things: having sufficient data to specify behavior through its full input range, plus sufficient computational power available to train it. Incomplete datasets (as most are) or shortcuts taken during the training process produce increased error rates when run in production. And unlike in traditional programming, where bugs tend to produce concrete exceptions, errors inside an ML model manifest as bias, hallucinatory output, and other less-than-totally-tangible misbehavior.
If this trial-and-error approach to development sounds expensive, it is. Both the volumes of data and the iterations needed to arrive on an acceptable “fit” between inputs and outputs scale with the complexity of the problem. Delivering 2.0 programs was a non-trivial feat back when classification and prediction problems were the order of the day; today’s GenAI problems are orders of magnitude more involved.
In terms of accelerating adoption and performance, then, there are three features of Software 2.0 programs to focus on:
- The astronomical computing resources required to train/author 2.0 programs (and runtime costs aren’t cheap, either)
- The smaller set of operations required to execute a 2.0 program (versus those required by the nearest general-purpose programming language)
- The shift from implementing programs (1.0 “coding“) to using data and prompt engineering to specify them
In one sense, anyone who’s interacted with an AI system is a programmer (or a “prompt engineer,” in the jargon of the day). We still need to specify the problem we want solved, but given a generally-capable model and sufficient training cases the heavy-lift implementation shifts to the machine itself.
Things get really interesting when we turn AI generally (and near-future foundation models in particular) back on the problem of traditional software development. An AI that can implicitly develop programs in natural language should be adaptable to programming languages, too, which is exactly what we’re seeing from Copilot, CodeWhisperer, and their like.
It’s important to underscore, however, that collaborative and automated interactions in the programming task aren’t new concepts so much as decades-long evolutionary trends. Both technology (type systems, syntax highlighting, code completion) and practice (TDD, pair programming) have steadily improved the process of building quality software. The state of the art depends heavily on the prior.
Even what have appeared as massive recent steps forward in capability may not be what they seem. The rush to market during a hype cycle tends to usher otherwise-hidden R&D efforts out of the shadows, while–as Stephen Wolfram memorably put it–capability may be less about better models than problems that are “computationally easier than we thought.”
What we can say, however, is that the compounding benefits of better models, larger data sets, more compute, and steadily-improving human/computer interactions are paying off. Code completion at the functional, module, and even system level remains imperfect–but vastly improved over just a year ago.
Seeing AI systems as computational resources built from (and running) entirely new sorts of “programs” returns to the three issues impinging a full technological breakout:
- Data – and how it’s obtained, cultivated, and applied
- Trust – and how monitoring, understanding, and verifying model behavior reinforces it
- Cost – and how accessible, reliable, performant hardware is key to widespread adoption
Fail to solve these, and–until the next “computationally easier” problem comes along–AI systems will be just another bubble.
Whether in training or as input to an existing model, data is the ingredient in developing and verifying ML programs–and a huge barrier to entry for everyone without access to large, extant datasets. The challenge is compounded for actors complying with data-privacy regulations and copyright law, as the insight tied up in data that are proprietary, personal or both will remain out of reach. Simultaneously, the corresponding temptation to trade moral scruples and legal liability for short-term gain may prove too much to resist, with severe reputational risk for the actor and industry alike.
Even when we can get the stuff, though, how do we know it’s good? Digital records typically exist for the past few decades, but with no guarantees of complete (or correct) data entry. More recently, content-creating AI systems introduce a path to Model Collapse–and a quick, ironic end to an otherwise promising enterprise.
Ultimately, already-inadequate systems for storing, controlling, and authenticating data need a massive upgrade. Even where technical answers exist (adopting blockchain technology to authenticate data provenance and ownership, for one), the changes needed to adopt them (carrying signing keys secured on our person) require a degree of acclimation that will not happen overnight. Finally, the patchwork of sometimes-contradictory legal and regulatory requirements that exist today to (ostensibly) guide use and deter misuse of sensitive data have yet to be fully tested against innovation on the ground. And as long as actors committed to proceeding responsibly lack definition on what exactly that means, “move fast and break things” will stay the order of the day.
Assume that adequate training data are accessible, though, and even that they were responsibly obtained. A trained model appears to be behaving correctly, at least as far as the training data reserved for verification, but once it’s out in the real world–how do we know?
Programs written in model weights (rather than something near to natural language) defy intuition. Recognizing and handling exceptions in production is a non-trivial task, complicated by our inability to understand the internal “logic” that led up to the event. If we can’t, as frequently we can’t, we laugh off a ludicrous (but transient) behavior, accept the error, and move on. But “acceptable error” means something different in a toy chat application than in medicine, transportation, or financial services, and lower trust in any application of a technology tends to erode trust in all of them.
The bar to trusting AI systems is high, and perhaps unreasonably so. Humans make mistakes, too. The alien nature of AI systems tends to yield failure modes that are unexpected, unfamiliar, and frequently hilarious–but not necessarily more frequent or damaging than those we would introduce ourselves. Despite a ludicrous vulnerability to traffic cones, there’s reasonable evidence to conclude that Autonomous Vehicles are already safer than human operators (though crash data from human-initiated incidents aren’t as accessible as those shared by major AV operators).
Whatever the bar, the state of the art in system verification is a tedious exercise in data collection and statistical analysis. Tools to inspect and tune behavior are crude and often prohibitively expensive. Advancements that let us better understand and “certify” AI systems have the potential to accelerate development and public confidence at a rate that simply isn’t possible with the black-box methods available today.
The last few bubbles have begun with software, and small wonder. Custom programs running on general-purpose infrastructure are much cheaper to design, build, and deploy than those that depend on changes to the physical world. As usage has scaled, however, subsequent hardware adaptations have both increased performance and created commercial opportunities of their own. Cloud Computing after the dotcom boom. The adoption of GPUs and subsequently ASICs in proof-of-work cryptocurrency mining. AI systems are no different.
In fact the convergence between mutable software prototypes and efficient hardware is already underway. AI-accelerator ASICs like Google’s Tensor Processing Units improve the efficiency of programs written within certain constraints (in this case, using the TensorFlow framework), while Cloud providers are racing to expand access to abstract compute resources.
Cost, access, and better techniques for extending base models will push this trend further. As we start to think of AI systems as computational resources (more EC2 than Windows), it will inevitably be deployed and priced as such. Imagine:
- a generally-capable base model is flashed to custom silicon
- purpose-built devices extend the frozen base model using parameter tuning, parameter injection, or another clever technique still await pre-publication on arXiv
- or they don’t, and applications built on top obtain the services of the base model as-is
- new base models are regularly swapped out (akin to old processors today)
In other words, the services of a sophisticated AI system would be accessible from a single, dedicated chip inside a mobile handset or home automation hub–or as a managed service available from your favorite Cloud Provider.
Which Providers know. They’re already offering software-defined outlines of the AI-backed computing services to come. Running on them on general-purpose hardware is expensive: both a tremendous opportunity for innovation and the key challenge to using AI at scale.
If history is any lesson, useful technologies tend to grow cheaper over time, and therein lies a danger: both data (a finite resource subject to a complex regulatory environment) and trust (a problem we can address today, however inelegantly) take a backseat to cost. So long as they can afford it, market forces will push less-than-responsible actors to continue rushing less-than-fully-trustworthy models trained on less-than-fully-deidentified-and/or-licensed data to production without adequate verification. These risk significant reputational consequences both for those involved and the technology generally. But the cat won’t go back in the bag.
Even with the vexing challenges of data, trust, and cost out of the way, the social and economic consequences of AI systems will still weigh heavily on widespread adoption. Without further progress on technical issues, however, those factors will remain academic.
Bubbles burst when ambitions fall short, reality sinks in, and excitement peters out. The technologies underpinning the current AI hype cycle aren’t anything radical or new–the Transformer architecture dates to 2017; neural nets have existed on paper since the 1940s–but engorged on data and computing resources they’re now delivering results that can feel akin to magic.
They aren’t magic, of course, but the real magic is coming. Data, trust, and cost will be solved (or, by outrunning the relevant authorities, rendered moot), and per Kurzweil’s Law likely sooner than we expect. Increasingly capable AI systems will follow, ushering in opportunities and challenges we may not yet comprehend.
I am not a techno-optimist, far from it. I’m a boring technology guy with enough scar tissue to keep an eye pointed downfield. And what’s down there, in this bubble and beyond, is ever-more capable AI.
Buy that, and it’s time for some serious conversations about what it means to coexist and leverage the capabilities to come. I’ll stick to implications surrounding my own work (building software and software teams), but suffice to say that there’s more to be said.
For developers, though, here are some guesses about where the future leads. Caveat emptor.
If you’ve stuck it out this far, you’re in on the secret: they already can. There’s a difference between toying around with the best prompt for the job or pair programming with an AI, however, and letting the AI build software itself, and so far the state of the art is.. in progress.
But a decades-deep history of computers tackling parts of the programming task gives some insight into where things are headed. Historically, better programming tools have let programmers:
- spend more time understanding and improving existing systems
- spend more time fitting their work to user needs
- spend less time programming
In other words, shortening the distance from “making it work” to “making it fast.” And as automated assistants take on even more of the implementation, developers’ time will shift further towards specifying the systems and ensuring the right things are being built. Domain knowledge, communication, and critical thinking skills will be crucial, both for successful interactions with AI systems and for collecting and understanding customers’ needs. Not that these didn’t matter before. But in an AI-enabled future, it’s a pretty safe bet that they’ll constitute the majority of what sets effective developers apart.
Expect more hybrid-roles–the developer/designer, the product-managing developer, and so on–as the barrier of learning programming languages evaporates (in some cases. Someone will always be running COBOL). Expect more time spent on user interfaces and human-computer interactions–though a machine able to intuit or iteratively settle on appropriate UI decisions may not be far away either.
Finally, expect extensive, ongoing negotiations about data access, privacy, and interoperability between disparate systems. An AI programmer will implement what we ask of it, but the political and legal wrangling behind the specification should stay in the human realm for the foreseeable future.
First, clear goals. Natural language is an inherently fuzzy medium for specification, and framing intentions with little room to deviate from the desired result has always been important in the programming task. Current-generation AI systems’ minimal intuition and common sense make it even more so, and programmers will need to find clever ways to describe and refine the system’s goals. One possibility may be an extension of multi-step reasoning where the system must implement a human-verifiable test suite before moving on to implementation. TDD, again. But for the machine age.
Second, all systems (including the human kind), deliver better results when operating in known environments. AI systems will do their best work when given familiar patterns and predictable interfaces to work with–just like our human colleagues today. Prioritizing legibility inside proprietary codebases and adopting common standards at the edges improves discoverability and understanding–so long as the behavior described in the documentation accurately reflects the behavior under the hood.
It isn’t a stretch to imagine a future-state Web made up of discovery services, OpenAPI interfaces, and state machines layered atop whatever data they serve–the more logic implemented on top of “known-good” building blocks, the better.
The take-home, though, is that the things that will benefit AI systems overlap heavily with what we should be doing for our human collaborators today. Keep it simple. Write as little as possible. Machines won’t be the only ones giving you kudos.
Most of a programmer’s hands-on time is spent anticipating and handling error conditions. AI systems are very good at addressing error cases that they’ve been trained to look for, but exceptions–unforeseeable by definition–may not be so easily addressed. Production telemetry of AI-developed software will go in two directions:
- AI systems will deliver code in programming languages and binary formats with existing support
- A new generation of reliability engineering tools will support instrumentation, inspection, and diagnosis of production issues.
We’ve hardly scratched the surface of what the second might look like. Many organizations have yet to adopt basic prerequisite steps (even as much as Data and Model Versioning), and model explainability remains a sticky, unsolved problem. Understanding and debugging alien, AI-generated systems at runtime will depend on massive improvements in debugging tools. Or better aliens.
Whether we’re working in existing tools or building better ones, injecting AI systems into the SDLC should also impact operations by shortening the distance between development and production. AI systems that can recognize runtime exceptions, author fixes, and deliver them to production offer a tantalizing path towards “self-healing” properties within established development workflows–no resource-management or runtime hacks required.
And those capabilities may be needed. AI systems deployed as computational resources pose a unique set of operational challenges, from hallucinatory responses to just-over-the-horizon security risks. The specific shape of the task will depend on the systems involved, as ever, but operators in an AI-enabled world won’t be bored.
Imagine working with the fastest, most knowledgeable, teammate you’ve had in your career. Multiply that by six. Or 60. The point is that AI systems are already able to search and evolve an entire codebase in near-realtime, and their value to the team will only increase as their capabilities continue to expand.
That’s important in several ways. AI systems’ knowledge is predicated on the information they’re given. Developers with on-demand access to that knowledge can spend less cognitive energy acquiring it themselves. And the nature of onboarding, training, and interacting as a team will evolve significantly as a result.
One safe bet for the future is that keeping process and operational data in machine-accessible form will be even more important than it is today. AI systems’ capacity for context-gathering and discovery activities is limited by the information available–and secrets locked away in team members’ heads don’t count. Natural-language documents do, though, and the ability to connect information from planning documents, support tickets, pull requests, and chat threads with hard operational numbers will make systems like Jira, Slack, and Gitlab even more valuable than they’ve been to date.
Shifting the repository of institutional knowledge from human team members’ brains into indexed, searchable datastores will free up time for everything else. Practices like code review and post-mortems exist in part to share understanding and refine processes; with AI systems picking up the burden on both sides, developers may find these practices re-centered on their primary goals (ensuring code works and root-causes are addressed). Any energy freed up in the doing may shift beyond the codebase–to the problem domain, to user interactions, and to business context–and in connecting the work more deeply with the customers depending on it.
Increasing access to the knowledge embodied in software projects will also reduce technical barriers to entry. Onboarding will be much faster than in the past, and the importance of hiring team members based on domain expertise and customer empathy will increase accordingly.
All of this points towards a future job description that looks rather different in the past. Software developers will rely less on their mastery of specific codebases and technical skills and more on business sense and problem-solving skills. The responsibilities of the role will look more like what we call “citizen development” today–albeit with vastly more capable tools at their disposal.
And if that’s you, there’s no time like the present to start thinking about how and where you might fit.
The future will doubtless be an interesting place, but we’re not there yet.
We’re still in an expanding bubble that hasn’t yet reached escape velocity.
.ai domains and marketing websites are commanding massive valuations
regardless of the value they’re delivering or the costs they incur. We’re still
facing down a host of unsolved problems around security, bias, misinformation,
labor shifts–and that’s if we trust our models at all.
But the technology beneath it all is much, much more than mostly-capable-and-occasionally-hilarious chat interactions. A toehold in natural language; a computing paradigm well-suited to hardware design; and a rush of venture money into actually applying them are a revolution-in-waiting for how software is designed and built.
What are you thinking? Buy the premise, or take a different view entirely? While I’ve offered a few suggestions about where we might be headed, they’re being challenged (daily) by both new research and my own evolving thinking. Wherever you’re at, I’d love to hear your thoughts and keep the conversation going.