After talking to founders, builders, and researchers involved with AI from the most recognized organizations to the obscure, I've noticed two schools of thought emerging when it comes to how artificial general intelligence will develop:
Integrated: This school of thought believes that there will be one model to rule them all. This integrated view supposes that with sufficient training and research, a general purpose model will be born, ready to handle anything we throw at it.
Modular: A belief that AGI will be modular and through composability we'll have AI that can call upon an arsenal of tools, plugins, and workflows to carry out a generic task.
In this post, I will present the usual assortment of thoughts, analysis, and predictions on this topic. This time, it'll all be pre-summarized for maximum digestibility1.
They can coexist
Just like how we live in a world with integrated Apple and modular Android devices, I believe there's enough room for two approaches to AGI. One thing we should acknowledge is that while AGI is defined as the ability for AI to carry out any intellectual task a person can, it does not mean that the specific flavour of AGI is efficient at doing all tasks. In other words, AI may specialize and we might find some tasks more suited for an integrated approach and others more suited for a modular approach. OpenAI has shown that they have an appetite for juggling both as they've just announced a more powerful GPT-4 model (integrated) and extensible ChatGPT plugins (modular).
It's really a spectrum with tradeoffs
The distinction between the integrated and modular approaches is not actually rigid. As with most spectrums, the extreme ends don't work in practice either. The feasibility of building everything into one model is questionable as baking everything in one model would likely require incredible computational complexity and scaling challenges. Perhaps computation and scaling are achievable with advances in quantum computing but an integrated model will also need vast amounts of data which is becoming more difficult to obtain in a world where everyone is wary about what all the AI models are vacuuming up.
Putting aside the concerns of feasibility, the mere existence of one model to rule them all is impractical. An integrated superintelligent model would need a vast continuous feed of knowledge and the ability to adapt to perform any given task. There are two major issues with this:
It'll be inefficient and expensive to tap an all-knowing AI to turn on your light bulb
It's very dangerous to have one all-knowing and powerful black box model that can't be reasoned with
The other extreme looks elusive as well. A modular approach may suffer from diminishing returns with more components added. In a maximally modular system, the complexity of managing countless specialized components would make it difficult to maintain efficiency and coherence. It’s possible that the individual modules may struggle to communicate effectively or work harmoniously with each other, creating additional bottlenecks and latency.
ChatGPT Plugins are being compared to the App Store but there’s a whole new challenge for these kinds of AI app marketplaces. Plugins open up vectors for prompt injection and attack. A marketplace provider also needs to ensure that all the plugins play nice with each other. Finally, from a user’s perspective, writing effective prompts for an initial model to communicate with and prompt modules introduces a whole new level of complexity – similar to a game of broken telephone.
Modular AGI will come first
My belief is that AGI will appear on our screens from the modular side of the spectrum first. It'll incorporate the best aspects of foundational models, like GPT-4, which can be further extended with retrieval abilities and access to browsers, code, and APIs. This hybrid approach would allow for the adaptability and wide-ranging capabilities of an integrated foundational model while still getting the most out of the focused expertise of modules.
Foundational models will continue to be important in the modular approach as there are a few key reasoning steps needed to orchestrate everything. First, the foundational model needs to interpret and break-down the initial prompt into smaller problems that modules can solve. Then, the foundational model needs to route tasks to the appropriate modules. Finally, the foundational model needs to synthesize and assemble a response or request additional information from the user.
It remains to be seen how important the specific provider of the foundational models will be. Some are claiming that large language models (LLMs) are going through a Stable Diffusion moment as fine-tuned LLM models like Alpaca are prancing out in the wild on phones and laptops rather than being fenced in servers. Apple, the most valuable company in the world, is also throwing its weight into better local hardware and optimizations to run AI models.
It'll be a wild ride
My personal obsession with All the AI started in late November and it hasn't slowed. Companies, both big and small, and hackers from all corners of the world are building and more importantly, they are shipping. There'll be no shortage of progress.
My English-as-a-third-language parents were having a hard time following my last couple crazy posts. This one is for you dad!