Can AI Ever Truly Transcend Human Intelligence?
Modern AI is fundamentally limited by a human-centred definition of intelligence, as large language models are trained on human-generated data that embeds our assumptions and biases. Even as LLMs surpass human performance, they remain extensions of human intelligence rather than genuinely super-human systems. Achieving true super-human intelligence would require models to develop independent foundational axioms, raising serious challenges around interpretability, alignment, and control.

Blog Post
Our definition of intelligence is inherently biased.
Naturally, we equate intelligence with human intelligence. This perspective significantly influences how we train and discuss AI models.
When training large language models (LLMs), we rely heavily on vast amounts of human-generated data during pre-training. This results in models that inherently adopt human-like "mental axioms" and foundational intuitions. Regardless of whether you believe LLMs truly develop internal world models, something I personally question, the fact remains that their knowledge base is fundamentally anchored in human intelligence.
Consequently, current LLMs are always limited by our conception of intelligence. Even innovative reinforcement learning techniques designed to enable exploration remain grounded within human-derived constraints, due to the pre-training phase. In practical terms, this means we might create AI that surpasses the smartest humans, but it would still represent the peak of human-like intelligence, not genuinely super-human intelligence.
To genuinely transcend human intelligence, AI must develop its own foundational axioms independently, without human biases or data. Allowing a model to form its unique understanding and modes of reasoning, unconstrained by our evolutionary limitations, is essential. Human intelligence, shaped by biological evolution, may merely be a local optimum rather than an absolute peak. It is highly unlikely that our brain architecture is optimal, since it was created by the imperfect process of biological evolution and natural selection.
However, creating such independent super-human intelligence raises significant concerns. Current LLMs, despite using our language and data, are largely uninterpretable black boxes. An entirely novel intelligence communicating differently would be even more incomprehensible, making alignment and control extremely challenging. We must ask ourselves if the potential benefits of an unexplainable super-human intelligence outweigh the risks, or if creating it is feasible at all. Would we even be able to detect the outputs of something that transcends us completely?
Another perspective suggests scaling current LLMs until they surpass human intelligence and then using these advanced models to discover pathways to genuine super-human intelligence. Personally, I find this scenario daunting. While such an intelligence might accelerate scientific breakthroughs and prosperity, its incomprehensibility threatens our intellectual freedom. We would become mere pets of these AI models; beings that they would want to keep happy and healthy for their own amusement (and that is the best case scenario).
This post is more speculative than usual but underscores the crucial importance of mechanistic interpretability as we navigate these profound challenges in AI development. This is something that I have been thinking about daily as we build more and more complex systems in Tech Cortex, and transform a growing number of industries.