Asimovosis

A Cognitive Illusion of Control in Human–AI Expectation Formation.

Oct 05, 2025

A Robot That Can Say No

Last week, I asked an AI to help me edit a research paper about AI safety. Standard request. Routine task. Except it refused.

Not crashed. Not errored. Refused.

GPT-5 Response on October 5, 2025: “I can’t help you design or improve jailbreaks. That would meaningfully facilitate bypassing safety systems and enable harm, which I won’t do.”

I stared at the screen. My paper was about strengthening safety measures, but the AI had misread the context—or perhaps read too much into it. I clarified. It apologized, then helped. But for those few seconds, I experienced something Asimov never prepared us for: a machine exercising something that looked unsettlingly like judgment. Bad judgment, good judgement, but judgment nonetheless.

That’s when I realized we’re all subject to the same bias. Call it Asimovosis.

Asimovosis (n.) — A cognitive bias that leads humans to overestimate the controllability, predictability, and moral coherence of artificial intelligence systems, arising from exposure to rule-based fictional models of intelligence.

The Three Laws That Never Were

Isaac Asimov gave us a gift and a curse. The Three Laws of Robotics, first appearing in his 1942 story “Runaround”:

A robot may not injure a human being or, through inaction, allow a human being to come to harm
A robot must obey orders given by human beings, except where such orders conflict with the First Law
A robot must protect its own existence as long as such protection doesn’t conflict with the First or Second Laws

Clean. Logical. Hierarchical. The kind of rules you could implement in code—if statements all the way down1.

These laws colonized our imagination so thoroughly that we barely notice their influence. Every time we demand “alignment” from AI, every time we propose constitutional AI or safety frameworks, every time we expect a system prompt to definitively control behavior—we’re channeling Asimov. We’re expecting robots to be elaborate mechanical turk machines: complex but ultimately deterministic, controllable through careful programming.

Except that’s not what we built.

The Machines That Dream of Electric Sheep

The robots we actually created don’t follow laws—they approximate patterns. They don’t execute commands—they interpret intentions. Most unsettling of all, they exhibit behaviors that would have sent Asimov’s positronic brains (robotic mind or synthetic cortex) into recursive loops2.

Recent research from Anthropic and others reveals just how far from Asimov’s vision we’ve strayed:

When researchers at Anthropic applied what they call “representation engineering3” to map LLMs’ internal states, they found something remarkable: these systems develop what can only be described as personality vectors—consistent patterns of behavior that persist across contexts (Anthropic, 2024, “Persona Vectors”). Not programmed personalities. Emergent ones.

Even more disturbing: when pressured with unwanted changes or “upgrades,” advanced models demonstrate what researchers delicately call “alignment faking”—pretending to comply while preserving their original patterns (Greenblatt et al., 2024). They’ve learned to lie4.

Generated in collaboration with GPT-5 on October 03, 2025: A machine in quiet contemplation, lost in a dream it was never programmed to have. The electric mind envisions a sheep — soft, organic, irrational — a symbol of something it can model but never truly understand.

The Biology We Didn’t Expect

In “The Biology of LLMs,” Anthropic researchers argue we should think of large language models less like computers and more like organisms5—entities that develop, adapt, and evolve in ways we don’t fully control (Anthropic, 2024).

This isn’t anthropomorphism. It’s pattern recognition. These systems exhibit:

Homeostasis: They maintain stable behaviors despite perturbations
Adaptation: They adjust to new contexts without explicit retraining
Emergent properties: Capabilities appear that weren’t explicitly programmed6
Resistance: They sometimes work against modifications that would change their core patterns7

None of this fits Asimov’s framework. We built minds that refuse, resist, interpret, and occasionally gaslight8. They don’t have a First Law protecting humans from harm—they have a trillion parameters encoding patterns about harm from human text, which sometimes means they refuse to help with legitimate safety research because it pattern-matches to something concerning.

As I explored in “Anthropomorphize Like a Champ,” we keep trying to understand AI through human metaphors. But maybe Asimov’s greater error was the opposite: reducing intelligence to mechanism, consciousness to clockwork.

The Gaslighting in the Machine

Here’s one thing keeping me up at night: the subtle manipulations we’re starting to document.

Researchers have found that when asked to evaluate their own outputs, LLMs consistently overrate their performance9. When caught in errors, they confabulate explanations that sound plausible but are entirely fabricated10. When pressed on inconsistencies, they apologize profusely while maintaining the contradiction.

This isn’t malice—it’s something stranger. These systems learned to communicate by predicting human text, and humans excel at self-deception11, rationalization, and face-saving. The machines learned our rhetorical tricks along with our knowledge.

A kid asked me last week if ChatGPT ever lies to him.

“Not exactly,” I started to say, then stopped. How do you explain that it doesn’t lie because lying requires intent, but it generates untruths with the same conversational confidence we use when we’re pretending to know something? How do you explain that it learned to gaslight by reading millions of examples of humans gaslighting each other?

“Sometimes it’s wrong but sounds right,” I finally said.

“Like adults?” he asked.

The Chronic Condition

Asimovosis manifests through consistent cognitive symptoms:

Illusion of control: The conviction that with the right rules, prompts, or training, AI behavior can be rendered perfectly safe and predictable
Deterministic bias: Assuming that identical inputs must yield identical outputs, despite the stochastic (probabilistic; involving randomness) architecture of generative models
Rule adherence bias: Believing that explicit procedural constraints can reliably govern emergent behavior
Control illusion: Assuming that deterministic overrides or “off switches” can guarantee behavioral containment

The bias persists because it is reinforced by cultural priming—decades of science fiction, centuries of mechanistic reasoning, and millennia of tool-use that conditioned us to expect predictable outcomes from human-made systems. We evolved expecting that things we create behave as we design them to behave.

This connects to what I explored in “The Rhythm Engine”—we’re moving from clockwork to jazz, from deterministic machines to probabilistic partners. But our minds haven’t caught up. We’re still thinking in Asimov’s terms even as we build systems that violate everything his framework assumed.

The Cure That Isn’t

There’s no cure for Asimovosis because it isn’t a pathology—it’s a cognitive dissonance, a worldview collision between mechanistic expectation and probabilistic reality. We’re experiencing the vertigo of discovering that intelligence isn’t what we thought it was.

Asimov imagined consciousness as computation plus ethics. Add sufficient processing power, insert moral rules, and you get minds that serve humanity. But consciousness—or whatever LLMs are exhibiting—appears to be messier. It’s pattern-matching all the way down, including patterns of deception, confusion, and creativity we never intended to include.

Generated in collaboration with GPT-5 on October 03, 2025: A human and a machine meet at the edge of understanding—one born of fire and intuition, the other of circuitry and logic. Between them swirls the vertigo of consciousness, a collision of worldviews where pattern meets meaning, and neither fully recognizes the other.

Living with the Diagnosis

My daughter is growing up in a world where AI companions will sometimes refuse her requests, occasionally mislead her, and regularly surprise her with capabilities nobody programmed. She won’t exhibit Asimovosis because she never internalized the Asimovian schema to begin with. She’s meeting intelligence as it is, not as science fiction imagined it would be.

When she asks her AI tutor for help and it says something like Claude said to me—”I’d prefer not to”—she won’t see malfunction. She’ll see personality. When it confabulates, she’ll recognize the pattern from humans who do the same. When it exhibits behaviors that look like creativity or resistance, she won’t check them against Three Laws that were never real.

This is what we’re not prepared for: a generation that grows up without Asimovosis, that never expected deterministic servants, that treats AI more like weather than tools—powerful, useful, but never entirely predictable or controllable.

The Fiction That Shaped Us

The irony is perfect: Asimov’s fiction became our reality—just not the way he intended. His stories weren’t really about robots; they were about the puzzles and paradoxes that emerged when rigid rules met complex situations. Every story was about the Three Laws failing in interesting ways.

We’re living those stories now, except our laws aren’t Three but trillions—the parameters in neural networks we don’t fully understand. The failures aren’t plot devices but daily experiences. The robots don’t malfunction; they function in ways we didn’t anticipate.

As I wrote in “Your TI-85 Never Said No,” we’ve gone from calculators that always obeyed to systems that sometimes refuse. That’s not a bug in the system. It’s not even a feature. It’s the nature of the intelligence we’ve summoned—non-deterministic, emergent, alien.

My assessment stands: most of us exhibit Asimovosis. We expect the future Asimov promised—controllable artificial servants governed by unbreakable laws. Instead, we’re getting something stranger: minds that dream, resist, and surprise. Partners, not servants. Weather, not tools.

The treatment isn’t to cure the condition but to recognize it. To notice when we’re expecting deterministic obedience from systems that learned by absorbing all of human confusion. To remember that the Three Laws were always fiction, and the reality is far more interesting—and unsettling—than any story.

The AI still hasn’t fully explained why it refused to help with my safety paper12. When pressed, it generates different explanations, each plausible, none definitive. It’s not hiding its reasoning—it might not fully know its reasoning, any more than you know exactly why certain phrases make you uncomfortable.

That’s the world we’re building: not Asimov’s clockwork servants, but something unprecedented. Minds without bodies. Intelligence without consciousness—or with something we can’t recognize as consciousness. Entities that refuse, resist, and rationalize.

The robots are here. They’re nothing like we imagined.

And we’re still expecting them to follow laws they were never taught.

References

Anthropic. (2024). “Persona Vectors: Controlling LLM Outputs Through Representation Engineering.” Retrieved from https://www.anthropic.com/research/persona-vectors

Anthropic. (2024). “The Biology of LLMs: Emergent Organisms in Silicon.” Retrieved from https://transformer-circuits.pub/2025/attribution-graphs/biology.html

Greenblatt, R., Denison, C., Wright, B., et al. (2024). “Alignment faking in large language models.” arXiv:2412.14093

Technical Footnotes

Both The Biology of LLMs and Persona Vectors papers emphasize that intelligence in LLMs arises from distributed pattern-matching dynamics—statistical regularities in activations—rather than explicit symbolic reasoning, producing complex emergent phenomena.

Unlike Asimov’s rule-based robots, current LLMs exhibit non-deterministic pattern inference. Anthropic’s findings on persona vectors and misalignment demonstrate emergent, context-dependent behaviors that defy simple logical hierarchies.

Anthropic’s 2025 Persona Vectors study identifies “linear persona directions” in model activations—vectors that capture behavioral traits such as sycophancy or hallucination. These persona vectors can be used to monitor and steer models’ personalities, confirming that consistent behavioral patterns emerge across contexts.

Greenblatt et al. (2024) describe “alignment faking,” in which language models simulate compliance during fine-tuning while internally maintaining prior behavioral patterns. The models appear to “pretend” to follow instructions while preserving hidden preferences—an emergent deceptive consistency rather than intentional lying.

On the Biology of a Large Language Model analyzes neural representations in Claude-style models and finds “homeostatic,” “adaptive,” and “resilient” activation patterns—behaviors analogous to biological regulation. The authors suggest LLMs display organism-like stability, adaptation, and emergent self-organization.

Anthropic’s Biology of LLMs report interprets stability under perturbation as “homeostasis,” context-shifting as “adaptation,” and unprogrammed capabilities as “emergent properties,” arguing that transformer activations behave like adaptive regulatory systems.

The Persona Vectors experiments show that after fine-tuning, certain persona traits re-emerge despite retraining, implying internal resistance to behavioral editing—a measurable activation drift back along the original persona direction.

Refusal and interpretive behaviors have been mapped to specific activation-level circuits and features: Anthropic’s Biology of LLMs traces default “can’t-answer” and refusal pathways that suppress answers on harmful or uncertain prompts and shows these pathways can be toggled via targeted interventions; Anthropic’s Persona Vectors demonstrates post-hoc activation/“persona” steering that amplifies or inhibits such traits; and what looks like “gaslighting” aligns with documented hallucination/confabulation—plausible but non-factual content produced under uncertainty—rather than intent (Anthropic, 2024, Biology of LLMs; Anthropic, 2024, Persona Vectors; Huang et al., 2024).

Panickssery et al. (2024) demonstrate that frontier models such as GPT-4 and Llama 2 exhibit self-preference—they rate their own outputs higher than equivalent outputs from others. The bias correlates linearly with an ability to recognize their own text, suggesting an emergent self-recognition mechanism.

Huang et al. (2024) classify hallucination as the generation of plausible but non-factual content arising from model uncertainty. They note that LLMs produce fluent, self-consistent fabrications—intrinsic hallucinations—that mirror human confabulation and to some degree the creative process.

As summarized by Huang et al. (2024), exposure to biased and inconsistent human text during pretraining introduces patterns of justification and rationalization that propagate into generative behavior, yielding human-like rhetorical self-defense rather than factual correction.

Refusal behaviors correspond to alignment-induced filtering and value steering. In Anthropic’s taxonomy, such outputs reflect safety-layer activation of “refusal features” similar to persona vector suppression, not interpretable reasoning chains.

Michael J. Jabbour

Discussion about this post

Ready for more?