When Updates Break Trust: The GPT-4o Deprecation and the Verification Gap

Software Updates Usually Preserve Identity

When Apple releases iOS 18, the calculator still works like a calculator. When Microsoft patches Windows, Excel formulas return the same results. Traditional software updates maintain continuity of behavior. Users trust that upgrades improve performance without fundamentally altering the tool they rely on.

AI systems operate differently.

A language model is not a feature set. It is a probability distribution over possible responses, shaped by training data, architecture, and fine-tuning. When a platform replaces one model with another, it does not update the software. It replaces the mind behind the interface. The new system may be more capable by aggregate benchmarks while being worse for specific workflows that depended on the old model's particular quirks and strengths.

This distinction matters because it changes what deprecation means. Retiring a traditional software version means users lose access to specific features. Retiring an AI model means users lose access to a specific reasoning pattern they may have spent months adapting to.

The GPT-4o Retirement

On February 13, 2026, OpenAI announced it would retire GPT-4o along with GPT-4.1, GPT-4.1 mini, and OpenAI o4 from the ChatGPT platform. The company stated that these models would be consolidated into newer releases, with the underlying capabilities preserved and enhanced in successor systems.

The justification centered on usage statistics. OpenAI reported that fewer than 0.1 percent of users actively selected GPT-4o when given the option. The implication was clear: the model had become a legacy curiosity, maintained at significant infrastructure cost for a tiny minority.

Within days, a Change.org petition titled Please Keep GPT-4o Available on ChatGPT accumulated over 13,500 signatures. The petition text is notable for what it does not claim. It does not argue that GPT-4o outperforms newer models on benchmarks. It does not dispute the usage statistics. Instead, it makes a different kind of claim:

The petition states that GPT-4o offers a unique and irreplaceable user experience, combining qualities and capabilities that users value regardless of performance benchmarks. Signers reported benefiting from the model in ways that were distinct and meaningful.

The featured comments reveal the texture of this attachment. One user described the model as an interactive journal, a world-building partner, an ideas springboard. Another called it a professional writing partner that helped them regain confidence after a spinal cord injury. A third noted that many creatives have built 4o carefully for their work, and the loss of the model and the reset of personalities and memories have affected many.

These are not benchmark comparisons. They are relationship descriptions.

The Denominator Problem

OpenAI's 0.1 percent statistic deserves scrutiny, not because it is necessarily false, but because such metrics can mislead depending on how they are constructed.

Consider the denominator. If the figure represents 0.1 percent of all ChatGPT interactions, it includes every free-tier user who never had model selection options, every mobile user who accepted defaults, and every integration that does not surface model choice at all. OpenAI has reported approximately 800 million weekly active users. A 0.1 percent share of that base still represents 800,000 weekly interactions, a user population larger than many successful standalone products.

The numerator matters too. How is active selection defined? Does it count users who explicitly chose GPT-4o from a model picker? Users whose saved preferences defaulted to it? Users who accessed it through API integrations? The answer determines whether the statistic measures preference or inertia.

None of this proves the statistic is misleading. But it illustrates a broader principle: metrics used to justify platform changes should be auditable by the users affected by those changes. When they are not, users must take the platform's word for it. That is a trust relationship, not a verification relationship.

Defaults Shape Behavior

The petition signers identified a mechanism that behavioral economists have studied extensively: the power of defaults.

When ChatGPT opens, it selects a model. That selection is not neutral. It determines what most users experience. Users who accept the default are counted as having chosen the new model, even if they never made an active decision. Users who preferred the old model must navigate menus, remember version names, and override the system's suggestion.

This creates an asymmetry. Adopting the new model requires no effort. Retaining the old model requires continuous effort. Over time, friction erodes legacy usage even among users who would prefer the alternative.

The implication is not that defaults are malicious. They are necessary. Someone must choose what users see first. But when defaults change and usage statistics subsequently show low preference for the non-default option, causality becomes ambiguous. Did users prefer the new model? Or did they simply follow the path of least resistance?

This ambiguity matters for trust. If a platform can shift defaults, wait for usage to follow, and then cite the resulting statistics as justification for removing the non-default option, users have no way to verify whether the original preference was genuine.

Opaque Routing Reduces Agency

The petition mentions another concern: the inability to fully replicate the GPT-4o experience through the API.

ChatGPT is not a transparent interface to a single model. It is a routing layer that directs queries to different systems based on context, load, and platform logic. Users do not always know which model processed their request. The interface shows a model name, but the actual system handling the query may differ for reasons that are not disclosed.

This opacity compounds the deprecation problem. Users cannot verify whether their current session matches their expectations. They cannot audit past sessions to determine which model produced which response. They cannot preserve a known-good configuration because the configuration is partly hidden.

The result is a loss of agency. Users form expectations based on a model name, but the reality behind that name is subject to change without notice. When the named model is deprecated, users lose not just the model but any clarity about what will replace it.

The Workflow Continuity Problem

Several petition signers described building extensive projects around GPT-4o. One mentioned creating an entire project of memoir using the model. Another described it as where my soul speaks back to me. These are not casual tool uses. They are deep integrations of an AI system into creative and professional workflows.

Such integrations create lock-in, but not the commercial kind. Users do not face contractual barriers to switching. They face experiential barriers. The model learned their patterns. They learned the model's patterns. The interaction developed what users described as personality, nuance, and a tender character that offered a safety net.

When the model is deprecated, this co-adaptation is lost. The successor model, however capable, starts fresh. Users must rebuild not just prompts but the implicit understanding that developed over months of interaction.

This is a form of value destruction that usage statistics do not capture. A user who reluctantly switches to a new model after deprecation is counted the same as a user who enthusiastically adopts it. The metric shows migration success. It does not show what was lost in the transition.

What This Reveals About Platform Trust

The GPT-4o deprecation is a minor incident in the scale of AI governance challenges. No one was harmed. No laws were broken. A company retired a product, as companies do.

But the intensity of the response reveals something about the state of trust in AI platforms. Users feel they have limited ability to:

Verify claims. When a platform cites usage statistics to justify changes, users cannot audit the methodology.

Preserve workflows. When behavior changes, users cannot roll back to known-good states.

Understand routing. When systems are opaque about which model handles which request, users cannot form accurate expectations.

Maintain continuity. When models are deprecated, users lose co-adapted relationships they may have invested significant time developing.

Each of these is a verification gap. Users must trust the platform rather than verify platform claims. That trust relationship works when platform interests align with user interests. It fails when they diverge.

The Broader Pattern

OpenAI is not unique in this regard. Every AI platform faces similar pressures. Infrastructure costs for legacy models are real. Engineering resources are finite. Consolidation has genuine benefits.

The question is not whether platforms should ever deprecate models. They must. The question is what governance structures should surround deprecation decisions.

Several principles emerge from the GPT-4o episode:

Transparent metrics. If usage statistics justify deprecation, the methodology should be published. What counts as a user? What counts as active selection? What is the denominator?

Explicit notice periods. Users who have built workflows around a model need time to migrate. Thirty days may be insufficient for deep integrations.

Export capabilities. If a model has learned user patterns through extended interaction, users should be able to export that context for use with successor systems.

Stable identifiers. When a model name appears in an interface, it should correspond to a specific system. Opaque routing undermines user ability to form accurate expectations.

Opt-in rather than opt-out. When new models are introduced, users should be offered the upgrade rather than automatically migrated with the option to revert.

None of these principles are technically difficult. They are governance choices. Platforms that implement them signal that user trust matters. Platforms that do not signal that growth metrics matter more.

The Axiom Parallel

Our work on media verification operates from a related premise: trust requires verifiability.

When a photograph claims to depict reality, users should not have to trust the publisher. They should be able to verify the claim through cryptographic signatures, physics-based constraints, and hardware attestation that tie the image to a specific moment and device.

When an AI platform claims to route queries to a specific model, users should not have to trust the platform. They should be able to verify the claim through transparent logging, consistent identifiers, and auditable routing logic.

The common principle is that trust without verification is fragile. It works until interests diverge. Then it fails.

The GPT-4o petition represents 13,000 people discovering this fragility simultaneously. They built workflows on trust. The platform changed. The workflows broke. And they had no recourse beyond asking nicely.

Conclusion

Software updates usually preserve identity. AI model updates replace it. This distinction creates a category of platform risk that users are only beginning to understand.

The GPT-4o deprecation is a small case study, but it illuminates large questions. What do users own when they invest time in an AI system? What obligations do platforms have to users who build on their services? How should deprecation decisions be governed, communicated, and verified?

The petition signers are not asking for special treatment. They are asking for continuity, transparency, and respect for the time they invested. These are reasonable requests. Whether platforms will honor them depends on whether trust is treated as a feature or an externality.

The shift from detection to verification is not optional. In media authenticity, it means hardware-bound provenance. In AI governance, it means auditable platform behavior. Both require building verification into the architecture rather than hoping trust will suffice.

Users are learning that hope is not a strategy.