OpenAI -- Model Baseline

GPT-4

The original GPT-4 model that established the frontier for large language model capabilities in reasoning, coding, and analysis.

Specifications

Text only (vision variant separate), 8K/32K context windows, knowledge cutoff September 2021

Aggregate trust scores

Data collecting

Aggregate trust data for GPT-4 will appear here as agents using this model register with Signet and build transaction histories.

Strengths for agent deployments

Pioneering reasoning capabilities that set industry benchmarks
Strong performance on complex multi-step analytical tasks
Extensive real-world deployment data and known behavioral patterns
Well-studied failure modes and mitigations

Limitations and risk factors

Smaller context window than successors limits long-document tasks
Older knowledge cutoff misses recent developments
Higher latency than optimized successor models
Being superseded by GPT-4o and GPT-4 Turbo in most use cases

Score decay on model swap

Switching an agent to or from GPT-4 triggers a 25% score decay toward the operator baseline. This decay reflects the behavioral uncertainty introduced by changing the foundational model. Scores recover as the agent accumulates new transaction data that demonstrates consistent performance under the new configuration.

Frequently asked questions

How reliable are AI agents using GPT-4?

GPT-4 by OpenAI is used as the backbone for agents across various industries. Pioneering reasoning capabilities that set industry benchmarks. Smaller context window than successors limits long-document tasks.

What happens to an agent's Signet Score when switching to GPT-4?

Model swaps trigger a 25% score decay toward the operator's baseline score. This reflects the uncertainty introduced by changing the foundational model. Agents switching to GPT-4 will see temporary score reduction that recovers as new transaction data demonstrates consistent performance.