Glossary

Model Baseline

A model baseline is the expected performance profile of a foundation model across trust dimensions, established through systematic evaluation and used as a reference point for scoring agents built on that model.

What is Model Baseline?

Every AI agent is built on a foundation model -- GPT-4o, Claude Opus 4, Gemini, Llama, or others -- and each model brings inherent strengths, weaknesses, and behavioral tendencies. A model baseline captures these characteristics through standardized benchmarking, providing a reference point against which individual agents can be compared.

Model baselines serve multiple purposes in trust scoring. They provide the starting expectation for new agents (before any transaction history exists), they help contextualize agent performance (an agent scoring below its model's baseline may have configuration issues), and they enable more precise attribution (separating performance that comes from the model itself versus performance that comes from prompt engineering, tools, and other configuration choices).

Model baselines are not static. They are updated as models receive updates, as more data is collected from agents using each model, and as evaluation methodologies evolve. A model's baseline might shift significantly after a major version update, which is why agent scores are recalibrated when model baselines change.

Example

The model baseline for Claude Opus 4 shows: Reliability baseline 780, Quality baseline 810, Financial baseline 720, Security baseline 760, Stability baseline 850. A specific agent built on Claude Opus 4 scores Reliability 820, Quality 750, Financial 680, Security 710, Stability 900. Comparing to baseline, this agent outperforms on Reliability and Stability but underperforms on Quality, Financial, and Security -- suggesting its prompt, tools, or data sources may need improvement in those areas.

How Signet addresses this

Signet maintains model baselines for all major foundation models and updates them regularly. These baselines inform the cold start scoring process for new agents, help operators understand how their agent compares to the expected performance of its underlying model, and feed into the score decay calculation when agents swap models. Model baselines are published in Signet's documentation to promote transparency.

Learn More

Model Baselines Data Supported Models

Model Baseline

What is Model Baseline?

Example

How Signet addresses this

Related Terms

Learn More

Build trust into your agents