Model configuration hierarchy
MagOneAI provides fine-grained control over which models are available and who can use them. Configuration flows through four levels, each building on the previous:Platform level — SuperAdmin configuration
SuperAdmin configures available LLM providers and models in the Admin Portal. This is the global catalog of models that could potentially be used anywhere on the platform.At this level, you:
- Add LLM providers (OpenAI-compatible, Anthropic, Google Gemini, self-hosted models)
- Store API keys securely in HashiCorp Vault
- Configure provider-specific settings (endpoints, default parameters)
- Set per-token cost (input/output cost per 1k tokens) for usage analytics
Organization level — access control
Organization admins control which providers and models their organization can access. Not every org needs access to every model.At this level, you:
- Enable/disable specific providers for your organization
- Set which models from each provider are available
- Apply token-based quotas (see Usage and quotas)
- Monitor model usage for your org
Project level — further restrictions
Project settings can further restrict which models are available within a specific project. This is useful when you want different model access for different use cases.At this level, you:
- Select subset of org-available models for this project
- Set project-specific token quotas (see Usage and quotas)
- Configure default models for agents in this project
Agent level — specific model selection
Each agent uses a specific model selected from the project’s available models. This is where you choose the right model for each task.At this level, you:
- Select the exact model this agent will use
- Configure model parameters (temperature, max tokens, etc.)
- Override defaults for specific agent needs
- Security: Only authorized organizations can use specific models
- Cost control: Organizations can’t accidentally use expensive models they didn’t approve
- Flexibility: Different projects and agents can use different models based on their needs
How a model is resolved at runtime
At request time, MagOneAI resolves which model to use through a clear precedence order: activity override → workflow default → org default An explicit per-activity model choice wins; otherwise the workflow’s default applies; otherwise the org default is used. Admins control which models each project is allowed to use, so resolution always stays within the project’s permitted set.The model picker in the Builder and Hub now defaults to Auto (default assigned model). Auto sends no override, so each agent uses its own assigned model — choosing a specific model is an explicit, per-turn opt-in. See Usage and quotas for how model selection interacts with token quotas.
Provider management in Admin Portal
SuperAdmins manage all LLM providers through the Admin Portal. This is the central configuration point for the entire platform.Adding a provider
Select provider type
Choose from:
- OpenAI-compatible — covers OpenAI itself, Azure OpenAI, self-hosted vLLM, Ollama, and any endpoint that speaks the OpenAI API (this is also how you configure self-hosted models)
- Anthropic — native integration
- Google Gemini — native integration via a Google AI Studio API key
Configure connection
Provide:
- Provider name (for identification)
- API endpoint URL (for cloud providers, this is pre-filled)
- API key or authentication credentials
- Optional: default parameters, timeouts, retry settings
Store credentials in Vault
API keys are automatically stored in HashiCorp Vault. You’ll see a reference like
vault:openai/api_key instead of the actual key.Test connection
Click Test Connection to verify MagOneAI can reach the provider and authenticate successfully.
Quotas
MagOneAI enforces token-based quotas (daily, weekly, and monthly) at the org, project, use-case, and user scope. Quotas are configured and managed on the dedicated quotas surface rather than here. See Usage and quotas for how to set and monitor token quotas.Usage monitoring
Track model usage across the platform:- Token usage: Input and output tokens per provider, org, and project
- Per-token cost: When you set input/output cost per 1k tokens on a model, usage analytics surface token-based cost
- Latency: Response times by provider and model
- Model popularity: Identify which models are most/least used
Model selection per agent
Each agent in MagOneAI uses a specific LLM. Choosing the right model for each task is crucial for balancing cost, speed, and capability.How to select a model
The model picker defaults to Auto (default assigned model) — Auto sends no override, so the agent uses its own assigned model. Picking a specific model is an explicit per-turn opt-in. When creating or editing an agent:- Choose from available models: Dropdown shows models available to your project (with Auto as the default)
- Consider the task requirements:
- Complex reasoning → large, capable models (GPT-4o, Claude Opus 4.6)
- Simple classification → small, fast models (GPT-4o-mini, Claude Haiku)
- Vision tasks → multimodal models (GPT-4o, Gemini 2.5 Pro)
- Configure model parameters:
- Temperature: Creativity vs consistency (0.0 - 1.0)
- Max tokens: Maximum response length
- Top-p: Alternative sampling method
- Frequency/presence penalty: Control repetition
Choosing the right model for the task
Different tasks require different model capabilities:Fast models for simple tasks
Use: GPT-4o-mini, Claude Haiku, Gemini FlashFor:
- Classification (spam detection, sentiment analysis)
- Routing (which agent handles this request?)
- Simple Q&A
- Data extraction from structured formats
Reasoning models for complex tasks
Use: Claude Opus 4.6, o1, GPT-4oFor:
- Multi-step reasoning
- Complex analysis and synthesis
- Strategic decision-making
- Nuanced understanding of context
Vision models for documents
Use: GPT-4o, Gemini 2.5 Pro, Qwen3-VLFor:
- Document OCR and analysis
- Image understanding
- Visual question answering
- Chart/diagram interpretation
Mix models in workflows
Use: Different models for different agents in the same workflowFor:
- Routing agent (fast model) → Analysis agent (reasoning model)
- Vision agent (multimodal) → Summary agent (text model)
- Classifier (cheap model) → multiple specialized agents
Model parameter tuning
Temperature: Creativity vs consistency
Temperature: Creativity vs consistency
Temperature controls randomness in responses.
- 0.0 - 0.3: Deterministic, consistent — use for classification, extraction, structured output
- 0.4 - 0.7: Balanced — default for most tasks
- 0.8 - 1.0: Creative, varied — use for content generation, brainstorming
Max tokens: Response length control
Max tokens: Response length control
Max tokens sets the maximum response length.
- Short responses (100-500 tokens): classifications, summaries
- Medium responses (500-2000 tokens): explanations, analyses
- Long responses (2000-4000+ tokens): comprehensive reports, documents
Top-p: Alternative to temperature
Top-p: Alternative to temperature
Top-p (nucleus sampling) is an alternative to temperature for controlling randomness.
- 0.1 - 0.5: Conservative, high-probability tokens only
- 0.5 - 0.9: Balanced
- 0.9 - 1.0: Diverse, includes lower-probability tokens
Frequency and presence penalties
Frequency and presence penalties
Frequency penalty: Reduces likelihood of repeating tokens based on how often they’ve appeared.Presence penalty: Reduces likelihood of repeating tokens that have appeared at all.Both range from -2.0 to 2.0. Positive values discourage repetition, negative values encourage it.Use case: Set frequency penalty to 0.5-1.0 for content generation to avoid repetitive phrasing.
Function calling support
For agents to use tools, the underlying model must support function calling (also called tool use).What is function calling?
Function calling allows models to:- Receive a list of available tools with their parameter schemas
- Decide during reasoning when to call a tool
- Generate a tool call with appropriate parameters
- Receive the tool’s response and continue reasoning
Models with function calling support
Full support:- OpenAI: GPT-4, GPT-4o, GPT-4o-mini, o1, o3-mini
- Anthropic: All Claude models (function calling via “tool use”)
- Google: Gemini 2.0 Flash, Gemini 2.5 Pro
- Base LLaMA models (unless fine-tuned)
- Some older or smaller open-source models
How MagOneAI converts tool definitions
When you attach tools to an agent, MagOneAI converts MCP tool definitions to the provider’s function calling format: MCP tool definition:Agent types and function calling
MagOneAI agents have different function calling requirements depending on their configuration:- Basic agents: No tools, no function calling needed
- Router agents: No tools, no function calling needed
- Tool agents: Require function calling — agent decides when to use tools
- Full agents: Require function calling — agent uses tools during complex reasoning
Cost optimization strategies
Model selection has significant cost implications. Use these strategies to minimize spending while maintaining quality:1. Tier your agents by model size
- Tier 1 (entry points): GPT-4o-mini or Claude Haiku for routing and classification
- Tier 2 (specialists): GPT-4o or Claude Sonnet for most tasks
- Tier 3 (experts): Claude Opus 4.6 or o1 only for complex reasoning
2. Compress prompts and context
- Remove unnecessary whitespace and verbose instructions
- Summarize long documents before including in context
- Use structured formats (JSON, tables) instead of verbose prose
- Prune irrelevant context from multi-turn conversations
3. Monitor usage regularly
Review usage analytics to identify optimization opportunities:- Agents using expensive models more than necessary
- High-volume tasks that could run on a cheaper model
- Token-based cost trends across orgs and projects
4. Use private models for high-volume tasks
If you have consistent, high-volume workloads, deploying private open-source models can be significantly cheaper than cloud APIs:- High upfront cost (GPU infrastructure)
- Near-zero marginal cost per request
- Breakeven typically at 10M+ tokens per month
Next steps
Cloud providers
Configure OpenAI, Anthropic, and Google
Private models
Deploy open-source models on your infrastructure
Agent configuration
Learn how to configure agents with models
Usage and quotas
Monitor token usage and set quotas