Skip to main content

Model configuration hierarchy

MagOneAI provides fine-grained control over which models are available and who can use them. Configuration flows through four levels, each building on the previous:
1

Platform level — SuperAdmin configuration

SuperAdmin configures available LLM providers and models in the Admin Portal. This is the global catalog of models that could potentially be used anywhere on the platform.At this level, you:
  • Add LLM providers (OpenAI-compatible, Anthropic, Google Gemini, self-hosted models)
  • Store API keys securely in HashiCorp Vault
  • Configure provider-specific settings (endpoints, default parameters)
  • Set per-token cost (input/output cost per 1k tokens) for usage analytics
2

Organization level — access control

Organization admins control which providers and models their organization can access. Not every org needs access to every model.At this level, you:
  • Enable/disable specific providers for your organization
  • Set which models from each provider are available
  • Apply token-based quotas (see Usage and quotas)
  • Monitor model usage for your org
3

Project level — further restrictions

Project settings can further restrict which models are available within a specific project. This is useful when you want different model access for different use cases.At this level, you:
  • Select subset of org-available models for this project
  • Set project-specific token quotas (see Usage and quotas)
  • Configure default models for agents in this project
4

Agent level — specific model selection

Each agent uses a specific model selected from the project’s available models. This is where you choose the right model for each task.At this level, you:
  • Select the exact model this agent will use
  • Configure model parameters (temperature, max tokens, etc.)
  • Override defaults for specific agent needs
This hierarchy ensures:
  • Security: Only authorized organizations can use specific models
  • Cost control: Organizations can’t accidentally use expensive models they didn’t approve
  • Flexibility: Different projects and agents can use different models based on their needs

How a model is resolved at runtime

At request time, MagOneAI resolves which model to use through a clear precedence order: activity override → workflow default → org default An explicit per-activity model choice wins; otherwise the workflow’s default applies; otherwise the org default is used. Admins control which models each project is allowed to use, so resolution always stays within the project’s permitted set.
The model picker in the Builder and Hub now defaults to Auto (default assigned model). Auto sends no override, so each agent uses its own assigned model — choosing a specific model is an explicit, per-turn opt-in. See Usage and quotas for how model selection interacts with token quotas.

Provider management in Admin Portal

SuperAdmins manage all LLM providers through the Admin Portal. This is the central configuration point for the entire platform.

Adding a provider

1

Navigate to Providers

In the Admin Portal, open the provider configuration and click Add Provider.
2

Select provider type

Choose from:
  • OpenAI-compatible — covers OpenAI itself, Azure OpenAI, self-hosted vLLM, Ollama, and any endpoint that speaks the OpenAI API (this is also how you configure self-hosted models)
  • Anthropic — native integration
  • Google Gemini — native integration via a Google AI Studio API key
3

Configure connection

Provide:
  • Provider name (for identification)
  • API endpoint URL (for cloud providers, this is pre-filled)
  • API key or authentication credentials
  • Optional: default parameters, timeouts, retry settings
4

Store credentials in Vault

API keys are automatically stored in HashiCorp Vault. You’ll see a reference like vault:openai/api_key instead of the actual key.
5

Test connection

Click Test Connection to verify MagOneAI can reach the provider and authenticate successfully.
6

Assign to organizations

Choose which organizations have access to this provider. You can enable for all orgs or specific ones.

Quotas

MagOneAI enforces token-based quotas (daily, weekly, and monthly) at the org, project, use-case, and user scope. Quotas are configured and managed on the dedicated quotas surface rather than here. See Usage and quotas for how to set and monitor token quotas.

Usage monitoring

Track model usage across the platform:
  • Token usage: Input and output tokens per provider, org, and project
  • Per-token cost: When you set input/output cost per 1k tokens on a model, usage analytics surface token-based cost
  • Latency: Response times by provider and model
  • Model popularity: Identify which models are most/least used
Usage and cost analytics appear in the analytics dashboards — see the Admin Portal.

Model selection per agent

Each agent in MagOneAI uses a specific LLM. Choosing the right model for each task is crucial for balancing cost, speed, and capability.

How to select a model

The model picker defaults to Auto (default assigned model) — Auto sends no override, so the agent uses its own assigned model. Picking a specific model is an explicit per-turn opt-in. When creating or editing an agent:
  1. Choose from available models: Dropdown shows models available to your project (with Auto as the default)
  2. Consider the task requirements:
    • Complex reasoning → large, capable models (GPT-4o, Claude Opus 4.6)
    • Simple classification → small, fast models (GPT-4o-mini, Claude Haiku)
    • Vision tasks → multimodal models (GPT-4o, Gemini 2.5 Pro)
  3. Configure model parameters:
    • Temperature: Creativity vs consistency (0.0 - 1.0)
    • Max tokens: Maximum response length
    • Top-p: Alternative sampling method
    • Frequency/presence penalty: Control repetition

Choosing the right model for the task

Different tasks require different model capabilities:

Fast models for simple tasks

Use: GPT-4o-mini, Claude Haiku, Gemini FlashFor:
  • Classification (spam detection, sentiment analysis)
  • Routing (which agent handles this request?)
  • Simple Q&A
  • Data extraction from structured formats
Why: 10x cheaper, 5x faster, good enough for simple tasks

Reasoning models for complex tasks

Use: Claude Opus 4.6, o1, GPT-4oFor:
  • Multi-step reasoning
  • Complex analysis and synthesis
  • Strategic decision-making
  • Nuanced understanding of context
Why: Superior reasoning, worth the cost for complex tasks

Vision models for documents

Use: GPT-4o, Gemini 2.5 Pro, Qwen3-VLFor:
  • Document OCR and analysis
  • Image understanding
  • Visual question answering
  • Chart/diagram interpretation
Why: Native vision capabilities, no separate OCR step needed

Mix models in workflows

Use: Different models for different agents in the same workflowFor:
  • Routing agent (fast model) → Analysis agent (reasoning model)
  • Vision agent (multimodal) → Summary agent (text model)
  • Classifier (cheap model) → multiple specialized agents
Why: Optimize cost and performance for each step

Model parameter tuning

Temperature controls randomness in responses.
  • 0.0 - 0.3: Deterministic, consistent — use for classification, extraction, structured output
  • 0.4 - 0.7: Balanced — default for most tasks
  • 0.8 - 1.0: Creative, varied — use for content generation, brainstorming
Example: A customer support classifier should use temperature 0.0 for consistent categorization. A marketing copy generator can use 0.8 for creative variety.
Max tokens sets the maximum response length.
  • Short responses (100-500 tokens): classifications, summaries
  • Medium responses (500-2000 tokens): explanations, analyses
  • Long responses (2000-4000+ tokens): comprehensive reports, documents
Tip: Set max tokens appropriately for your use case. Unnecessarily high limits increase cost without benefit.
Top-p (nucleus sampling) is an alternative to temperature for controlling randomness.
  • 0.1 - 0.5: Conservative, high-probability tokens only
  • 0.5 - 0.9: Balanced
  • 0.9 - 1.0: Diverse, includes lower-probability tokens
Generally use temperature OR top-p, not both. Top-p is often more predictable than temperature at extreme values.
Frequency penalty: Reduces likelihood of repeating tokens based on how often they’ve appeared.Presence penalty: Reduces likelihood of repeating tokens that have appeared at all.Both range from -2.0 to 2.0. Positive values discourage repetition, negative values encourage it.Use case: Set frequency penalty to 0.5-1.0 for content generation to avoid repetitive phrasing.

Function calling support

For agents to use tools, the underlying model must support function calling (also called tool use).

What is function calling?

Function calling allows models to:
  1. Receive a list of available tools with their parameter schemas
  2. Decide during reasoning when to call a tool
  3. Generate a tool call with appropriate parameters
  4. Receive the tool’s response and continue reasoning
Without function calling, agents can’t use MCP tools for actions like sending emails, querying databases, or calling APIs.

Models with function calling support

Full support:
  • OpenAI: GPT-4, GPT-4o, GPT-4o-mini, o1, o3-mini
  • Anthropic: All Claude models (function calling via “tool use”)
  • Google: Gemini 2.0 Flash, Gemini 2.5 Pro
No function calling:
  • Base LLaMA models (unless fine-tuned)
  • Some older or smaller open-source models

How MagOneAI converts tool definitions

When you attach tools to an agent, MagOneAI converts MCP tool definitions to the provider’s function calling format: MCP tool definition:
{
  "name": "send_email",
  "description": "Send an email to recipients",
  "parameters": {
    "type": "object",
    "properties": {
      "to": {"type": "array", "items": {"type": "string"}},
      "subject": {"type": "string"},
      "body": {"type": "string"}
    },
    "required": ["to", "subject", "body"]
  }
}
Converted to OpenAI function calling format:
{
  "type": "function",
  "function": {
    "name": "send_email",
    "description": "Send an email to recipients",
    "parameters": {
      "type": "object",
      "properties": {
        "to": {"type": "array", "items": {"type": "string"}},
        "subject": {"type": "string"},
        "body": {"type": "string"}
      },
      "required": ["to", "subject", "body"]
    }
  }
}
This conversion happens automatically. You define tools once in MCP format, and they work across all providers that support function calling.

Agent types and function calling

MagOneAI agents have different function calling requirements depending on their configuration:
  • Basic agents: No tools, no function calling needed
  • Router agents: No tools, no function calling needed
  • Tool agents: Require function calling — agent decides when to use tools
  • Full agents: Require function calling — agent uses tools during complex reasoning
If you select a model without function calling for a Tool or Full agent, you’ll see a warning during configuration.
Use the smallest model that reliably handles the task. A fast model for initial classification, a powerful model for deep analysis, and a vision model for documents — all in the same workflow — optimizes both cost and performance.

Cost optimization strategies

Model selection has significant cost implications. Use these strategies to minimize spending while maintaining quality:

1. Tier your agents by model size

  • Tier 1 (entry points): GPT-4o-mini or Claude Haiku for routing and classification
  • Tier 2 (specialists): GPT-4o or Claude Sonnet for most tasks
  • Tier 3 (experts): Claude Opus 4.6 or o1 only for complex reasoning
Route requests through Tier 1, escalate to higher tiers only when needed.

2. Compress prompts and context

  • Remove unnecessary whitespace and verbose instructions
  • Summarize long documents before including in context
  • Use structured formats (JSON, tables) instead of verbose prose
  • Prune irrelevant context from multi-turn conversations

3. Monitor usage regularly

Review usage analytics to identify optimization opportunities:
  • Agents using expensive models more than necessary
  • High-volume tasks that could run on a cheaper model
  • Token-based cost trends across orgs and projects
See Usage and quotas for token quotas and the Admin Portal for usage analytics.

4. Use private models for high-volume tasks

If you have consistent, high-volume workloads, deploying private open-source models can be significantly cheaper than cloud APIs:
  • High upfront cost (GPU infrastructure)
  • Near-zero marginal cost per request
  • Breakeven typically at 10M+ tokens per month
See Private model deployment for details.

Next steps

Cloud providers

Configure OpenAI, Anthropic, and Google

Private models

Deploy open-source models on your infrastructure

Agent configuration

Learn how to configure agents with models

Usage and quotas

Monitor token usage and set quotas