Skip to main content

Model configuration hierarchy

MagOneAI provides fine-grained control over which models are available and who can use them. Configuration flows through four levels, each building on the previous:
1

Platform level — SuperAdmin configuration

SuperAdmin configures available LLM providers and models in the Admin Portal. This is the global catalog of models that could potentially be used anywhere on the platform.At this level, you:
  • Add LLM providers (OpenAI, Anthropic, Google, private models)
  • Store API keys securely in HashiCorp Vault
  • Configure provider-specific settings (endpoints, default parameters)
  • Monitor provider health and availability
2

Organization level — access control

Organization admins control which providers and models their organization can access. Not every org needs access to every model.At this level, you:
  • Enable/disable specific providers for your organization
  • Set which models from each provider are available
  • Configure rate limits and quotas per organization
  • Monitor model usage and costs for your org
3

Project level — further restrictions

Project settings can further restrict which models are available within a specific project. This is useful when you want different model access for different use cases.At this level, you:
  • Select subset of org-available models for this project
  • Set project-specific rate limits
  • Configure default models for new agents in this project
4

Agent level — specific model selection

Each agent uses a specific model selected from the project’s available models. This is where you choose the right model for each task.At this level, you:
  • Select the exact model this agent will use
  • Configure model parameters (temperature, max tokens, etc.)
  • Override defaults for specific agent needs
This hierarchy ensures:
  • Security: Only authorized organizations can use specific models
  • Cost control: Organizations can’t accidentally use expensive models they didn’t approve
  • Flexibility: Different projects and agents can use different models based on their needs

Provider management in Admin Portal

SuperAdmins manage all LLM providers through the Admin Portal. This is the central configuration point for the entire platform.

Adding a provider

1

Navigate to Providers

In Admin Portal, go to Configuration → LLM Providers and click Add Provider.
2

Select provider type

Choose from:
  • OpenAI (cloud)
  • Anthropic (cloud)
  • Google (cloud)
  • OpenAI Compatible (for private models)
  • Custom (for proprietary integrations)
3

Configure connection

Provide:
  • Provider name (for identification)
  • API endpoint URL (for cloud providers, this is pre-filled)
  • API key or authentication credentials
  • Optional: default parameters, timeouts, retry settings
4

Store credentials in Vault

API keys are automatically stored in HashiCorp Vault. You’ll see a reference like vault:openai/api_key instead of the actual key.
5

Test connection

Click Test Connection to verify MagOneAI can reach the provider and authenticate successfully.
6

Assign to organizations

Choose which organizations have access to this provider. You can enable for all orgs or specific ones.

Rate limits and quotas per organization

Control model usage to prevent runaway costs and ensure fair resource allocation: Rate limits:
  • Requests per minute per organization
  • Tokens per minute per organization
  • Concurrent requests per organization
Quotas:
  • Total tokens per day/month
  • Total requests per day/month
  • Total cost per month (in USD)
When an organization exceeds limits, requests are queued or rejected with clear error messages.

Usage monitoring

Track model usage across the platform:
  • Provider dashboard: See aggregate usage for each provider
  • Organization usage: Break down usage by organization
  • Cost tracking: Monitor spending per provider, per org, per project
  • Model popularity: Identify which models are most/least used
  • Error rates: Track failed requests by provider and reason
Access these metrics in Admin Portal → Analytics → Model Usage.

Model selection per agent

Each agent in MagOneAI uses a specific LLM. Choosing the right model for each task is crucial for balancing cost, speed, and capability.

How to select a model

When creating or editing an agent:
  1. Choose from available models: Dropdown shows models available to your project
  2. Consider the task requirements:
    • Complex reasoning → large, capable models (GPT-4o, Claude Opus 4.6)
    • Simple classification → small, fast models (GPT-4o-mini, Claude Haiku)
    • Vision tasks → multimodal models (GPT-4o, Gemini 2.5 Pro)
  3. Configure model parameters:
    • Temperature: Creativity vs consistency (0.0 - 1.0)
    • Max tokens: Maximum response length
    • Top-p: Alternative sampling method
    • Frequency/presence penalty: Control repetition

Choosing the right model for the task

Different tasks require different model capabilities:

Fast models for simple tasks

Use: GPT-4o-mini, Claude Haiku, Gemini FlashFor:
  • Classification (spam detection, sentiment analysis)
  • Routing (which agent handles this request?)
  • Simple Q&A
  • Data extraction from structured formats
Why: 10x cheaper, 5x faster, good enough for simple tasks

Reasoning models for complex tasks

Use: Claude Opus 4.6, o1, GPT-4oFor:
  • Multi-step reasoning
  • Complex analysis and synthesis
  • Strategic decision-making
  • Nuanced understanding of context
Why: Superior reasoning, worth the cost for complex tasks

Vision models for documents

Use: GPT-4o, Gemini 2.5 Pro, Qwen3-VLFor:
  • Document OCR and analysis
  • Image understanding
  • Visual question answering
  • Chart/diagram interpretation
Why: Native vision capabilities, no separate OCR step needed

Mix models in workflows

Use: Different models for different agents in the same workflowFor:
  • Routing agent (fast model) → Analysis agent (reasoning model)
  • Vision agent (multimodal) → Summary agent (text model)
  • Classifier (cheap model) → multiple specialized agents
Why: Optimize cost and performance for each step

Model parameter tuning

Temperature controls randomness in responses.
  • 0.0 - 0.3: Deterministic, consistent — use for classification, extraction, structured output
  • 0.4 - 0.7: Balanced — default for most tasks
  • 0.8 - 1.0: Creative, varied — use for content generation, brainstorming
Example: A customer support classifier should use temperature 0.0 for consistent categorization. A marketing copy generator can use 0.8 for creative variety.
Max tokens sets the maximum response length.
  • Short responses (100-500 tokens): classifications, summaries
  • Medium responses (500-2000 tokens): explanations, analyses
  • Long responses (2000-4000+ tokens): comprehensive reports, documents
Tip: Set max tokens appropriately for your use case. Unnecessarily high limits increase cost without benefit.
Top-p (nucleus sampling) is an alternative to temperature for controlling randomness.
  • 0.1 - 0.5: Conservative, high-probability tokens only
  • 0.5 - 0.9: Balanced
  • 0.9 - 1.0: Diverse, includes lower-probability tokens
Generally use temperature OR top-p, not both. Top-p is often more predictable than temperature at extreme values.
Frequency penalty: Reduces likelihood of repeating tokens based on how often they’ve appeared.Presence penalty: Reduces likelihood of repeating tokens that have appeared at all.Both range from -2.0 to 2.0. Positive values discourage repetition, negative values encourage it.Use case: Set frequency penalty to 0.5-1.0 for content generation to avoid repetitive phrasing.

Function calling support

For agents to use tools, the underlying model must support function calling (also called tool use).

What is function calling?

Function calling allows models to:
  1. Receive a list of available tools with their parameter schemas
  2. Decide during reasoning when to call a tool
  3. Generate a tool call with appropriate parameters
  4. Receive the tool’s response and continue reasoning
Without function calling, agents can’t use MCP tools for actions like sending emails, querying databases, or calling APIs.

Models with function calling support

Full support:
  • OpenAI: GPT-4, GPT-4o, GPT-4o-mini, o1, o3-mini
  • Anthropic: All Claude models (function calling via “tool use”)
  • Google: Gemini 2.0 Flash, Gemini 2.5 Pro
No function calling:
  • Base LLaMA models (unless fine-tuned)
  • Some older or smaller open-source models

How MagOneAI converts tool definitions

When you attach tools to an agent, MagOneAI converts MCP tool definitions to the provider’s function calling format: MCP tool definition:
{
  "name": "send_email",
  "description": "Send an email to recipients",
  "parameters": {
    "type": "object",
    "properties": {
      "to": {"type": "array", "items": {"type": "string"}},
      "subject": {"type": "string"},
      "body": {"type": "string"}
    },
    "required": ["to", "subject", "body"]
  }
}
Converted to OpenAI function calling format:
{
  "type": "function",
  "function": {
    "name": "send_email",
    "description": "Send an email to recipients",
    "parameters": {
      "type": "object",
      "properties": {
        "to": {"type": "array", "items": {"type": "string"}},
        "subject": {"type": "string"},
        "body": {"type": "string"}
      },
      "required": ["to", "subject", "body"]
    }
  }
}
This conversion happens automatically. You define tools once in MCP format, and they work across all providers that support function calling.

Agent types and function calling

MagOneAI agents have different function calling requirements depending on their configuration:
  • Basic agents: No tools, no function calling needed
  • Router agents: No tools, no function calling needed
  • Tool agents: Require function calling — agent decides when to use tools
  • Full agents: Require function calling — agent uses tools during complex reasoning
If you select a model without function calling for a Tool or Full agent, you’ll see a warning during configuration.
Use the smallest model that reliably handles the task. A fast model for initial classification, a powerful model for deep analysis, and a vision model for documents — all in the same workflow — optimizes both cost and performance.

Cost optimization strategies

Model selection has significant cost implications. Use these strategies to minimize spending while maintaining quality:

1. Tier your agents by model size

  • Tier 1 (entry points): GPT-4o-mini or Claude Haiku for routing and classification
  • Tier 2 (specialists): GPT-4o or Claude Sonnet for most tasks
  • Tier 3 (experts): Claude Opus 4.6 or o1 only for complex reasoning
Route requests through Tier 1, escalate to higher tiers only when needed.

2. Use caching where available

Some providers (Anthropic) support prompt caching. Enable caching for:
  • System prompts (same across many requests)
  • Large context documents (RAG results, knowledge base content)
  • Tool definitions (same for all calls with this agent)
Caching can reduce costs by 90% for repeated context.

3. Compress prompts and context

  • Remove unnecessary whitespace and verbose instructions
  • Summarize long documents before including in context
  • Use structured formats (JSON, tables) instead of verbose prose
  • Prune irrelevant context from multi-turn conversations

4. Monitor and alert on usage

Set up alerts for:
  • Daily spend exceeding threshold
  • Specific agents using expensive models excessively
  • Failed requests (you’re paying for them but getting no value)
Regularly review usage in analytics to identify optimization opportunities.

5. Use private models for high-volume tasks

If you have consistent, high-volume workloads, deploying private open-source models can be significantly cheaper than cloud APIs:
  • High upfront cost (GPU infrastructure)
  • Near-zero marginal cost per request
  • Breakeven typically at 10M+ tokens per month
See Private model deployment for details.

Next steps