Model configuration

Model configuration hierarchy

MagOneAI provides fine-grained control over which models are available and who can use them. Configuration flows through four levels, each building on the previous:

Platform level — SuperAdmin configuration

SuperAdmin configures available LLM providers and models in the Admin Portal. This is the global catalog of models that could potentially be used anywhere on the platform.At this level, you:

Add LLM providers (OpenAI, Anthropic, Google, private models)
Store API keys securely in HashiCorp Vault
Configure provider-specific settings (endpoints, default parameters)
Monitor provider health and availability

Organization level — access control

Organization admins control which providers and models their organization can access. Not every org needs access to every model.At this level, you:

Enable/disable specific providers for your organization
Set which models from each provider are available
Configure rate limits and quotas per organization
Monitor model usage and costs for your org

Project level — further restrictions

Project settings can further restrict which models are available within a specific project. This is useful when you want different model access for different use cases.At this level, you:

Select subset of org-available models for this project
Set project-specific rate limits
Configure default models for new agents in this project

Agent level — specific model selection

Each agent uses a specific model selected from the project’s available models. This is where you choose the right model for each task.At this level, you:

Select the exact model this agent will use
Configure model parameters (temperature, max tokens, etc.)
Override defaults for specific agent needs

This hierarchy ensures:

Security: Only authorized organizations can use specific models
Cost control: Organizations can’t accidentally use expensive models they didn’t approve
Flexibility: Different projects and agents can use different models based on their needs

Provider management in Admin Portal

SuperAdmins manage all LLM providers through the Admin Portal. This is the central configuration point for the entire platform.

Adding a provider

Navigate to Providers

In Admin Portal, go to Configuration → LLM Providers and click Add Provider.

Select provider type

Choose from:

OpenAI (cloud)
Anthropic (cloud)
Google (cloud)
OpenAI Compatible (for private models)
Custom (for proprietary integrations)

Configure connection

Provide:

Provider name (for identification)
API endpoint URL (for cloud providers, this is pre-filled)
API key or authentication credentials
Optional: default parameters, timeouts, retry settings

Store credentials in Vault

API keys are automatically stored in HashiCorp Vault. You’ll see a reference like vault:openai/api_key instead of the actual key.

Test connection

Click Test Connection to verify MagOneAI can reach the provider and authenticate successfully.

Assign to organizations

Choose which organizations have access to this provider. You can enable for all orgs or specific ones.

Rate limits and quotas per organization

Control model usage to prevent runaway costs and ensure fair resource allocation: Rate limits:

Requests per minute per organization
Tokens per minute per organization
Concurrent requests per organization

Quotas:

Total tokens per day/month
Total requests per day/month
Total cost per month (in USD)

When an organization exceeds limits, requests are queued or rejected with clear error messages.

Usage monitoring

Track model usage across the platform:

Provider dashboard: See aggregate usage for each provider
Organization usage: Break down usage by organization
Cost tracking: Monitor spending per provider, per org, per project
Model popularity: Identify which models are most/least used
Error rates: Track failed requests by provider and reason

Access these metrics in Admin Portal → Analytics → Model Usage.

Model selection per agent

Each agent in MagOneAI uses a specific LLM. Choosing the right model for each task is crucial for balancing cost, speed, and capability.

How to select a model

When creating or editing an agent:

Choose from available models: Dropdown shows models available to your project
Consider the task requirements:
- Complex reasoning → large, capable models (GPT-4o, Claude Opus 4.6)
- Simple classification → small, fast models (GPT-4o-mini, Claude Haiku)
- Vision tasks → multimodal models (GPT-4o, Gemini 2.5 Pro)
Configure model parameters:
- Temperature: Creativity vs consistency (0.0 - 1.0)
- Max tokens: Maximum response length
- Top-p: Alternative sampling method
- Frequency/presence penalty: Control repetition

Choosing the right model for the task

Different tasks require different model capabilities:

Fast models for simple tasks

Use: GPT-4o-mini, Claude Haiku, Gemini FlashFor:

Classification (spam detection, sentiment analysis)
Routing (which agent handles this request?)
Simple Q&A
Data extraction from structured formats

Why: 10x cheaper, 5x faster, good enough for simple tasks

Reasoning models for complex tasks

Use: Claude Opus 4.6, o1, GPT-4oFor:

Multi-step reasoning
Complex analysis and synthesis
Strategic decision-making
Nuanced understanding of context

Why: Superior reasoning, worth the cost for complex tasks

Vision models for documents

Use: GPT-4o, Gemini 2.5 Pro, Qwen3-VLFor:

Document OCR and analysis
Image understanding
Visual question answering
Chart/diagram interpretation

Why: Native vision capabilities, no separate OCR step needed

Mix models in workflows

Use: Different models for different agents in the same workflowFor:

Routing agent (fast model) → Analysis agent (reasoning model)
Vision agent (multimodal) → Summary agent (text model)
Classifier (cheap model) → multiple specialized agents

Why: Optimize cost and performance for each step

Model parameter tuning

Temperature: Creativity vs consistency

Temperature controls randomness in responses.

0.0 - 0.3: Deterministic, consistent — use for classification, extraction, structured output
0.4 - 0.7: Balanced — default for most tasks
0.8 - 1.0: Creative, varied — use for content generation, brainstorming

Example: A customer support classifier should use temperature 0.0 for consistent categorization. A marketing copy generator can use 0.8 for creative variety.

Max tokens: Response length control

Max tokens sets the maximum response length.

Short responses (100-500 tokens): classifications, summaries
Medium responses (500-2000 tokens): explanations, analyses
Long responses (2000-4000+ tokens): comprehensive reports, documents

Tip: Set max tokens appropriately for your use case. Unnecessarily high limits increase cost without benefit.

Top-p: Alternative to temperature

Top-p (nucleus sampling) is an alternative to temperature for controlling randomness.

0.1 - 0.5: Conservative, high-probability tokens only
0.5 - 0.9: Balanced
0.9 - 1.0: Diverse, includes lower-probability tokens

Generally use temperature OR top-p, not both. Top-p is often more predictable than temperature at extreme values.

Frequency and presence penalties

Frequency penalty: Reduces likelihood of repeating tokens based on how often they’ve appeared.Presence penalty: Reduces likelihood of repeating tokens that have appeared at all.Both range from -2.0 to 2.0. Positive values discourage repetition, negative values encourage it.Use case: Set frequency penalty to 0.5-1.0 for content generation to avoid repetitive phrasing.

Function calling support

For agents to use tools, the underlying model must support function calling (also called tool use).

What is function calling?

Function calling allows models to:

Receive a list of available tools with their parameter schemas
Decide during reasoning when to call a tool
Generate a tool call with appropriate parameters
Receive the tool’s response and continue reasoning

Without function calling, agents can’t use MCP tools for actions like sending emails, querying databases, or calling APIs.

Models with function calling support

Full support:

OpenAI: GPT-4, GPT-4o, GPT-4o-mini, o1, o3-mini
Anthropic: All Claude models (function calling via “tool use”)
Google: Gemini 2.0 Flash, Gemini 2.5 Pro

No function calling:

Base LLaMA models (unless fine-tuned)
Some older or smaller open-source models

How MagOneAI converts tool definitions

When you attach tools to an agent, MagOneAI converts MCP tool definitions to the provider’s function calling format: MCP tool definition:

{
  "name": "send_email",
  "description": "Send an email to recipients",
  "parameters": {
    "type": "object",
    "properties": {
      "to": {"type": "array", "items": {"type": "string"}},
      "subject": {"type": "string"},
      "body": {"type": "string"}
    },
    "required": ["to", "subject", "body"]
  }
}

Converted to OpenAI function calling format:

{
  "type": "function",
  "function": {
    "name": "send_email",
    "description": "Send an email to recipients",
    "parameters": {
      "type": "object",
      "properties": {
        "to": {"type": "array", "items": {"type": "string"}},
        "subject": {"type": "string"},
        "body": {"type": "string"}
      },
      "required": ["to", "subject", "body"]
    }
  }
}

This conversion happens automatically. You define tools once in MCP format, and they work across all providers that support function calling.

Agent types and function calling

MagOneAI agents have different function calling requirements depending on their configuration:

Basic agents: No tools, no function calling needed
Router agents: No tools, no function calling needed
Tool agents: Require function calling — agent decides when to use tools
Full agents: Require function calling — agent uses tools during complex reasoning

If you select a model without function calling for a Tool or Full agent, you’ll see a warning during configuration.

Use the smallest model that reliably handles the task. A fast model for initial classification, a powerful model for deep analysis, and a vision model for documents — all in the same workflow — optimizes both cost and performance.

Cost optimization strategies

Model selection has significant cost implications. Use these strategies to minimize spending while maintaining quality:

1. Tier your agents by model size

Tier 1 (entry points): GPT-4o-mini or Claude Haiku for routing and classification
Tier 2 (specialists): GPT-4o or Claude Sonnet for most tasks
Tier 3 (experts): Claude Opus 4.6 or o1 only for complex reasoning

Route requests through Tier 1, escalate to higher tiers only when needed.

2. Use caching where available

Some providers (Anthropic) support prompt caching. Enable caching for:

System prompts (same across many requests)
Large context documents (RAG results, knowledge base content)
Tool definitions (same for all calls with this agent)

Caching can reduce costs by 90% for repeated context.

3. Compress prompts and context

Remove unnecessary whitespace and verbose instructions
Summarize long documents before including in context
Use structured formats (JSON, tables) instead of verbose prose
Prune irrelevant context from multi-turn conversations

4. Monitor and alert on usage

Set up alerts for:

Daily spend exceeding threshold
Specific agents using expensive models excessively
Failed requests (you’re paying for them but getting no value)

Regularly review usage in analytics to identify optimization opportunities.

5. Use private models for high-volume tasks

If you have consistent, high-volume workloads, deploying private open-source models can be significantly cheaper than cloud APIs:

High upfront cost (GPU infrastructure)
Near-zero marginal cost per request
Breakeven typically at 10M+ tokens per month

See Private model deployment for details.

Next steps

Cloud providers

Configure OpenAI, Anthropic, and Google

Private models

Deploy open-source models on your infrastructure

Agent configuration

Learn how to configure agents with models

Cost analytics

Monitor model usage and costs

Getting Started

Platform Guide

Agents

Workflow Builder

Tools & Integrations

Models & Providers

Cookbooks

Security & Administration

Model configuration