Skip to main content

Overview

MagOneAI supports major cloud LLM providers with native integration. Configure your API keys in the Admin Portal, and start using models immediately across your organization. Cloud providers offer:
  • No infrastructure management — pay per token, no servers to maintain
  • Latest models — access to cutting-edge capabilities as they’re released
  • Scalability — handle any volume without capacity planning
  • High availability — provider-managed uptime and redundancy
This guide covers configuration for OpenAI, Anthropic, and Google Gemini.
MagOneAI supports three provider types: OpenAI-compatible (OpenAI itself, Azure OpenAI, self-hosted vLLM, Ollama, and any OpenAI-compatible endpoint), Anthropic (native), and Google Gemini (native, via a Google AI Studio API key). Amazon Bedrock and Google Vertex AI are not yet available — they are on the roadmap.

OpenAI configuration

OpenAI provides the widely-used GPT family of models, including GPT-4, GPT-4o, and the reasoning-focused o1 and o3-mini models.

Supported models

GPT-4o — Latest GPT-4 with vision, fast and capable
  • Context: 128K tokens
  • Vision: Yes
  • Function calling: Yes
  • Best for: General-purpose tasks, multimodal workflows, production deployments
GPT-4o-mini — Smaller, faster, cheaper version of GPT-4o
  • Context: 128K tokens
  • Vision: Yes
  • Function calling: Yes
  • Best for: Classification, routing, simple extraction, high-volume tasks
GPT-4 — Original GPT-4 (being phased out in favor of GPT-4o)
  • Context: 8K or 32K tokens
  • Vision: No (text-only)
  • Function calling: Yes
  • Best for: Legacy workflows, text-only tasks
o1 — Reasoning model with extended thinking capability
  • Context: 128K tokens
  • Vision: Limited
  • Function calling: Yes
  • Best for: Complex reasoning, mathematics, code generation, strategic analysis
o3-mini — Smaller reasoning model, faster than o1
  • Context: 128K tokens
  • Vision: No
  • Function calling: Yes
  • Best for: Reasoning tasks where o1 is overkill, cost-sensitive reasoning workflows

Setup steps

1

Obtain OpenAI API key

Sign up for an OpenAI account at platform.openai.com and create an API key. You’ll need billing configured on your OpenAI account.
2

Add provider in Admin Portal

In the Admin Portal, add a new provider.Select OpenAI-compatible as the provider type (this also covers Azure OpenAI and any OpenAI-compatible endpoint).
3

Enter API key

Paste your OpenAI API key. Click Store in Vault to securely save it in HashiCorp Vault.You’ll see a reference like vault:openai/api_key — the actual key is never displayed again.
4

Configure default settings (optional)

Set platform-wide defaults:
  • Default organization ID (if using OpenAI org structure)
  • Request timeout (default: 60 seconds)
  • Retry settings (attempts and backoff)
5

Test connection

Click Test Connection. MagOneAI will make a test API call to verify:
  • API key is valid
  • Models are accessible
  • Network connectivity is working
6

Assign to organizations

Choose which MagOneAI organizations can use OpenAI models. You can enable for all orgs or select specific ones.

Function calling

All OpenAI models support native function calling. When you attach tools to an agent using an OpenAI model, MagOneAI automatically formats tool definitions in OpenAI’s function calling format and processes tool calls from the model’s responses.

Vision support

GPT-4o and GPT-4o-mini support vision. Send images as part of the input:
  • Image URLs (must be publicly accessible or signed URLs)
  • Base64-encoded images
  • File uploads processed by MagOneAI
Vision models can:
  • Extract text from documents (OCR)
  • Understand charts and diagrams
  • Answer questions about images
  • Describe image content
Use vision models for document processing, visual QA, and multimodal workflows.

Cost considerations

OpenAI pricing (as of 2024, subject to change):
ModelInput (per 1M tokens)Output (per 1M tokens)Vision (per image)
GPT-4o$5.00$15.00~$0.003
GPT-4o-mini$0.15$0.60~$0.0001
o1$15.00$60.00-
o3-mini$1.10$4.40-
Optimization tips:
  • Use GPT-4o-mini for simple tasks (10x cheaper than GPT-4o)
  • Use o1 only when extended reasoning is truly needed (very expensive)
  • Set per-token cost on each model so usage analytics reflect real spend

Anthropic configuration

Anthropic provides the Claude family of models, known for their extended context windows and strong reasoning capabilities.

Supported models

Claude 4.5 Sonnet — Latest flagship model with strong performance
  • Context: 200K tokens (experimental 1M available)
  • Vision: Yes
  • Function calling: Yes (via “tool use”)
  • Best for: Production workloads, complex analysis, extended context tasks
Claude Opus 4.6 — Most capable Claude model
  • Context: 200K tokens
  • Vision: Yes
  • Function calling: Yes
  • Best for: Most complex reasoning, strategic analysis, nuanced understanding
Claude Haiku — Fast, lightweight Claude model
  • Context: 200K tokens
  • Vision: Yes (select versions)
  • Function calling: Yes
  • Best for: High-volume tasks, classification, fast response needs

Setup steps

1

Obtain Anthropic API key

Sign up for an Anthropic account at console.anthropic.com and create an API key. Configure billing on your account.
2

Add provider in Admin Portal

In the Admin Portal, add a new provider.Select Anthropic as the provider type.
3

Enter API key

Paste your Anthropic API key. Click Store in Vault to securely save it in HashiCorp Vault.
4

Test connection

Click Test Connection to verify API access and model availability.
5

Assign to organizations

Choose which MagOneAI organizations can use Anthropic models.

Function calling via “tool use”

Anthropic models support function calling through their “tool use” feature. It works the same as OpenAI’s function calling from an application perspective — models can call tools during reasoning. Anthropic’s tool use is often praised for:
  • Accurate tool parameter generation
  • Intelligent tool selection
  • Strong reasoning about when tools are needed vs when to answer directly

Extended context windows

All Claude models support 200K token context windows, significantly larger than most other models. Some Claude models offer experimental 1M token context. Use cases for extended context:
  • Process entire codebases or long documents
  • Maintain conversation history over many turns
  • Include extensive reference materials in every request
  • Avoid context summarization and loss of detail
Note: Extended context is powerful but expensive. Only include necessary context.

Cost considerations

Anthropic pricing (as of 2024, subject to change):
ModelInput (per 1M tokens)Output (per 1M tokens)
Claude 4.5 Sonnet$3.00$15.00
Claude Opus 4.6$15.00$75.00
Claude Haiku$0.25$1.25
Optimization tips:
  • Use Claude Haiku for simple tasks (much cheaper than Sonnet/Opus)
  • Use Claude for extended context needs instead of multiple smaller requests
  • Set per-token cost on each model so usage analytics reflect real spend

Google Gemini configuration

Google provides the Gemini family of models with strong multimodal capabilities and competitive pricing. MagOneAI integrates with Gemini natively using a Google AI Studio API key.

Supported models

Gemini 2.0 Flash — Fast, lightweight model with multimodal support
  • Context: 1M tokens
  • Vision: Yes
  • Function calling: Yes
  • Best for: High-volume tasks, long context needs, cost-sensitive workloads
Gemini 2.5 Pro — Most capable Gemini model
  • Context: 2M tokens (largest available)
  • Vision: Yes
  • Function calling: Yes
  • Best for: Extremely long context, multimodal analysis, complex reasoning

Setup steps

1

Obtain Google AI Studio API key

Go to ai.google.dev and create a Google AI Studio API key.
2

Add provider in Admin Portal

In the Admin Portal, add a new provider.Select Google Gemini as the provider type.
3

Enter API key

Paste your Google AI Studio API key. Click Store in Vault to save it securely.
4

Test connection

Click Test Connection to verify access and model availability.
5

Assign to organizations

Choose which MagOneAI organizations can use Google models.

Function calling

Gemini models support native function calling. Google’s function calling implementation is compatible with OpenAI’s format, making integration straightforward.

Multimodal capabilities

All Gemini models are natively multimodal, supporting:
  • Text input and output
  • Image understanding
  • Audio understanding (select models)
  • Video understanding (select models)
Gemini excels at:
  • Document analysis with complex layouts
  • Video content understanding
  • Cross-modal reasoning (e.g., “Describe what’s being said in this video”)

Extended context — up to 2M tokens

Gemini 2.5 Pro supports 2 million token context, the largest available from any major provider. This enables:
  • Processing entire books or documentation sets
  • Analyzing hours of video content
  • Maintaining extremely long conversation histories
  • Including massive reference materials
Cost note: 2M tokens of context is expensive even at Google’s pricing. Use extended context judiciously.

Cost considerations

Google pricing (as of 2024, subject to change):
ModelInput (per 1M tokens)Output (per 1M tokens)Context limit
Gemini 2.0 Flash$0.075$0.301M tokens
Gemini 2.5 Pro$1.25$5.002M tokens
Gemini 2.0 Flash is extremely cost-effective — 50x cheaper than Claude Opus for similar capabilities. Optimization tips:
  • Use Gemini Flash for long-context tasks (very cheap compared to alternatives)
  • Leverage native multimodal capabilities instead of separate OCR + LLM pipeline
  • Use Gemini Pro when you need the 2M context window

API key management

All provider API keys are stored in HashiCorp Vault with military-grade security.

How API keys are stored

1

Encrypted at rest

API keys are encrypted with AES-256 before storage. Vault’s encryption is FIPS 140-2 compliant.
2

Access control

Only the MagOneAI platform service account can read API keys. Individual users never have direct access to keys.
3

Audit logging

Every retrieval of an API key from Vault is logged with timestamp, requesting service, and reason.
4

Automatic rotation support

Vault supports automatic key rotation. Update the API key in Vault, and all services use the new key without configuration changes.

Key reference format

In configuration files and UI, API keys are referenced by Vault path, never stored directly:
vault:openai/api_key
vault:anthropic/production_key
vault:google/gemini_key
This ensures:
  • Keys never appear in logs
  • Configuration files can be version-controlled safely
  • Keys can be rotated without updating configurations

Rotation without disruption

To rotate an API key:
  1. Generate new key from provider (OpenAI, Anthropic, Google)
  2. Update the key in Vault at the existing path
  3. Existing workflows continue with no disruption
  4. (Optional) Revoke old key from provider after confirming new key works
No agent configurations or workflow definitions need updating.

Usage tracking and costs

Monitor model usage across all cloud providers in the analytics dashboards.

What’s tracked

  • Token usage — input tokens and output tokens per provider, org, and project
  • Per-token cost — when you set input/output cost per 1k tokens on a model, usage analytics surface token-based cost
  • Latency — response times by provider and model

Usage analytics

Usage and cost analytics appear in the analytics dashboards — see the Admin Portal. View:
  • Token usage over time — daily, weekly, monthly trends
  • Usage by provider, organization, and project
  • Per-token cost for models that have cost configured

Quotas

MagOneAI enforces token-based quotas (daily, weekly, and monthly) at the org, project, use-case, and user scope. When a quota is exceeded, requests are rejected with a clear error message until the quota resets or is increased. Quotas are configured and managed on a dedicated surface — see Usage and quotas.
You can use multiple providers simultaneously. Use OpenAI for general tasks, Anthropic for complex reasoning with extended context, and Google for multimodal — all within the same project. Mix and match to optimize cost and capability.

Best practices

Create separate API keys for development, staging, and production environments. This allows:
  • Separate cost tracking
  • Isolated quota management
  • Easier debugging (know which environment caused issues)
  • Security isolation (compromise in dev doesn’t affect production)
Apply token-based quotas at the org, project, use-case, and user scope to:
  • Prevent runaway usage from buggy workflows
  • Ensure fair allocation across organizations
See Usage and quotas for configuration.
Regularly review model usage to identify:
  • Agents using expensive models unnecessarily
  • High-volume tasks that could use cheaper models
  • Token-based cost trends across orgs and projects

Troubleshooting

Symptoms: “Invalid API key” or “Authentication failed” errors.Solutions:
  • Verify the API key is correct in the provider’s console
  • Check if the key has been revoked or expired
  • Ensure billing is configured on the provider account
  • Update the key in Vault and test connection again
Symptoms: “Rate limit exceeded” or “Too many requests” errors.Solutions:
  • Check if you’re hitting the provider’s own rate limits
  • Reduce request frequency in workflows
  • Upgrade to a higher-tier plan with the provider for higher limits
Symptoms: “Model not found” or “Model not available” errors.Solutions:
  • Verify the model name is correct (case-sensitive)
  • Check if the model is available in your provider account tier
  • Some models require waitlist access — check provider console
  • Ensure the model hasn’t been deprecated by the provider
Symptoms: Requests taking very long or timing out.Solutions:
  • Check provider status page for outages or degraded performance
  • Reduce max tokens to decrease response generation time
  • Consider using a faster model for time-sensitive tasks
  • Increase timeout settings if legitimate requests are timing out
  • Verify network connectivity to provider endpoints

Next steps

Private models

Deploy your own models for data sovereignty

Model configuration

Learn about model selection and optimization

Usage and quotas

Track token usage and set quotas

Agent configuration

Configure agents with the right models