Skip to main content

Overview

MagOneAI supports major cloud LLM providers with native integration. Configure your API keys in the Admin Portal, and start using models immediately across your organization. Cloud providers offer:
  • No infrastructure management — pay per token, no servers to maintain
  • Latest models — access to cutting-edge capabilities as they’re released
  • Scalability — handle any volume without capacity planning
  • High availability — provider-managed uptime and redundancy
This guide covers configuration for OpenAI, Anthropic, and Google AI.

OpenAI configuration

OpenAI provides the widely-used GPT family of models, including GPT-4, GPT-4o, and the reasoning-focused o1 and o3-mini models.

Supported models

GPT-4o — Latest GPT-4 with vision, fast and capable
  • Context: 128K tokens
  • Vision: Yes
  • Function calling: Yes
  • Best for: General-purpose tasks, multimodal workflows, production deployments
GPT-4o-mini — Smaller, faster, cheaper version of GPT-4o
  • Context: 128K tokens
  • Vision: Yes
  • Function calling: Yes
  • Best for: Classification, routing, simple extraction, high-volume tasks
GPT-4 — Original GPT-4 (being phased out in favor of GPT-4o)
  • Context: 8K or 32K tokens
  • Vision: No (text-only)
  • Function calling: Yes
  • Best for: Legacy workflows, text-only tasks
o1 — Reasoning model with extended thinking capability
  • Context: 128K tokens
  • Vision: Limited
  • Function calling: Yes
  • Best for: Complex reasoning, mathematics, code generation, strategic analysis
o3-mini — Smaller reasoning model, faster than o1
  • Context: 128K tokens
  • Vision: No
  • Function calling: Yes
  • Best for: Reasoning tasks where o1 is overkill, cost-sensitive reasoning workflows

Setup steps

1

Obtain OpenAI API key

Sign up for an OpenAI account at platform.openai.com and create an API key. You’ll need billing configured on your OpenAI account.
2

Add provider in Admin Portal

Navigate to Admin Portal → Configuration → LLM Providers → Add Provider.Select OpenAI as the provider type.
3

Enter API key

Paste your OpenAI API key. Click Store in Vault to securely save it in HashiCorp Vault.You’ll see a reference like vault:openai/api_key — the actual key is never displayed again.
4

Configure default settings (optional)

Set platform-wide defaults:
  • Default organization ID (if using OpenAI org structure)
  • Request timeout (default: 60 seconds)
  • Retry settings (attempts and backoff)
5

Test connection

Click Test Connection. MagOneAI will make a test API call to verify:
  • API key is valid
  • Models are accessible
  • Network connectivity is working
6

Assign to organizations

Choose which MagOneAI organizations can use OpenAI models. You can enable for all orgs or select specific ones.

Function calling

All OpenAI models support native function calling. When you attach tools to an agent using an OpenAI model, MagOneAI automatically formats tool definitions in OpenAI’s function calling format and processes tool calls from the model’s responses.

Vision support

GPT-4o and GPT-4o-mini support vision. Send images as part of the input:
  • Image URLs (must be publicly accessible or signed URLs)
  • Base64-encoded images
  • File uploads processed by MagOneAI
Vision models can:
  • Extract text from documents (OCR)
  • Understand charts and diagrams
  • Answer questions about images
  • Describe image content
Use vision models for document processing, visual QA, and multimodal workflows.

Cost considerations

OpenAI pricing (as of 2024, subject to change):
ModelInput (per 1M tokens)Output (per 1M tokens)Vision (per image)
GPT-4o$5.00$15.00~$0.003
GPT-4o-mini$0.15$0.60~$0.0001
o1$15.00$60.00-
o3-mini$1.10$4.40-
Optimization tips:
  • Use GPT-4o-mini for simple tasks (10x cheaper than GPT-4o)
  • Use o1 only when extended reasoning is truly needed (very expensive)
  • Enable prompt caching if available to reduce input token costs

Anthropic configuration

Anthropic provides the Claude family of models, known for their extended context windows and strong reasoning capabilities.

Supported models

Claude 4.5 Sonnet — Latest flagship model with strong performance
  • Context: 200K tokens (experimental 1M available)
  • Vision: Yes
  • Function calling: Yes (via “tool use”)
  • Best for: Production workloads, complex analysis, extended context tasks
Claude Opus 4.6 — Most capable Claude model
  • Context: 200K tokens
  • Vision: Yes
  • Function calling: Yes
  • Best for: Most complex reasoning, strategic analysis, nuanced understanding
Claude Haiku — Fast, lightweight Claude model
  • Context: 200K tokens
  • Vision: Yes (select versions)
  • Function calling: Yes
  • Best for: High-volume tasks, classification, fast response needs

Setup steps

1

Obtain Anthropic API key

Sign up for an Anthropic account at console.anthropic.com and create an API key. Configure billing on your account.
2

Add provider in Admin Portal

Navigate to Admin Portal → Configuration → LLM Providers → Add Provider.Select Anthropic as the provider type.
3

Enter API key

Paste your Anthropic API key. Click Store in Vault to securely save it in HashiCorp Vault.
4

Configure prompt caching (optional)

Enable Anthropic’s prompt caching feature to dramatically reduce costs for repeated context:
  • System prompts (same across requests)
  • Large knowledge base documents
  • Tool definitions
Caching can reduce costs by up to 90% for cached content.
5

Test connection

Click Test Connection to verify API access and model availability.
6

Assign to organizations

Choose which MagOneAI organizations can use Anthropic models.

Function calling via “tool use”

Anthropic models support function calling through their “tool use” feature. It works the same as OpenAI’s function calling from an application perspective — models can call tools during reasoning. Anthropic’s tool use is often praised for:
  • Accurate tool parameter generation
  • Intelligent tool selection
  • Strong reasoning about when tools are needed vs when to answer directly

Extended context windows

All Claude models support 200K token context windows, significantly larger than most other models. Some Claude models offer experimental 1M token context. Use cases for extended context:
  • Process entire codebases or long documents
  • Maintain conversation history over many turns
  • Include extensive reference materials in every request
  • Avoid context summarization and loss of detail
Note: Extended context is powerful but expensive. Only include necessary context.

Prompt caching

Anthropic’s prompt caching is a significant cost optimization:
  • Cache system prompts that don’t change across requests
  • Cache large knowledge base documents included in every RAG workflow
  • Cache tool definitions (often dozens of KB for complex tool sets)
Cached content costs 90% less on subsequent requests. Enable caching in provider settings.

Cost considerations

Anthropic pricing (as of 2024, subject to change):
ModelInput (per 1M tokens)Output (per 1M tokens)Cached input (per 1M tokens)
Claude 4.5 Sonnet$3.00$15.00$0.30
Claude Opus 4.6$15.00$75.00$1.50
Claude Haiku$0.25$1.25$0.03
Optimization tips:
  • Use Claude Haiku for simple tasks (much cheaper than Sonnet/Opus)
  • Enable prompt caching for repeated context (90% cost reduction)
  • Use Claude for extended context needs instead of multiple smaller requests

Google configuration

Google provides the Gemini family of models with strong multimodal capabilities and competitive pricing.

Supported models

Gemini 2.0 Flash — Fast, lightweight model with multimodal support
  • Context: 1M tokens
  • Vision: Yes
  • Function calling: Yes
  • Best for: High-volume tasks, long context needs, cost-sensitive workloads
Gemini 2.5 Pro — Most capable Gemini model
  • Context: 2M tokens (largest available)
  • Vision: Yes
  • Function calling: Yes
  • Best for: Extremely long context, multimodal analysis, complex reasoning

Setup steps

1

Obtain Google AI API key

Go to ai.google.dev and create an API key. Note: For enterprise use, you may want Google Cloud Vertex AI instead of the consumer API.
2

Add provider in Admin Portal

Navigate to Admin Portal → Configuration → LLM Providers → Add Provider.Select Google as the provider type.
3

Enter API key

Paste your Google AI API key. Click Store in Vault to save it securely.
4

Choose API type

Select between:
  • Google AI API (consumer API, simpler setup)
  • Vertex AI (enterprise, requires GCP project configuration)
5

Test connection

Click Test Connection to verify access and model availability.
6

Assign to organizations

Choose which MagOneAI organizations can use Google models.

Function calling

Gemini models support native function calling. Google’s function calling implementation is compatible with OpenAI’s format, making integration straightforward.

Multimodal capabilities

All Gemini models are natively multimodal, supporting:
  • Text input and output
  • Image understanding
  • Audio understanding (select models)
  • Video understanding (select models)
Gemini excels at:
  • Document analysis with complex layouts
  • Video content understanding
  • Cross-modal reasoning (e.g., “Describe what’s being said in this video”)

Extended context — up to 2M tokens

Gemini 2.5 Pro supports 2 million token context, the largest available from any major provider. This enables:
  • Processing entire books or documentation sets
  • Analyzing hours of video content
  • Maintaining extremely long conversation histories
  • Including massive reference materials
Cost note: 2M tokens of context is expensive even at Google’s pricing. Use extended context judiciously.

Cost considerations

Google pricing (as of 2024, subject to change):
ModelInput (per 1M tokens)Output (per 1M tokens)Context limit
Gemini 2.0 Flash$0.075$0.301M tokens
Gemini 2.5 Pro$1.25$5.002M tokens
Gemini 2.0 Flash is extremely cost-effective — 50x cheaper than Claude Opus for similar capabilities. Optimization tips:
  • Use Gemini Flash for long-context tasks (very cheap compared to alternatives)
  • Leverage native multimodal capabilities instead of separate OCR + LLM pipeline
  • Use Gemini Pro when you need the 2M context window

API key management

All provider API keys are stored in HashiCorp Vault with military-grade security.

How API keys are stored

1

Encrypted at rest

API keys are encrypted with AES-256 before storage. Vault’s encryption is FIPS 140-2 compliant.
2

Access control

Only the MagOneAI platform service account can read API keys. Individual users never have direct access to keys.
3

Audit logging

Every retrieval of an API key from Vault is logged with timestamp, requesting service, and reason.
4

Automatic rotation support

Vault supports automatic key rotation. Update the API key in Vault, and all services use the new key without configuration changes.

Key reference format

In configuration files and UI, API keys are referenced by Vault path, never stored directly:
vault:openai/api_key
vault:anthropic/production_key
vault:google/vertex_ai_key
This ensures:
  • Keys never appear in logs
  • Configuration files can be version-controlled safely
  • Keys can be rotated without updating configurations

Rotation without disruption

To rotate an API key:
  1. Generate new key from provider (OpenAI, Anthropic, Google)
  2. Update the key in Vault at the existing path
  3. Existing workflows continue with no disruption
  4. (Optional) Revoke old key from provider after confirming new key works
No agent configurations or workflow definitions need updating.

Usage tracking and costs

Monitor model usage and costs across all cloud providers in the Admin Portal.

What’s tracked

  • Total requests per provider, per organization, per project
  • Token usage (input tokens, output tokens, cached tokens)
  • Cost calculated based on provider pricing
  • Error rates and failure reasons
  • Latency (p50, p95, p99 response times)

Cost analytics dashboard

Access detailed cost analytics in Admin Portal → Analytics → Model Usage. View:
  • Cost over time — daily, weekly, monthly spend trends
  • Cost by provider — which providers are most expensive
  • Cost by organization — which orgs are spending the most
  • Cost by project — which projects drive costs
  • Cost by agent — identify expensive agents for optimization

Setting budget alerts

Configure alerts to notify you when spending exceeds thresholds:
  • Daily spend exceeds $X
  • Weekly spend exceeds $Y
  • Specific organization exceeds monthly budget
  • Specific project is 80% through monthly quota
Alerts can email admins or post to Slack/Teams channels.

Quota enforcement

Set hard quotas to prevent runaway costs: Organization quotas:
  • Max tokens per day/month
  • Max requests per day/month
  • Max cost per month
Project quotas:
  • Same options, applied at project level
When a quota is exceeded, requests are rejected with clear error messages until the quota resets or is increased.
You can use multiple providers simultaneously. Use OpenAI for general tasks, Anthropic for complex reasoning with extended context, and Google for multimodal — all within the same project. Mix and match to optimize cost and capability.

Best practices

Create separate API keys for development, staging, and production environments. This allows:
  • Separate cost tracking
  • Isolated quota management
  • Easier debugging (know which environment caused issues)
  • Security isolation (compromise in dev doesn’t affect production)
Even though cloud providers have their own rate limits, set rate limits in MagOneAI to:
  • Prevent runaway costs from buggy workflows
  • Ensure fair allocation across organizations
  • Avoid hitting provider limits (which can affect all requests)
Weekly review of model usage to identify:
  • Agents using expensive models unnecessarily
  • High-volume tasks that could use cheaper models
  • Failed requests (you pay for them but get no value)
  • Opportunities to leverage prompt caching
Configure fallback providers for resilience:
  • Primary: OpenAI GPT-4o
  • Fallback 1: Anthropic Claude Sonnet
  • Fallback 2: Google Gemini Pro
If one provider has an outage, workflows automatically use the fallback without manual intervention.

Troubleshooting

Symptoms: “Invalid API key” or “Authentication failed” errors.Solutions:
  • Verify the API key is correct in the provider’s console
  • Check if the key has been revoked or expired
  • Ensure billing is configured on the provider account
  • Update the key in Vault and test connection again
Symptoms: “Rate limit exceeded” or “Too many requests” errors.Solutions:
  • Check if you’re hitting the provider’s rate limits (requests per minute)
  • Reduce request frequency in workflows
  • Upgrade to a higher-tier plan with the provider for higher limits
  • Implement request queuing in MagOneAI to smooth out spikes
Symptoms: “Model not found” or “Model not available” errors.Solutions:
  • Verify the model name is correct (case-sensitive)
  • Check if the model is available in your provider account tier
  • Some models require waitlist access — check provider console
  • Ensure the model hasn’t been deprecated by the provider
Symptoms: Requests taking very long or timing out.Solutions:
  • Check provider status page for outages or degraded performance
  • Reduce max tokens to decrease response generation time
  • Consider using a faster model for time-sensitive tasks
  • Increase timeout settings if legitimate requests are timing out
  • Verify network connectivity to provider endpoints

Next steps