Overview
MagOneAI supports major cloud LLM providers with native integration. Configure your API keys in the Admin Portal, and start using models immediately across your organization. Cloud providers offer:- No infrastructure management — pay per token, no servers to maintain
- Latest models — access to cutting-edge capabilities as they’re released
- Scalability — handle any volume without capacity planning
- High availability — provider-managed uptime and redundancy
OpenAI configuration
OpenAI provides the widely-used GPT family of models, including GPT-4, GPT-4o, and the reasoning-focused o1 and o3-mini models.Supported models
Supported models
GPT-4o — Latest GPT-4 with vision, fast and capable
- Context: 128K tokens
- Vision: Yes
- Function calling: Yes
- Best for: General-purpose tasks, multimodal workflows, production deployments
- Context: 128K tokens
- Vision: Yes
- Function calling: Yes
- Best for: Classification, routing, simple extraction, high-volume tasks
- Context: 8K or 32K tokens
- Vision: No (text-only)
- Function calling: Yes
- Best for: Legacy workflows, text-only tasks
- Context: 128K tokens
- Vision: Limited
- Function calling: Yes
- Best for: Complex reasoning, mathematics, code generation, strategic analysis
- Context: 128K tokens
- Vision: No
- Function calling: Yes
- Best for: Reasoning tasks where o1 is overkill, cost-sensitive reasoning workflows
Setup steps
Obtain OpenAI API key
Sign up for an OpenAI account at platform.openai.com and create an API key. You’ll need billing configured on your OpenAI account.
Add provider in Admin Portal
Navigate to Admin Portal → Configuration → LLM Providers → Add Provider.Select OpenAI as the provider type.
Enter API key
Paste your OpenAI API key. Click Store in Vault to securely save it in HashiCorp Vault.You’ll see a reference like
vault:openai/api_key — the actual key is never displayed again.Configure default settings (optional)
Set platform-wide defaults:
- Default organization ID (if using OpenAI org structure)
- Request timeout (default: 60 seconds)
- Retry settings (attempts and backoff)
Test connection
Click Test Connection. MagOneAI will make a test API call to verify:
- API key is valid
- Models are accessible
- Network connectivity is working
Function calling
All OpenAI models support native function calling. When you attach tools to an agent using an OpenAI model, MagOneAI automatically formats tool definitions in OpenAI’s function calling format and processes tool calls from the model’s responses.Vision support
GPT-4o and GPT-4o-mini support vision. Send images as part of the input:- Image URLs (must be publicly accessible or signed URLs)
- Base64-encoded images
- File uploads processed by MagOneAI
- Extract text from documents (OCR)
- Understand charts and diagrams
- Answer questions about images
- Describe image content
Cost considerations
OpenAI pricing (as of 2024, subject to change):| Model | Input (per 1M tokens) | Output (per 1M tokens) | Vision (per image) |
|---|---|---|---|
| GPT-4o | $5.00 | $15.00 | ~$0.003 |
| GPT-4o-mini | $0.15 | $0.60 | ~$0.0001 |
| o1 | $15.00 | $60.00 | - |
| o3-mini | $1.10 | $4.40 | - |
- Use GPT-4o-mini for simple tasks (10x cheaper than GPT-4o)
- Use o1 only when extended reasoning is truly needed (very expensive)
- Enable prompt caching if available to reduce input token costs
Anthropic configuration
Anthropic provides the Claude family of models, known for their extended context windows and strong reasoning capabilities.Supported models
Supported models
Claude 4.5 Sonnet — Latest flagship model with strong performance
- Context: 200K tokens (experimental 1M available)
- Vision: Yes
- Function calling: Yes (via “tool use”)
- Best for: Production workloads, complex analysis, extended context tasks
- Context: 200K tokens
- Vision: Yes
- Function calling: Yes
- Best for: Most complex reasoning, strategic analysis, nuanced understanding
- Context: 200K tokens
- Vision: Yes (select versions)
- Function calling: Yes
- Best for: High-volume tasks, classification, fast response needs
Setup steps
Obtain Anthropic API key
Sign up for an Anthropic account at console.anthropic.com and create an API key. Configure billing on your account.
Add provider in Admin Portal
Navigate to Admin Portal → Configuration → LLM Providers → Add Provider.Select Anthropic as the provider type.
Enter API key
Paste your Anthropic API key. Click Store in Vault to securely save it in HashiCorp Vault.
Configure prompt caching (optional)
Enable Anthropic’s prompt caching feature to dramatically reduce costs for repeated context:
- System prompts (same across requests)
- Large knowledge base documents
- Tool definitions
Function calling via “tool use”
Anthropic models support function calling through their “tool use” feature. It works the same as OpenAI’s function calling from an application perspective — models can call tools during reasoning. Anthropic’s tool use is often praised for:- Accurate tool parameter generation
- Intelligent tool selection
- Strong reasoning about when tools are needed vs when to answer directly
Extended context windows
All Claude models support 200K token context windows, significantly larger than most other models. Some Claude models offer experimental 1M token context. Use cases for extended context:- Process entire codebases or long documents
- Maintain conversation history over many turns
- Include extensive reference materials in every request
- Avoid context summarization and loss of detail
Prompt caching
Anthropic’s prompt caching is a significant cost optimization:- Cache system prompts that don’t change across requests
- Cache large knowledge base documents included in every RAG workflow
- Cache tool definitions (often dozens of KB for complex tool sets)
Cost considerations
Anthropic pricing (as of 2024, subject to change):| Model | Input (per 1M tokens) | Output (per 1M tokens) | Cached input (per 1M tokens) |
|---|---|---|---|
| Claude 4.5 Sonnet | $3.00 | $15.00 | $0.30 |
| Claude Opus 4.6 | $15.00 | $75.00 | $1.50 |
| Claude Haiku | $0.25 | $1.25 | $0.03 |
- Use Claude Haiku for simple tasks (much cheaper than Sonnet/Opus)
- Enable prompt caching for repeated context (90% cost reduction)
- Use Claude for extended context needs instead of multiple smaller requests
Google configuration
Google provides the Gemini family of models with strong multimodal capabilities and competitive pricing.Supported models
Supported models
Gemini 2.0 Flash — Fast, lightweight model with multimodal support
- Context: 1M tokens
- Vision: Yes
- Function calling: Yes
- Best for: High-volume tasks, long context needs, cost-sensitive workloads
- Context: 2M tokens (largest available)
- Vision: Yes
- Function calling: Yes
- Best for: Extremely long context, multimodal analysis, complex reasoning
Setup steps
Obtain Google AI API key
Go to ai.google.dev and create an API key. Note: For enterprise use, you may want Google Cloud Vertex AI instead of the consumer API.
Add provider in Admin Portal
Navigate to Admin Portal → Configuration → LLM Providers → Add Provider.Select Google as the provider type.
Choose API type
Select between:
- Google AI API (consumer API, simpler setup)
- Vertex AI (enterprise, requires GCP project configuration)
Function calling
Gemini models support native function calling. Google’s function calling implementation is compatible with OpenAI’s format, making integration straightforward.Multimodal capabilities
All Gemini models are natively multimodal, supporting:- Text input and output
- Image understanding
- Audio understanding (select models)
- Video understanding (select models)
- Document analysis with complex layouts
- Video content understanding
- Cross-modal reasoning (e.g., “Describe what’s being said in this video”)
Extended context — up to 2M tokens
Gemini 2.5 Pro supports 2 million token context, the largest available from any major provider. This enables:- Processing entire books or documentation sets
- Analyzing hours of video content
- Maintaining extremely long conversation histories
- Including massive reference materials
Cost considerations
Google pricing (as of 2024, subject to change):| Model | Input (per 1M tokens) | Output (per 1M tokens) | Context limit |
|---|---|---|---|
| Gemini 2.0 Flash | $0.075 | $0.30 | 1M tokens |
| Gemini 2.5 Pro | $1.25 | $5.00 | 2M tokens |
- Use Gemini Flash for long-context tasks (very cheap compared to alternatives)
- Leverage native multimodal capabilities instead of separate OCR + LLM pipeline
- Use Gemini Pro when you need the 2M context window
API key management
All provider API keys are stored in HashiCorp Vault with military-grade security.How API keys are stored
Encrypted at rest
API keys are encrypted with AES-256 before storage. Vault’s encryption is FIPS 140-2 compliant.
Access control
Only the MagOneAI platform service account can read API keys. Individual users never have direct access to keys.
Audit logging
Every retrieval of an API key from Vault is logged with timestamp, requesting service, and reason.
Key reference format
In configuration files and UI, API keys are referenced by Vault path, never stored directly:- Keys never appear in logs
- Configuration files can be version-controlled safely
- Keys can be rotated without updating configurations
Rotation without disruption
To rotate an API key:- Generate new key from provider (OpenAI, Anthropic, Google)
- Update the key in Vault at the existing path
- Existing workflows continue with no disruption
- (Optional) Revoke old key from provider after confirming new key works
Usage tracking and costs
Monitor model usage and costs across all cloud providers in the Admin Portal.What’s tracked
- Total requests per provider, per organization, per project
- Token usage (input tokens, output tokens, cached tokens)
- Cost calculated based on provider pricing
- Error rates and failure reasons
- Latency (p50, p95, p99 response times)
Cost analytics dashboard
Access detailed cost analytics in Admin Portal → Analytics → Model Usage. View:- Cost over time — daily, weekly, monthly spend trends
- Cost by provider — which providers are most expensive
- Cost by organization — which orgs are spending the most
- Cost by project — which projects drive costs
- Cost by agent — identify expensive agents for optimization
Setting budget alerts
Configure alerts to notify you when spending exceeds thresholds:- Daily spend exceeds $X
- Weekly spend exceeds $Y
- Specific organization exceeds monthly budget
- Specific project is 80% through monthly quota
Quota enforcement
Set hard quotas to prevent runaway costs: Organization quotas:- Max tokens per day/month
- Max requests per day/month
- Max cost per month
- Same options, applied at project level
You can use multiple providers simultaneously. Use OpenAI for general tasks, Anthropic for complex reasoning with extended context, and Google for multimodal — all within the same project. Mix and match to optimize cost and capability.
Best practices
Use separate API keys per environment
Use separate API keys per environment
Create separate API keys for development, staging, and production environments. This allows:
- Separate cost tracking
- Isolated quota management
- Easier debugging (know which environment caused issues)
- Security isolation (compromise in dev doesn’t affect production)
Enable rate limiting at provider level
Enable rate limiting at provider level
Even though cloud providers have their own rate limits, set rate limits in MagOneAI to:
- Prevent runaway costs from buggy workflows
- Ensure fair allocation across organizations
- Avoid hitting provider limits (which can affect all requests)
Monitor and optimize regularly
Monitor and optimize regularly
Weekly review of model usage to identify:
- Agents using expensive models unnecessarily
- High-volume tasks that could use cheaper models
- Failed requests (you pay for them but get no value)
- Opportunities to leverage prompt caching
Implement fallbacks across providers
Implement fallbacks across providers
Configure fallback providers for resilience:
- Primary: OpenAI GPT-4o
- Fallback 1: Anthropic Claude Sonnet
- Fallback 2: Google Gemini Pro
Troubleshooting
API key invalid or expired
API key invalid or expired
Symptoms: “Invalid API key” or “Authentication failed” errors.Solutions:
- Verify the API key is correct in the provider’s console
- Check if the key has been revoked or expired
- Ensure billing is configured on the provider account
- Update the key in Vault and test connection again
Rate limit exceeded
Rate limit exceeded
Symptoms: “Rate limit exceeded” or “Too many requests” errors.Solutions:
- Check if you’re hitting the provider’s rate limits (requests per minute)
- Reduce request frequency in workflows
- Upgrade to a higher-tier plan with the provider for higher limits
- Implement request queuing in MagOneAI to smooth out spikes
Model not available
Model not available
Symptoms: “Model not found” or “Model not available” errors.Solutions:
- Verify the model name is correct (case-sensitive)
- Check if the model is available in your provider account tier
- Some models require waitlist access — check provider console
- Ensure the model hasn’t been deprecated by the provider
High latency or timeouts
High latency or timeouts
Symptoms: Requests taking very long or timing out.Solutions:
- Check provider status page for outages or degraded performance
- Reduce max tokens to decrease response generation time
- Consider using a faster model for time-sensitive tasks
- Increase timeout settings if legitimate requests are timing out
- Verify network connectivity to provider endpoints