Cloud providers - MagOneAI

Overview

MagOneAI supports major cloud LLM providers with native integration. Configure your API keys in the Admin Portal, and start using models immediately across your organization. Cloud providers offer:

No infrastructure management — pay per token, no servers to maintain
Latest models — access to cutting-edge capabilities as they’re released
Scalability — handle any volume without capacity planning
High availability — provider-managed uptime and redundancy

This guide covers configuration for OpenAI, Anthropic, and Google AI.

OpenAI configuration

OpenAI provides the widely-used GPT family of models, including GPT-4, GPT-4o, and the reasoning-focused o1 and o3-mini models.

Supported models

GPT-4o — Latest GPT-4 with vision, fast and capable

Context: 128K tokens
Vision: Yes
Function calling: Yes
Best for: General-purpose tasks, multimodal workflows, production deployments

GPT-4o-mini — Smaller, faster, cheaper version of GPT-4o

Context: 128K tokens
Vision: Yes
Function calling: Yes
Best for: Classification, routing, simple extraction, high-volume tasks

GPT-4 — Original GPT-4 (being phased out in favor of GPT-4o)

Context: 8K or 32K tokens
Vision: No (text-only)
Function calling: Yes
Best for: Legacy workflows, text-only tasks

o1 — Reasoning model with extended thinking capability

Context: 128K tokens
Vision: Limited
Function calling: Yes
Best for: Complex reasoning, mathematics, code generation, strategic analysis

o3-mini — Smaller reasoning model, faster than o1

Context: 128K tokens
Vision: No
Function calling: Yes
Best for: Reasoning tasks where o1 is overkill, cost-sensitive reasoning workflows

Setup steps

Obtain OpenAI API key

Sign up for an OpenAI account at platform.openai.com and create an API key. You’ll need billing configured on your OpenAI account.

Add provider in Admin Portal

Navigate to Admin Portal → Configuration → LLM Providers → Add Provider.Select OpenAI as the provider type.

Enter API key

Paste your OpenAI API key. Click Store in Vault to securely save it in HashiCorp Vault.You’ll see a reference like vault:openai/api_key — the actual key is never displayed again.

Configure default settings (optional)

Set platform-wide defaults:

Default organization ID (if using OpenAI org structure)
Request timeout (default: 60 seconds)
Retry settings (attempts and backoff)

Test connection

Click Test Connection. MagOneAI will make a test API call to verify:

API key is valid
Models are accessible
Network connectivity is working

Assign to organizations

Choose which MagOneAI organizations can use OpenAI models. You can enable for all orgs or select specific ones.

Function calling

All OpenAI models support native function calling. When you attach tools to an agent using an OpenAI model, MagOneAI automatically formats tool definitions in OpenAI’s function calling format and processes tool calls from the model’s responses.

Vision support

GPT-4o and GPT-4o-mini support vision. Send images as part of the input:

Image URLs (must be publicly accessible or signed URLs)
Base64-encoded images
File uploads processed by MagOneAI

Vision models can:

Extract text from documents (OCR)
Understand charts and diagrams
Answer questions about images
Describe image content

Use vision models for document processing, visual QA, and multimodal workflows.

Cost considerations

OpenAI pricing (as of 2024, subject to change):

Model	Input (per 1M tokens)	Output (per 1M tokens)	Vision (per image)
GPT-4o	$5.00	$15.00	~$0.003
GPT-4o-mini	$0.15	$0.60	~$0.0001
o1	$15.00	$60.00	-
o3-mini	$1.10	$4.40	-

Optimization tips:

Use GPT-4o-mini for simple tasks (10x cheaper than GPT-4o)
Use o1 only when extended reasoning is truly needed (very expensive)
Enable prompt caching if available to reduce input token costs

Anthropic configuration

Anthropic provides the Claude family of models, known for their extended context windows and strong reasoning capabilities.

Supported models

Claude 4.5 Sonnet — Latest flagship model with strong performance

Context: 200K tokens (experimental 1M available)
Vision: Yes
Function calling: Yes (via “tool use”)
Best for: Production workloads, complex analysis, extended context tasks

Claude Opus 4.6 — Most capable Claude model

Context: 200K tokens
Vision: Yes
Function calling: Yes
Best for: Most complex reasoning, strategic analysis, nuanced understanding

Claude Haiku — Fast, lightweight Claude model

Context: 200K tokens
Vision: Yes (select versions)
Function calling: Yes
Best for: High-volume tasks, classification, fast response needs

Setup steps

Obtain Anthropic API key

Add provider in Admin Portal

Navigate to Admin Portal → Configuration → LLM Providers → Add Provider.Select Anthropic as the provider type.

Enter API key

Paste your Anthropic API key. Click Store in Vault to securely save it in HashiCorp Vault.

Configure prompt caching (optional)

Enable Anthropic’s prompt caching feature to dramatically reduce costs for repeated context:

System prompts (same across requests)
Large knowledge base documents
Tool definitions

Caching can reduce costs by up to 90% for cached content.

Test connection

Click Test Connection to verify API access and model availability.

Assign to organizations

Choose which MagOneAI organizations can use Anthropic models.

Function calling via “tool use”

Anthropic models support function calling through their “tool use” feature. It works the same as OpenAI’s function calling from an application perspective — models can call tools during reasoning. Anthropic’s tool use is often praised for:

Accurate tool parameter generation
Intelligent tool selection
Strong reasoning about when tools are needed vs when to answer directly

Extended context windows

All Claude models support 200K token context windows, significantly larger than most other models. Some Claude models offer experimental 1M token context. Use cases for extended context:

Process entire codebases or long documents
Maintain conversation history over many turns
Include extensive reference materials in every request
Avoid context summarization and loss of detail

Note: Extended context is powerful but expensive. Only include necessary context.

Prompt caching

Anthropic’s prompt caching is a significant cost optimization:

Cache system prompts that don’t change across requests
Cache large knowledge base documents included in every RAG workflow
Cache tool definitions (often dozens of KB for complex tool sets)

Cached content costs 90% less on subsequent requests. Enable caching in provider settings.

Cost considerations

Anthropic pricing (as of 2024, subject to change):

Model	Input (per 1M tokens)	Output (per 1M tokens)	Cached input (per 1M tokens)
Claude 4.5 Sonnet	$3.00	$15.00	$0.30
Claude Opus 4.6	$15.00	$75.00	$1.50
Claude Haiku	$0.25	$1.25	$0.03

Optimization tips:

Use Claude Haiku for simple tasks (much cheaper than Sonnet/Opus)
Enable prompt caching for repeated context (90% cost reduction)
Use Claude for extended context needs instead of multiple smaller requests

Google configuration

Google provides the Gemini family of models with strong multimodal capabilities and competitive pricing.

Supported models

Gemini 2.0 Flash — Fast, lightweight model with multimodal support

Context: 1M tokens
Vision: Yes
Function calling: Yes
Best for: High-volume tasks, long context needs, cost-sensitive workloads

Gemini 2.5 Pro — Most capable Gemini model

Context: 2M tokens (largest available)
Vision: Yes
Function calling: Yes
Best for: Extremely long context, multimodal analysis, complex reasoning

Setup steps

Obtain Google AI API key

Go to ai.google.dev and create an API key. Note: For enterprise use, you may want Google Cloud Vertex AI instead of the consumer API.

Add provider in Admin Portal

Navigate to Admin Portal → Configuration → LLM Providers → Add Provider.Select Google as the provider type.

Enter API key

Paste your Google AI API key. Click Store in Vault to save it securely.

Choose API type

Select between:

Google AI API (consumer API, simpler setup)
Vertex AI (enterprise, requires GCP project configuration)

Test connection

Click Test Connection to verify access and model availability.

Assign to organizations

Choose which MagOneAI organizations can use Google models.

Function calling

Gemini models support native function calling. Google’s function calling implementation is compatible with OpenAI’s format, making integration straightforward.

Multimodal capabilities

All Gemini models are natively multimodal, supporting:

Text input and output
Image understanding
Audio understanding (select models)
Video understanding (select models)

Gemini excels at:

Document analysis with complex layouts
Video content understanding
Cross-modal reasoning (e.g., “Describe what’s being said in this video”)

Extended context — up to 2M tokens

Gemini 2.5 Pro supports 2 million token context, the largest available from any major provider. This enables:

Processing entire books or documentation sets
Analyzing hours of video content
Maintaining extremely long conversation histories
Including massive reference materials

Cost note: 2M tokens of context is expensive even at Google’s pricing. Use extended context judiciously.

Cost considerations

Google pricing (as of 2024, subject to change):

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context limit
Gemini 2.0 Flash	$0.075	$0.30	1M tokens
Gemini 2.5 Pro	$1.25	$5.00	2M tokens

Gemini 2.0 Flash is extremely cost-effective — 50x cheaper than Claude Opus for similar capabilities. Optimization tips:

Use Gemini Flash for long-context tasks (very cheap compared to alternatives)
Leverage native multimodal capabilities instead of separate OCR + LLM pipeline
Use Gemini Pro when you need the 2M context window

API key management

All provider API keys are stored in HashiCorp Vault with military-grade security.

How API keys are stored

Encrypted at rest

API keys are encrypted with AES-256 before storage. Vault’s encryption is FIPS 140-2 compliant.

Access control

Only the MagOneAI platform service account can read API keys. Individual users never have direct access to keys.

Audit logging

Every retrieval of an API key from Vault is logged with timestamp, requesting service, and reason.

Automatic rotation support

Vault supports automatic key rotation. Update the API key in Vault, and all services use the new key without configuration changes.

Key reference format

In configuration files and UI, API keys are referenced by Vault path, never stored directly:

vault:openai/api_key
vault:anthropic/production_key
vault:google/vertex_ai_key

This ensures:

Keys never appear in logs
Configuration files can be version-controlled safely
Keys can be rotated without updating configurations

Rotation without disruption

To rotate an API key:

Generate new key from provider (OpenAI, Anthropic, Google)
Update the key in Vault at the existing path
Existing workflows continue with no disruption
(Optional) Revoke old key from provider after confirming new key works

No agent configurations or workflow definitions need updating.

Usage tracking and costs

Monitor model usage and costs across all cloud providers in the Admin Portal.

What’s tracked

Total requests per provider, per organization, per project
Token usage (input tokens, output tokens, cached tokens)
Cost calculated based on provider pricing
Error rates and failure reasons
Latency (p50, p95, p99 response times)

Cost analytics dashboard

Access detailed cost analytics in Admin Portal → Analytics → Model Usage. View:

Cost over time — daily, weekly, monthly spend trends
Cost by provider — which providers are most expensive
Cost by organization — which orgs are spending the most
Cost by project — which projects drive costs
Cost by agent — identify expensive agents for optimization

Setting budget alerts

Configure alerts to notify you when spending exceeds thresholds:

Daily spend exceeds $X
Weekly spend exceeds $Y
Specific organization exceeds monthly budget
Specific project is 80% through monthly quota

Alerts can email admins or post to Slack/Teams channels.

Quota enforcement

Set hard quotas to prevent runaway costs: Organization quotas:

Max tokens per day/month
Max requests per day/month
Max cost per month

Project quotas:

Same options, applied at project level

When a quota is exceeded, requests are rejected with clear error messages until the quota resets or is increased.

You can use multiple providers simultaneously. Use OpenAI for general tasks, Anthropic for complex reasoning with extended context, and Google for multimodal — all within the same project. Mix and match to optimize cost and capability.

Best practices

Use separate API keys per environment

Create separate API keys for development, staging, and production environments. This allows:

Separate cost tracking
Isolated quota management
Easier debugging (know which environment caused issues)
Security isolation (compromise in dev doesn’t affect production)

Enable rate limiting at provider level

Even though cloud providers have their own rate limits, set rate limits in MagOneAI to:

Prevent runaway costs from buggy workflows
Ensure fair allocation across organizations
Avoid hitting provider limits (which can affect all requests)

Monitor and optimize regularly

Weekly review of model usage to identify:

Agents using expensive models unnecessarily
High-volume tasks that could use cheaper models
Failed requests (you pay for them but get no value)
Opportunities to leverage prompt caching

Implement fallbacks across providers

Configure fallback providers for resilience:

Primary: OpenAI GPT-4o
Fallback 1: Anthropic Claude Sonnet
Fallback 2: Google Gemini Pro

If one provider has an outage, workflows automatically use the fallback without manual intervention.

Troubleshooting

API key invalid or expired

Symptoms: “Invalid API key” or “Authentication failed” errors.Solutions:

Verify the API key is correct in the provider’s console
Check if the key has been revoked or expired
Ensure billing is configured on the provider account
Update the key in Vault and test connection again

Rate limit exceeded

Symptoms: “Rate limit exceeded” or “Too many requests” errors.Solutions:

Check if you’re hitting the provider’s rate limits (requests per minute)
Reduce request frequency in workflows
Upgrade to a higher-tier plan with the provider for higher limits
Implement request queuing in MagOneAI to smooth out spikes

Model not available

Symptoms: “Model not found” or “Model not available” errors.Solutions:

Verify the model name is correct (case-sensitive)
Check if the model is available in your provider account tier
Some models require waitlist access — check provider console
Ensure the model hasn’t been deprecated by the provider

High latency or timeouts

Symptoms: Requests taking very long or timing out.Solutions:

Check provider status page for outages or degraded performance
Reduce max tokens to decrease response generation time
Consider using a faster model for time-sensitive tasks
Increase timeout settings if legitimate requests are timing out
Verify network connectivity to provider endpoints

Next steps

Private models

Deploy your own models for data sovereignty

Model configuration

Learn about model selection and optimization

Cost analytics

Deep dive into cost tracking and optimization

Agent configuration

Configure agents with the right models

Getting Started

Platform Guide

Agents

Workflow Builder

Tools & Integrations

Models & Providers

Cookbooks

Security & Administration

​Overview

​OpenAI configuration

​Setup steps

​Function calling

​Vision support

​Cost considerations

​Anthropic configuration

​Setup steps

​Function calling via “tool use”

​Extended context windows

​Prompt caching

​Cost considerations

​Google configuration

​Setup steps

​Function calling

​Multimodal capabilities

​Extended context — up to 2M tokens

​Cost considerations

​API key management

​How API keys are stored

​Key reference format

​Rotation without disruption

​Usage tracking and costs

​What’s tracked

​Cost analytics dashboard

​Setting budget alerts

​Quota enforcement

​Best practices

​Troubleshooting

​Next steps

Private models

Model configuration

Cost analytics

Agent configuration

Overview

OpenAI configuration

Setup steps

Function calling

Vision support

Cost considerations

Anthropic configuration

Setup steps

Function calling via “tool use”

Extended context windows

Prompt caching

Cost considerations

Google configuration

Setup steps

Function calling

Multimodal capabilities

Extended context — up to 2M tokens

Cost considerations

API key management

How API keys are stored

Key reference format

Rotation without disruption

Usage tracking and costs

What’s tracked

Cost analytics dashboard

Setting budget alerts

Quota enforcement

Best practices

Troubleshooting

Next steps