Skip to main content

Overview

Connect your agents to databases for intelligent data retrieval and analysis. MagOneAI provides two approaches to database access:
  1. Natural language to SQL — agents ask questions in plain language, converted to SQL automatically
  2. Direct SQL execution — agents execute SQL queries directly for precise control
Both approaches support read and write operations, though read-only access is strongly recommended for AI agents to prevent unintended data modifications.

Vanna Text-to-SQL

Vanna is an AI-powered text-to-SQL system that converts natural language questions into SQL queries. This makes databases accessible to agents without requiring them to understand complex schema details.

How it works

1

User or agent asks a question

Natural language question: “What were our top 5 customers by revenue last quarter?”
2

Vanna converts to SQL

Vanna analyzes your database schema and converts the question to SQL:
SELECT customer_name, SUM(order_total) as revenue
FROM orders
WHERE order_date >= '2024-01-01' AND order_date < '2024-04-01'
GROUP BY customer_name
ORDER BY revenue DESC
LIMIT 5;
3

Query executes against database

MagOneAI executes the generated SQL against your configured database connection.
4

Results returned to agent

Query results are returned in structured format. The agent can interpret and present the data.

Supported databases

Vanna supports major SQL databases:
  • PostgreSQL — production-grade relational database
  • MySQL — widely used open-source database
  • Microsoft SQL Server — enterprise database system
  • SQLite — lightweight embedded database
  • MariaDB — MySQL-compatible database
  • Amazon Redshift — cloud data warehouse
  • Google BigQuery — serverless data warehouse
  • Snowflake — cloud data platform

Training Vanna on your schema

For best accuracy, train Vanna on your specific database schema. This helps it understand:
  • Table relationships and foreign keys
  • Common query patterns in your domain
  • Business logic and calculated fields
  • Naming conventions and terminology
Training Vanna improves query accuracy by teaching it about your database:
  1. Automatic schema extraction: Vanna reads your database schema (tables, columns, types, relationships)
  2. Example queries: Provide example question-SQL pairs for common queries
  3. Business terminology: Define domain-specific terms (e.g., “ARR” means “Annual Recurring Revenue”)
  4. Query patterns: Show Vanna how you typically join tables or calculate metrics
Training is done through the MagOneAI Admin Portal under Integrations → Database → Vanna Training.Example training:
{
  "question": "What's our monthly recurring revenue?",
  "sql": "SELECT SUM(subscription_amount) FROM subscriptions WHERE status = 'active'",
  "explanation": "MRR is the sum of all active subscription amounts"
}

Limitations of text-to-SQL

While powerful, text-to-SQL has limitations:
  • Ambiguous questions may generate incorrect SQL
  • Complex analytical queries may require human-written SQL for accuracy
  • Schema changes require retraining for best results
  • Performance depends on training quality and schema complexity
For mission-critical queries, consider using direct SQL execution with predefined, tested queries.

CSV-to-SQL tool

Query CSV files using SQL without setting up a database server. This is useful for:
  • Ad-hoc data analysis on uploaded files
  • Prototyping workflows before connecting to production databases
  • Processing data exports from external systems

How it works

  1. Upload CSV file to MagOneAI (via UI or API)
  2. File is loaded into SQLite automatically with inferred schema
  3. Query with SQL using standard SELECT statements
  4. Results returned in structured format
Example:
SELECT product_category, AVG(price) as avg_price, COUNT(*) as count
FROM products_csv
WHERE stock > 0
GROUP BY product_category
ORDER BY avg_price DESC;

Use cases

  • Analyze sales data exports from e-commerce platforms
  • Process survey results from CSV downloads
  • Join multiple CSV files for cross-dataset analysis
  • Validate data quality before database import
CSV-to-SQL uses an ephemeral SQLite database. Files are not permanently stored — they’re loaded for the duration of the workflow execution.

Database connection configuration

Connect agents to your databases securely using the configuration workflow.
1

Add database connection in project settings

Navigate to Project Settings → Integrations → Databases and click Add Connection.
2

Configure connection string

Provide connection details:
  • Connection type: PostgreSQL, MySQL, etc.
  • Host and port: Database server address
  • Database name: Target database
  • Username and password: Database credentials
Credentials are automatically stored in HashiCorp Vault. You’ll reference the connection by name in workflows.
3

Train text-to-SQL model (optional)

For Vanna text-to-SQL, train the model on your schema and example queries. This improves accuracy significantly. Training is optional but highly recommended.
4

Attach database tools to agents

In agent configuration, enable database tools and specify which database connections the agent can access.
5

Test the connection

Use the built-in query tester to verify connectivity and run a simple query.

Connection string format

Connection strings follow standard database URL formats: PostgreSQL:
postgresql://username:password@host:5432/database_name
MySQL:
mysql://username:password@host:3306/database_name
SQL Server:
mssql://username:password@host:1433/database_name
Connection strings are stored in Vault with the reference format: vault:database/production_db.

Security considerations

Database access by AI agents requires careful security design. Follow these best practices to protect your data.

Use read-only credentials

Always use read-only database credentials for AI agents unless write access is specifically required and approved. Agents can make mistakes or be manipulated by adversarial inputs. Read-only access prevents data loss or corruption.
Create a dedicated read-only database user: PostgreSQL example:
CREATE USER agent_readonly WITH PASSWORD 'secure_password';
GRANT CONNECT ON DATABASE your_database TO agent_readonly;
GRANT USAGE ON SCHEMA public TO agent_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO agent_readonly;
MySQL example:
CREATE USER 'agent_readonly'@'%' IDENTIFIED BY 'secure_password';
GRANT SELECT ON your_database.* TO 'agent_readonly'@'%';

Connection credentials in Vault

All database credentials are stored in HashiCorp Vault:
  • Encrypted at rest with AES-256
  • Access controlled by role-based permissions
  • Automatic key rotation supported
  • Audit logging for all credential access
Connection strings are never exposed in:
  • Workflow configurations (reference by name only)
  • Agent logs or traces
  • API responses
  • User interfaces

Query audit logging

MagOneAI logs all database queries executed by agents:
  • What query was executed (full SQL text)
  • Who executed it (which agent, which user, which workflow)
  • When it was executed (timestamp)
  • What data was accessed (tables and columns)
  • Query result metadata (row count, execution time)
Audit logs are available in the Admin Portal under Security → Audit Logs → Database Access.

Row-level security

Where supported by your database, implement row-level security (RLS) to restrict which data agents can access. PostgreSQL RLS example:
-- Enable RLS on sensitive table
ALTER TABLE customer_data ENABLE ROW LEVEL SECURITY;

-- Policy: agents can only see non-PII columns
CREATE POLICY agent_access ON customer_data
  FOR SELECT
  TO agent_readonly
  USING (true)  -- Allow all rows
  WITH CHECK (false);  -- Prevent writes

-- Grant access to specific columns only
GRANT SELECT (customer_id, company_name, industry) ON customer_data TO agent_readonly;
This ensures agents can’t accidentally access sensitive columns even if they construct queries requesting them.

Network security

Restrict database access at the network level:
  • Firewall rules: Allow connections only from MagOneAI platform IP addresses
  • VPC peering: For cloud databases, use VPC peering to avoid exposing databases to the public internet
  • SSL/TLS: Require encrypted connections to the database
  • Private endpoints: Use private database endpoints when available

Prevent SQL injection

While MagOneAI uses parameterized queries to prevent SQL injection, additional safeguards:
  • Validate agent-generated SQL before execution (especially with text-to-SQL)
  • Set query timeouts to prevent resource exhaustion
  • Limit result set sizes to prevent memory issues
  • Whitelist tables and columns that agents can access

Usage in workflows

Database tools can be used in both Agent nodes (intelligent query construction) and Tool nodes (predefined queries).

Agent node with text-to-SQL

System prompt:
You are a data analyst assistant. When users ask questions about data:
1. Convert the question to SQL using the database schema
2. Execute the query
3. Interpret the results
4. Present findings in clear, non-technical language

Always explain your reasoning and cite the data sources.
User question:
What were our top 5 products by revenue last month?
The agent:
  1. Uses Vanna to convert the question to SQL
  2. Executes the generated query
  3. Receives results (product names and revenue figures)
  4. Formats the answer: “Last month’s top products by revenue were…”

Tool node with direct SQL

For predefined, recurring queries, use a Tool node with direct SQL: Tool: Execute SQL Query Connection: production_db_readonly Query:
SELECT
  product_name,
  SUM(order_total) as total_revenue,
  COUNT(DISTINCT order_id) as num_orders
FROM orders
WHERE order_date >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month')
  AND order_date < DATE_TRUNC('month', CURRENT_DATE)
GROUP BY product_name
ORDER BY total_revenue DESC
LIMIT 5;
This query executes exactly as written every time, ensuring consistency.

Combining database and other tools

Database + Web Search:
  • Query internal sales data from database
  • Search web for competitor pricing
  • Compare and generate market positioning analysis
Database + Email:
  • Query database for weekly metrics
  • Format results into report
  • Email report to stakeholders
Database + RAG:
  • Query database for customer information
  • Search internal documents for customer history
  • Synthesize complete customer profile

Performance optimization

Database queries can be slow or resource-intensive. Optimize for agent use:

Query timeouts

Set maximum query execution time to prevent long-running queries:
  • Default timeout: 30 seconds
  • Configurable per connection in project settings
  • Queries exceeding timeout are cancelled automatically

Result set limits

Limit the number of rows returned to prevent memory issues:
  • Default limit: 1,000 rows
  • Configurable per connection
  • Agents should use pagination for large datasets

Query caching

Enable query result caching for frequently repeated queries:
  • Cache TTL: configurable (default 5 minutes)
  • Cache key: query text + parameters
  • Automatic invalidation on cache expiry

Indexing

Ensure database tables are properly indexed for common query patterns:
  • Index foreign keys used in joins
  • Index columns used in WHERE clauses
  • Composite indexes for multi-column filters
  • Analyze query execution plans regularly

Troubleshooting

Symptoms: “Unable to connect to database” or timeout errors.Solutions:
  • Verify connection string is correct (host, port, database name)
  • Check database credentials are valid
  • Ensure MagOneAI platform IPs are whitelisted in firewall
  • Verify database is running and accepting connections
  • Check SSL/TLS settings if required
Symptoms: Query results don’t match the question asked.Solutions:
  • Train Vanna with example question-SQL pairs for your schema
  • Provide clearer, more specific questions
  • Define business terminology in training data
  • For complex queries, use direct SQL execution instead
  • Review generated SQL before execution (enable SQL preview in settings)
Symptoms: “Query timeout” errors or very slow response times.Solutions:
  • Add indexes to tables for columns used in WHERE and JOIN clauses
  • Reduce result set size with LIMIT clauses
  • Optimize query structure (avoid SELECT *, use specific columns)
  • Increase query timeout in connection settings if appropriate
  • Consider materialized views for complex aggregations
Symptoms: “Access denied” or “Permission denied” when executing queries.Solutions:
  • Verify database user has necessary SELECT permissions
  • Check row-level security policies aren’t blocking access
  • Ensure database user can access the specific tables being queried
  • Review database audit logs for detailed permission errors
Symptoms: CSV-to-SQL tool returns errors when loading files.Solutions:
  • Verify CSV is properly formatted (headers in first row)
  • Check for encoding issues (use UTF-8)
  • Ensure column names don’t have special characters
  • Reduce file size if exceeding limits (max 50MB)
  • Validate data types can be inferred correctly

Next steps