Skip to main content

How usage governance works

MagOneAI meters AI consumption in tokens — the units a language model reads and writes — not in dollars. Every quota you set is a cap on how many tokens can be consumed within a time window. This keeps governance predictable across providers and models, and it works the same whether a model is expensive or cheap. You use two complementary controls:

Quotas (enforcing)

Hard token limits that block an LLM call when it would push usage over the cap. Use these to contain spend and prevent runaway consumption.

Threshold notifications (notify-only)

Alerts that fire as a daily token threshold is approached. These never block — they just tell admins it’s time to look.
Setting and changing quotas, granting top-ups, and reviewing limit-increase requests are admin actions. Members can see their own usage and can ask for more when they’re blocked, but they can’t change limits themselves.

Quota periods

Every quota runs on a fixed window in UTC:
PeriodWhat it caps
DailyTokens used since the start of the current UTC day
WeeklyTokens used since the start of the current UTC week
Monthly (default)Tokens used since the start of the current UTC month
Windows reset automatically at the period boundary — there’s no manual reset to perform and nothing to remember to clear. When a new period begins, used-token counters return to zero and the full allowance is available again.

What a quota can apply to

A limit can be attached to any of these scopes, so you can govern broadly or precisely:

Organization

A ceiling for everything an organization consumes.

Project

A cap for a single project inside an organization.

Use case

A cap for an individual use case.

Per user (within an org)

A cap on what each member of an organization can consume.

Limits cascade

Quotas resolve from the broadest setting down to the most specific:
  1. A platform default applies if nothing more specific is set.
  2. An organization default can override the platform default.
  3. A specific scope limit (project, use case, or per-user) can override the defaults for that scope.
This means you can set one sensible default and then carve out exceptions only where you need them.

Cap a single model

Any limit can optionally apply to just one model (one LLM configuration) instead of all usage. This lets you put a tight cap on an expensive model while leaving cheaper models more headroom — without restricting overall usage.

Finite or unlimited

Each limit is either:
  • A finite token budget for the period, or
  • Unlimited — usage is still fully tracked and visible, but it is never blocked.
Use unlimited when you want visibility without enforcement — for example, to watch a new team’s consumption before deciding where to set a real cap.

How enforcement decides

When several limits apply to the same request — say an org cap, a project cap, and a per-user cap — MagOneAI uses most restrictive wins. The request must satisfy all applicable limits; it’s allowed only if every one of them still has room.

When a request is blocked

If a request would exceed a limit, the LLM call is blocked before it runs, and the member sees a clear message that names:
  • Which scope was hit (organization, project, use case, or their own per-user limit),
  • The limit and the period it applies to, and
  • The model, if the limit was model-specific — so the member knows they can switch to a different model and continue.
A blocked request means no tokens were spent on it — the call simply doesn’t run. The member can wait for the period to reset, switch to an uncapped model (if only one model was capped), or request a higher limit.

Fail-open

If the usage-tracking system is briefly unavailable, MagOneAI lets requests through rather than blocking your teams. Governance favors keeping you working over hard-failing during a transient outage; enforcement resumes automatically once tracking is back.

Top-ups

Sometimes a team needs more room within a period without you changing the base limit. An admin can grant a top-up — extra token credits added on top of the base allowance.
  • Top-ups stack on the base limit for the period, raising the effective cap.
  • A top-up can carry an optional expiry date, after which it no longer counts toward the allowance.
Top-ups are the mechanism behind approved limit-increase requests (below), and you can also grant them proactively.

Self-serve limit-increase requests

When a member is blocked, they don’t have to file a ticket and wait — they can request a higher limit directly on the scope that blocked them. An admin then reviews and decides.
1

Member hits a limit and requests more

After being blocked, the member submits a request for a higher limit on the scope that stopped them. The request enters a pending state.
2

Admin reviews

An admin sees the pending request along with the scope and period in question.
3

Approve or reject

Approving grants a top-up, which immediately raises the effective allowance. Rejecting declines the request with no change.
A request moves through these states:
StateMeaning
PendingSubmitted and awaiting an admin decision
ApprovedGranted — a top-up was added to the allowance
RejectedDeclined by an admin; no change
CancelledWithdrawn before it was decided

Seeing usage

Everyone gets visibility appropriate to their role.

Members

See your own usage — tokens used, your limit, how much remains, and when it resets.

Admins

See usage for an organization, a project, or a use case — each showing tokens used, the limit, remaining, and the reset time.
Because every view shows the reset time, it’s easy to tell whether a team is genuinely over-provisioned or simply close to the end of a busy period.

Threshold notifications

Separate from enforcing quotas, an organization can set a daily token threshold that triggers notifications to platform admins as usage climbs:
Usage levelWhat happens
75% of thresholdAdmins are notified that usage is approaching the threshold
90% of thresholdAdmins are notified that usage is near the threshold
100% of thresholdAdmins are notified that usage has reached the threshold
Threshold notifications only notify — they never block a request. They’re an early-warning signal, not a cap. To actually stop consumption, set a quota. See Notifications for how and where these alerts are delivered.

The “Auto” model picker

The per-turn model picker in both the workflow Builder and MagOneAI Hub now defaults to Auto (default assigned model).
  • Auto sends no model override, so each agent runs on its own assigned model — exactly as the workflow was designed.
  • Choosing a specific model is an explicit, per-turn opt-in. Selecting Auto again returns to the default.
This pairs naturally with per-model quotas: if an expensive model is capped for the period, you can pick a cheaper model for that turn and keep working, then fall back to Auto when you’re done.
When a blocked-request message names a specific model, the model picker is your fastest way to continue — switch to an uncapped model for the turn, or set it back to Auto.

Good patterns

  • Default broad, restrict narrow. Set a comfortable org default, then add tighter project, use-case, or per-user limits only where you need them. The cascade does the rest.
  • Cap the expensive model, not everything. A model-specific limit on a premium model contains your biggest costs while leaving everyday work unaffected.
  • Use threshold notifications as a tripwire. Pair a daily threshold with your quotas so admins hear about heavy days at 75% — well before anyone hits a hard block.
  • Prefer top-ups for short-term needs. When a team needs more room for a busy period, grant a top-up with an expiry rather than permanently raising the base limit.

Next steps

Admin Portal

Where admins manage organizations, users, and governance

Models overview

How models and LLM configurations work across the platform

Notifications

Where threshold alerts and other platform notifications surface

Triggers & execution

Where token usage and cost are recorded per run