Usage & quotas - MagOneAI

How usage governance works

MagOneAI meters AI consumption in tokens — the units a language model reads and writes — not in dollars. Every quota you set is a cap on how many tokens can be consumed within a time window. This keeps governance predictable across providers and models, and it works the same whether a model is expensive or cheap. You use two complementary controls:

Quotas (enforcing)

Hard token limits that block an LLM call when it would push usage over the cap. Use these to contain spend and prevent runaway consumption.

Threshold notifications (notify-only)

Alerts that fire as a daily token threshold is approached. These never block — they just tell admins it’s time to look.

Setting and changing quotas, granting top-ups, and reviewing limit-increase requests are admin actions. Members can see their own usage and can ask for more when they’re blocked, but they can’t change limits themselves.

Quota periods

Every quota runs on a fixed window in UTC:

Period	What it caps
Daily	Tokens used since the start of the current UTC day
Weekly	Tokens used since the start of the current UTC week
Monthly (default)	Tokens used since the start of the current UTC month

Windows reset automatically at the period boundary — there’s no manual reset to perform and nothing to remember to clear. When a new period begins, used-token counters return to zero and the full allowance is available again.

What a quota can apply to

A limit can be attached to any of these scopes, so you can govern broadly or precisely:

Organization

A ceiling for everything an organization consumes.

Project

A cap for a single project inside an organization.

Use case

A cap for an individual use case.

Per user (within an org)

A cap on what each member of an organization can consume.

Limits cascade

Quotas resolve from the broadest setting down to the most specific:

A platform default applies if nothing more specific is set.
An organization default can override the platform default.
A specific scope limit (project, use case, or per-user) can override the defaults for that scope.

This means you can set one sensible default and then carve out exceptions only where you need them.

Cap a single model

Any limit can optionally apply to just one model (one LLM configuration) instead of all usage. This lets you put a tight cap on an expensive model while leaving cheaper models more headroom — without restricting overall usage.

Finite or unlimited

Each limit is either:

A finite token budget for the period, or
Unlimited — usage is still fully tracked and visible, but it is never blocked.

Use unlimited when you want visibility without enforcement — for example, to watch a new team’s consumption before deciding where to set a real cap.

How enforcement decides

When several limits apply to the same request — say an org cap, a project cap, and a per-user cap — MagOneAI uses most restrictive wins. The request must satisfy all applicable limits; it’s allowed only if every one of them still has room.

When a request is blocked

If a request would exceed a limit, the LLM call is blocked before it runs, and the member sees a clear message that names:

Which scope was hit (organization, project, use case, or their own per-user limit),
The limit and the period it applies to, and
The model, if the limit was model-specific — so the member knows they can switch to a different model and continue.

A blocked request means no tokens were spent on it — the call simply doesn’t run. The member can wait for the period to reset, switch to an uncapped model (if only one model was capped), or request a higher limit.

Fail-open

If the usage-tracking system is briefly unavailable, MagOneAI lets requests through rather than blocking your teams. Governance favors keeping you working over hard-failing during a transient outage; enforcement resumes automatically once tracking is back.

Top-ups

Sometimes a team needs more room within a period without you changing the base limit. An admin can grant a top-up — extra token credits added on top of the base allowance.

Top-ups stack on the base limit for the period, raising the effective cap.
A top-up can carry an optional expiry date, after which it no longer counts toward the allowance.

Top-ups are the mechanism behind approved limit-increase requests (below), and you can also grant them proactively.

Self-serve limit-increase requests

When a member is blocked, they don’t have to file a ticket and wait — they can request a higher limit directly on the scope that blocked them. An admin then reviews and decides.

Member hits a limit and requests more

After being blocked, the member submits a request for a higher limit on the scope that stopped them. The request enters a pending state.

Admin reviews

An admin sees the pending request along with the scope and period in question.

Approve or reject

Approving grants a top-up, which immediately raises the effective allowance. Rejecting declines the request with no change.

A request moves through these states:

State	Meaning
Pending	Submitted and awaiting an admin decision
Approved	Granted — a top-up was added to the allowance
Rejected	Declined by an admin; no change
Cancelled	Withdrawn before it was decided

Seeing usage

Everyone gets visibility appropriate to their role.

Members

See your own usage — tokens used, your limit, how much remains, and when it resets.

Admins

See usage for an organization, a project, or a use case — each showing tokens used, the limit, remaining, and the reset time.

Because every view shows the reset time, it’s easy to tell whether a team is genuinely over-provisioned or simply close to the end of a busy period.

Threshold notifications

Separate from enforcing quotas, an organization can set a daily token threshold that triggers notifications to platform admins as usage climbs:

Usage level	What happens
75% of threshold	Admins are notified that usage is approaching the threshold
90% of threshold	Admins are notified that usage is near the threshold
100% of threshold	Admins are notified that usage has reached the threshold

Threshold notifications only notify — they never block a request. They’re an early-warning signal, not a cap. To actually stop consumption, set a quota. See Notifications for how and where these alerts are delivered.

The “Auto” model picker

The per-turn model picker in both the workflow Builder and MagOneAI Hub now defaults to Auto (default assigned model).

Auto sends no model override, so each agent runs on its own assigned model — exactly as the workflow was designed.
Choosing a specific model is an explicit, per-turn opt-in. Selecting Auto again returns to the default.

This pairs naturally with per-model quotas: if an expensive model is capped for the period, you can pick a cheaper model for that turn and keep working, then fall back to Auto when you’re done.

When a blocked-request message names a specific model, the model picker is your fastest way to continue — switch to an uncapped model for the turn, or set it back to Auto.

Good patterns

Default broad, restrict narrow. Set a comfortable org default, then add tighter project, use-case, or per-user limits only where you need them. The cascade does the rest.
Cap the expensive model, not everything. A model-specific limit on a premium model contains your biggest costs while leaving everyday work unaffected.
Use threshold notifications as a tripwire. Pair a daily threshold with your quotas so admins hear about heavy days at 75% — well before anyone hits a hard block.
Prefer top-ups for short-term needs. When a team needs more room for a busy period, grant a top-up with an expiry rather than permanently raising the base limit.

Next steps

Admin Portal

Where admins manage organizations, users, and governance

Models overview

How models and LLM configurations work across the platform

Notifications

Where threshold alerts and other platform notifications surface

Triggers & execution

Where token usage and cost are recorded per run

​How usage governance works

Quotas (enforcing)

Threshold notifications (notify-only)

​Quota periods

​What a quota can apply to

Organization

Project

Use case

Per user (within an org)

​Limits cascade

​Cap a single model

​Finite or unlimited

​How enforcement decides

​When a request is blocked

​Fail-open

​Top-ups

​Self-serve limit-increase requests

​Seeing usage

Members

Admins

​Threshold notifications

​The “Auto” model picker

​Good patterns

​Next steps

Admin Portal

Models overview

Notifications

Triggers & execution

How usage governance works

Quota periods

What a quota can apply to

Limits cascade

Cap a single model

Finite or unlimited

How enforcement decides

When a request is blocked

Fail-open

Top-ups

Self-serve limit-increase requests

Seeing usage

Threshold notifications

The “Auto” model picker

Good patterns

Next steps