Most companies are burning through AI budgets without knowing it — and Box CEO Aaron Levie has some sharp insights that could change how your business handles this.
Article At A Glance
- AI tokens are the currency of every AI interaction — understanding how they work directly impacts your bottom line.
- Token budgeting is now a core business competency, not just a developer concern, as AI becomes embedded in enterprise workflows.
- Box, under Aaron Levie’s leadership, has positioned itself as a key voice in how enterprises should think about AI cost management and governance.
- Most companies overspend on AI because they treat tokens like unlimited resources — a costly and avoidable mistake.
- There are specific, proven strategies that leading companies use to optimize token usage without sacrificing output quality — keep reading to find out what they are.
If your company is using AI tools — whether that’s ChatGPT, Claude, Gemini, or any enterprise AI platform — you are spending tokens every single time. Tokens are the fundamental unit of measurement that large language models (LLMs) use to process and generate text. One token is roughly equal to four characters or about three-quarters of a word in English. A single back-and-forth conversation with an AI model can burn hundreds to thousands of tokens without you even realizing it.
Box CEO Aaron Levie has been one of the more outspoken enterprise leaders on this topic, pushing businesses to think of AI not just as a capability, but as a managed resource with real financial implications. His commentary cuts through the hype and gets to the operational reality that CFOs and CTOs are now wrestling with.
What Are AI Tokens and Why Do They Matter for Your Budget?
At their core, tokens are how AI models read and write. When you send a prompt to an AI model, it breaks your input into tokens, processes them, and generates output tokens in response. You pay for both the input and the output. This is where businesses get caught off guard — they budget for the tool subscription but don’t account for the per-token costs that accumulate at scale.
Here’s a quick breakdown of how token pricing typically works across major models:
| AI Model | Input Cost (per 1M tokens) | Output Cost (per 1M tokens) |
|---|---|---|
| GPT-4o | $5.00 | $15.00 |
| Claude 3.5 Sonnet | $3.00 | $15.00 |
| Gemini 1.5 Pro | $3.50 | $10.50 |
| GPT-3.5 Turbo | $0.50 | $1.50 |
These numbers look small in isolation. But multiply them across thousands of employees running dozens of queries per day, and you’re looking at a serious line item. A company with 500 employees each running just 20 AI queries per day — at an average of 1,000 tokens per query — can easily hit tens of millions of tokens monthly.
Aaron Levie’s Take: AI Is a Resource, Not a Feature
Levie has consistently framed AI adoption as something that requires the same discipline as any other enterprise resource. In various public discussions and interviews, he has emphasized that companies treating AI like a plug-and-play feature will struggle with cost overruns and governance gaps. The companies that win, in his view, are the ones that build intentional frameworks around how AI is consumed inside their organizations.
This perspective is significant coming from the CEO of Box, a company that manages enormous volumes of enterprise content and has been integrating AI deeply into its platform through Box AI. Levie’s position gives him a front-row seat to how large organizations actually use — and misuse — AI at scale.
“The companies that will get the most value from AI are those that treat it with the same operational rigor as any other critical business system.” — Aaron Levie, CEO of Box
What makes this insight actionable is that it reframes the conversation. Instead of asking “how do we get access to AI?” the smarter question becomes “how do we govern, measure, and optimize our AI consumption?” That shift in thinking is exactly what separates companies that scale AI profitably from those that rack up costs with little return.
The Real Cost of Poor Token Management
Most businesses don’t discover they have a token problem until the invoice arrives. By then, the damage is done. Poor token management shows up in three distinct ways: bloated prompts that carry unnecessary context, redundant API calls that repeat work already done, and model mismatches where companies use GPT-4o for tasks that GPT-3.5 Turbo could handle at one-tenth the cost.
The bloated prompt problem is more common than most teams admit. Developers and non-technical users alike tend to over-explain when writing prompts, padding requests with context the model doesn’t need. Every extra sentence is extra tokens. At scale, that padding compounds into thousands of dollars in unnecessary spend.
Proven Strategies for AI Token Budgeting
The good news is that token optimization is a learnable, implementable discipline. Here are the most effective strategies that enterprise teams are using right now:
- Right-size your model selection. Not every task needs your most powerful — and most expensive — model. Use GPT-4o or Claude 3.5 Sonnet for complex reasoning tasks, and route simpler classification or summarization jobs to lighter models like GPT-3.5 Turbo or Claude Haiku. This single change can cut AI costs by 40% to 70% depending on your use case mix.
- Implement prompt compression techniques. Strip system prompts down to only what the model needs. Remove conversational filler, redundant instructions, and duplicate context. Tools like LLMLingua, developed by Microsoft Research, are specifically designed to compress prompts without degrading output quality.
- Use caching for repeated queries. If your application asks the same or similar questions repeatedly, caching responses means you only pay for the tokens once. OpenAI’s prompt caching feature, for example, offers up to 50% cost reduction on cached input tokens.
- Set hard token limits per request. Most AI APIs allow you to define a
max_tokensparameter. Setting firm ceilings on output length prevents runaway generation costs, especially in automated pipelines where no human is reviewing each call. - Audit your context windows. Long context windows are powerful but expensive. Feeding an entire 50-page document into a model when only three paragraphs are relevant is a classic token budget killer. Implement retrieval-augmented generation (RAG) to pull only the most relevant content before sending it to the model.
How Box Approaches AI Governance at Scale
Box AI offers a practical example of what thoughtful AI governance looks like inside an enterprise platform. Rather than giving employees unrestricted access to AI capabilities, Box has built usage controls, audit trails, and permission layers directly into how its AI features operate. This isn’t accidental — it reflects exactly the operational discipline that Levie talks about publicly.
The key insight from the Box model is that governance and usability are not opposites. You can give employees powerful AI tools while still maintaining visibility into how those tools are being used and what they’re costing. The two priorities reinforce each other when the architecture is designed correctly from the start.
Token budgeting isn’t about restricting AI use — it’s about making AI use sustainable so companies can scale it without financial surprises.
For businesses looking to implement similar structures, the starting point is instrumentation. You cannot manage what you cannot measure. That means integrating token usage tracking into your existing observability stack, setting departmental budgets, and creating feedback loops so teams understand the cost impact of their AI usage patterns.
Building a Token Budget Framework for Your Organization
A practical token budget framework doesn’t need to be complicated. Think of it in three layers:
- Strategic layer: Leadership defines acceptable AI spend as a percentage of operational budget, tied to expected productivity gains or revenue impact.
- Operational layer: Engineering and IT teams implement model routing rules, prompt libraries, caching infrastructure, and usage dashboards using tools like LangSmith, Helicone, or AWS Bedrock’s built-in cost tracking.
- User layer: Individual employees and teams receive guidelines on prompt best practices, with guardrails built into the tools themselves to prevent accidental overspending.
When these three layers work together, token budgeting stops being a reactive problem and becomes a proactive competitive advantage. Companies that master this early will be able to deploy AI more broadly, more confidently, and at a fraction of the cost their less disciplined competitors are paying.
What the Numbers Actually Tell You
Token usage data is one of the most underutilized sources of business intelligence available to companies right now. Most organizations that track it at all are only looking at total spend. That’s like managing a fleet of vehicles by looking only at the total fuel bill — useful, but nowhere near the full picture.
The metrics that actually drive better decisions are more granular:
- Cost per task type — What does it cost your organization to summarize a document versus generate a contract versus answer a support query?
- Token efficiency ratio — How many useful output tokens are you getting per input token spent? A low ratio signals bloated prompting.
- Model utilization breakdown — What percentage of your queries are hitting premium models versus economy models?
- Peak usage patterns — When are token costs spiking, and is there a workflow reason driving it?
- Department-level attribution — Which teams are generating the most AI spend, and is that spend tied to measurable output?
Once you have visibility into these numbers, optimization becomes straightforward. You stop guessing and start making decisions based on actual consumption patterns. This is the operational maturity that separates companies that scale AI profitably from those that are constantly surprised by their bills.
The Competitive Advantage of Getting This Right Early
AI costs are not static. As models improve and competition among providers intensifies, per-token costs have been trending downward. GPT-4-class capabilities that cost $30 per million output tokens in 2023 dropped significantly through 2024, and that trend is expected to continue. Companies that build token budgeting competencies now are positioning themselves to capture those savings automatically as the market evolves.
There’s also a compounding talent advantage. Teams that understand how to write efficient prompts, architect lean AI pipelines, and govern usage intelligently become more valuable over time. This isn’t a one-time optimization project — it’s a durable organizational capability that gets stronger with practice.
The companies that treat AI cost management as a core competency today will be the ones with the widest competitive moats tomorrow — because they’ll be able to deploy more AI, faster, at lower cost than their rivals.
Putting It All Together
Token budgeting is not a technical niche topic for developers to worry about in the background. It is a board-level business conversation about how companies allocate resources toward one of the most transformative technologies in modern business history. Aaron Levie’s framing is the right one: treat AI like a managed enterprise resource, with the governance, measurement, and optimization discipline that any critical system demands.
The path forward is clear. Start by instrumenting your current AI usage so you actually know what you’re spending and where. Then apply model routing, prompt optimization, caching, and hard output limits to bring costs under control. Build the three-layer framework — strategic, operational, and user-level — so that governance scales with adoption rather than lagging behind it.
None of this requires slowing down AI adoption. In fact, it enables faster adoption because leadership and finance teams can say yes to broader AI deployment when they have confidence that spend is visible, controllable, and tied to business value. That confidence is what unlocks the next level of AI investment inside any organization.
The businesses winning with AI right now are not necessarily the ones spending the most — they’re the ones spending the smartest, and there has never been a better time to build that discipline inside your organization.
If you’re ready to take AI adoption seriously and want expert guidance on making it work across your enterprise, exploring what industry leaders like Box are building in the AI governance space is a strong place to start.
