Why a small AI agency invests in its own tooling

The fourth time we did the same workflow on a customer project and got it slightly wrong, we stopped and built a tool. That decision quietly changed how the agency operates.

Most small agencies don’t make that decision. Every customer project gets started fresh. Every workflow gets re-figured-out. Every “how do we deploy to this platform” or “what’s our process for this kind of audit” gets re-derived by whoever’s on the project. Six months later, three people on the team have three different “right ways” to do the same thing, and the team is reinventing work they collectively did six times last year.

This post is about why we changed course, what changed when we did, and what to look for if you’re a small team thinking about doing the same.

The cost of not having tooling

The first phase of a small agency, internal tooling feels like a distraction. You’re trying to land customers, ship projects, and not run out of money. Investing time in “how we do things” feels like procrastinating on the actual work.

Then you start hitting the second-time effect. You did the workflow correctly the first time on Customer A. By Customer C, two team members are doing it slightly differently because the original details have faded. By Customer F, “our way of doing X” has drifted into three different ways, each with their own subtle bugs. The team is now spending real time arguing about which way is correct, in a domain where it’s already been settled internally. They’ve just lost the settlement.

The cost shows up in three places:

Onboarding. New team members can’t figure out “how we do things here” because there’s no canonical reference. They learn by shadowing, which means they pick up whichever variation they happened to see first. The longer this runs, the more variations exist.

Quality drift. Workflows get worse over time, not better, when nobody owns them. Each iteration introduces small mistakes that compound into systemic ones.

Lost lessons. When someone figures out a clever way to handle a tricky case on a project, that lesson stays in their head. The next time the same case comes up (six months later, on a different project, with a different person), the lesson has to be re-figured-out.

The agencies that stay small but get sharper over time treat these costs seriously. The agencies that stay small but stay scattered eat them quietly.

What “investing in tooling” actually means

It’s not what most people think. It’s not building a custom in-house dashboard. It’s not a bespoke project-management system. It’s not a CRM you’ve decided to write yourself.

It’s documenting your team’s repeated workflows in a form that runs. The exact medium depends on the team. Some teams use scripts, some use playbooks, some use plugins for whatever AI development environment they live in. The shape that matters is consistent: a workflow that’s been done three or more times gets packaged so the fourth time isn’t from scratch.

The threshold matters. Packaging a workflow takes real effort. A workflow that’s only ever going to be done once isn’t worth packaging. A workflow you’ll do five times definitely is. The threshold of three is rough but works in practice.

The other shape that matters: the packaged workflow has to actually run. A Notion document explaining “this is how we do X” is dramatically less useful than a thing you invoke and it does X. The Notion document gets stale; nobody reads it. The runnable artifact gets used; staying up to date is part of using it.

What we wouldn’t recommend

Two things to avoid, both of which we did wrong on the way to the current setup:

Don’t build for the team you wish you had. A toolkit designed for the ten-person agency you’d like to be in two years is wasted on the four-person agency you actually are. Build for the team that’s running today. The future-team’s needs will be different than you expect anyway.

Don’t optimize the meta-tooling. It’s tempting to spend time on the tooling-around-the-tooling: validation frameworks, marketplace infrastructure, version-pinning conventions. Almost all of that is wasted. The packaged workflows are what matter; the scaffolding around them is overhead. The simpler the meta-layer, the better.

What it looks like done well

A few signs your team’s tooling investment is working:

A new hire can ship a customer project in their first month using your team’s tools, without shadowing for half of it.
The same workflow done by two different team members produces the same output, not subtly different ones.
When someone figures out a better way to do a workflow, the improvement makes it back into the toolkit instead of being lost.
The team can name the toolkit’s contents off the top of their head. (If they can’t, the toolkit is too big or too obscure to be useful.)
Adding a new workflow to the toolkit feels routine, not heroic.

If those things are true, you’ve moved from “we ship every project from scratch” to “we have a way of working.” The cost difference between those two modes is much larger than it looks from inside the chaotic version.

When to start

We’d say: start when the team can name three workflows that have happened on three or more projects. That’s the signal that there’s enough recurrence to package. Earlier than that and you’re optimizing prematurely. Later than that and you’ve already paid the cost a few times over.

The investment is small. The return compounds. And the second-order benefit (a team that takes its own workflows seriously becomes a team that takes everything else seriously) is worth more than the first-order benefit.

If you’re a small agency thinking about how to invest in tooling and want to compare notes, we’d love to talk.