Building is the easy part now. Anyone with a Claude key, a Cursor window, and a free Saturday can ship a working prototype of almost anything. That used to be the bottleneck. It is not anymore.
The new bottleneck is taste. Specifically: knowing which of the things you can build are actually worth keeping.
That decision is one of the most important the lab makes, and it is one I genuinely enjoy making, because it is the moment a project stops being a science fair entry and starts becoming something real (or honestly gets put down so the next bet can take its slot).
Here is the rubric Yonasol uses to decide whether an AI product earns a next round, a real product slot, or a clean ending.
1. Does the magic survive the second use?
Most AI demos are charming on first contact. The model says something clever, the workflow saves a step, the founder shows it to a friend, everyone nods. That is the demo high.
The real test is the second time someone uses it, ideally a few days later. Does the magic survive? Or does it feel like a parlor trick now that you know how it works?
Products worth keeping pass the second-use test. The magic gets quieter, but the utility goes up. By the tenth use, the user is not impressed anymore, just dependent. That is the goal.
If the magic dies on second use, the bet is probably a feature in someone else's product, not a product on its own.
2. Is the value created on a real schedule?
Some AI tools save a person five minutes once a quarter. Some save them five minutes every Monday morning. The math on those two is wildly different.
A product worth keeping creates value on a real schedule. Daily, weekly, or transactional (every time event X happens). That cadence is what turns a tool into a habit, and habits are what turn habits into revenue.
Cadence questions worth asking:
- How often does the underlying problem actually occur?
- Is the user reminded by reality, or do they have to remember on their own?
- Does the value compound over time, or reset each session?
- Is the schedule predictable enough to build a workflow around?
If the cadence is "occasionally, when I remember," the bet is probably not a product. It is a Notion template.
3. Is the cost of being wrong tolerable?
AI products fail in two ways: they fail to be useful, and they fail in ways the user has to clean up. The second one matters more than people admit.
A product worth keeping makes the cost of a wrong answer tolerable. The user can scan the output, catch the mistake, and move on. The tool earns trust because being wrong is cheap and being right is leveraged.
A product not worth keeping makes the cost of a wrong answer expensive. A wrong reply sent to a real customer. A wrong number in a real spreadsheet. A wrong calendar invite to a real client. When wrong is expensive, the user starts double-checking everything, which removes the leverage that justified the tool in the first place.
This is not about model quality. It is about workflow design. The right shape lets a fallible model still be useful. The wrong shape turns even a great model into a liability.
4. Does the user keep coming back without a nudge?
The cleanest signal in the world is unprompted return.
Did the user come back the next day, the next week, the next time the problem showed up, without an email reminder, a push notification, or a discount? If yes, the product is doing real work. If no, the product is being kept alive by marketing, which is a much more expensive substrate to build on.
Yonasol's rule on this is simple: if a product needs constant nudging to stay in someone's workflow, the workflow is not real yet. That is fine in week one. It is a problem in month three.
Unprompted return is also one of the most satisfying things to watch happen. Watching one user come back on their own is worth a hundred signups who never log in.
5. Does it fit a workflow, or does it ask for one?
Two flavors of AI product:
- Fits an existing workflow. The user already does the thing. The tool makes the thing faster, cheaper, or sharper. Adoption friction is low because the slot already exists.
- Asks for a new workflow. The tool is great in theory but requires the user to build a new habit around it. Adoption friction is high because the slot does not exist yet, and human beings are not great at building new slots in their week.
Most AI products worth keeping fit an existing workflow. The ones that ask for a new workflow have to be unusually compelling to survive, and most are not. This is not a hard rule, but it is a heavy thumb on the scale.
If the bet requires the user to "just spend ten minutes a day" doing something they have never done before, the bet is probably going to fail, and the lab should know that before the third round.
6. Can it be explained in one honest sentence?
Pricing pages and landing copy tend to lie. The honest sentence does not.
A product worth keeping can be described in one sentence that a real operator would nod at:
- "It replies to your inbound leads in under five minutes so you stop losing them."
- "It remembers the context you keep re-pasting into every AI tool."
- "It turns a week of meals into a grocery list and a plan in under three minutes."
If the honest sentence requires three clauses, a buzzword, or a forward-looking promise, the product is not crisp enough yet. Sometimes that means the product is wrong. Sometimes it means the product is right and the framing is wrong. Either way, the work is not done until the sentence lands.
This is also the moment most bets reveal whether they are real. Trying to write the honest sentence is one of the most useful things the lab does, because the sentence is unforgiving in a way the prototype is not.
7. Does it leave the lab better than it found it?
The last question is the meta one.
Even bets that fail can be worth running, as long as they leave the lab better than they found it. A failed bet that produces a reusable prompt library, a sharper scoring rubric, a better landing page template, or a decision memo that improves the next bet is still a good run.
A failed bet that leaves nothing behind is the only kind that hurts. Those are the ones the system is designed to avoid, mostly through the documentation step and the discipline of writing teardowns whether the bet won or lost.
The keep/kill question is not just about the product. It is also about whether the lab grew because of it.
How the rubric actually gets used
Not every question gets a clean yes or no. The rubric is not a checklist. It is a conversation the lab has with itself when a bet hits its first verdict point.
A typical pattern:
- Fast yeses on most questions, fast no on one. That is usually a fixable bet. The "no" tells you exactly what to fix in the next round.
- A handful of soft yeses and one big yes. That is often a feature, not a product. The big yes is real, but the product around it has not earned its slot yet. Reshape and try again.
- Mostly soft yeses, no clear big yes. That is usually a kill. The bet is interesting but not necessary. Document it cleanly and move on.
- Strong yeses across the board. That is a graduate. The bet earns a real product slot, real positioning work, and real selling effort.
Most bets are in groups 1 through 3. Group 4 is rare, and that is fine. The lab is not in a hurry to declare graduates. It is in a hurry to keep the loop honest.
What this is not
It is not a substitute for taste. The rubric is a scaffold for the conversation, not a replacement for it. Sometimes a bet fails the rubric on paper but the operator's gut is screaming about something the rubric is missing. That is a signal worth listening to.
It is not a moat. Anyone could write this rubric. The advantage is in actually running it, every time, even when the bet is your favorite.
It is not permanent. The rubric will change as the lab learns. The current version is the best one we have today, and it will be a worse one in a year because we will have replaced it.
The short version
A Yonasol product worth keeping survives the second use, creates value on a real schedule, makes wrong answers cheap, earns unprompted return, fits a workflow the user already has, can be described in one honest sentence, and leaves the lab sharper than it found it.
Bets that hit most of those graduate. Bets that hit a few get reshaped. Bets that hit almost none get killed cleanly, with a teardown to show for it. And the whole rubric exists so the lab spends its time on products that deserve to exist, not on products that are merely possible to build.
That is what keep/kill looks like here. It is also one of the most enjoyable parts of running the loop.
Lab Notes are field notes from inside Yonasol. They cover how products get built, how the operating system evolves, and what gets learned along the way.