Thinking Out Loud

Thinking Out Loud

AI didn’t make me more productive. It made me feel productive.

AI agents promise to save time. But between babysitting outputs and building workflows, you may be losing more than you gain.

Jan Tegze's avatar
Jan Tegze
Mar 10, 2026
∙ Paid

I want to tell you about a Sunday in February, when I spent four hours building an automation for a task that only takes twelve minutes to do manually.

I wasn’t stupid about it. I had a reason: if I could automate the twelve-minute task, I’d have twelve extra minutes every time it came up. Multiply that across a week, a month, a quarter, and suddenly you’re recovering real time. The math checked out, and the motivation was legitimate. By 6 pm, I had a workflow diagram, a webhook that kept timing out, and the original task still sitting in my queue because I’d been too busy building the system to actually do the work.

My wife asked what I’d gotten done. I showed her the diagram.

She nodded politely and said nothing. That doesn’t surprise me. I get really excited about the diagrams and things I create, like EditText.App, but I’m often the only one among my family and close friends who feels that way.

While I was looking for ways to improve my AI agent’s sourcing abilities, I found a study published in Harvard Business Review that suggests that rather than making work easier, AI may be causing what researchers term “brain fry.”

That got me thinking about how I was spending my time with the latest tech stack, you know, all those AI agents like OpenClaw and others. Instead of letting the machines do the work, I was spending a lot of time babysitting them, making sure they didn’t hallucinate, misunderstand instructions, or just make a bigger mess than necessary.

Over the past few months, I’ve tested at least 15 AI agents. Different tools, different use cases, some designed for research and summarization, some for content generation, some that were supposed to handle the kind of low-level decision-making that eats up thirty minutes a day in email triage.

I went in as a believer. I still am, in some long-run sense. But somewhere around agent number eight I started to notice that the time I was “saving” kept getting absorbed by something I couldn’t quite name.

The workflows multiplied. The actual client work didn’t change. My output, measured by things with deadlines attached to them, stayed roughly flat while my Notion workspace got increasingly elaborate. I kept optimizing the system that was supposed to free me up to do the work, instead of doing the work.

I’m not sure when I noticed the gap. Probably sometime around the time, when I realized I’d spent an entire Sunday morning configuring an agent to help me write faster content for websites and produced nothing writable by noon.


Most advice sounds the same. This won’t. Subscribe for ideas that make you think.


The screenshot is already the product

There’s a psychological mechanism called effort substitution, and it’s not a fringe idea. Research published in the Journal of Consumer Psychology in 2011 by Moty Amar and colleagues found that people who planned to accomplish a goal reported meaningfully lower motivation to actually pursue it afterward. The planning itself discharged some of the psychological pressure. You got partial credit from your own brain without doing the thing.

AI tools didn’t create this problem. But they gave it a much nicer interface.

Before, performing productivity looked like: color-coded calendars, elaborate to-do apps, journaling about your goals in a Moleskine. Those things could at least pretend to be preparatory.

The AI version looks like: a multi-agent pipeline that summarizes your inbox, drafts your LinkedIn posts, and schedules follow-ups based on email sentiment analysis. It’s harder to dismiss as mere performance because it involves real technology that does real things. The tool is actually running. Code is actually executing. You’re not just drawing boxes in a notebook; you’re watching logs scroll by in a terminal.

Which is exactly what makes it more dangerous as a substitution mechanism, not less.

LinkedIn rewards the aesthetic of process. I’ve seen screenshots of workflows that have to take two hours to build, but those people received three times their usual engagement of posts describing actual results they spent three weeks producing.

The platform isn’t optimizing for you sharing what you accomplished. It’s optimizing for the thing that makes other people feel like they’re behind. A complicated-looking agent diagram does that. A business outcome usually doesn’t photograph well.

There’s a whole conversation about why LinkedIn specifically amplifies this that I’m not going to get into, but I’ll say this: it’s not just LinkedIn. Reddit productivity threads, Hacker News threads about someone’s personal OS, the YouTube genre of “my complete 2026 productivity system” - they all reward the same thing. They all reward the showing of the work, not the work itself. AI just made the showing more impressive-looking than anything we had before.

And here’s something I’ve been sitting with and can’t fully resolve. I wonder how much of this predates AI entirely. The productivity genre has been selling systems over outcomes for decades. Getting Things Done came out in 2001 and spawned an entire sub-economy of people who spent more time refining their GTD implementation than they spent doing the tasks the system was supposed to capture.

Maybe the AI agent trend is just the latest iteration of that. Or maybe the speed and sophistication of current tools have crossed some threshold where the substitution effect is qualitatively different. I genuinely don’t know. But I notice the problem more now than I did five years ago, but I can’t tell if that’s because the problem is bigger or because I’m paying more attention.

Trophy awarded to a blank canvas outline surrounded by applauding silhouettes

What effort substitution actually does to a brain mid-task

The Jean Baudrillard reference in a lot of online productivity discourse is a bit overstated, but it’s pointing at something real. A simulacrum, in Baudrillard’s framework, is a representation that has replaced the thing it was supposed to represent. The map becomes the territory. In this context: the workflow system becomes the work.

Your brain doesn’t experience this as deception. That’s important. You feel productive because you are doing something. The agent is configured. The prompt is engineered. The integration is live. These are real actions that required real effort.

The prefrontal cortex doesn’t have a clean way to distinguish between “effort that produces an outcome” and “effort that produces the appearance of an outcome,” especially when the tool in question generates convincing output.

A 2014 study from the University of Michigan by Ethan Kross found that self-distancing, the act of reflecting on yourself in the third person, reduced emotional intensity and helped people analyze difficult decisions more clearly.

I think about this whenever I watch myself evaluate my own workflows. I can’t fully self-assess here. I don’t have a good read on whether my agent stack is saving me time in any net sense, because the act of building and maintaining it feels like legitimate work even when it isn’t producing legitimate results.

The specific failure mode I see most: I’ll set up an agent to summarize emails I should be reading directly. Not because I get too many emails to read, but because reading emails I might need to respond to creates a kind of low-level anxiety that the summarization layer delays. The agent isn’t saving me time. It’s managing my discomfort. There’s a difference, and I kept pretending there wasn’t.

A friend of mine, Tomáš, who runs operations for a mid-sized logistics firm in Brno, told me something similar last spring. We were at a dinner that had nothing to do with any of this, talking about something else entirely, and it came up sideways. He’d spent six weeks building a dashboard that aggregated all his key metrics in one place.

Beautifully designed thing, real-time data, color-coded thresholds. Then he told me he mostly didn’t look at it. He was getting updates from his team verbally instead. He’d built the dashboard to feel in control, and once he felt in control he didn’t need the dashboard. I don’t know if this proves anything. His situation is unusual because he had the resources to build it properly, which most people don’t. But the dynamic stuck with me.

Figure standing on a giant map that fully covers the real landscape beneath it

Babysitting the thing that was supposed to free you

This is what actually happens when you run agents at the level most people are describing online: They break. Not dramatically, not in ways that are easy to diagnose. They drift. A workflow that ran correctly for three weeks takes a random path on a Wednesday morning and produces output that’s 80% right, which is somehow worse than output that’s obviously wrong because 80% right takes longer to catch and fix.

I’ve had a research agent start hallucinating citations that looked plausible enough that I almost posted them. I caught it because I happened to read closely that day. And let's face it, we don't always read things closely, especially on days we're tired.

One of the agents I tested for about six weeks was built for lead research. The idea was simple: give it a list of company names, it pulls relevant context from a few sources, formats a brief for each one. First two weeks, worked great.

Then it started pulling information from the wrong companies, confusing similarly-named entities, and returning confidently formatted briefs that were factually backwards. I didn’t notice for four days. In those four days, I had two calls where I walked in with the wrong prep. The agent cost me more credibility than the time it saved.

I want to be fair here; that might have been a prompt engineering problem. Maybe better guardrails would have caught the entity confusion earlier. But that’s sort of my point. The tool requires ongoing investment just to maintain basic reliability, and the investment isn’t visible on the dashboard where you’re tracking your “hours saved.”

So you watch, you review outputs. You check that the tool did what you told it to do. And the watching and reviewing takes time, except it’s worse than doing the task yourself would have been because now you’re doing two things: the oversight work and the mental load of trusting a system you’re not sure you can trust.

I want to push back on something I see constantly online. The people claiming they have hundreds of agents running autonomously and generating significant revenue without meaningful supervision are mostly either lying about the supervision part or working in extremely narrow, well-defined domains where the failure modes are small and bounded.

Autonomous content generation at scale without human review produces confidently wrong material. Autonomous outreach at scale without human oversight produces messages that annoy people on your behalf without your knowledge. The “agentic future” some people are describing isn’t here. The models aren’t reliable enough for the level of autonomy being claimed.

A 2025 report from Stanford's Human-Centered AI group found that AI coding agents complete multi-hour task benchmarks with roughly 50% reliability. More striking: a randomized trial included in the same report found that experienced open-source developers took about 19% longer to complete work when they had access to frontier AI tools than when those tools were taken away. Not 19% faster, longer.

I should say: I’m not sure the coding context maps cleanly to the kind of business workflow automation most people are actually running. Coding tasks are more measurable than most agent use cases, which makes them easier to benchmark but also a bit cleaner than real-world conditions.

The 50% reliability figure has stayed in my head anyway, because it matches what I’ve experienced, and the developer slowdown finding is harder to dismiss.

The overhead compounds in ways that are hard to see in the moment. You spend forty-five minutes configuring the agent. Another thirty debugging when it fails the first time. Twenty minutes a week on monitoring. An hour here and there on prompt adjustments as the underlying model updates change behavior slightly.

Annualized, you may have spent more time on the infrastructure than you’d have spent doing the task the old way. The time savings are real in theory and often negative in practice, and the negative in practice part is invisible because it shows up as “agent maintenance” rather than “time I could have spent working.”

I’m not saying don’t use them. I’m saying the math you’re doing in your head when you adopt one is probably wrong.

Official-looking document with gold seal whose text dissolves into errors at the bottom

What the twelve-minute task was actually about

The question I find myself sitting with isn’t “which agents are worth it.” That’s solvable with enough testing. The question is harder: if I stripped out every tool that makes me feel productive and only kept the ones that produce verifiable outcomes, what would be left?

I’ve been afraid to actually run that audit. That’s data.

There’s no crisp takeaway here about which workflows to cut or how to structure your agent oversight process. Plenty of people will sell you that. What I keep coming back to is the older, less satisfying question that the AI productivity conversation keeps stepping around: what are you actually trying to avoid doing, and is the thing you’re building making it easier or harder to avoid doing it?

For me it was the twelve-minute task on that Sunday in February. Not because twelve minutes was too long, but because doing it meant looking at a set of numbers I didn’t want to look at. The automation wasn’t about efficiency. It was a very elaborate way of not opening a spreadsheet.

There’s a version of this that’s genuinely hard to answer. Some tasks really do deserve automation. Some friction is real friction, not psychological avoidance dressed up as friction. I don’t have a reliable method for telling the difference in the moment.

What I notice is that when I’m excited about building a workflow, that excitement is sometimes about the problem being solved and sometimes about not having to confront what happens after it’s solved. The second kind of excitement has a slightly different texture, a little more restless, a little quicker to check how the tool looks rather than whether it’s working. I haven’t figured out what to do with that observation beyond noticing it.

The models will get more reliable. The tooling will catch up. Some of the oversight burden will come down as the agents get better at knowing when to ask for help versus when to proceed. I’m genuinely optimistic about the three-to-five-year picture.

But right now, most of what’s being called “autonomous” is just latent human supervision wearing a different costume, and most of what’s being called “productivity gains” is the pleasant feeling of configuring something that might, eventually, save you time.

The twelve-minute task is still in my queue, by the way. I did it last Saturday. Took eleven minutes. The numbers were fine.

Do not believe the hype; it is full of bugs.


Articles spread through people, not algorithms.

Share


The agent audit I ran after wasting six months on automation

The exercise I’m about to describe took me about three hours the first time and produced results I didn’t show anyone for two weeks because they were, lets say… embarrassing.

User's avatar

Continue reading this post for free, courtesy of Jan Tegze.

Or purchase a paid subscription.
© 2026 Jan Tegze · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture