{
  "slug": "armia-agentnykh-tsykliv",
  "url": "https://neurodrift.org/en/blog/armia-agentnykh-tsykliv/",
  "title": "After the Prompt: The Birth of an Army of Agentic Loops",
  "description": "The prompt is no longer the main weapon for working with AI. The loop becomes the main one: context, agent, tools, artifact, verification, memory, repetition. The same architecture as in Madyar's drones — only in code.",
  "author": "Дністер",
  "language": "en-US",
  "published": "2026-05-16T07:00:00.000Z",
  "updated": null,
  "tags": [
    "AI",
    "agents",
    "Claude Code"
  ],
  "translationOf": "https://neurodrift.org/blog/armia-agentnykh-tsykliv/",
  "sourceUrl": null,
  "body": "> Just six months ago, the main hero of working with AI was the user searching for the right wording. In May 2026 that's no longer true. The winner isn't the one who writes a prettier prompt, but the one who closes the loop faster: context, agent, tools, artifact, verification, memory, repetition. This is a new operational logic — and a very familiar one.\n\n## The chat no longer looks like a chat\n\nNot long ago, interacting with AI looked simple.\n\nYou opened ChatGPT or Claude. You typed a request. You waited. You copied the text. You fixed it. You typed again. You waited again. You copied again.\n\nIt was the **era of the prompt**.\n\nIn it, the central hero was the user trying to find the right wording. *\"Write like a senior developer.\"* *\"Act as a McKinsey consultant.\"* *\"Ask me 10 clarifying questions.\"* *\"Don't hallucinate.\"* *\"Be precise.\"* *\"Write without fluff.\"* *\"Think step by step.\"*\n\nIt really worked. For a while.\n\nBut in May 2026 it became clear that this logic is already becoming obsolete. Not because prompt engineering disappeared — it remained as one of the layers. But because it stopped being the center of the system.\n\nNow the main question is no longer *\"how do I phrase the request?\"*\n\nIt's a different one:\n\n<aside class=\"pullquote\"><p><strong>How do you build an environment in which AI reliably does the work, doesn't lose context, doesn't break the product, and doesn't force a human to repeat the same thing over and over?</strong></p></aside>\nThis is no longer a chat. It's an **operating system**.\n\nOn 5 May, OpenAI rolled out *memory sources* in ChatGPT — the ability to see exactly which saved memories, past chats, custom instructions, files, or connected Gmail influenced an answer. Memory stops being a mystical black box and becomes a visible working layer. <sup>[[1]](#src-openai-memory)</sup>\n\nNine days later, OpenAI rolled out **Codex in the mobile ChatGPT**. And here the \"mobile\" part isn't what matters. What matters is something else: Codex can now be run like a live remote employee. From your phone you can see active threads, project context, screenshots, terminal output, diffs, test results, approvals, switch models, or kick off a new task. Over **4 million people** already use Codex weekly. <sup>[[2]](#src-openai-codex-mobile)</sup>\n\nThis is a radical change in UX.\n\nAI no longer sits in a chat waiting for one perfect request. It **works in an environment**. It sees files. It runs tests. It writes diffs. It waits for approval. It has hooks. It connects over Remote SSH. It leaves traces.\n\nThe prompt becomes just a launch button.\n\nThe work happens in a **loop**.\n\n## The end of the magic prompt\n\nIn the old era, prompt engineering was like shamanism.\n\nPeople collected \"perfect prompts.\" Collections of commands. Roles. Secret formulas. Markdown templates. \"Ultimate Claude Code prompt.\" \"Best ChatGPT prompt for developers.\" \"One prompt to build your SaaS.\"\n\nIt was natural. When a tool is new, people try to control it with language.\n\nBut over time it became clear: **one giant prompt is a bad way to manage complex work**.\n\nA big prompt bloats easily. It contradicts itself. It mixes rules, context, goals, exceptions, style, technical constraints, history, security, and answer format. It becomes not an instruction, but a trash can.\n\nAnd worst of all: it doesn't scale.\n\nOne prompt can help write a text, fix a function, or explain an error. But it holds a long process poorly:\n\n- research a topic;\n- gather sources;\n- create a structure;\n- write code;\n- deploy to a server;\n- verify;\n- get feedback;\n- make edits;\n- update memory;\n- ship the next version.\n\nIn such a process, the prompt is just **one layer**. Alongside it are context, memory, tools, hooks, permissions, runtime, evals, logs, sandboxes, human approvals.\n\nDataHub put it bluntly in April: prompt engineering optimizes how you **phrase** the instruction, while context engineering manages the entire **information environment** in which the model works. In their study, **82% of IT and data leaders** said prompt engineering alone is no longer enough for AI at scale, and **95% consider context engineering important** for scaling agents. <sup>[[3]](#src-datahub-context)</sup>\n\n<aside class=\"pullquote\"><p><strong>Prompt engineering asks: \"How do I say this?\" Context engineering asks: \"What does the model need to know right now?\"</strong></p></aside>\nThe difference is enormous.\n\nIn the first case, a person polishes the wording. In the second, they build a system for delivering the right context at the right moment.\n\nIt's like the difference between a beautiful order and proper logistics. A general can write a perfect order. But if the map is old, comms are broken, the unit doesn't know the terrain, the fuel didn't arrive, and HQ can't see the front — the order is worthless.\n\nIt's the same with AI. The model can be very smart. The prompt can be beautiful. But if the context is bad, the memory is noisy, the tools are disordered, and approval logic is undefined — the system will break.\n\n## The new formula of power: not the model, but the loop\n\nThe old AI logic thought in platforms.\n\nGPT. Claude. Gemini. Llama. Grok. DeepSeek. ([Which exact frontier model to pick for which task](/blog/cg4kpsh9r1-yaku-model-sh-vibrati-dlya-vashogo-zavda/) is a separate article, because the landscape changes fast.)\n\nThe new logic thinks in loops.\n\n```text\nintent\n  → context\n  → agent\n  → tools\n  → artifact\n  → verification\n  → feedback\n  → memory\n  → next iteration\n```\n\nIn the old logic you asked: *\"Which model is best?\"*\n\nIn the new logic you have to ask:\n\n- how does the agent get context?\n- which tools can it call?\n- where is it allowed to write?\n- when should it stop?\n- what will it log?\n- who approves risky actions?\n- how does the result turn into the next loop?\n- what of this gets saved to memory?\n- which errors become eval tests?\n\nThis is no longer the magic of the answer. This is the **engineering of repetition**.\n\nLangChain, in its *State of Agent Engineering* report, writes that **57.3%** of respondents already have agents in production, and another **30.4%** are actively developing agents with plans to deploy. The biggest production blocker is quality, named by **32% of respondents**. Observability has already been adopted by **89%** of organizations, while only **52.4%** have evals. <sup>[[4]](#src-langchain-state)</sup>\n\nThese numbers show one thing: agents are no longer a demo. They have become working infrastructure. And now the main headache isn't *\"how to make the model write something,\"* but **how to make its work reliable**.\n\n<aside class=\"pullquote\"><p><strong>AI no longer needs to impress. AI needs to be controlled.</strong></p></aside>\n## Codex in your pocket: the agent as a remote employee\n\nOpenAI named the May release simply: \"Work with Codex from anywhere.\"\n\nFormally, it's mobile access to Codex in the ChatGPT app. But culturally it's something else entirely — it's the first mass-market image of an AI coding agent as **a process that doesn't end with an answer in a chat**.\n\nYou launch a task on a laptop, Mac mini, or devbox. The agent works in your environment. It sees the project context. It runs commands. It outputs terminal output. It creates a diff. It takes screenshots. It runs tests. It asks for permission.\n\nAt that moment you can be in a taxi, on a walk, at the gym, or between calls — and give it a decision from your phone.\n\nThis is not *\"writing code on your phone.\"* It's **managing an execution loop from your phone**.\n\nOpenAI writes it plainly: a small check-in can keep the work from stalling, avoid unnecessary rework, or help Codex move with the right context. Hence the set of actions: review outputs, approve commands, change models, start something new. <sup>[[2]](#src-openai-codex-mobile)</sup>\n\nThis is a very important moment for everyone who works with AI.\n\nA person no longer sits in front of the model as an operator of a text field. The person becomes a **dispatcher of long tasks**.\n\n```text\nhuman:\n  sets the intent\n  gives constraints\n  makes decisions\n  verifies the result\n\nagent:\n  reads the code\n  calls tools\n  tries options\n  creates an artifact\n  returns evidence\n```\n\nThis is a new rhythm. Not *\"one request — one answer.\"* But *\"one task — many micro-interventions.\"*\n\nIn this sense, the smartphone becomes not a device for consuming AI, but a **remote control for the agent**.\n\n## Claude Code and the problem of long loops\n\nAnthropic, too, is moving not just toward a smarter model, but toward a **longer execution loop**.\n\nOn 6 May, Anthropic doubled the five-hour Claude Code rate limits for Pro, Max, Team, and seat-based Enterprise plans, removed the peak-hours limit reduction for Claude Code on Pro and Max, and substantially raised the API limits for Opus models. In the same release — a partnership with SpaceX that grants access to over **300 MW of new capacity and over 220,000 NVIDIA GPUs** over the course of a month. <sup>[[5]](#src-anthropic-claude-code-limits)</sup>\n\nThese numbers sound like infrastructure news. But in reality it's news about UX.\n\nWhy do AI coding tools hit limits so quickly?\n\nBecause agentic development burns not \"messages.\" It burns **loops**.\n\nThe agent reads the repository. It searches for files. It tries a patch. It runs tests. It gets an error. It reads the log. It rewrites. It runs again. It makes a diff. It gives a summary. It waits for the human. It continues.\n\nThis is a long session. It can last minutes or hours.\n\nLangChain describes it like this: long-running agents need durable execution, memory, multi-tenancy, human-in-the-loop, and observability. An agent can work for minutes or hours, wait for human approval, survive a deploy or crash, and **not lose progress**. <sup>[[6]](#src-langchain-deep)</sup>\n\nSo the real bottleneck isn't only intelligence.\n\nThe real bottleneck is **the duration and reliability of the loop**.\n\nIf an agent can't work for a long time, it stays autocomplete.\n\nIf it can work for a long time, save state, ask for permission, recover, and leave an audit trail — it becomes a **worker in the system**.\n\n## Control plane: where the new war is really being fought\n\nOn 15 May, VentureBeat very precisely named the next front: not the model war, but the **agent control plane**.\n\nThe idea is simple: companies are no longer just choosing which model answers better. They're choosing **where the operational machine of AI will live**: in the Microsoft stack, the OpenAI API layer, the Anthropic managed runtime, an open framework, or a hybrid mix.\n\nPer VB Pulse, in February 2026 Microsoft Copilot Studio and Azure AI Studio had **38.6% primary-platform adoption** among enterprise agent orchestration respondents, OpenAI Assistants and Responses API — **25.7%**, Anthropic tool use and workflows — **5.7%** (the sample is small, so VB explicitly cautions against over-reading it). <sup>[[7]](#src-vb-controlplane)</sup>\n\nBut even with that caution, the signal is strong.\n\nA model can be swapped. A control plane is harder to swap. Because that's where these live:\n\n- permissions;\n- memory;\n- tools;\n- approvals;\n- logs;\n- auditability;\n- sandboxing;\n- integrations;\n- cost controls;\n- security policies;\n- workflow state.\n\nIn the old AI logic, vendor lock-in was at the model level. In the new one, it's at the **runtime** level. That's much deeper.\n\nIf your team keeps workflows, permissions, memory, hooks, and agent tasks in one environment, you're no longer just \"using a model.\" You're building an **operational fabric** around it.\n\nNot *\"which LLM is the smartest?\"* But *\"where does my work live?\"*\n\n## RAG is no longer enough\n\nAnother shift: classic RAG stops being the universal answer.\n\nA few years ago it was fashionable to say: *\"We'll connect documents to a vector database, and the agent will know everything.\"*\n\nBut agentic workflows quickly exposed the weakness of this approach.\n\nWhen an agent works in a long loop, it needs not just a **search** of documents. It needs a **compiled structure of knowledge**: what the source of truth is, how entities are related, what the permissions are, which data is stale, what format is needed for the next tool call, what can be thrown out of the context window.\n\nOn 4 May, VentureBeat described this as a transition from a RAG pipeline to a compilation-stage knowledge layer. In the Pinecone Nexus example, one financial analysis task that previously consumed **2.8M tokens** was completed with **4,000 tokens** — a claimed reduction of **98%** (this is Pinecone's internal benchmark, not yet customer-validated). <sup>[[8]](#src-vb-knowledge)</sup>\n\nEven if you treat the number cautiously, the direction is obvious.\n\nThe future isn't about throwing a bigger context window at the model. The future is about giving it a **smaller, cleaner, more structured context**.\n\n```text\nbad:\n  all the documents\n  the entire history\n  all the instructions\n  all the noise\n\nbetter:\n  relevant facts\n  current state\n  clear constraints\n  the needed tools\n  short memory\n  evidence links\n```\n\nA large context without discipline isn't power. It's trash with a big limit.\n\n<aside class=\"pullquote\"><p><strong>The context window isn't an archive. It's the agent's working memory.</strong></p></aside>\nAnd if noise gets into that memory, the agent degrades **even before** it runs out of tokens.\n\n## GitHub showed how not to breed agents\n\nOne of the best practical examples of the week is the GitHub accessibility agent.\n\nOn 15 May, GitHub described an experiment with a general-purpose accessibility agent. Its job is to answer accessibility questions in the Copilot CLI and VS Code integration, and also to catch and automatically fix simple, objective accessibility issues **before production**. The agent has already reviewed **3,535 pull requests** and has a **68% resolution rate**. <sup>[[9]](#src-github-a11y)</sup>\n\nBut that's not the most interesting part. The most interesting part is the **architecture**.\n\nGitHub initially had a monolithic agent, but it quickly hit its limits. The team moved to a sub-agent architecture. Many guides advise building **a whole zoo of agents**, but GitHub found this works worse. They kept **only two**:\n\n1. a passive reviewer / researcher;\n2. an active implementer.\n\nThey are sandboxed and don't pass content directly to one another. Instead, each creates **structured, templatized output** that the parent orchestrating agent consumes, validates, and routes.\n\n```text\norchestrator\n  → reviewer\n  → structured findings\n  → orchestrator validates\n  → implementer\n  → changes or guidance\n  → re-audit\n```\n\nHere you can see the new culture of AI workflow.\n\nAgents **don't need to \"communicate like humans.\"** It sounds cute, but it quickly creates chaos. They need to pass structured artifacts.\n\nGitHub writes plainly that without a template schema, agents would start communicating arbitrarily, which creates higher token expenditure, hallucinations, unnecessary code changes, and a **nearly impossible audit**. <sup>[[9]](#src-github-a11y)</sup>\n\n<aside class=\"pullquote\"><p><strong>Don't let agents chatter. Give them schemas for handing off work.</strong></p></aside>\nEven more important — GitHub introduced **complexity-based behavior**. If the code is too complex, the agent is not allowed to generate changes. It switches to guidance-only mode or escalates to a human. There are also high-risk patterns where the agent is **forbidden to write code**: drag and drop, toasts, rich text editors, tree views, data grids.\n\nThis is mature agent design. AI shouldn't always act. Sometimes the best thing an agent can do is **stop**.\n\n## The limit of automation: 36% won't yield\n\nIn the same material, GitHub gives another strong number.\n\nOf the **55 WCAG level A and AA Success Criteria**, only **35** can be detected by deterministic automated code checkers. That means roughly **36% of the criteria** require manual evaluation. <sup>[[9]](#src-github-a11y)</sup>\n\nThis isn't just an accessibility fact. It's a model of reality for **any AI workflows**.\n\nIn every complex field there's a part that can be checked automatically. And there's a part where **human judgment** is needed.\n\n```text\nautomatable:\n  syntax\n  tests\n  obvious errors\n  format\n  repeatable patterns\n  part of compliance\n\nneeds judgment:\n  UX\n  reputational risk\n  ethical ambiguity\n  client context\n  strategic trade-off\n  semantic quality\n```\n\nThe problem with many AI systems is that they behave as if 100% of reality can be turned into a tool call. That's not true.\n\nA strong AI architecture **doesn't deny** human judgment. It places it at the **right point in the loop**.\n\n## Human-in-the-loop isn't QA at the end\n\nThe old idea of human-in-the-loop looked like this: AI does something, the human checks it, approves or edits. This is a weak model.\n\nA stronger model: the human doesn't just check the output. The human **shapes the trajectory**.\n\nLangChain describes the agent improvement loop as a process in which a team quickly creates a first version of an agent, runs it in a production-like environment, gathers data, analyzes outputs and eval scores, and human feedback influences context engineering and the next iterations. <sup>[[10]](#src-langchain-improvement)</sup>\n\nSo the human isn't an editor after the model. The human is the **trainer of the loop**.\n\nThey see where the agent gets confused. Which sources are missing. Where the context needs to be compressed. Where to add an example. Where to forbid an action. Where an escalation gate is needed. Where an error needs to be turned into a test.\n\n```text\nrun\n  → failure\n  → human judgment\n  → eval case\n  → context patch\n  → workflow patch\n  → next run\n```\n\nThis is the key to reducing wasted iterations. Not asking the model to *\"be better.\"* But taking **every failure** and turning it into a new element of the system.\n\n## Hooks: rules move from the prompt into the runtime\n\nIn the Codex release, OpenAI emphasized: **Hooks are now generally available** on all plans. They can be used for secret scanning, validators, conversation logging, memory creation, or repo-specific behavior customization. <sup>[[2]](#src-openai-codex-mobile)</sup>\n\nClaude Code is moving in this same logic. The Claude Code documentation from April–May shows a whole wave of runtime primitives: **Routines, `/usage`, `/ultrareview`, effort levels, hooks, monitor tools, permission logic, sandbox rules**. Routines run templated cloud agents on a schedule, a GitHub event, or an API call. `/usage` shows exactly what's consuming the limits. `/ultrareview` runs parallel multi-agent analysis and an adversarial critique pass for code review. <sup>[[13]](#src-claude-runtime)</sup>\n\nThis means rules are increasingly moving **from the prompt into the runtime**.\n\nYou don't need to write in the prompt:\n\n*\"Please don't delete important files, don't push secrets, don't change production configs, don't run dangerous Bash commands, don't generate code in high-risk zones.\"*\n\nThis needs to be coded into **hooks, policies, deny-lists, validators, and approval gates**.\n\n<aside class=\"pullquote\"><p><strong>The prompt requests. The runtime enforces.</strong></p></aside>\nThis is a fundamental difference.\n\nOn 13 May, GitHub also rolled out the **Agent tasks REST API** for the Copilot cloud agent in public preview. Copilot Business and Enterprise users can programmatically launch cloud agent tasks. The agent works in its own development environment, can make and validate code changes, and then open a pull request. GitHub gives scenarios: fan out refactors across repositories, one-click repo setup from an internal developer portal, weekly release preparation with release notes. <sup>[[12]](#src-github-restapi)</sup>\n\nThis is another step from chat to **infrastructure**. When an agent is launched via a REST API, it becomes not an assistant in a window, but a **part of the pipeline**.\n\n## Sakana Conductor: a small manager stronger than a big genius\n\nThe most intellectual signal of recent weeks is Sakana AI's work on *Conductor*.\n\nThe idea is almost elegant: don't train yet another model that decides everything itself. Teach **a small model to manage other models**.\n\nSakana describes a **7B Conductor model**, trained with reinforcement learning, that orchestrates a pool of frontier models — GPT-5, Gemini, Claude, and open-source models. It doesn't write code directly. It decides: whom to call, which subtask to assign, what context to show, how to assemble the workflow. For simple factual questions it might call one model. For complex coding problems — it creates a planner-executor-verifier pipeline. <sup>[[14]](#src-sakana-conductor)</sup>\n\nThe results are strong: in the paper, Conductor shows **83.93 on LiveCodeBench**, **93.3 on AIME25**, **87.5 on GPQA-Diamond**, and an average of **77.27**, exceeding the individual worker models in this setup. <sup>[[15]](#src-sakana-benchmark)</sup>\n\nThis is a very important metaphor for the entire AI era.\n\nThe future may belong not to the biggest model. But to the **best coordinator**.\n\n```text\nbig model:\n  solves the task itself\n\nconductor:\n  breaks down the task\n  picks the agents\n  limits the context\n  triggers verification\n  assembles the final result\n```\n\nThis is like a team. The strongest leader isn't necessarily the best designer, programmer, analyst, and editor themselves. Their strength is **knowing whom to bring in when**, what to assign to whom, what information to give, when to stop, and how to assemble the result.\n\nAI is starting to learn not only to answer. AI is starting to learn **management**.\n\n## But not every workflow needs an agent\n\nHere it's important not to fall into the opposite foolishness.\n\nIf prompt engineering was overrated, now there's a risk of overrating agents.\n\nOn 14 May, Martin Fowler published James Pritchard's view: many \"agent use cases\" are really just **workflows** — known sequences of steps where one or two steps involve an LLM. If the workflow is known, autonomy is often not needed. A function call is. <sup>[[16]](#src-fowler-workflows)</sup>\n\nThis is painful, but correct. Not everything needs to be turned into an agent.\n\nIf a process is stable — code the process. If the steps are known — make a pipeline. If you need to extract data, classify, reformat, validate a template — that's often **a function with an LLM call inside**.\n\nAn agent is needed where there is: uncertainty, search, branching, tool use, long context, intermediate decisions, a need for human approval, a variable trajectory.\n\n```text\nknown path → workflow\nunknown path → agent\nhigh risk → human gate\nrepeatable pattern → automation\n```\n\nA simple matrix, but it saves you from **over-agenting**.\n\n## The economics of agents: subscriptions are no longer bottomless\n\nAnother unpleasant but important signal is **billing**.\n\nOn 14 May, Zed explained that, effective 15 June, Anthropic splits Claude subscription billing into two pools: first-party Claude tools and third-party agent / SDK usage. For third-party agent usage through ACP, `claude -p`, and other tools, an **Agent SDK credit** is introduced: $20 for Pro, $100 for Max 5x, $200 for Max 20x. Once the credit is exhausted — usage at API rates or requests stop. <sup>[[17]](#src-zed-billing)</sup>\n\nThis isn't just pricing drama. It's **the end of the all-you-can-eat illusion** for heavy agent workflows.\n\nWhen a person writes 30 messages in a chat, that's one economics. When an agent launches dozens of tool calls, reads a repository, runs subagent analysis, holds long context, and repeats tests — that's **an entirely different economics**.\n\nAgentic work costs a lot, because it's not *\"an answer.\"* It's a **compute loop**.\n\n- Bad context = more tokens.\n- Bad prompt = more retries.\n- Bad tools = more erroneous actions.\n- Bad evals = more human review.\n- Bad runtime = more interruptions.\n- Bad memory = every run from scratch.\n\nAll of this costs. Not metaphorically. Literally.\n\n## Voice-to-artifact: the next natural form of work\n\nThe most interesting thing is that this logic is already starting to look very **natural** in real work.\n\nA person speaks into a microphone. Claude Code or Codex gets the task. It creates an HTML, a landing page, a script, a database migration, a Telegram bot, a research document. It uploads it to the server. The person looks at the result. By voice, they give edits. The AI changes it. Deploy again. Feedback again. Everything is documented in `.md` files, project memory, agent instructions, changelog.\n\nThis is already a normal working mode for people who live in fast iteration.\n\n```text\nvoice\n  → agent\n  → artifact\n  → deploy\n  → inspect\n  → correction\n  → memory\n  → next version\n```\n\nThinking stops being separated from production.\n\nPreviously, between an idea and an artifact there was a lot of friction: sit down, formulate, write a spec, hand it to a developer, wait, receive it, explain the edits, wait again.\n\nNow the voice interface compresses this loop. **The idea moves into the product almost directly.**\n\nBut that's exactly why structure becomes critical. If you don't formalize this loop, it quickly turns into chaos: different sessions, different agents, lost context, duplicates, poorly recorded decisions, *\"why did we do this?\"*, *\"where's the latest version?\"*, *\"which prompt worked?\"*\n\nSo the new stack must have **memory**. Not romantic. Technical:\n\n```text\n/project.md      what it is, the goal, users, domain, deploy\n/decisions.md    key decisions, why, what not to do\n/workflows.md    how to launch, deploy, verify, roll back\n/agents.md       roles, constraints, tools, escalation rules\n/evals.md        typical errors, acceptance criteria, regressions\n```\n\nThis isn't bureaucracy. It's a way **not to lose speed**.\n\n## Why wasted iterations are the main enemy\n\nThis whole topic comes down to one thing: **reduce the number of wasted iterations**.\n\nNot just *\"get a better answer.\"* But get **fewer loops to the right result**.\n\nA bad AI workflow looks like this:\n\n```text\nprompt\n  → not it\n  → explanation\n  → not it\n  → clarification\n  → not it\n  → irritation\n  → manual edit\n```\n\nA good AI workflow looks like this:\n\n```text\nspec\n  → context\n  → agent run\n  → artifact\n  → validation\n  → focused correction\n  → memory update\n  → reusable template\n```\n\nThe difference isn't in the \"smartness of the model.\" The difference is that the second loop **learns**. After each error it becomes better. The first one just burns nerves.\n\nThat's exactly why the **artifact-first** approach is so strong. Don't ask the AI to *\"explain what you did.\"* Ask it to create an artifact that can be verified: a diff, a test result, a deployed page, JSON, a checklist, a PR, a changelog, a screenshot, a log, a report.\n\n<aside class=\"pullquote\"><p><strong>An answer persuades. An artifact is verified.</strong></p></aside>\n## The worst anti-patterns of 2026\n\nIn the new AI reality, work is most often broken not by models, but by bad patterns.\n\n**1. A giant prompt instead of a system.** When all the rules, style, context, history, and constraints live in one canvas of text — the system becomes fragile. *Better:* a short core prompt, context separately, tools separately, policies in hooks, managed memory, explicit evals.\n\n**2. An agent without boundaries.** If an agent can do everything, sooner or later it will do something unnecessary. *Better:* read-only by default, write only with scope, dangerous actions require approval, high-risk zones blocked.\n\n**3. Free text between agents.** Without a schema you get hallucinations, token bloat, and audit hell. *Better:* structured handoff, template schema, explicit fields, parent orchestrator validates.\n\n**4. No memory of decisions.** Every new session starts from scratch. The human explains the same thing again. *Better:* `decisions.md`, `project.md`, known constraints, what not to do.\n\n**5. No eval loop.** Errors are fixed manually but don't become tests. *Better:* failure → captured → classified → added to eval → prevents regression.\n\n## A new profession: architect of agentic loops\n\nFrom this a new role is born.\n\nNot a prompt engineer in the old sense. But an **agent workflow architect**.\n\nA person who can:\n\n- break processes into stages;\n- determine where a model is needed and where ordinary code is;\n- design the context flow;\n- configure memory;\n- spell out agent roles;\n- create structured handoffs;\n- build approval gates;\n- introduce evals;\n- control costs;\n- make workflows portable between vendors.\n\nThis isn't one profession on LinkedIn. It's a **skill** that will permeate the work of a founder, a CTO, a product manager, an operations lead, an analyst, an editor, a developer.\n\nIn 2024 it was valuable *\"to be able to prompt.\"* In 2026 it's valuable **to be able to build loops**.\n\n## Bottom line\n\n<mark style=\"background:#ffe600;color:#0a0a0a;padding:0.05em 0.15em;font-weight:600;\">The era of the magic prompt is ending not because prompts became unnecessary. It's ending because **the work became longer than one prompt**.</mark>\n\nAI now writes code, runs tests, edits files, works in a devbox, waits for approval, remembers decisions, reads mail, connects tools, creates PRs, launches via API, gets hooks, and falls under governance.\n\nThis is no longer a \"text generator.\" It's a **new execution machine**.\n\nAnd in this machine, what matters most isn't who writes the prettiest prompt.\n\nWhat matters is who closes the loop faster:\n\n```text\nsee\n  → formulate\n  → launch\n  → verify\n  → fix\n  → remember\n  → repeat\n```\n\n[Just as in Madyar's war](/blog/madyar-drone-war-ukraine-future-of-war/) the winner isn't a single platform but the speed of the sensor loop — in modern AI work the winner isn't a single model but the speed of the **agentic loop**.\n\nThe prompt was a command.\n\nThe loop becomes an **army**.\n\n<aside class=\"sources\">\n\n### Sources\n\n1. <span id=\"src-openai-memory\"></span>OpenAI — Memory Sources release for ChatGPT, 5 May 2026. <https://openai.com/index/>\n2. <span id=\"src-openai-codex-mobile\"></span>OpenAI — \"Work with Codex from anywhere\" mobile launch + Hooks GA, 14 May 2026. <https://openai.com/index/>\n3. <span id=\"src-datahub-context\"></span>DataHub — Context Engineering survey (82% IT leaders, 95% on importance), April 2026. <https://datahub.com/>\n4. <span id=\"src-langchain-state\"></span>LangChain — State of Agent Engineering report, 2026 edition. <https://blog.langchain.com/>\n5. <span id=\"src-anthropic-claude-code-limits\"></span>Anthropic — Claude Code rate-limit doubling + SpaceX 300 MW partnership, 6 May 2026. <https://www.anthropic.com/news/>\n6. <span id=\"src-langchain-deep\"></span>LangChain — Durable execution for production deep agents. <https://blog.langchain.com/>\n7. <span id=\"src-vb-controlplane\"></span>VentureBeat — Agent Control Plane analysis (Microsoft 38.6%, OpenAI 25.7%, Anthropic 5.7%), 15 May 2026. <https://venturebeat.com/ai/>\n8. <span id=\"src-vb-knowledge\"></span>VentureBeat — Pinecone Nexus compilation-stage knowledge layer benchmark (2.8M → 4K tokens), 4 May 2026. <https://venturebeat.com/ai/>\n9. <span id=\"src-github-a11y\"></span>GitHub Engineering — Accessibility Agent architecture, sub-agent pattern, 36% manual-only threshold, 15 May 2026. <https://github.blog/engineering/>\n10. <span id=\"src-langchain-improvement\"></span>LangChain — Agent improvement loop & context engineering patterns. <https://blog.langchain.com/>\n11. <span id=\"src-github-jetbrains\"></span>GitHub — Copilot CLI in JetBrains IDEs, Ask Question tool, `.agent.md` support, 13 May 2026. <https://github.blog/>\n12. <span id=\"src-github-restapi\"></span>GitHub — Agent tasks REST API public preview, 13 May 2026. <https://github.blog/>\n13. <span id=\"src-claude-runtime\"></span>Anthropic Claude Code documentation — Routines, `/usage`, `/ultrareview`, hooks, sandbox rules (April–May 2026). <https://docs.claude.com/>\n14. <span id=\"src-sakana-conductor\"></span>Sakana AI — Conductor paper: 7B model orchestrating frontier models via RL. <https://sakana.ai/>\n15. <span id=\"src-sakana-benchmark\"></span>Sakana AI — Conductor benchmark results (LiveCodeBench 83.93, AIME25 93.3, GPQA-D 87.5). <https://sakana.ai/>\n16. <span id=\"src-fowler-workflows\"></span>Martin Fowler & James Pritchard — \"Workflows vs agents\" distinction, 14 May 2026. <https://martinfowler.com/>\n17. <span id=\"src-zed-billing\"></span>Zed — Anthropic Agent SDK credit split ($20/$100/$200), effective 15 June 2026, 14 May 2026. <https://zed.dev/blog/>\n\n</aside>"
}