Welcome to the Sturdy Blog
News and Resources
The latest from Sturdy — product news, insights, and resources.
.png)
The Context Engine
Executive Summary
The Context Engine
The model is not the problem. In every enterprise AI deployment that has hit a production wall in 2026, the failure lives one layer down: in how data is prepared, permissioned, and delivered before the model ever begins reasoning. Model choice has become the wrong question. With Anthropic's Claude surpassing OpenAI in U.S. enterprise adoption (34.4% vs. 32.3%, Ramp AI Index, April 2026), the market has already moved on. The competition has shifted from the Reasoning Engine to the Context Engine.
While nearly every enterprise has deployed frontier models, most are paying a Hallucination Tax they cannot see on their P&L. For an organization with 1,000 knowledge workers, the 4.3 hours per employee per week spent manually verifying AI outputs (Forrester, 2025) equates to approximately $16.8 million in annual salary drain, calculated at a conservative $75 per fully-loaded hour. Multiply that across a global enterprise, and it maps to the $67.4 billion in documented AI hallucination losses recorded in 2024 alone (AllAboutAI, 2025). This is not a failure of the model. It is a failure of architecture.
This paper argues that the next phase of enterprise AI requires a Deterministic Intelligence Layer: infrastructure that normalizes, indexes, and permissions customer data before it reaches the model. Teams replacing token-heavy RAG workflows with deterministic, pre-indexed context are seeing substantial reductions in cost per task while dramatically improving retrieval precision and AI reliability. More importantly, they are crossing the Threshold of Action: the point where AI becomes trustworthy enough to move from surfacing insights to executing workflows.

Section 1
The New Benchmark: Claude's Enterprise Breakout Moment
The AI market just had its crossover moment. As of April 2026, more U.S. businesses pay for Anthropic's Claude than for any other AI model. 34.4% vs. 32.3% for OpenAI, according to the Ramp AI Index, which tracks actual spending across more than 50,000 companies. This isn't a survey about intent. It's purchasing data.
By March 2026, Anthropic was capturing 73% of first-time business AI buyers (Axios, March 2026). A year earlier, one in 25 businesses on Ramp's platform paid for Anthropic. Today, it's nearly one in three.
Enterprise buyers don't switch defaults on a whim. They switch when something is demonstrably working better for the work they actually need done.
The Model Is Not the Problem
Here is the harder truth underneath that adoption story. Despite the crossover, most enterprise AI deployments are not delivering.

Widespread adoption. Widespread underdelivery. Both things are true simultaneously.
The instinct in most organizations is to treat this as a model problem: switch providers, upgrade to the latest version, hire a prompt engineer. None of it moves the needle in any sustained way, because the model is not where the failure lives. Claude is a reasoning engine. A sophisticated one. But a reasoning engine can only reason over what it's given. And in most enterprise deployments, what's given is a mess. Fragments.
The Performance Ceiling
Every technical leader deploying Claude at scale hits the same wall. The demo works. The pilot looks promising. Then it moves toward production, and something breaks. Not catastrophically, but consistently. The AI misattributes an item to the wrong account. It summarizes a customer's history using stale data. It generates an output that sounds authoritative and requires 20 minutes of human verification before it can be trusted.
"Feed a world-class reasoning engine confident, well-structured garbage, and you get the same in return."
This is not a failure of reasoning capability. It is a failure of context architecture. The data required to generate reliable outputs, account history, communications, support activity, call transcripts, and operational metadata typically exists across fragmented systems with inconsistent normalization, disconnected permissions, and no canonical entity resolution layer tying it together.
Context Is the New Infrastructure
The companies pulling ahead in 2026 are not winning because they chose a better model. They are winning because they solved the harder problem underneath it: delivering clean, resolved, permission-aware context before the model ever begins reasoning.
- IT, Data, and Platform Engineering provide the Engine (Claude): a recurring operating expense. World-class reasoning, rented.
- RevOps, Data, and AI Teams provide the Map (the Deterministic Data Layer): a long-term asset. Customer intelligence, owned.
Claude is the current catalyst. The model market will keep moving. New releases, new providers, new pricing. What doesn't move is the underlying problem: fragmented, unresolved, improperly permissioned data. Deterministic context is the durable architecture. The organizations building it now will carry that advantage into every subsequent model generation.
Most organizations already have the engine. What they lack is the map.
Section 2
The Hallucination Tax: Why Fragmented Data Kills AI Performance
If the model isn't the problem, why are so many production-grade AI initiatives hitting a performance ceiling? The answer is the Hallucination Tax.
In 2024, hallucinations cost enterprises an estimated $67.4 billion in global losses (AllAboutAI, 2025). By early 2026, the cost has shifted from outright fabrications to "silent hallucinations": outputs that look structurally perfect but are factually untethered from the current state of the business.
For an organization with 1,000 knowledge workers, the 4.3 hours lost per person per week equates to roughly 223,600 hours of wasted annual productivity, approximately $16.8 million in annual salary drain at a conservative, fully loaded rate. It never appears on the P&L as an AI cost. It shows up as underperformance, missed forecasts, and slower deal cycles.

This forces employees to act as "Human Middleware": the bridge between fragmented systems and the AI that was supposed to make them irrelevant. This tax is the direct result of four specific architectural failure modes.
Failure Mode 1: Retrieval Precision (The Token Tax)
Standard RAG is probabilistic. It retrieves semantically similar fragments, not operational truth. When a sales leader asks, "Why did we lose this seven-figure deal?", the system may surface an old QBR deck instead of the pricing objections in email, the procurement concerns buried in Slack, the legal escalation in Jira, and the product gaps discussed in call transcripts that actually determined the outcome.
Because retrieval is imprecise, teams over-index by stuffing the context window with every possible document to ensure the right one is in there. The result: thousands of reasoning tokens spent filtering noise. A world-class reasoning engine doing the work of a search index.
Failure Mode 2: "Lost in the Middle" (Attention Drift)
Research by Liu et al. (TACL, 2024) demonstrated that accuracy on multi-document reasoning tasks drops by more than 30 percentage points when relevant information is buried in the middle of a long context window. This matters enormously in enterprise environments, where critical signals are scattered across support escalations, pricing discussions, call transcripts, Slack threads, and CRM updates. Simply increasing context size does not solve the problem. In many cases, it amplifies it by forcing the model to attend to more noise.
Failure Mode 3: The Identity Crisis (Entity Disambiguation)
In a fragmented environment, identity is a variable, not a constant. "Jane Doe" in a Zoom transcript needs to resolve to the same Jane Doe in Salesforce, Gmail, Zendesk, Slack, and the CRM activity timeline. Without deterministic entity resolution, the model is forced to infer whether those interactions belong to the same person, account, or buying committee.
Without deterministic entity resolution, the model is forced to reconstruct identity probabilistically. A support escalation tied to one stakeholder, a pricing objection raised in a sales call, and an executive concern discussed over email may be incorrectly assembled into the wrong account narrative entirely.
Failure Mode 4: The Permission Ghost (Unauthorized Surface)
This is the silent killer of enterprise AI programs. Most RAG pipelines lack Source-System Parity. If the AI retrieves a snippet from a private executive email because it was "semantically relevant" to an intern's query, the system has failed regardless of whether anyone noticed.
Incidents like EchoLeak show exactly why retrieval-layer permission enforcement matters. In late 2025, researchers demonstrated a zero-click vulnerability in Microsoft 365 Copilot that could exfiltrate sensitive data from Copilot context without user interaction. No prompt injection required. The retrieval layer was the attack surface.
For most organizations, the permission layer isn't just a technical problem. It is an organizational liability that Legal and Security will eventually force you to solve on a deadline, under pressure, after something has already gone wrong.
The Production Wall
These four failure modes create the Production Wall. A curated demo can appear remarkably accurate. But production environments are not curated. They are noisy, fragmented, and constantly changing, with critical signals distributed across emails, calls, support threads, Slack conversations, and operational systems evolving in real time.
"You cannot solve these four problems by tuning the prompt. You have to solve them by fixing the context."

Section 3
The Deterministic Intelligence Layer
To climb over the Production Wall, enterprise architecture must evolve. The solution is not a larger context window or a more complex prompt. It is a fundamental shift in how data is prepared for the model. Enter the Deterministic Intelligence Layer: infrastructure that sits between your raw data silos and Claude, acting as the architectural antidote to the four failure modes in Section 2.
The Four Pillars
1. Precision Indexing (Ending the Token Tax)
Instead of relying on similarity search alone, the context layer resolves entities, removes duplication, and prioritizes high-signal interactions before retrieval. The model receives structured operational context rather than raw fragments competing for attention.
In Sturdy-observed deployments, replacing raw context with pre-indexed, distilled payloads has reduced token consumption by 80 to 90% on comparable workflows. Results vary by source data density and baseline architecture. You stop paying for Claude to be a search filter.
2. Signal Distillation (Solving "Lost in the Middle")
Semantic Pruning strips HTML headers, Slack noise, legal footers, and the RE: FWD: RE: reply chains that bury every actual decision in 40 lines of quoted text, distilling threads into thematic buckets: Bug Reports, Feature Requests, Sentiment Shifts. The most critical insights land at the beginning of the context window, bypassing the 30-point accuracy drop documented in long-context research.
3. Deterministic Entity Resolution (Fixing the Identity Crisis)
A Global Entity Map resolves disparate naming conventions into a single, immutable Customer ID. Claude is no longer guessing whether two conversations belong to the same account. It is being told they do.
4. Parity-Enforced Permissions (Exorcising the Permission Ghost)
The retrieval layer enforces source-system permissions before context assembly, so unauthorized records are excluded from the payload sent to the model. This is not a prompt-level instruction that can be overridden or confused. It is an architectural enforcement point that sits entirely upstream of the model.
Security becomes a structural property of the architecture, not a probabilistic instruction to the model. Incidents like EchoLeak show why this distinction matters: when permission logic lives inside the prompt, the retrieval layer remains an attack surface. When it lives at the data layer, it doesn't.
Reference Implementation: Sturdy + Claude via MCP
While the merits of this architecture are clear, building it internally results in years of maintenance debt (see Section 5). Sturdy leverages the Model Context Protocol to serve as the Context Engine for Claude, normalizing, indexing, and permission-stamping your customer intelligence layer across Salesforce, Gmail, Slack, and Zendesk before Claude ever queries it.
Claude provides the Reasoning Layer. Sturdy provides the Memory and Context Layer. Together, they move an enterprise from AI that reads your business to AI that acts on it.

Section 4
What It Unlocks: From Reading to Acting
In 2026, summarization is a commodity. The competitive advantage lies in moving from AI that reads your business to AI that acts on it. This transition requires a fundamental shift in how leadership views the AI stack and who owns what.
- IT, Data, and Platform Engineering provide the Engine (Claude): recurring operating expense. World-class reasoning, rented.
- RevOps, Data, and AI Teams provide the Map (the Deterministic Data Layer): a long-term asset. Customer intelligence, owned, not rented.
When the engine has a perfect map, the Acceleration Gap closes.
RevOps: The Revenue Architect
For the RevOps leader, a deterministic layer turns fragmented operational data into active revenue signals. Instead of building static dashboards that explain why a quarter was missed, RevOps can monitor the commercial signals that actually move deals: pricing hesitation in email, procurement delays, legal friction, competitive mentions, executive disengagement, stalled next steps, and tone changes across active opportunities.
A deterministic context layer resolves those signals to the right person, account, opportunity, and timeline before AI ever reasons over them. That is what turns scattered communication into reliable revenue action.
RevOps stops being a report generator. It becomes the operating system for revenue execution: designing the logic that turns verified commercial signals into coordinated GTM action.
Sales: Instant Account Intelligence
The average sales rep spends roughly 20% of their week on pre-call research. With a deterministic layer, the account briefing is no longer a probabilistic summary. It is a verified snapshot: "The customer's last three support tickets were resolved, but they haven't yet implemented the API update discussed in the March QBR."
Product: The Automated Feedback Loop
Product managers are often the most data-rich but insight-poor employees in the company. A deterministic layer moves PMs from reading feedback to querying insights. Claude analyzes 60 days of feedback across Slack and Zendesk and, with a single prompt, generates a high-fidelity Jira ticket including exact customer quotes, impacted account IDs, and revenue at risk.
Customer Success: Proactive Triage
In CS, latency is the enemy. A deterministic layer allows Claude to perform live triage. When a customer sends a frustrated email, the AI checks contract terms and recent product usage logs before the CSM has finished reading the subject line. It presents a Context-Aware Response ready to send, grounded in verified account data.
"The model you license today is rent. The customer intelligence layer you build is equity. One gets replaced. The other compounds."
Every account signal normalized, every entity resolved, every permission enforced. That accumulates. The organizations building this layer now are building institutional memory that makes every model they run on top of it better.

Section 5
The Build vs. Buy Reality
The instinct for most sophisticated IT and data teams is to build. It is a legitimate impulse. The stack looks deceptively simple: a few API connectors, a vector database, and some chunking logic. In the demo phase, an internal build often feels like the most cost-effective path.

The Four Hidden Engineering Hurdles
1. The Normalization Treadmill
Building a connector to Salesforce is straightforward. Maintaining the logic layer that resolves entity names across Salesforce, Slack, and Zendesk as those systems' schemas evolve is a full-time engineering job. This is Semantic Drift: hundreds of developer hours consumed by maintenance rather than innovation.
2. The Permission Mapping Paradox
Mapping row-level permissions from source systems into an AI context window is one of the most complex security challenges in modern software. Most internal builds rely on prompt-level security, which fails under the weight of incidents like EchoLeak. This isn't a technical trade-off. It is an organizational liability waiting to be forced into crisis.
3. The Latency Wall
A custom RAG pipeline often takes 5 to 10 seconds to fetch and clean data. In Sturdy-observed deployments, pre-indexed deterministic retrieval consistently operates under 1 second on production data volumes, but reaching that benchmark requires specialized search infrastructure expertise that is rarely the core competency of a generalist data team building from scratch.
4. The Token Optimization Tax
Without signal distillation, internal builds routinely pass 3x to 5x more tokens than necessary. Teams save on build costs only to spend twice as much on model API costs.
Where Does Your Engineering Dollar Go?
The strategic question isn't "Can we build this?" It's "Should we own the maintenance of this?"

Competitive advantage does not live in the plumbing. No customer chooses a vendor because their AI has a better Python script for cleaning Slack data.
By offloading the Normalization Treadmill to Sturdy, organizations are promoting their engineering teams from Data Cleaners to AI Product Owners, moving their best people away from the maintenance treadmill and toward the high-value work of building AI that drives revenue.
Buy the plumbing. Build the logic. The teams doing this are shipping revenue-generating AI workflows, while their competitors are still debugging entity-resolution scripts.
Section 6
What to Do Now: The 2026 Roadmap
The Acceleration Gap is not a permanent state. It is a choice of architecture. The move is not to wait for a smarter model. The move is to fix the context. Here are four moves for leadership to take in the next 90 days.
Move 1: Audit Your Retrieval Precision, Not Your Prompts
Most teams spend the majority of their time prompt-tuning errors caused by bad data retrieval. The action: Run a Ground Truth test. Take ten complex customer queries and manually check the data fragments Claude is being fed. If more than 20% of that data is noisy, stale, or misattributed, no prompt engineering will save the deployment. You have a plumbing problem, not a reasoning problem.
Move 2: Isolate a Multi-Source Workflow
The highest ROI for a deterministic layer is found where data is most fragmented. The action: Pick a high-value, closed-loop use case where data lives in at least three systems. For example: the path from customer feedback in Slack and Zendesk to an engineering action in Jira. Solve the context problem here, and you've built a blueprint for the rest of the organization.
Move 3: Enforce Permissions at the Data Layer
Stop treating security as a probabilistic instruction. The action: Move permission enforcement out of the system prompt and into the retrieval infrastructure. Ensure the retrieval layer enforces source-system permissions before context assembly, so unauthorized records never reach the model. The Permission Ghost is exorcised structurally, not instructionally, and the organizational liability is removed before Legal ever has to get involved.
Move 4: Define Where AI Earns the Right to Act
The distance between AI that summarizes and AI that executes is a trust gap, not a technology gap. The action: Build human-in-the-loop approval gates for high-stakes actions. Drafting a renewal contract. Creating a Jira ticket. Sending a support response. Use your deterministic layer to provide the required Confidence Equity. The threshold to target is a sub-5% error rate on AI-generated drafts. That is the point at which approval gates can be safely reduced, and workflows become self-sustaining.
Traditional probabilistic RAG architectures struggle to reach this threshold consistently at enterprise scale. Because probabilistic retrieval introduces entity errors, stale data, and permission noise, error rates on complex multi-source tasks typically stabilize in the 15 to 30% range regardless of prompt quality, even with hybrid retrieval and reranking layers added on top.
A deterministic layer that resolves entities before inference, distills the signal before retrieval, and enforces permissions before the model ever sees the data is the only architecture that makes sub-5% structurally achievable, rather than an occasional lucky outcome.
In Sturdy-observed deployments, teams that reach this threshold have consistently moved to reduced-oversight approval workflows within a quarter. Results depend on workflow complexity and baseline data quality. Reaching the sub-5% Trust Threshold is the definitive signal that an organization has graduated from "AI Experiments" to a Context Engine architecture capable of autonomous action. That is the architectural line between AI that assists and AI that acts.

Conclusion
The Architectural Advantage
Frontier models will continue to improve and commoditize. The durable advantage is no longer the model itself. It is the architecture surrounding it.
The long-term value does not live in another standalone AI interface. Interfaces change too quickly. The durable layer is the operational context infrastructure beneath them.
Organizations that solve deterministic context assembly, entity resolution, permission-aware retrieval, and operational state assembly gain a compounding advantage independent of whichever model, interface, or orchestration layer dominates next year.
Organizations that solve context architecture today are building infrastructure that compounds across model generations. As interfaces evolve and models improve, the operational context layer beneath them becomes increasingly valuable.
"The era of the Context Engine is here. Is your architecture ready for it?"
Executive Summary
The Context Engine
The model is not the problem. In every enterprise AI deployment that has hit a production wall in 2026, the failure lives one layer down: in how data is prepared, permissioned, and delivered before the model ever begins reasoning. Model choice has become the wrong question. With Anthropic's Claude surpassing OpenAI in U.S. enterprise adoption (34.4% vs. 32.3%, Ramp AI Index, April 2026), the market has already moved on. The competition has shifted from the Reasoning Engine to the Context Engine.
While nearly every enterprise has deployed frontier models, most are paying a Hallucination Tax they cannot see on their P&L. For an organization with 1,000 knowledge workers, the 4.3 hours per employee per week spent manually verifying AI outputs (Forrester, 2025) equates to approximately $16.8 million in annual salary drain, calculated at a conservative $75 per fully-loaded hour. Multiply that across a global enterprise, and it maps to the $67.4 billion in documented AI hallucination losses recorded in 2024 alone (AllAboutAI, 2025). This is not a failure of the model. It is a failure of architecture.
This paper argues that the next phase of enterprise AI requires a Deterministic Intelligence Layer: infrastructure that normalizes, indexes, and permissions customer data before it reaches the model. Teams replacing token-heavy RAG workflows with deterministic, pre-indexed context are seeing substantial reductions in cost per task while dramatically improving retrieval precision and AI reliability. More importantly, they are crossing the Threshold of Action: the point where AI becomes trustworthy enough to move from surfacing insights to executing workflows.

Section 1
The New Benchmark: Claude's Enterprise Breakout Moment
The AI market just had its crossover moment. As of April 2026, more U.S. businesses pay for Anthropic's Claude than for any other AI model. 34.4% vs. 32.3% for OpenAI, according to the Ramp AI Index, which tracks actual spending across more than 50,000 companies. This isn't a survey about intent. It's purchasing data.
By March 2026, Anthropic was capturing 73% of first-time business AI buyers (Axios, March 2026). A year earlier, one in 25 businesses on Ramp's platform paid for Anthropic. Today, it's nearly one in three.
Enterprise buyers don't switch defaults on a whim. They switch when something is demonstrably working better for the work they actually need done.
The Model Is Not the Problem
Here is the harder truth underneath that adoption story. Despite the crossover, most enterprise AI deployments are not delivering.

Widespread adoption. Widespread underdelivery. Both things are true simultaneously.
The instinct in most organizations is to treat this as a model problem: switch providers, upgrade to the latest version, hire a prompt engineer. None of it moves the needle in any sustained way, because the model is not where the failure lives. Claude is a reasoning engine. A sophisticated one. But a reasoning engine can only reason over what it's given. And in most enterprise deployments, what's given is a mess. Fragments.
The Performance Ceiling
Every technical leader deploying Claude at scale hits the same wall. The demo works. The pilot looks promising. Then it moves toward production, and something breaks. Not catastrophically, but consistently. The AI misattributes an item to the wrong account. It summarizes a customer's history using stale data. It generates an output that sounds authoritative and requires 20 minutes of human verification before it can be trusted.
"Feed a world-class reasoning engine confident, well-structured garbage, and you get the same in return."
This is not a failure of reasoning capability. It is a failure of context architecture. The data required to generate reliable outputs, account history, communications, support activity, call transcripts, and operational metadata typically exists across fragmented systems with inconsistent normalization, disconnected permissions, and no canonical entity resolution layer tying it together.
Context Is the New Infrastructure
The companies pulling ahead in 2026 are not winning because they chose a better model. They are winning because they solved the harder problem underneath it: delivering clean, resolved, permission-aware context before the model ever begins reasoning.
- IT, Data, and Platform Engineering provide the Engine (Claude): a recurring operating expense. World-class reasoning, rented.
- RevOps, Data, and AI Teams provide the Map (the Deterministic Data Layer): a long-term asset. Customer intelligence, owned.
Claude is the current catalyst. The model market will keep moving. New releases, new providers, new pricing. What doesn't move is the underlying problem: fragmented, unresolved, improperly permissioned data. Deterministic context is the durable architecture. The organizations building it now will carry that advantage into every subsequent model generation.
Most organizations already have the engine. What they lack is the map.
Section 2
The Hallucination Tax: Why Fragmented Data Kills AI Performance
If the model isn't the problem, why are so many production-grade AI initiatives hitting a performance ceiling? The answer is the Hallucination Tax.
In 2024, hallucinations cost enterprises an estimated $67.4 billion in global losses (AllAboutAI, 2025). By early 2026, the cost has shifted from outright fabrications to "silent hallucinations": outputs that look structurally perfect but are factually untethered from the current state of the business.
For an organization with 1,000 knowledge workers, the 4.3 hours lost per person per week equates to roughly 223,600 hours of wasted annual productivity, approximately $16.8 million in annual salary drain at a conservative, fully loaded rate. It never appears on the P&L as an AI cost. It shows up as underperformance, missed forecasts, and slower deal cycles.

This forces employees to act as "Human Middleware": the bridge between fragmented systems and the AI that was supposed to make them irrelevant. This tax is the direct result of four specific architectural failure modes.
Failure Mode 1: Retrieval Precision (The Token Tax)
Standard RAG is probabilistic. It retrieves semantically similar fragments, not operational truth. When a sales leader asks, "Why did we lose this seven-figure deal?", the system may surface an old QBR deck instead of the pricing objections in email, the procurement concerns buried in Slack, the legal escalation in Jira, and the product gaps discussed in call transcripts that actually determined the outcome.
Because retrieval is imprecise, teams over-index by stuffing the context window with every possible document to ensure the right one is in there. The result: thousands of reasoning tokens spent filtering noise. A world-class reasoning engine doing the work of a search index.
Failure Mode 2: "Lost in the Middle" (Attention Drift)
Research by Liu et al. (TACL, 2024) demonstrated that accuracy on multi-document reasoning tasks drops by more than 30 percentage points when relevant information is buried in the middle of a long context window. This matters enormously in enterprise environments, where critical signals are scattered across support escalations, pricing discussions, call transcripts, Slack threads, and CRM updates. Simply increasing context size does not solve the problem. In many cases, it amplifies it by forcing the model to attend to more noise.
Failure Mode 3: The Identity Crisis (Entity Disambiguation)
In a fragmented environment, identity is a variable, not a constant. "Jane Doe" in a Zoom transcript needs to resolve to the same Jane Doe in Salesforce, Gmail, Zendesk, Slack, and the CRM activity timeline. Without deterministic entity resolution, the model is forced to infer whether those interactions belong to the same person, account, or buying committee.
Without deterministic entity resolution, the model is forced to reconstruct identity probabilistically. A support escalation tied to one stakeholder, a pricing objection raised in a sales call, and an executive concern discussed over email may be incorrectly assembled into the wrong account narrative entirely.
Failure Mode 4: The Permission Ghost (Unauthorized Surface)
This is the silent killer of enterprise AI programs. Most RAG pipelines lack Source-System Parity. If the AI retrieves a snippet from a private executive email because it was "semantically relevant" to an intern's query, the system has failed regardless of whether anyone noticed.
Incidents like EchoLeak show exactly why retrieval-layer permission enforcement matters. In late 2025, researchers demonstrated a zero-click vulnerability in Microsoft 365 Copilot that could exfiltrate sensitive data from Copilot context without user interaction. No prompt injection required. The retrieval layer was the attack surface.
For most organizations, the permission layer isn't just a technical problem. It is an organizational liability that Legal and Security will eventually force you to solve on a deadline, under pressure, after something has already gone wrong.
The Production Wall
These four failure modes create the Production Wall. A curated demo can appear remarkably accurate. But production environments are not curated. They are noisy, fragmented, and constantly changing, with critical signals distributed across emails, calls, support threads, Slack conversations, and operational systems evolving in real time.
"You cannot solve these four problems by tuning the prompt. You have to solve them by fixing the context."

Section 3
The Deterministic Intelligence Layer
To climb over the Production Wall, enterprise architecture must evolve. The solution is not a larger context window or a more complex prompt. It is a fundamental shift in how data is prepared for the model. Enter the Deterministic Intelligence Layer: infrastructure that sits between your raw data silos and Claude, acting as the architectural antidote to the four failure modes in Section 2.
The Four Pillars
1. Precision Indexing (Ending the Token Tax)
Instead of relying on similarity search alone, the context layer resolves entities, removes duplication, and prioritizes high-signal interactions before retrieval. The model receives structured operational context rather than raw fragments competing for attention.
In Sturdy-observed deployments, replacing raw context with pre-indexed, distilled payloads has reduced token consumption by 80 to 90% on comparable workflows. Results vary by source data density and baseline architecture. You stop paying for Claude to be a search filter.
2. Signal Distillation (Solving "Lost in the Middle")
Semantic Pruning strips HTML headers, Slack noise, legal footers, and the RE: FWD: RE: reply chains that bury every actual decision in 40 lines of quoted text, distilling threads into thematic buckets: Bug Reports, Feature Requests, Sentiment Shifts. The most critical insights land at the beginning of the context window, bypassing the 30-point accuracy drop documented in long-context research.
3. Deterministic Entity Resolution (Fixing the Identity Crisis)
A Global Entity Map resolves disparate naming conventions into a single, immutable Customer ID. Claude is no longer guessing whether two conversations belong to the same account. It is being told they do.
4. Parity-Enforced Permissions (Exorcising the Permission Ghost)
The retrieval layer enforces source-system permissions before context assembly, so unauthorized records are excluded from the payload sent to the model. This is not a prompt-level instruction that can be overridden or confused. It is an architectural enforcement point that sits entirely upstream of the model.
Security becomes a structural property of the architecture, not a probabilistic instruction to the model. Incidents like EchoLeak show why this distinction matters: when permission logic lives inside the prompt, the retrieval layer remains an attack surface. When it lives at the data layer, it doesn't.
Reference Implementation: Sturdy + Claude via MCP
While the merits of this architecture are clear, building it internally results in years of maintenance debt (see Section 5). Sturdy leverages the Model Context Protocol to serve as the Context Engine for Claude, normalizing, indexing, and permission-stamping your customer intelligence layer across Salesforce, Gmail, Slack, and Zendesk before Claude ever queries it.
Claude provides the Reasoning Layer. Sturdy provides the Memory and Context Layer. Together, they move an enterprise from AI that reads your business to AI that acts on it.

Section 4
What It Unlocks: From Reading to Acting
In 2026, summarization is a commodity. The competitive advantage lies in moving from AI that reads your business to AI that acts on it. This transition requires a fundamental shift in how leadership views the AI stack and who owns what.
- IT, Data, and Platform Engineering provide the Engine (Claude): recurring operating expense. World-class reasoning, rented.
- RevOps, Data, and AI Teams provide the Map (the Deterministic Data Layer): a long-term asset. Customer intelligence, owned, not rented.
When the engine has a perfect map, the Acceleration Gap closes.
RevOps: The Revenue Architect
For the RevOps leader, a deterministic layer turns fragmented operational data into active revenue signals. Instead of building static dashboards that explain why a quarter was missed, RevOps can monitor the commercial signals that actually move deals: pricing hesitation in email, procurement delays, legal friction, competitive mentions, executive disengagement, stalled next steps, and tone changes across active opportunities.
A deterministic context layer resolves those signals to the right person, account, opportunity, and timeline before AI ever reasons over them. That is what turns scattered communication into reliable revenue action.
RevOps stops being a report generator. It becomes the operating system for revenue execution: designing the logic that turns verified commercial signals into coordinated GTM action.
Sales: Instant Account Intelligence
The average sales rep spends roughly 20% of their week on pre-call research. With a deterministic layer, the account briefing is no longer a probabilistic summary. It is a verified snapshot: "The customer's last three support tickets were resolved, but they haven't yet implemented the API update discussed in the March QBR."
Product: The Automated Feedback Loop
Product managers are often the most data-rich but insight-poor employees in the company. A deterministic layer moves PMs from reading feedback to querying insights. Claude analyzes 60 days of feedback across Slack and Zendesk and, with a single prompt, generates a high-fidelity Jira ticket including exact customer quotes, impacted account IDs, and revenue at risk.
Customer Success: Proactive Triage
In CS, latency is the enemy. A deterministic layer allows Claude to perform live triage. When a customer sends a frustrated email, the AI checks contract terms and recent product usage logs before the CSM has finished reading the subject line. It presents a Context-Aware Response ready to send, grounded in verified account data.
"The model you license today is rent. The customer intelligence layer you build is equity. One gets replaced. The other compounds."
Every account signal normalized, every entity resolved, every permission enforced. That accumulates. The organizations building this layer now are building institutional memory that makes every model they run on top of it better.

Section 5
The Build vs. Buy Reality
The instinct for most sophisticated IT and data teams is to build. It is a legitimate impulse. The stack looks deceptively simple: a few API connectors, a vector database, and some chunking logic. In the demo phase, an internal build often feels like the most cost-effective path.

The Four Hidden Engineering Hurdles
1. The Normalization Treadmill
Building a connector to Salesforce is straightforward. Maintaining the logic layer that resolves entity names across Salesforce, Slack, and Zendesk as those systems' schemas evolve is a full-time engineering job. This is Semantic Drift: hundreds of developer hours consumed by maintenance rather than innovation.
2. The Permission Mapping Paradox
Mapping row-level permissions from source systems into an AI context window is one of the most complex security challenges in modern software. Most internal builds rely on prompt-level security, which fails under the weight of incidents like EchoLeak. This isn't a technical trade-off. It is an organizational liability waiting to be forced into crisis.
3. The Latency Wall
A custom RAG pipeline often takes 5 to 10 seconds to fetch and clean data. In Sturdy-observed deployments, pre-indexed deterministic retrieval consistently operates under 1 second on production data volumes, but reaching that benchmark requires specialized search infrastructure expertise that is rarely the core competency of a generalist data team building from scratch.
4. The Token Optimization Tax
Without signal distillation, internal builds routinely pass 3x to 5x more tokens than necessary. Teams save on build costs only to spend twice as much on model API costs.
Where Does Your Engineering Dollar Go?
The strategic question isn't "Can we build this?" It's "Should we own the maintenance of this?"

Competitive advantage does not live in the plumbing. No customer chooses a vendor because their AI has a better Python script for cleaning Slack data.
By offloading the Normalization Treadmill to Sturdy, organizations are promoting their engineering teams from Data Cleaners to AI Product Owners, moving their best people away from the maintenance treadmill and toward the high-value work of building AI that drives revenue.
Buy the plumbing. Build the logic. The teams doing this are shipping revenue-generating AI workflows, while their competitors are still debugging entity-resolution scripts.
Section 6
What to Do Now: The 2026 Roadmap
The Acceleration Gap is not a permanent state. It is a choice of architecture. The move is not to wait for a smarter model. The move is to fix the context. Here are four moves for leadership to take in the next 90 days.
Move 1: Audit Your Retrieval Precision, Not Your Prompts
Most teams spend the majority of their time prompt-tuning errors caused by bad data retrieval. The action: Run a Ground Truth test. Take ten complex customer queries and manually check the data fragments Claude is being fed. If more than 20% of that data is noisy, stale, or misattributed, no prompt engineering will save the deployment. You have a plumbing problem, not a reasoning problem.
Move 2: Isolate a Multi-Source Workflow
The highest ROI for a deterministic layer is found where data is most fragmented. The action: Pick a high-value, closed-loop use case where data lives in at least three systems. For example: the path from customer feedback in Slack and Zendesk to an engineering action in Jira. Solve the context problem here, and you've built a blueprint for the rest of the organization.
Move 3: Enforce Permissions at the Data Layer
Stop treating security as a probabilistic instruction. The action: Move permission enforcement out of the system prompt and into the retrieval infrastructure. Ensure the retrieval layer enforces source-system permissions before context assembly, so unauthorized records never reach the model. The Permission Ghost is exorcised structurally, not instructionally, and the organizational liability is removed before Legal ever has to get involved.
Move 4: Define Where AI Earns the Right to Act
The distance between AI that summarizes and AI that executes is a trust gap, not a technology gap. The action: Build human-in-the-loop approval gates for high-stakes actions. Drafting a renewal contract. Creating a Jira ticket. Sending a support response. Use your deterministic layer to provide the required Confidence Equity. The threshold to target is a sub-5% error rate on AI-generated drafts. That is the point at which approval gates can be safely reduced, and workflows become self-sustaining.
Traditional probabilistic RAG architectures struggle to reach this threshold consistently at enterprise scale. Because probabilistic retrieval introduces entity errors, stale data, and permission noise, error rates on complex multi-source tasks typically stabilize in the 15 to 30% range regardless of prompt quality, even with hybrid retrieval and reranking layers added on top.
A deterministic layer that resolves entities before inference, distills the signal before retrieval, and enforces permissions before the model ever sees the data is the only architecture that makes sub-5% structurally achievable, rather than an occasional lucky outcome.
In Sturdy-observed deployments, teams that reach this threshold have consistently moved to reduced-oversight approval workflows within a quarter. Results depend on workflow complexity and baseline data quality. Reaching the sub-5% Trust Threshold is the definitive signal that an organization has graduated from "AI Experiments" to a Context Engine architecture capable of autonomous action. That is the architectural line between AI that assists and AI that acts.

Conclusion
The Architectural Advantage
Frontier models will continue to improve and commoditize. The durable advantage is no longer the model itself. It is the architecture surrounding it.
The long-term value does not live in another standalone AI interface. Interfaces change too quickly. The durable layer is the operational context infrastructure beneath them.
Organizations that solve deterministic context assembly, entity resolution, permission-aware retrieval, and operational state assembly gain a compounding advantage independent of whichever model, interface, or orchestration layer dominates next year.
Organizations that solve context architecture today are building infrastructure that compounds across model generations. As interfaces evolve and models improve, the operational context layer beneath them becomes increasingly valuable.
"The era of the Context Engine is here. Is your architecture ready for it?"
Our articles

Navigating AI Ethics
The question is no longer about whether you will use AI; it’s when. And no matter where you are on your journey, navigating the ethical implications of AI use is crucial. Ethical AI is not just a buzzword but a set of principles designed to ensure fairness, transparency, and accountability in how businesses use artificial intelligence. In the case of Sturdy, we’ve made ethical AI a core commitment. These principles guide our every move, ensuring AI benefits businesses without crossing the line into unmitigated risk.
What Is Ethical AI?
Ethical AI refers to developing and deploying AI systems that prioritize fairness, transparency, and respect for privacy. For businesses, this means using AI to make smarter decisions while ensuring that the data and technologies used do not cause harm or reinforce biases. The importance of this cannot be overstated—AI has the potential to either empower or exploit, and ethical guidelines ensure we remain on the right side of that divide.
Sturdy’s Commitment to Ethical AI
Sturdy's approach to AI revolves around several inviolable principles:
- Business-Only Data Use: Sturdy’s AI systems focus solely on improving how businesses make decisions. They don't delve into personal data or manipulate information for other purposes. The data processed by Sturdy comes from business sources like support tickets, corporate emails, or recorded calls—never from personal channels.
- No Ulterior Motives for Data: The data collected by Sturdy is knowingly provided by our customers, and the company doesn't use this data for any purpose beyond what's agreed upon. This ensures transparency and trust between the platform and its users.
- Privacy and Protection: One of the most critical aspects of Sturdy’s approach is its commitment to not allowing any entity—whether a business or government—to use its technology in ways that violate privacy. If a client were found to be doing so, Sturdy would terminate the relationship.
- No Deception: Our product is engineered to prevent deception. It never manipulates or deceives users, ensuring that the insights drawn from AI are used to enhance business practices rather than exploit loopholes.
Human Oversight and the Role of AI
At the core of Sturdy’s AI principles is the belief that AI should not replace human decision-making but augment it. Our Natural Language Classifiers (NLCs) are built to detect risks and opportunities based on the probability that a conversation indicates a particular issue. For example, when a customer complains about a "buggy" product, Sturdy’s AI might tag it as a "Bug" and label the customer as "Unhappy." However, humans remain in control—analyzing the situation and deciding the best action.
Final Thoughts
Sturdy's approach to AI exemplifies how businesses can responsibly use technology to drive growth and improve operations while safeguarding ethics. They demonstrate that AI doesn’t need to infringe on privacy or replace human decision-making. Instead, AI should be a tool that empowers teams, ensures transparency, and upholds ethical standards. Navigating the ethics of AI is not just a challenge—it’s an ongoing commitment, and Sturdy is setting a new standard for how it should be done.

You have been paywalled
(The image attached to this post is not entirely accurate but read on, and I’ll explain)
I’ve been spending a lot of time on Sturdy’s brand message lately. Part of this process entails interviewing folks from various walks of life about the current state of their businesses, their teams, and the companies they invest in.
The recurring theme: Sales Leaders aren’t having a good time right now. But you knew that already. I want to talk about what you don’t know.
After one of my interviews, I received a text with a quote by the former CEO of Swedish Airlines, Jan Carlzon.
An individual without information can’t take responsibility. An individual with information can’t help but take responsibility.
There are many different “things”’ that impact revenue: bad service, confusing products, poor response times, overselling, bug reports, price, whacky renewal processes, etc. You already knew this.
You know a lot about economic conditions, because that information is widely, and publicly available. You probably know a fair amount about “Sales Things” because your team is talking about “percent to goal” in almost every meeting, and there are a lot of discussions about what’s working and what isn’t. And, you likely review almost every deal in your pipeline.
What would happen if the opportunities in your pipeline were randomly placed in your ticket systems, CRMs, and a smattering of email inboxes? Knowing what was working with sales would get more difficult, if not impossible.
Today, the issues that affect service, product, marketing, etc., are randomly smattered across every customer-facing system in your business. The only way you “know” they happen is if someone else decides they are important enough to log or forward.
How do you get the information you need to make an impact?
Where is the information your product team needs to know?
Where is the information that your pricing team needs to know?
Where is the information that your renewals team needs to know?
If your Product Marketing Manager wants to know how their new pricing plan is working, what would inform that? A pretty good source—I’d argue the best source—of that information is sitting in emails, tickets, and call transcripts. But, if you are a Product Marketing Manager, you don’t have access to tickets, call transcripts, or customer emails.
You’ve been paywalled.
If you want to know what features to fix, there’s a data point in your Support Chatbot. When your Renewals Manager needs information on an account , they need to scroll through tickets and ask a few people, “What’s going on with this account?”
As a result, every business has smart people who rely on other people to log things, categorize things, and forward things. This is why our teams have logins to systems they seldom use - so they can find a “thing” they might need.
The irony is that the information you need to know to do your job effectively is harder to source than the information about things you can’t control.
You probably know the inflation rate. If you don’t, you can discover it in one search.
Your VP of CS probably doesn’t know “What’s the most common source of customer frustration in the last 90 days?” Why? Because that information is splashed across your business in a host of silos that VP can’t access. Imagine trying to do that job, without that answer.
Imagine if that VP could answer that question in one search, using what customers are actually saying to every person in your business.
This paywalling has made our businesses fragile and slow. The hints of the B2B slowdown were arriving at our doorsteps in emails and tickets for months. “We’re cutting costs”. “Procurement wants a discount.” Why didn’t we see this coming? Because we weren’t looking for it, and couldn’t find it.
Time to get faster, and sturdier. You have smart people who can take responsibility. Bust the paywalls and give them information they need to react and act.
Do that hard things,
Steve

How about Ethical Software?
There has been, and should be, a lot of talk about Ethical AI. Over the last several weeks, I have been revising Sturdy’s Ethical AI policy. I am trying to convey that we don’t do shady stuff and won’t let our customers do it, either.
(If you are interested in Ethical AI, we have a webinar coming up at the end of the month; the registration link is in the comments)
Writing the policy, I realized we need to talk about ethics writ large, not just as it relates to AI.
Consider the case of Allstate and Arity, as reported in a June 9 NYT story, “Is Your Driving Being Secretly Scored?” Allstate apparently owns Arity. Arity builds phone apps for things like finding gas stations. Their apps also track how you drive, although they bury that minor detail in their “consent” pages (that no one reads). They then share this data with Allstate.
Not a lot of gray area here. This is unethical.
My co-founder, Joel Passen, coined this mantra at our first startup 20’ish years ago:
“Build what you’d want to use, sell it how you’d want to be sold, and service it how you’d want to be serviced.”
I don’t think anyone downloading a Gas Station finder app wants their driving to be sent to Allstate. I would not. And I would not build it.
So, instead of an “Ethical AI” policy, I’ve decided we need an “Ethical Software Policy”. It will encompass our use of AI, our platform, and how we expect our software to be used.
Here’s a bit of a summary so far…
Sturdy’s Ethical Software Policy (WIP):
- Our product is only be used to improve how businesses make decisions so they can be better vendors to their customers;
- We will not support use cases that do not directly relate to our problem set. The use cases for our product will be obvious;
- We do not have ulterior motives for our customer’s data or their users;
- We will not let any entity, business, government, or person use our product in a way that violates a person’s privacy;
- We will not, nor will we allow our product to score or rank human beings;
- Our product will be engineered to prevent deception and must never be used to deceive people;
- Finally, If we feel that one of our customers is using our product in a way that violates our principles, we will terminate their service.
The problem is that many “Ethical Policies” are only as good as the paper they are written on. They are a checkbox on an RFP. None of us want to live in this world. Maybe it's time to try and live in a better one.
At some point, somewhere along the corporate food chain, executives need to say, “No.”
It is hard to say “no” to revenue. Do the hard things.
Let me know your thoughts.
Steve

Where good (business) ideas die
Years back I had an idea that every time a customer expressed some sort of "love" we would reach out and ask them to be a reference. The way this was supposed to work was that the Support/CS person would forward any happy customer to the marketing team as a "Reference Lead.” Then, marketing would reach out to the customer. Nothing groundbreaking here. If your business doesn't already do this, go ahead and give it a shot. Happy customers close deals for you.
And at the end of the first month, nothing. Why?
Do none of our customers like us?
Did our Support Team drop the ball?
Did the Marketing team drop the ball?
Did the customer refuse?
If you manage groups of people, you can certainly think of other examples.
Like, "Whenever there is a new customer contact, make sure you log it to Salesforce, dang it!"
Or, "Whenever there is a bug report, log it to JIRA."
The reference harvesting failure has stuck with me. It was so simple, yet it failed spectacularly.
I have three takeaways from this that guide me today:
First, in our world of "Knowledge Work" almost every new policy/idea requires a new manual task. Add it to Excel. Track it in CRM. I would say we've built an entire ecosystem centered on digital logging, but it is more like a multiverse. Every silo has its own physics with its own rules and workflows.
Second, every ‘silo-bounce’ increases the failure rate. "Take this thing from Support and log it for the Product Manager so they can recommend it to Engineering." Boing. Boing. Crash. Intersections are more dangerous than freeways.
Finally, whenever you implement a policy, it will fail unless you lean in and check on it regularly, and you probably won't. No coach, no team.
The future will be a much better place for your co-workers and customers.
Artificial Intelligence, after you do the hard things like building integrations, cleaning data, de-duping, creating a UI and then a data-API, will improve your business, your customers, and your life.
There will be no more manual logging. There is no need to ask someone to forward an event.
Your coworkers won't have the soul-sucking task of "logging it if it is important." Your customers won’t email managers, "No one has gotten back to me."
Until that time...
Tomorrow, your team will be assigned a new task to log something for someone else's team. Some people will forget. The other team will be required to read that information. Some people won't do it.
In three months, your CEO will be annoyed. "What ever happened with that one thing I asked for?"
This is one of the reasons my team and I started building Sturdy in 2019. There are too many people logging minutiae so that someone might find the time to read it. There are too many customers that fall through the cracks that could easily be saved. There are too many good ideas that die because of failed execution and lack of accountability.
It doesn't have to be this way.
.png)
Why We Don't Have Nice Things
I have always been fascinated by how product roadmaps are maintained. So much so that I feel it necessary to pen a bombastic screed on the topic.
(As an aside, when you talk to VC’s, they’ll ask, “What’s your {2-5} year roadmap?” I want to say, “Whatever needs to get built,” but I think better of it. Life Pro Tip: use words like, “disintermediate.”
I find there is little utility in years-long product roadmaps. Unless you ignore your users/customers. If you have a team conducting market research to determine what to build and then put it in a 2-year plan, then you’re ignoring your users. If you have a team advocating for your users and having hard conversations with engineering and sales, you are not ignoring your users.
This is why Gmail, 20 years later, still has the attachments at the bottom of the email instead of at the top, where they belong: the revenue team is filling the roadmap with better ways to sell your data. I digress.)
The three drivers of a company’s product roadmap are:
Things users want;
Things your sellers want;
Things your product team/engineers want.
They don’t overlap as often as you might think.
Your users want usability (and probably a ton of user-permissions stuff). They bought your product missing certain features, and they are OK with that. They primarily want your existing stuff to get better, easier to use, and easier to get data from.
Your sellers want new features. They usually want the best feature that your competitors already have.
Your product team is more complicated. Most teams want insane reliability, security, and speed. Teams run by CTO’s aspiring to wear black turtlenecks build their own UI framework from scratch so that the one thing the new thing does will be 1% better at something.
Where do they overlap?
- Your Revenue Teams and Users overlap around UI and reporting. If it looks pretty and has cool reports, it will sell software (1).
- Users and Engineering overlap in the desire for performance and reliability (2).
- Development and Revenue overlap at shiny things (3). When you hear “Minimally Viable Product,” you’ve found it. When you hear “App Store”, or “I took some screenshots,” you’ve found it.
- If you are wondering what happens when they all intersect, I don’t know. I can’t remember all three teams agreeing on a feature.
Your existing customers don’t care about shiny things. But you need to grow revenue, and the CTO is on board, so guess what gets built?
(I would like to say that building shiny things isn’t wholly a bad idea. You need to go for it every now and then. Sometimes, really cool stuff gets built. But, in my experience, that shiny MVP is going to the back of the update line the day it's shipped, and it will suck, forever. Related to this is why your “Admin” area is terrible. Don’t lie, you know it is.)
I have sat in so many board meetings where the CTO presents a roadmap, and the COO/Customer Leader freaks out. I was in an amazing one over a decade ago when the CTO’s priority was “voice enabling the product.”
Everyone blew a gasket.
If your customer falls in the woods, and no one is listening, do they make a sound?
If a user reports a bug or asks for a feature, if someone remembers to do it, it will be manually logged in a drop-down menu in some silo. It’s also probably logged by someone who has no incentive other than to close the ticket as quickly as possible. In other words, if it gets logged, it will be stored somewhere that’s hard to get to, and no one will read it.
If a user is confused, or says something sucks, someone wraps the user in a warm blanket of apologies and moves on. In the worst case scenario, the user will get something like, “that’s actually how we intended it to work!”
(Once, in a design review, a UI team told me they hid a feature because they didn’t want the users to actually use it. It allowed people to opt in to having a paper check instead of a direct deposit. “How many support tickets did this cause last month?” No one knew.)
It takes hard work to know what the customer wants, or hates. It also requires honesty, and a bit of self-flagellation.
I ran into a CxO who wanted AI to “automatically write knowledge base articles.” I hear this as, “Our product is so confusing that we can’t manage the number of questions about how to use it.”
Get honest: fix the product. No one, ever, renewed because of an awesome knowledge base. Good products don’t need AI knowledge bases. They also don’t need churn prediction or quarterly business reviews, but that’s for another time.
To break this cycle, you must be rigorous about logging every feature request, bug, and UI issue. You’ll need to understand why customers are saying, “how do I do this?” and “that’s confusing.”
(Another data point: track when your people apologize. “What are we apologizing for?”)
How will you gather this brutal truth? You need to put someone in charge of collecting data from your 5-50 systems, organizing it by account, and attaching a cost-benefit analysis to each issue. Then put it in a spreadsheet and review it every week with the Revenue, Ops, Customer and Engineering teams. Soon everyone will develop a healthy anxiety about the quality of your product. Saying “no” to shiny things will get easier.
Do this and your customers will like you again.
End rant.
Do the hard things,
Steve

Your customers don’t care about your retention rates
I spoke to an entrepreneur this week, and he said, “This company cut CS by 50% just to see what would happen.”
The same person said, “90% of the companies I talk to are canceling their CSP.”
After a recent merger of two large CSPs, one of their executives posted his resignation on LinkedIn, the TL;DR was that CS has a lot of promise but executive leadership refuses to give it the budget it needs.
CS is approaching a crisis. The root of the problem is retention, and the belief means that only one group ‘owns’ the number.
Why? No matter how much tech or flesh you throw at a retention problem, CS isn’t going to improve it in any meaningful way…alone.
If your Marketing team targets customers who won’t get value from your product and they buy it, what happens?
If your product is confusing, or buggy, or just sucks, what happens?
If your Sales team sells deals with false promises, what happens?
If your onboarding process stinks, what happens?
If your Accounting team pisses people off, what happens?
The answers to the above are obvious. What is not obvious? Which of these problems is afflicting your business right now, as you read this, because each of those issues is in a different system, silo and team.
You aren't paying attention.
No one owns retention. The obsession with retention has led us to ignore what really matters: what makes customers happy, and what does not.
Today, we have the opportunity to automatically discover almost every issue that detracts from customer satisfaction, route it to the right person, and track its resolution. The Marketing VP targets customers who need the product, the Product Team has a customer-led roadmap, the Billing Team realizes that the auto-renewal process does more harm than good, and the CRO learns which sellers are over and under-selling.
When was the last time you heard someone say, “We leave no stone unturned in our quest to resolve every customer issue rapidly and intelligently?”
I have spoken to several executives who say, “I just wouldn’t know what to do with this type of data.” I make a note to never buy their products. They don’t care about customers.
Call me crazy. I want to live in a world where every product or service I buy is awesome. So does everyone else. Focus on being awesome, and you won’t need to worry about retention.
Let’s try to make it a reality together.

You're in the pros
My neighbor asked me to speak with his son (who is not connected here on LI). The son is a mid-market account manager (post-sales) at a large SI (pure services). His remits are expansion/upsell, renewal assistance, and retention/escalation. His book has 30 customers, and its approximate value is just shy of $1mm annually.
He's stuck.
He's stuck at his company. They pay well. His role isn't challenging him anymore. He doesn't want to do pure sales or pure CS work. He is smart. He is motivated to create a career path. Right now, he can't see the forest from the trees.
After 20 minutes, he asked me what he should start, continue, and stop doing. Great question in this context.
Here was my advice. If you know me well, you know it took many more words than LinkedIn will accept in a single post. 😉
🏅 Start thinking of yourself as a professional athlete.
Professional athletes spend +90% of their time preparing for competition. Prepare like a pro for both internal and external meetings. Study your customers and learn everything you can about them. This will prepare you for your account reviews with your leadership. This will help you blow out your KPIs. This will build the foundation of success. Preparation is hard. It's tedious. You will be working harder than ever. Keep doing it. You will not see results for at least 6 mo. Keep going.
💡 Continue asking for help.
Tapping into the expertise and experiences of others is a dying art. New people offer new perspectives. Getting advice will help you learn how other pros have built their careers. As an early/mid-career person, building relationships and networks will serve you well now and in the future. You're defined by the company you keep. Expand your community. It will, eventually, unlock opportunities.
🛑 Stop going through the motions.
Lacking purpose, passion, and interest is a career-advancement death sentence. Most importantly, it leads to dissatisfaction, stagnation, and lack of fulfillment in every aspect of your life. Stop just trying to make your numbers. Kill your number. Stop relying on what got you here. Dig deeper to force yourself to grow. Every day can be the first day of school. You have the power to reinvent yourself every day.
You are in the pros now. Be a pro.

The six attributes that we consistently interview for
There were 453 jobs posted on Indeed in the US for customer success managers in the past 14 days.
On average, companies interview five candidates before making a hiring decision for a mid-level customer success position. That’s a lot of interviews—and time. With productivity being top of mind for customer leaders, new hires, assuming a good fit, will eventually increase capacity, but the process is a body blow to short-term productivity.
Then there is the risk of a bad hire - the real kidney punch. I won’t go into that in this post.
All this hiring is encouraging, and it also got me thinking about how leaders can directly impact the hiring process without all kinds of process changes and wrangling of resources.
Interviews. Ask better questions. Get better information. Make better hiring decisions.
I’ve hired dozens of post-sales people over the years, and here are six attributes that I consistently interview for.
Technical Preparedness: We sold a solution and are now delivering one. Our people must have the chops/cognition to understand complex platforms, workflows, and ecosystems. Additionally, we have to ensure from the get-go that our associates know how to prepare for a solution-oriented meeting with a customer—substance over fluff.
Attention to Detail: Our teammates must be organized, willing to follow processes, and steadfast in capturing data.
Coachability: Ideal candidates will be open and even excited about learning quickly. We look for people who take direction well. We don’t have a long window for ramp. Humility is key.
Sticktoitiveness: Being on the frontline is arduous. Our associates must be able to manage the emotional peaks and valleys.
Work Ethic: Drive is a key value here. We need people who want to work hard while they’re at work consistently and who take pride in the quality of their output.
Resourcefulness: Our teammates need to be hyper-resourceful, diggers of information, and, most of all, intellectually curious so that they can identify root causes.
Note: I haven’t hired a person in the last 20 years without them taking an assessment designed by Gary Kustis There’s nothing like getting another, unbiased data point with which to make a decision. I'm happy to share how and when I use assessments - just message me.
Also, if you're interested in interviewing like I am, check out what my friends Intertru Inc are doing. Unique and effective.
Otherwise, if you want a copy of our full behavioral interview guide for CS, you can grab it here!

The Scary Six: Response Lag
I was speaking to the COO of one of our customers a few weeks back, and he said that Sturdy’s “Response Lag” signal was his “Laptop Smasher.” This signal is defined as a “customer is asking for a status update on an unresolved issue.” If your goal is to make sure your customers feel heard, then it is a bad one.
While not AI-based, the attached regex will help you find some of these messages on your own. If your support or BI system allows you to filter on inbound messages it will provide cleaner results. (There’s quite a bit of contextual difference between a customer asking for an update and one of your people asking a customer for an update).
In most cases, this signal is pretty rare. Typically, it occurs about once per every 1,000 conversations (again, only detecting messages “coming in” from a customer).This signal is important to track for two reasons. The first is that it is almost never self-reported. It is rare for a customer-facing person to say, “Yeah, the customer is upset because I never got back to them.” I am almost certain that you have CS/Support teammates who have a much higher incidence of this signal than your best performers. It also means that you have a grumpy customer that you don’t know about.
The second reason is that Response Lags provide really good data for resolving hidden process or product gaps. If a customer is asking for an update on an issue, it is likely that several other customers are asking about the same thing. Every business is different, but Response Lags will likely indicate that there is a product, process, or person responsible for the plurality of them.
At Sturdy, we use machine learning to track, record, and alert you of Response Lags. We’re also working on some cool stuff that will track the response time of any open issue from any conversation (without requiring a customer to hit the “is this resolved?” button).
How cool would it be to have a dashboard of every “waiting for a response,” email, chat or phone call? We’re working on it.
Give the regex a try, and feel free to DM me with any questions. I hope you don’t smash a laptop. Of course, please regale us in the comments of any learnings you’d like to share on the subject.
Do that hard things,
Steve
.png)
The Scary Six: Executive Change
At the end of last year, I shared a regular expression (regex) that identifies "contract requests." That's a scary signal for people who like to keep customers.
Today, I want to discuss the scariest of the Scary Six, "Executive Change."
At my last company, Newton, this signal had the highest correlation to churn and initially resulted in a loss about 50% of the time (for many of Sturdy's customers, this is also true).
So what is it? Let's say you sell accounting services, and this happens:
"Hi, I am the new CFO, and I would like a quick rundown of your capabilities."
The response is often,
"So nice to meet you! LMK when you have 30 minutes for a quick call!"
(By the way, usage will be high during this time, and their Health Score will be green.)
On to the regex…
The first two are specific to HR services/tech, so replace "hr" with "e-commerce," "accounting," "logistics," or whatever business you're in.
Here's what they do:
1. The first detects when someone says, "Hey, we have a new VP of HR coming on board soon."
2. The second, "I will be taking over the Admin role for this account."
3. The third, "Hey, I wanted to let you know that I will be leaving at the end of January."
Remember that they return a fair number of false positives (FPs). FPs are not included in the churn rate calculation.
The frequency of "Executive Change" varies depending on the industry and segment. In the SMB cohort, it occurs in about .1% to .2% of customer conversations. In huge enterprises, around .04%.
Interestingly, this signal is much more common in the HR space, firing at .3% per conversation.
There is also a lot of variation in the severity. Still, the correlation to cancellation is the 2nd highest of any signal we currently detect at Sturdy ("I want to cancel" being the highest, obviously). For SMB customers, the churn rate for this signal, if untreated, will approach 70%. It will be lower for enterprise customers.
Another critical point is that this is a leading indicator. It often occurs long before the cancellation event.
Why is this signal such a strong indicator? At the beginning of the post, we showed a sample trigger-sequence that ended something like, "Let's do a quick demo!"
What's wrong here? I think it is because one or all of the following is happening:
1. The value of your service can't be communicated in a "quick demo."
2. The new contact has undoubtedly used and trusts a competing solution.
3. The person conducting the demo has not been trained to sell your product, overcome objections, and destroy your competition's product.
This is a perfect recipe for failure. Here's a scenario...
Acme Corp sells HR Software on M2M and yearly contracts; it receives:
10k emails and tickets per month (items).
10k items equals to about 2k conversations (1 convo = ~ 4.4 items)
.3% detection per conversation = 6 Exec Changes
Two false-positive (30% FP rate)
50% churn x 4 = 2 losses
If untreated, Acme loses two customers to this signal per month.
The good news is that, in my experience, treatment will save about one of these customers each month. How?
1. Train everyone who touches customers, billing, CS, and marketing to identify the signal.
2. Immediately send the signal to your sales AND marketing teams.
Someone should attempt to discover the product the new contact used at their former company.
3. A salesperson must schedule a demo as soon as possible. (At Newton, our KPI was to conduct the demo within ten days). The seller should come armed with useful information, like usage data candidates hired (e.g.), and be prepared to sell against the new contact's previous solution.
4. In parallel, the marketing team checks LinkedIn to see if the previous contact has landed a new job. If not, someone should reach out and see if they need help in their job search (after all, you sell to companies that hire these people). If the person has landed somewhere, send them a note, a gift basket, or whatever you think is appropriate.
5. Send the previous contact to the Sales team as an SQL.
(Shameless plug: Sturdy has AI-language models that find 1, automatically route 2, and can tell you if 3 and 4 happened.)
The result of this process is a successful "double-dip". You may save a customer and gain a lead for your sales team. Ironically, if your competition is not tracking the Executive Change signal, your chance of closing that deal is very high.

Your customers are already telling you what's going to happen.
Connect what customers say to why your numbers move. Contextual revenueintelligence, ready for any LLM — or running natively in Ask Sturdy from day one.

