blank

The Next Phase of AI: From the AMD Conference to Building Local Agents

2026-05-19T00:00:00+00:00

After attending the AMD conference this morning, my biggest takeaway wasn’t about how powerful the hardware is. It was that AMD has bet on something that will be a crucial deployment model for AI’s future.

Over the past year or two, conversations about AI have defaulted to cloud-based large models, API calls, token consumption, and model capability rankings. The implicit assumption is that as long as models keep getting stronger, enterprises just need to plug into a good API and AI transformation will happen naturally. But from my experience building enterprise AI products, reality is far more complex.

What enterprises truly care about isn’t “can I access the most powerful model?” but more fundamental questions: Can my data stay under my control? Can my knowledge base stay local? Can AI truly understand and execute my business processes? Can we control costs? Will the system keep working when the network is unstable?

This is exactly why I believe AMD has chosen the right direction. They aren’t just talking about more compute power. They’re describing a new form of AI infrastructure: local devices, local models, enterprise private knowledge bases, multi-agent orchestration, with cloud capabilities accessed only when necessary. The conference repeatedly emphasized that after the AI demand explosion, the key is no longer who burns the most compute, but who uses it more intelligently. AMD broke down AI development into a combination of laptop, local workstation, and cloud paths — rather than pushing everything to the cloud.

Cloud API Isn’t the Only Path for AI

Many people still think of AI as purely an API model: an enterprise raises a request, the system sends content to a cloud-based large model, the model returns a result, and the business system continues from there.

But this model has several inherent limitations:

First, data security. Especially in China’s enterprise environment, many companies are unwilling to send their core data, customer information, process flows, and internal knowledge bases to external platforms. Even if technical anonymization is possible, it’s hard to accept psychologically and from a compliance standpoint. What many enterprises really want is: AI is usable, but core data stays local.

Second, cost. Current cloud-based large models have significant “capability redundancy.” Many business processes don’t need the most powerful model at every step. Tasks like internal knowledge base Q&A, document screening, order information sorting, process status determination, simple data extraction, and classification value stability, low cost, controllability, and high-frequency invocation far more than using the most expensive, most powerful model every time.

Third, network. Many people overlook this when building AI applications, but in real enterprise scenarios, network stability is a major variable. If a business process is highly dependent on remote APIs, then network latency, access failures, and service fluctuations all become part of the business system’s risk profile. For high-frequency, low-latency, privacy-sensitive scenarios, local models are actually more practical.

I increasingly believe the mainstream form of AI won’t be “all cloud” but a hybrid architecture combining cloud-based large models, local models, edge devices, and enterprise private systems. The cloud handles the most complex, general, reasoning-heavy tasks; local models handle high-frequency, low-cost, privacy-sensitive tasks tightly coupled with business processes. The conference also noted that the future is device-cloud collaboration, where local models take on more work in low-cost and privacy scenarios, only calling more advanced cloud models when tasks are complex enough.

AMD’s Focus on Local AI in China Is a Very Practical Decision

AMD’s decision to emphasize local AI in China makes perfect sense.

Chinese enterprises have strong demand for AI, but that doesn’t mean they’ll unconditionally embrace cloud APIs. On the contrary, once enterprises enter real business deployment, they become very concerned about data asset ownership, knowledge base localization, system controllability, and long-term costs.

This aligns closely with my own experience building enterprise AI products. It’s not that enterprises don’t want to use AI — it’s that they struggle to accept a completely black-box, fully externally dependent, continuously token-consuming system that keeps sending core data outward. This becomes even more pronounced when AI starts entering internal business processes rather than just acting as a chat assistant.

Because once AI truly participates in business operations, it’s no longer just answering questions — it needs to understand the company’s customers, products, processes, rules, historical data, and internal experience. At that point, what’s most valuable to an enterprise isn’t “which model they called,” but whether they’ve structured their data and processes into assets that AI can invoke.

Kai-Fu Lee’s comments at the conference about enterprise AI transformation pointed to the same thing: a company’s true competitiveness must ultimately be held in its own hands. Only when AI can participate in real work within a local environment — visible, controllable, and manageable — will enterprises entrust core business to it.

I deeply agree with this. The key to enterprise AI transformation isn’t purchasing an AI tool or launching a chat interface. It’s about gradually transforming the knowledge scattered across documents, spreadsheets, meetings, personnel experience, and business systems into digital assets that agents can understand, invoke, execute, and provide feedback on.

Not Every Task Needs the Most Powerful Model

In the past, discussions about AI defaulted to “the stronger the model, the better.” But entering real enterprise scenarios reveals this assumption doesn’t always hold.

Enterprises have many tasks that are fundamentally structured execution rather than open-ended creation. Recognizing which table a form should be imported into, determining whether a customer requirement is complete, checking if an order is missing fields, organizing action items from meeting minutes, or advancing a process step based on existing workflows. These tasks certainly need AI, but not necessarily the most powerful large model.

Often, what enterprises really need is a local model that’s stable enough, cheap enough, and close enough to business data. This model doesn’t need to know everything about the world, but it needs to deeply understand the company’s knowledge base, process rules, and business context. It doesn’t have to deliver stunning answers every time, but it must be able to embed into business flows consistently, reliably, and at low cost.

This is the true value of local AI. It’s not about directly competing with cloud-based large models. It’s about taking on the work that cloud models aren’t suited for. The more rational future architecture would have local models handling most high-frequency, repetitive, privacy-sensitive, and process-oriented tasks, with cloud models intervening only when complex reasoning, cross-domain judgment, or high-quality generation is needed.

This way, the AI system is no longer an “every request goes to the cloud” application, but more like a layered intelligence system. The bottom layer has local models and agents responsible for understanding internal enterprise data and processes; the top layer has more powerful cloud models handling a minority of complex problems. What enterprises truly need to build isn’t point-solution model capability, but a long-term evolvable agent orchestration system.

Enterprise AI Transformation Must Start with Data Structuring

From an enterprise perspective, the future competitive differentiator may not be “whether you’ve adopted AI,” but “whether you’ve structured your business into assets AI can use.”

Enterprises need to gradually structure their knowledge bases, customer feedback, product information, business processes, historical projects, expert experience, and internal decision-making logic. Otherwise, even with the most powerful large model, it can only answer questions based on external general knowledge — it can’t truly understand why a particular company operates the way it does.

This is the biggest obstacle in many enterprise AI projects. Not that the model isn’t powerful enough, but that the enterprise’s own data isn’t ready, processes haven’t been clearly articulated, and business rules still exist only in people’s experience. AI can generate content, but it can’t magically understand an organization’s tacit knowledge.

So when enterprises embrace multi-agent systems, they shouldn’t start with “I want to build many agents.” They should start with “what knowledge and processes do I have worth structuring?” An agent without data, tools, process interfaces, and a feedback loop is just a more sophisticated chatbot. Only when enterprises structure their digital assets can agents truly participate in business execution.

I believe enterprise AI transformation will increasingly resemble organizational engineering. Not placing AI alongside existing processes, but rethinking which processes can be decomposed, which judgments can be assisted by models, which tools can be exposed to agents, which results need human review, and which feedback can feed back into system updates.

Throughout this process, CEO or top management push is critical. AI transformation isn’t a small tool upgrade for the IT department. It touches data, processes, organizational collaboration, and management methods. Without top-down impetus, it’s easy to remain in the trial phase, ending up with a few scattered AI plugins rather than a system that truly changes how the enterprise operates.

Engineers No Longer Just Deliver Code

Another point that resonated with me at the conference was the changing role of engineers.

In the past, an engineer’s core deliverable was code. Product managers raised requirements, engineers implemented them, QA verified, and user feedback fed into the next iteration cycle. But in the AI era, this division of labor is blurring. Since AI can participate in code generation, an engineer’s value can’t remain limited to “writing code.” It must move upstream to problem definition, system design, business understanding, and product feedback loops.

Future engineers will need to use engineering thinking to participate in earlier-stage work. Why a market trend matters, what users truly need, which requirements can be systematically expressed, which processes can be decomposed into agent-executable tasks, where human review must be preserved, and which feedback should enter the next iteration.

In other words, engineers no longer just deliver code. They deliver a product system that can run, be verified, and be iterated upon. This system includes not only frontend pages and backend APIs, but also data structures, AI tool invocation, agent orchestration, testing and verification, user feedback, and continuous optimization.

This aligns with what I’ve been thinking about recently. AI-assisted programming doesn’t mean engineers just let agents auto-generate code. What truly matters is that engineers systematize their workflows — organizing requirements analysis, solution design, code implementation, testing verification, and feedback iteration into a stable loop. AI can improve each phase’s efficiency, but the prerequisite is that people must first clarify the process.

Individuals Should Also Structure Their Skills

If enterprises need to structure their data and processes into agents, the same applies to individuals.

I increasingly believe that everyone will need to structure their knowledge, experience, judgment methods, and workflows into a Skill. This Skill isn’t just a prompt or an automation script. It’s a reusable working methodology.

How an engineer analyzes requirements, breaks down tasks, makes technical choices, does code reviews, designs tests, and handles user feedback — all of these can gradually be abstracted. In the past, these capabilities only existed in people’s minds — hard for others to replicate, hard for oneself to scale. But with the help of AI agents, this experience can be encapsulated into an executable, callable, iterable system.

This leads to a fascinating future vision: work relationships may no longer be about hiring someone’s time, but about hiring the agent system someone has trained and maintained over the long term.

The person still matters. Direction-setting, critical decisions, quality reviews, and value judgments still need human responsibility. But a large amount of specific execution work can be handed to the local agent they’ve built. A person might no longer spend eight hours on repetitive work, but instead spend one or two hours checking results, adjusting direction, updating their Skill, and continuing to optimize their agent based on new project experience.

From this perspective, how individual capability is expressed will change. In the past, we used resumes to prove what we’ve done and interviews to prove what we know. In the future, perhaps a person’s truly valuable asset is whether they have a long-refined workflow and whether they can turn their experience into a running agent.

The Future of Compute May Be More Decentralized

If we look further ahead, AI compute may also become more decentralized.

Today’s AI infrastructure is largely concentrated in the cloud, with large model companies and cloud providers supplying compute, models, and APIs. But if local devices become increasingly powerful, local models increasingly capable, and agent orchestration increasingly mature, then not all intelligence needs to be concentrated on a few cloud platforms.

Everyone can have their own local agent. Every company can have its own enterprise agent network. An organization doesn’t have to send all tasks to a single centralized model. Instead, it can combine many small, specialized agents with specific experience. What employers may need isn’t infinite cloud compute, but the ability to orchestrate agents from different people, different departments, and different business systems.

This isn’t to say cloud large models aren’t important. On the contrary, they remain very important — handling more complex, general, and higher-level reasoning tasks. But local agents will take on more and more specific, daily, personalized, process-oriented work. The future AI world may not be one super brain doing everything for everyone, but countless local intelligences forming a more distributed work network.

This is why I think AMD’s path has value. They’re not just talking about hardware performance. They’re providing infrastructure for this distributed, localized, hybrid AI paradigm. If the future truly sees a large number of local models, local agents, enterprise private knowledge bases, and on-device inference, then hardware, software stacks, and development ecosystems must reorganize around this change.

Conclusion

So, after attending this AMD conference, I’m more certain than ever: the next phase of AI isn’t just about models getting bigger or API calls getting cheaper. It’s about individuals and enterprises beginning to structure their knowledge, processes, and data into running agents.

For enterprises, what truly matters isn’t which AI tool they bought, but whether they’ve turned their business data, industry knowledge, and process rules into digital assets, and built controllable, orchestrateable, feedback-driven multi-agent systems on that foundation.

For individuals, what truly matters isn’t whether they know how to use a particular AI tool, but whether they can structure their experience, judgment, and working methods into Skills, and further into their own local agents.

In this sense, AMD’s local AI story isn’t just about hardware. What it really points to is a structural shift: AI moving from cloud capability to local capability, from general models to enterprise processes, from one-time invocation to long-term structuring. Future competition may not be about the ability to call AI, but about the ability to own, train, and continuously update your own agents.

从 AMD 大会看 AI 的下一阶段：把知识沉淀成本地 Agent

2026-05-19T00:00:00+00:00

上午去 AMD 大会凑热闹之后，我最大的感受不是硬件有多强，而是 AMD 这次押中的，其实是 AI 未来非常重要的一种部署方式。

过去这一两年，大家谈 AI，默认想到的是云端大模型、API 调用、token 消耗和模型能力排行榜。好像只要模型越来越强，企业只需要接入一个足够好的 API，AI 转型就能自然发生。但从我自己做企业 AI 产品的经验来看，现实远没有这么简单。

企业真正关心的，往往不是”我能不能调用到最强的大模型”，而是另外几个更朴素的问题：我的数据能不能留在自己手里？我的知识库能不能沉淀在本地？我的业务流程能不能被 AI 真正理解和执行？成本能不能控制？网络不稳定时系统还能不能继续工作？

这也是为什么我觉得 AMD 这次的方向选得非常对。它不是在讲更强的算力，而是在讲一种新的 AI 基础设施形态：本地设备、本地模型、企业私有知识库、多 Agent 编排，以及必要时再调用云端能力。大会里反复提到，AI 需求爆发之后，关键不再是谁烧掉最多算力，而是谁能更聪明地使用算力；AMD 也把 AI 开发拆成从 laptop、本地 workstation 到 cloud 的组合路径，而不是把所有任务都推到云端。

AI 不会只有云端 API 一条路

现在很多人对 AI 的理解，仍然停留在 API 模式上。企业提出一个需求，系统把内容发给云端大模型，大模型返回结果，然后业务系统再继续往下走。

但这个模式有几个天然限制：

第一是数据安全。尤其是在中国的企业环境里，很多企业并不愿意把自己的核心数据、客户资料、工艺流程、内部知识库全部发到外部平台。哪怕技术上可以脱敏，心理上和合规上也很难完全接受。很多企业真正想要的是：AI 可以用，但核心数据最好还是留在本地。

第二是成本。现在云端大模型的算力其实存在很大的”能力冗余”。很多企业业务流程并不需要每一步都调用最强模型。比如内部知识库问答、文档初筛、订单信息整理、流程状态判断、简单的数据抽取和分类，这些任务更看重稳定、低成本、可控和高频调用，而不是每一次都要使用最贵、最强的模型。

第三是网络。很多人做 AI 应用时容易忽略这一点，但在真实企业场景里，网络稳定性本身就是一个很大的变量。如果一个业务流程高度依赖远程 API，那么网络延迟、访问失败、服务波动都会变成业务系统的一部分风险。对于一些高频、低延迟、强隐私的场景，本地模型反而更符合实际需求。

所以我越来越觉得，未来 AI 的主流形态不会是”全部云端化”，而是云端大模型、本地模型、边缘设备和企业私有系统之间的混合架构。云端负责最复杂、最通用、最需要强推理的任务；本地模型负责高频、低成本、隐私敏感、和业务流程结合紧密的任务。大会中也提到，未来会是设备与 cloud 协作，本地模型在低成本和隐私场景中承担更多工作，只有任务足够复杂时才调用更高级的云端模型。

AMD 选择在中国讲 local AI，是一个很现实的判断

我觉得 AMD 选择在中国强调 local AI，是非常正确的。

中国企业对 AI 的需求很强，但这种需求并不等于所有企业都会无条件接受云端 API。恰恰相反，很多企业一旦进入真实业务落地，就会非常在意数据资产的归属、知识库的本地化、系统的可控性，以及长期成本。

这和我自己做企业 AI 产品的感受非常一致。企业并不是不想用 AI，而是它们很难接受一种完全黑盒、完全外部依赖、持续消耗 token、核心数据还要不断外传的系统。尤其当 AI 开始进入企业内部流程，而不只是做一个聊天助手时，这个问题会变得更加明显。

因为 AI 一旦真正参与业务，它就不再只是回答问题，而是要理解企业的客户、产品、流程、规则、历史数据和内部经验。这个时候，企业最有价值的东西其实不是”调用了哪个模型”，而是自己有没有把数据和流程沉淀成可被 AI 调用的资产。

大会里李开复提到的企业 AI 转型，其实也指向这一点：企业真正的竞争力最终必须掌握在自己手里，只有当 AI 能在本地环境中参与真实工作，并且保持可见、可控、可管理，企业才会放心把核心业务交给 AI。

这句话我非常认同。因为企业 AI 转型的关键，不是采购一个 AI 工具，也不是上线一个聊天框，而是要把企业内部原本分散在文档、表格、会议、人员经验和业务系统里的知识，逐渐变成可以被 Agent 理解、调用、执行和反馈的数字化资产。

不是所有业务都需要最强的大模型

过去大家讨论 AI，经常会默认模型越强越好。但真正进入企业场景之后，会发现这个判断并不总是成立。

企业里有很多任务，本质上不是开放式创造，而是结构化执行。比如识别一份表格应该导入到哪个数据表，判断一个客户需求是否完整，检查一份订单是否缺少字段，整理一段会议纪要里的待办事项，或者根据已有流程推进下一步。这些任务当然需要 AI，但未必需要最强的大模型。

很多时候，企业更需要的是一个足够稳定、足够便宜、足够靠近业务数据的本地模型。这个模型不一定要知道全世界的知识，但它需要非常了解这家公司的知识库、流程规则和业务语境。它不一定每次都给出惊艳的答案，但它要能持续、可靠、低成本地嵌入到业务流里。

这也是 local AI 的真正价值。它不是为了和云端大模型正面对抗，而是承担云端大模型不适合承担的那部分工作。未来更合理的架构，应该是本地模型处理大部分高频、重复、隐私敏感和流程化任务，云端大模型只在需要复杂推理、跨领域判断或者高质量生成时介入。

这样一来，AI 系统就不再是一个”每次都向云端发请求”的应用，而更像是一个分层的智能系统。底层有本地模型和本地 Agent，负责理解企业内部数据和流程；上层有更强的云端模型，负责处理少数复杂问题。企业真正要建设的，不是单点模型能力，而是一套可以长期演进的 Agent 编排体系。

企业 AI 转型必须从数据沉淀开始

如果从企业角度看，未来的竞争重点可能不是”有没有接入 AI”，而是”有没有把自己的业务沉淀成 AI 可以使用的资产”。

企业需要把自己的知识库、客户反馈、产品信息、业务流程、历史项目、专家经验、内部决策逻辑逐步结构化。否则即使用了最强的大模型，它也只能基于外部通用知识回答问题，很难真正理解这家企业为什么这样运转。

这也是很多企业 AI 项目落地时最大的阻力。不是模型不够强，而是企业自己的数据还没有准备好，流程还没有被清晰表达，业务规则还停留在人的经验里。AI 可以生成内容，但它很难凭空理解一个组织的隐性知识。

所以企业拥抱 Multi-Agent，不应该从”我要做很多 Agent”开始，而应该从”我有哪些知识和流程值得沉淀”开始。一个 Agent 如果没有数据、没有工具、没有流程接口、没有反馈闭环，它就只是一个更复杂的聊天机器人。只有当企业把自己的数字化资产沉淀下来，Agent 才能真正参与业务执行。

我觉得未来企业的 AI 改造，会越来越像一次组织工程化改造。不是把 AI 放在现有流程旁边，而是重新思考哪些流程可以被拆解，哪些判断可以被模型辅助，哪些工具可以暴露给 Agent，哪些结果需要人来 review，哪些反馈可以反过来继续更新系统。

在这个过程中，CEO 或企业高层的推动非常重要。因为 AI 转型不是某个技术部门的小工具升级，而是会触碰到数据、流程、组织协作和管理方式。如果没有自上而下的推动，很容易停留在试用阶段，最后变成几个零散的 AI 插件，而不是一套真正改变企业运行方式的系统。

工程师交付的不再只是代码

大会里还有一个点我很有感触，就是工程师角色的变化。

过去工程师的核心交付物往往是代码。产品经理提出需求，工程师负责实现，测试负责验证，用户反馈再进入下一轮迭代。但在 AI 时代，这个分工会越来越模糊。因为 AI 可以参与代码生成，工程师的价值就不能只停留在”写代码”本身，而要更多前移到问题定义、系统设计、业务理解和产品闭环里。

将来的工程师，需要用工程化思维参与更早期的工作。比如市场热点为什么重要，用户真实需求是什么，哪些需求可以被系统化表达，哪些流程可以拆成 Agent 能执行的任务，哪些地方必须保留人工审核，哪些反馈需要进入下一轮迭代。

也就是说，工程师不再只是交付代码，而是要交付一个可以运行、可以验证、可以迭代的产品系统。这个系统不仅包括前端页面和后端接口，也包括数据结构、AI 工具调用、Agent 编排、测试验证、用户反馈和持续优化。

这和我自己最近一直在思考的方向也很一致。AI 编程并不意味着工程师只需要让 Agent 自动写代码。真正重要的是，工程师要把自己的工作流工程化，把需求分析、方案设计、代码实现、测试验证、反馈迭代这些环节组织成一个稳定的闭环。AI 可以提高每个环节的效率，但前提是人要先把流程想清楚。

个人也应该沉淀自己的 Skill

如果企业需要把自己的数据和流程沉淀成 Agent，那么个人其实也一样。

我现在越来越觉得，每个人未来都需要把自己的知识、经验、判断方式和工作流程沉淀成一种 Skill。这个 Skill 不只是一个提示词，也不只是一个自动化脚本，而是一套可以复用的工作方法。

比如一个工程师如何分析需求，如何拆分任务，如何判断技术选型，如何做 code review，如何设计测试，如何处理用户反馈，这些其实都可以逐步被抽象出来。过去这些能力只存在于人的脑子里，别人很难复制，自己也很难规模化。但在 AI Agent 的帮助下，这些经验有机会被封装成一个可执行、可调用、可迭代的系统。

这会带来一个很有意思的未来想象：未来的工作关系，可能不只是雇佣一个人的时间，而是雇佣这个人长期训练和维护出来的一套 Agent 能力。

一个人仍然很重要。因为方向判断、关键决策、质量 review、价值取舍，仍然需要人来负责。但大量具体执行工作，可以交给这个人沉淀出来的本地 Agent。这样一个人每天可能不再需要花八小时处理重复性工作，而是花一两个小时检查结果、调整方向、更新自己的 Skill，并根据新的项目经验继续优化 Agent。

从这个角度看，个人能力的表达方式会发生变化。过去我们用简历证明自己做过什么，用面试证明自己会什么。未来也许一个人真正有价值的资产，是他有没有一套经过长期打磨的工作流，能不能把自己的经验变成可以运行的 Agent。

未来的算力可能会更加去中心化

如果继续往远处想，AI 算力也可能会变得更加去中心化。

今天的 AI 基础设施很大程度上集中在云端，由大型模型公司和云厂商提供算力、模型和 API。但如果本地设备越来越强，本地模型越来越可用，Agent 编排越来越成熟，那么未来未必所有智能都要集中在少数云端平台上。

每个人可以有自己的本地 Agent，每家公司可以有自己的企业 Agent 网络。一个组织不一定要把所有任务都交给一个巨大的中心化模型，而是可以把很多小的、专业的、带有具体经验的 Agent 组合起来。雇主需要的，可能不是无限大的云端算力，而是把不同人的 Agent、不同部门的 Agent、不同业务系统里的 Agent 编排到一起。

这并不是说云端大模型不重要。相反，云端大模型仍然会非常重要，它会承担更复杂、更通用、更高阶的推理任务。但本地 Agent 会承担越来越多具体、日常、个性化、流程化的工作。未来的 AI 世界，可能不是一个超级大脑替所有人做事，而是无数个本地智能体共同组成一个更加分布式的工作网络。

这也是我觉得 AMD 这条路径有价值的地方。它不是只在讲硬件性能，而是在为这种分布式、本地化、混合式的 AI 形态提供基础设施。如果未来真的会出现大量本地模型、本地 Agent、企业私有知识库和设备端推理，那么硬件、软件栈和开发生态都必须围绕这种变化重新组织。

结语

所以，参加完这次 AMD 大会之后，我更确定了一件事：AI 的下一阶段，不只是模型继续变大，也不只是 API 调用继续变便宜，而是个人和企业都要开始把自己的知识、流程和数据沉淀成可以运行的 Agent。

对企业来说，真正重要的不是买了哪个 AI 工具，而是有没有把自己的业务数据、行业知识和流程规则变成数字资产，并在此基础上建立可控、可编排、可反馈的 Multi-Agent 系统。

对个人来说，真正重要的也不是会不会使用某一个 AI 工具，而是能不能把自己的经验、判断和工作方法沉淀成 Skill，进一步形成属于自己的本地 Agent。

从这个意义上说，AMD 这次讲的 local AI，不只是一个硬件故事。它背后真正指向的，是 AI 从云端能力走向本地能力，从通用模型走向企业流程，从单次调用走向长期沉淀的一次结构性变化。未来的竞争，可能不只是调用 AI 的能力，而是拥有、训练和持续更新自己 Agent 的能力。

From Tools to Capabilities: Why SaaS Is Disappearing

2026-04-26T00:00:00+00:00

I recently read Amazon CEO Andy Jassy’s shareholder letter, where he mentioned that Amazon’s 2026 capital expenditure is projected at around $200 billion, with a significant portion going to AI infrastructure. OpenAI alone has signed long-term contracts exceeding $100 billion and has begun cutting SaaS budgets.

Many interpret this as a “bold bet on the future of AI.” But from another perspective, it’s more like a monopoly game — using capital scale to push the barrier to entry for AI infrastructure to a level that the vast majority of small and medium enterprises can never reach. This means the demand for AI infrastructure isn’t something that will emerge in the future — it has already been locked in. Once computing power, models, and platforms are locked up by a handful of companies through capital scale, the barrier to entry at this layer becomes so high that most companies simply cannot participate.

The problem is that this change will propagate directly upstream. For the SaaS industry, this isn’t distant news — it’s an existential threat right at the doorstep.

The SaaS Business Model Is Collapsing

Many describe the SaaS predicament as “value anchors drifting” — customers are no longer satisfied with “renting software to use themselves”; they want experiences reshaped by AI. This characterization is too gentle. The reality is that the middle layer that SaaS relies on for survival is collapsing.

For the past two decades, SaaS business models were built on this logic: enterprises need someone to help move useful software to the cloud and continuously maintain, update, and integrate it. SaaS companies are this “middleman,” earning “connection value.”

But now, this layer of connection is disappearing because the upstream logic has changed. Hyperscalers like AWS and Azure are no longer just providing infrastructure — they’re starting to offer AI capabilities, agent capabilities, and even directly handle business logic. Meanwhile, downstream user logic is also changing: they want to simply state their needs and have them resolved, without logging in, configuring, or learning entire workflows. Users are bypassing SaaS entirely, solving problems directly with ChatGPT or various agents, and can even have agents operate their computers to complete tasks that previously required SaaS.

Document organization, spreadsheet generation, statistical analysis — these typical SaaS scenarios are being rapidly absorbed. SaaS companies caught in the middle suddenly find themselves losing their raison d’être in a squeeze from both sides.

This is also why many SaaS companies’ first reaction — adding an AI chatbox to their existing product — is almost destined to fail. Because a chatbox doesn’t change the essence of SaaS. It’s still a tool that requires people to log in, click, configure, and learn. What customers really want is “things get done without logging into anything.” When behavioral patterns change, optimizing the existing product form becomes meaningless. I’ve previously analyzed in detail how to approach agent-oriented software development.

“Going Back to a Blank Slate” Is a Giant’s Privilege

Jassy repeatedly emphasized in his letter that true leaders must dare to “go back to the beginning and rethink from a blank slate.” He cited examples like the Bedrock team rewriting their core engine in 76 days and Alexa+ completely restructuring its brain. These words sound inspiring, but for most SaaS companies, “going back to a blank slate” is essentially suicide.

A SaaS company with stable revenue, hundreds of customer contracts, fixed team structures, and cash flow pressures — how could it possibly reinvent itself overnight? You have customers, contracts, teams, and cash flow pressures. You can’t truly “tear everything down and start over.”

Three Pragmatic Paths Forward

Since breaking the giants’ monopoly is nearly impossible, what can small and mid-sized SaaS companies do?

Become a “Component Supplier” for the Agent Ecosystem

Rather than trying to build an all-powerful AI assistant, break yourself down into atomic capabilities that agents can call. Let other agents access your data, your rules, and your industry knowledge through APIs or CLIs, rather than having humans log into your web interface. The CLI is essentially the “business interface” of the AI era. When invocation becomes mainstream, the product itself is no longer the unit of delivery — capability is.

Hide in Niches Too Deep for Giants to Bother With

General scenarios — CRM, HR, project management, finance — will almost certainly be swept away by large platforms using AI. But vertical domains requiring deep industry know-how, complex compliance, relationship-based delivery, and localized service are areas where giants typically won’t or can’t go deep.

Future opportunities for small SaaS may not lie in “better project management tools” but in “tools that better understand filing processes,” “tools that better understand niche tax or labor laws,” or “tools that better understand specific manufacturing quality inspection standards.” This isn’t a sexy path, but it might be the only realistic one. After all, no matter how powerful AI becomes, it still needs to be fed data — and truly valuable data often hides in these extremely narrow, deeply specialized scenarios.

Become the “Requirement Translation + Engineering Execution” Middle Layer

Users often don’t know what they truly need. What they express are vague, outcome-oriented requirements rather than directly executable instructions. There still needs to be a layer of “translation” in between — converting natural language and business objectives into structured processes, data, and constraints.

At the same time, many problems won’t disappear with AI’s emergence — they’ll become even more complex: how to unify data formats, how to connect processes, how to control permissions, how to ensure security and compliance. These are all highly engineering-intensive problems. In other words, even if agents can directly complete tasks, the systems behind those tasks still need to be built and maintained.

This set of capabilities is where software companies might truly find their place in the future.

Who Gets to Stay at the Table?

So the question isn’t really whether SaaS will be replaced by AI, but whether you’re still understanding SaaS the old way. If today you still see yourself as a “software company” and still believe that the product and technology themselves are moats, the problem has already emerged. Because technology is becoming part of the infrastructure — encapsulated and invoked, not purchased.

Software won’t disappear, but its form is changing. Past software was a tool for people to use; future software is a capability to be called.

从工具到能力：SaaS 的位置正在消失

2026-04-26T00:00:00+00:00

前段时间读到亚马逊的CEO Andy Jassy 在今年的股东信里有个提到Amazon 2026 年的资本开支预计约 2000 亿美元，其中很大一块砸向 AI 基础设施。OpenAI 一家就签了超过 1000 亿美元的长期合同，并开始削减SaaS方向的预算。

很多人把这个解读为”豪赌 AI 未来”。但换个角度看，它更像是一场垄断游戏——用资本规模把 AI 基础设施的参与门槛拉到绝大多数中小企业永远无法企及的高度。意味着 AI 基础设施的需求，不是在未来才会出现，而是已经被提前锁定了。当算力、模型和平台被少数几家公司用资本规模锁住之后，这一层的参与门槛，就已经高到绝大多数公司无法进入了。

问题是，这种变化，会直接往上层传导。这对 SaaS 行业来说，不是远在天边的新闻，而是近在眼前的生存威胁。

SaaS的商业模式在坍塌

很多人对 SaaS 困境的描述是”价值锚点在漂移”——客户不再满足于”租一套软件自己用”，他们要的是被 AI 重塑过的体验。这种说法太温和了。实际情况是，SaaS 赖以生存的中间层正在坍塌。

过去二十年，SaaS 的商业模式建立在这样一个逻辑上：企业需要有人帮他们把好用的软件搬到云端，并且持续维护、更新、集成。SaaS 公司就是这个”中间人”，赚取的是”连接价值”。

但现在，这一层连接正在消失，因为上层的逻辑发生了变化。AWS、Azure 这些 hyperscaler，不再只是提供基础设施，而是开始直接提供 AI 能力、agent 能力，甚至可以直接承接业务逻辑。而下层的用户逻辑也在变化，他们想要的是直接说出需求，事情就被解决，不需要登录，配置，学习整个流程。用户开水绕过 SaaS、直接用 ChatGPT 或各种 agent 解决问题，甚至可以直接让agent去操作自己的电脑，完成原本需要 SaaS 的工作。

文档整理、表格生成、统计分析，这些曾经是 SaaS 的典型场景，现在正在被快速吞掉。夹在中间的 SaaS 公司，突然发现自己在一场两头挤压中失去了存在的必要性。

这也是为什么很多 SaaS 公司的第一反应——在现有产品里加一个 AI 聊天框——几乎注定失败。因为聊天框没有改变 SaaS 的本质。它仍然是一个需要人登录、点击、配置、学习的工具。而客户真正想要的，是”不用登录任何东西，事情就被解决了”。当行为模式发生变化的时候，原有产品形态的优化，是没有意义的。我在之前的文章中也详细分析过应该怎么做面向agent设计的软件开发。

“回到白纸”是一种巨头特权

Jassy 在信里反复强调，真正的领导者要敢于”回到起点，从一张白纸开始重想”。他举了 Bedrock 团队 76 天重写核心引擎、Alexa+ 完全重构大脑的例子。这些话听起来振奋人心，但对大多数 SaaS 公司来说，回到白纸”基本等于自杀。

一家已经有稳定收入、有成百上千客户合同、有固定团队分工的 SaaS 公司，怎么可能一夜之间把自己推翻重来？你有客户、有合同、有团队、有现金流压力。你不可能真的”推倒重来”。

三条务实的出路

既然巨头垄断难以打破，那中小型的SaaS公司还能做什么？

做 agent 生态的”零件供应商”

与其试图做一个全能的 AI 助手，不如把自己拆成 agent 可以调用的原子化能力。让其他 agent 通过 API 或 CLI 调用你的数据、你的规则、你的行业知识，而不是让人类登录你的网页界面。CLI 本质上是 AI 时代的”商业接口”。当调用成为主流，产品本身就不再是交付单位，能力才是。

躲进巨头不愿深耕的非常垂的领域

通用场景——CRM、HR、项目管理、财务——几乎肯定会被大平台用 AI 横扫。但那些需要深度行业 know-how、复杂合规、人情交付、本地化服务的垂直领域，巨头通常不愿或不能深耕。

未来的小 SaaS 机会可能不在”更好的项目管理工具”，而在”更懂的申报流程的工具”、”更懂的某个小众税法或劳工法的工具”、”更懂的特定制造业质检标准的工具”。这不是性感的路线，但可能是唯一现实的路线。毕竟，AI 再强，也需要喂数据；而真正有价值的数据，往往藏在这些极窄、极深的场景里。

做”需求转译 + 工程化承接”的中间层

用户很多时候，并不知道自己真正需要什么。他们说出来的是模糊的需求，是结果导向的表达，而不是可以直接执行的指令。这中间仍然需要一层”转译”——把自然语言、业务目标，转成结构化的流程、数据和约束。

同时，很多问题也不会因为 AI 的出现而消失，反而会变得更复杂：数据格式如何统一，流程如何衔接，权限如何控制，安全与合规如何保证。这些都是高度工程化的问题。也就是说，即使 agent 可以直接完成任务，任务背后的系统仍然需要被构建和维护。

这部分能力，才是未来软件公司真正可能留下来的位置。

谁还能留在牌桌上？

所以问题其实不在于SaaS 会不会被 AI 取代，而在于是不是还在用过去的方式理解 SaaS。如果今天还把自己当成一个”做软件的公司”，还相信产品和技术本身是护城河，那问题就已经出现了。因为技术正在变成基础设施的一部分，被封装、被调用，而不是被购买。

软件不会消失，但软件的形态正在改变。过去的软件，是给人用的工具；未来的软件，是被调用的能力。

Four Tier Testing En

2026-04-22T00:00:00+00:00

delete

Testing, Testing, and More Testing: A Four-Tier Strategy for Agent Workflows

2026-04-22T00:00:00+00:00

💡 In the Agent era, errors are produced and spread faster than ever. Traditional testing processes can’t keep up with development speed anymore. This article proposes a four-tier testing strategy—Unit, API, Smoke, and E2E—with detailed timing for each tier within Agent workflows, helping developers balance efficiency and quality.

Agent has made code generation incredibly fast, but it’s also amplified something else exponentially: errors are produced faster and spread faster too. Previously I could make three logic changes in a day; now an Agent does it in minutes. A bug used to be a single function gone wrong, but now the Agent might produce:

• The function itself is fine, but the interface contract changed • It runs locally, but breaks after merging to branch • Individual features work, but break other chains when released

So testing can no longer be defined as a post-development process—it needs to be part of the Agent development orchestration. This article shares how I approach this now.

Four Types of Testing

My primary definitions are: unit tests, API tests, smoke tests, and E2E tests. These different test levels need to run at different times, otherwise they’ll slow down development. Unit tests catch low-level errors; API tests catch collaboration errors; smoke tests catch integration incidents; E2E tests catch user experience failures.

Test Layer	The Real Question It Answers	What Happens If Missing
Unit Test	Did this local change break the most basic logic?	Bugs slip through at the cheapest stage
API Test	Can modules still collaborate per contract?	Individual features work, but integration fails
Smoke Test	After merging, are core paths still alive?	Looks mergeable, but explodes on release
E2E Test	Does the full user journey still work?	Engineering works, but user paths break

When Each Test Should Run

Test Type	Best Timing	Primary Goal	Worst Misuse
Unit Test	Run immediately after Agent completes a local function/module change	Catch local logic errors fast	Using it to verify entire business flows
API Test	Before submitting feature branch	Verify module contracts, I/O, dependencies	Over-relying on mocks, testing fake APIs
Smoke Test	Before PR merge or before merging to release	Confirm core paths survive	Cramming too many scenarios
E2E Test	Release candidate, before critical launch	Verify real user journeys	Making it the daily dev inner loop

Earlier tests should be faster, narrower, and cheaper; later tests should be fewer, heavier, and closer to production.

When Should Unit Tests Run?

Unit tests aren’t for pre-release—they’re part of the Agent development inner loop.

For these types of local, deterministic, cheap-feedback tasks, unit tests should run immediately:

• Pure function changes • Rule logic changes • State transition changes • Schema mapping changes • Data cleaning logic changes • Tool parameter assembly • Import/export field handling

In my Agent workflow, unit tests are attached to two points: run automatically after each local implementation, and run again before Agent finishes this round of modifications. These tests need to be fast, so they’re best added by the Agent automatically, triggered after local logic changes, with commits for each small feature point and pre-commit hooks to run related test files.

Change Type	When to Run Unit Tests
New pure function	Run immediately after writing
Modify existing rule	Run related cases immediately
Fix bug	Add regression test first, then code changes, then run
Refactor implementation (no behavior change)	Run immediately after change

When Should API Tests Run?

With limited context leading to memory inaccuracy, or multiple agents modifying simultaneously, these issues frequently occur:

• Parameter name changed, caller still uses old field • Return structure changed, downstream parser not updated • Tool call order looks fine, but state semantics changed completely • DB write succeeded, but API response contract changed • One module added default value, another module falsely judges success

So before submitting a feature branch or opening a PR, a complete API test is needed.

My API tests cover:

• Frontend-backend API contracts • Tool call input/output contracts • DB read/write boundaries • Webhook/event payloads • File import/export structures • Prompt output schemas • Agent step state transfer formats

API tests are usually slower than unit tests, but not requiring CI yet. So I prioritize: after Agent finishes feature self-test, let it complete all API test code and verify. Trigger via hook before pre-push, and ideally run again before PR creation.

When Should Smoke Tests Run?

Smoke tests verify whether the core flow breaks after changes enter integration state. Since I manage worktree tests myself, smoke tests run at only two points: before PR merges to worktree, and before merging to the test release version.

Smoke tests basically only check:

• Application can start • Core pages can open and return data • Key APIs return normally • 1~2 main paths can run through • Key dependencies (DB/cache/queue/model service) not broken

One note: don’t write it as a mini E2E. This layer doesn’t need comprehensiveness—it needs key paths alive. I define it in CI to run on every PR/merge, and on every release.

When Should E2E Tests Run?

E2E tests are expensive but also an important guard. However, they shouldn’t be the Agent daily development inner loop. Best timing is usually after merging to release, recording frequently-missed high-risk main paths, especially for core feature modules that might have frequent changes.

E2E should verify user paths are established, ensuring business flows are smooth from start to end. My approach is adding nightly scheduled tasks in CI for more complete E2E suites. Only add local runs for high-risk changes.

My Current Flow: From Feature Issue to Release, Which Tests Should Run?

Based on my development rhythm, here’s how I designed it:

Stage	Goal	Required Tests
Create feature issue	Define acceptance criteria	Write test strategy first, don’t run yet
Agent develops local logic	Catch low-level errors fast	Unit tests
Feature complete	Verify module collaboration	API/contract tests
Open PR / prepare merge	Prevent pollution to main	Smoke tests + necessary regression
Merge to release candidate	Verify integration stability	Release smoke test
Before production	Verify key user paths	Key E2E

Test Section That Should Be in an Issue Template

## Feature
Support importing new vendor CSV format

## Risks
- Field mapping changes
- Null value handling
- List page display after import
- Old format compatibility

## Required tests
- [ ] Unit tests: field mapping / null handling / schema validation
- [ ] API tests: import API response / DB write result
- [ ] Smoke tests: list page viewable after import
- [ ] E2E: only on release candidate - "upload CSV → import success → list visible" main path

## Merge gate
- Unit tests pass
- API tests pass
- smoke passes

This isn’t the complete template—it’s an addition to my previous issue template. Check previous articles for the full template.

All test files need to be written by the Agent during development. This article focuses on timing for each test type. These logics can all be defined via hooks or Agent skills. I feel I’m already the slowest part of the development process, so I write more tests to reduce my workload.

测试，测试，还是测试

2026-04-22T00:00:00+00:00

💡 在 Agent 时代，错误被更快地产出和扩散。传统的测试流程已无法满足开发速度的需求。本文提出四类测试策略——单元、接口、冒烟、E2E——并详解它们在 Agent 工作流中的最佳介入时机，帮助开发者在效率与质量间找到平衡。

Agent 现在把产出代码这一步做得非常快捷，但它也把另一件事成倍放大了：错误会被更快地产生，也会被更快地扩散。以前我一个人一天改三处逻辑，现在一个 Agent 几分钟就搞定了。以前 bug 更像是一个函数写扯了，现在这个 agent 可能搞出来的是：

• 函数本身没错，但接口契约改了 • 本地能跑，合到分支上就挂了 • 单个功能能通，但一进 release 就把别的链路搞死

所以现在测试环境不能被定义成开发后面的一个流程，而应该是 Agent 开发编排的一部分。这篇文章我就想分享我现在是怎么做的。

四类测试的使用

我现在主要定义的测试就是单元测试，接口测试，冒烟测试和 E2E 测试。而且这几种不同层级的测试需要在不同的时候来运行和验证，否则就会拖延开发的效率。单元测试是在拦低级错误，接口测试是在拦协作错误，冒烟测试是在拦集成事故，E2E 测试是在拦截用户体验事故。

测试层次	它真正回答的问题	如果缺失，会发生什么
单元测试	这次局部改动有没有把最基础的逻辑写坏？	bug 在最便宜的时候没被拦住
接口测试	模块之间还能不能按约定协作？	功能单点都对，但拼起来坏
冒烟测试	这次改动合进来后，核心流程还活着吗？	看起来能 merge，实际一发就炸
E2E 测试	用户从头到尾跑一遍，系统体验是否仍然成立？	工程链路正常，但用户路径断了

四类测试的介入时机

测试类型	最适合的介入时机	主要目标	最忌讳的误用
单元测试	Agent 每次完成一个局部函数/模块改动后立刻跑	快速拦住局部逻辑错误	拿它验证整条业务链路
接口测试	一个功能块完成、准备提交 feature 分支前	验证模块契约、输入输出、依赖协作	全靠 mock，测成假接口测试
冒烟测试	PR 准备合并、或合并到 release 分支后立刻跑	确认核心路径还活着	一次塞太多场景，跑成小型 E2E
E2E 测试	release 候选版本、关键上线前	验证真实用户主路径	把它当日常开发内环主力

越靠前的测试，应该越快、越窄、越便宜；越靠后的测试，应该越少、越重、越贴近真实环境。

单元测试应该什么时候跑？

单元测试不是发版前跑的，它应该是 Agent 开发内环的一部分。

只要 Agent 做的是下面这类局部，确定，反馈便宜的事情，单元测试就应该尽量立刻跑：

• 改纯函数 • 改规则判断 • 改状态转换 • 改 schema 映射 • 改数据清洗逻辑 • 改工具参数拼装 • 改 import / export 的字段处理

在我的 Agent workflow 里，单元测试挂在两个点：Agent 每完成一个局部实现后自动跑一次，以及 Agent 准备结束本轮修改前再聚合跑一次相关 test。这层测试要快，所以最适合让 Agent 自动添加单元测试，并且在改完局部逻辑后自动触发，每次小的功能点改动后 commit，并且增加 pre-commit hook 自动跑相关测试文件。

改动类型	单元测试时机
新增纯函数	写完立刻跑
修改已有规则判断	修改后立刻跑相关 case
修复 bug	先补回归测试，再让 Agent 改代码，再跑
重构实现但不改行为	改完立刻跑

接口测试应该什么时候跑？

因为上下文有限导致记忆不准，或者有多个 agents 在同步修改的场景下，经常会出现以下问题：

• 参数名改了，调用方还在用旧字段 • 返回结构变了，下游 parser 没跟着改 • 工具调用顺序看似没问题，但状态语义已经完全不同了 • 数据库写入成功了，但 API response contract 变了 • 一个模块加了默认值，另一个模块因此误判成功

所以在一个 feature 的局部实现已经完成，准备提交 feature 分支、或者准备开 PR 之前需要进行完整的接口测试。

我主要的接口测试包含：

• 前后端 API contract • tool call 输入输出 contract • 数据库读写边界 • webhook/event payload • 文件导入导出结构 • prompt output schema • agent step 之间的状态传递格式

接口测试通常比单元测试慢一点，但还没非要等 CI 的时候做。所以我会优先放在：Agent 自测 feature 完成后，让它补全所有的接口测试代码，并且验证。在 pre-push 之前用 hook 来触发跑一轮，最好在 PR 创建前再跑一轮。

冒烟测试应该什么时候跑？

冒烟测试验证这次改动进了集成态之后，核心流程有没有挂掉。因为我自己会把控 worktree 的测试，所以冒烟测试只在两个流程做，一个是 PR 合并这个 worktree 之前，还有就是合并到测试 release 的版本之前。

冒烟测试基本上就只做以下几个：

• 应用能启动 • 核心页面能打开，并且返回数据 • 关键 API 返回正常 • 主路径 1~2 条能跑通 • 关键依赖（DB / cache / queue / model service）没断

有一个注意的点就是别把它写成一个缩小版 E2E 大全。这一层不要追求全面，只追求关键路径活着，我会把它定义在 CI 中，每次 PR 或者 merge 的时候运行，以及每次做 release 的时候。

E2E 测试应该什么时候跑？

E2E 测试最贵，但是也是很重要的守护一环。但是它不应该成为 Agent 日常开发内环的主力。最适合的时机通常是合并到 release 版本之后，而且要记录经常改错的，高风险的主路径，尤其是核心功能模块可能经常涉及到改动。

E2E 最应该做的验证是用户的使用路径成立，保证业务路径从头到尾是否还通畅和成立。我现在的处理方式就是在 CI 中增加夜间定时任务跑更完整的 E2E 套件。高风险改动才临时加跑本地。

我现在的流程：从 feature issue 到 release，应该过哪些测试？

根据我自己的开发节奏，现在是这样设计的：

阶段	目标	应该过的测试
创建 feature issue	定义清楚验收点	先写清测试策略，不一定立刻跑
Agent 开发局部逻辑	快速拦低级错误	单元测试
feature 功能完成	验证模块协作	接口测试 / 契约测试
开 PR / 准备合并	防止改动污染主干	冒烟测试 + 必要回归测试
合并到 release 候选	验证集成态稳定性	release smoke test
正式上线前	验证关键用户路径	关键 E2E

一个 issue 模板中应该有的测试部分

## Feature
支持导入新格式的供应商 CSV

## 风险点
- 字段映射变化
- 空值处理
- 导入后列表页显示
- 老格式兼容性

## Required tests
- [ ] 单元测试：字段映射 / 空值处理 / schema 校验
- [ ] 接口测试：导入接口 response / DB 写入结果
- [ ] 冒烟测试：导入后列表页可正���查看
- [ ] E2E：仅在 release 候选时跑"上传 CSV → 导入成功 → 列表可见"主路径

## Merge gate
- 单元测试通过
- 接口测试通过
- smoke 通过

这个模板不是全部的，是在我之前的 issue 模板中增加的部分，以前的 issue 模板可以查看之前的文章。

所有的测试文件，都需要让 agent 在开发的过程中都写好，这篇文章着重讲的是各种测试的使用时机，这些逻辑都可以通过 hook 或者 agent 的 skill 来定义。因为我现在感觉我自己已经是整个开发流程中效率最低的环节了，所以尽量多写一些测试用来降低我的工作量。

Beyond Notebook Navigator: Why Tool Design Matters More Than Ever in the AI Era

2026-04-19T00:00:00+00:00

Some say the people now tinkering with AI were once tinkering with note-taking apps. Thinking about it, that describes me perfectly.

I’m now a heavy Obsidian user. What I love most about it is first, the extremely rich plugin ecosystem, and second, the ability to embed JavaScript directly into notes, turning them into a programmable system. This kind of freedom is incredibly appealing to programmers.

Last year, a plugin suddenly became very popular: Notebook Navigator. After using it, I uninstalled all my previous plugins—including management ones, layout ones, even calendar and homepage, which I had been using all along.

This gave me an even stronger realization: what AI amplifies isn’t efficiency itself, but the “convergence ability” between tools.

In the era of AI-accelerated development, those scattered small problems can be quickly identified, merged, polished, and ultimately turned into a smoother, more complete tool experience with almost no boundaries.

This article isn’t a manual. It’s about something more fundamental: when development barriers are dramatically lowered by AI, what should tool design really focus on?

Notebook Navigator: A “Pain Point Convergence” Experiment

What Does It Actually Solve?

Notebook Navigator does something seemingly simple—it replaces Obsidian’s default file browser. It provides a two-column structure: a navigation tree on the left, file list and preview on the right, while supporting various filtering methods like tags, folders, and search, with almost all operations doable via keyboard.

These capabilities themselves aren’t particularly new—you could find them across different plugins. What makes Notebook Navigator truly special isn’t adding new features, but changing something more fundamental: it converges Obsidian’s originally fragmented information organization methods into a single interaction entry.

In Obsidian, you can use folders, or tags, or rely on search or bi-directional links. These methods aren’t unified—they exist in parallel. You have to make choices to some extent, even switching between different panels. What Notebook Navigator does is make that choice disappear.

The same note can exist in both folder structure and tag system simultaneously, without needing to switch views to understand it. This experience is essentially closer to a multi-dimensional indexed database than a traditional file system.

It replaced many of my existing plugins, like calendar and homepage. Also, I really love the flashy Rainbow-colored folders, and the ability to easily change folder icons. (Yes, I’m a folder organization enthusiast.)

Customizable Space: Making Tools Fit Your Hand

What impressed me most about Notebook Navigator is its plasticity.

It allows you to adjust almost every key display and interaction method. You can change preview density, adjust how structure expands, define filtering rules, or rewrite your own shortcut logic. You can even create completely different view sets for different scenarios.

Behind this is a clear design philosophy: tools shouldn’t define how users use them; they should allow users to define the tool’s shape.

Good tools shouldn’t be like standard-sized shoes, but like moldable clay that can be reshaped repeatedly.

What Tool Design Should Focus on in the AI Era

What makes Notebook Navigator so impressive isn’t the plugin itself, but the trend it represents: those things that seemed like just “small problems” before—clicking one extra time, switching one extra panel, remembering one extra shortcut—they always existed. It was just that development costs were too high to justify solving them.

But things have changed. When development costs are compressed by AI, these problems are no longer ignored. They start being constantly combined and merged, ultimately forming a new experience paradigm. The core of competition has also shifted.

Writing a feature isn’t hard anymore. Making a plugin isn’t hard either. What’s truly difficult is how you organize these features together, and whether users feel any boundaries when using them.

Looking one layer deeper, user needs haven’t actually changed. What people want is finding content faster, completing operations more smoothly, organizing information more naturally. What’s really changed is user tolerance.

Before, as long as a feature existed, some clunky design was acceptable. But now, if a tool makes people switch contexts frequently or think hard, they’ll leave quickly—or even build their own alternative.

AI is making “settling” increasingly unacceptable.

In the future, the differences between tools are shrinking, while each person’s way of using them is expanding.

Author: Xiaozhao

PhD in Software Engineering · 13 years Full-Stack Development

Focus on AI workflows, software system design, and agent collaboration.

Building a professional AI perfumer startup while continuously documenting technical practices and thoughts.

从Notebook Navigator 聊聊AI时代设计的重要性

2026-04-19T00:00:00+00:00

有人说，现在折腾 AI 的这批人，之前就是折腾笔记的。想了想，好像说的就是我。现在已经是 Obsidian 的重度用户了。我最喜欢它的地方，一是极其丰富的插件生态，二是可以直接用 JS 嵌入到笔记中，把笔记变成一个可编程系统。对程序员来说，这种自由度非常友好。去年有一个插件突然变得很火：Notebook Navigator。用了之后，我把之前的一堆插件都卸掉了——包括管理类的、布局类的，甚至 calendar 和 homepage 这种我原本一直在用的东西。这件事让我产生了一个更强烈的感受：AI 放大的，其实不是效率本身，而是“工具之间的收敛能力”。在 AI 加速开发的时代，那些原本分散的小问题，可以被快速识别、融合、打磨，最后变成一套更顺滑、更完整、几乎没有边界的工具体验。这篇文章不是说明书，而是想聊一件更底层的事情：当开发门槛被 AI 大幅降低之后，工具设计真正应该关注什么。

Notebook Navigator：一场「痛点融合」实验

它到底解决了什么

Notebook Navigator 表面上是在做一件很简单的事情——替换 Obsidian 默认的文件浏览器。它提供了一个双栏结构，左侧是导航树，右侧是文件列表和预览，同时支持标签、文件夹、搜索等多种筛选方式，并且几乎所有操作都可以通过键盘完成。这些能力本身并不新，甚至可以说在不同插件中都见过。它真正特别的地方，不在于做了什么新的功能，而在于它改变了一件更根本的事情：它把 Obsidian 原本割裂的几种信息组织方式，收敛到了同一个交互入口里。

在 Obsidian 里，你可以用文件夹，也可以用标签，也可以依赖搜索或者双链。这些方式并不是统一的，它们是并列存在的。你必须在某种程度上做出选择，甚至要在不同的面板之间来回切换。Notebook Navigator 做的事情，是让这种选择消失。

同一篇笔记，可以同时存在于文件夹结构里，也存在于标签体系里，而你不再需要切换不同的视图去理解它。这种体验本质上更接近一个多维索引的数据库，而不是传统意义上的文件系统

它替代了我很多原有的插件，譬如calendar，homepage等。还有就是我个人真的很喜欢骚气的Rainbow配色文件夹，以及方便地更改文件夹的icon（对，我是文件夹管理党）。 ![[Pasted image 20260419103639.png]]

自定义空间：让工具长成你手的形状

Notebook Navigator 还有一个让我印象很深的点，是它的可塑性。它几乎允许你调整所有关键的展示和交互方式。你可以改变预览的密度，可以调整结构的展开方式，可以定义筛选规则，也可以重写自己的快捷键逻辑。甚至可以为不同的场景建立完全不同的一套视图。这背后其实是一种很清晰的设计判断：工具不应该定义用户的使用方式，而应该允许用户定义工具的形状。好的工具，不应该像一双标准码的鞋，而应该像一块可以反复塑形的陶泥。

AI 时代，工具设计该关注什么

Notebook Navigator 之所以让我印象深刻，并不是因为它本身，而是因为它代表了一种趋势：过去那些看起来只是“小问题”的东西，比如多点一下、多切一个面板、多记一套快捷键，其实一直存在。只是因为开发成本高，它们不值得被解决。

但现在情况变了。当开发成本被 AI 压低之后，这些问题不再被忽略。它们开始被不断组合、融合，最终形成一种新的体验形态。这时候竞争的核心，也发生了变化。写一个功能已经不难，做一个插件也不难。真正难的，是你如何把这些功能组织在一起，以及用户在使用它们的时候，是否感觉不到边界。

如果再往下看一层，其实用户的需求一直没有变。大家想要的，无非是更快找到内容，更顺畅完成操作，更自然地组织信息。真正发生变化的，是用户的容忍度。以前，只要功能存在，就可以接受一些不顺手的设计。但现在，如果一个工具让人频繁切换、频繁思考，用户很快就会离开，甚至自己做一个替代方案。 AI 的存在，让“将就”这件事变得越来越不可接受。

未来，工具之间的差异在缩小，而每个人的使用方式在放大。

Understanding Communication Model Choices Through OpenClaw’s Integration Approaches

2026-04-15T00:00:00+00:00

When connecting OpenClaw to different devices and channels, I started systematically organizing these communication approaches. Initially, I just wanted to understand how each channel connects, but later I discovered that different platforms choosing different integration methods is actually about solving completely different premises and scenarios.

Three Basic Bot Communication Patterns

Before diving deeper, we need to understand a few fundamental communication concepts.

Webhook: The Starting Point of Push Model

The platform actively “pushes” messages to your server. It’s like ordering takeout—the delivery person comes to your door and rings the bell (push), rather than you checking downstairs every 5 minutes (polling).

Technical characteristics:

High real-time: Messages delivered immediately upon arrival
Requires public IP or domain: Platform must be able to find your server
Low server pressure: Only processes when messages arrive

Long Polling: The Misunderstood “Pseudo-Real-Time”

Your server initiates a request to the platform, and the platform keeps this connection open without responding until there’s a message, then the client initiates the next request.

More accurately, it’s not “continuous polling,” but:

A “serial long connection request” model

It’s like calling your friend and asking “Are you here yet?” and they say “Wait, I’ll tell you when I arrive”—you stay on the call.

Technical characteristics:

Good real-time: Near real-time, but with slight latency
No public IP required: Suitable for local deployment
Medium server pressure: Requires maintaining long-duration requests

WebSocket: The True Two-Way Channel

A persistent two-way channel is established, where both parties can send messages at any time. Like two people on a voice call, whoever speaks, the other hears immediately.

Technical characteristics:

Bidirectional real-time: Both client and server can initiate
Suitable for high-frequency interactions: Chat, games, real-time data
Complex implementation: Requires handling connection state, heartbeat, reconnection, etc.

SSE: The Streaming Solution in HTTP World

SSE (Server-Sent Events) is an HTTP-based communication method where the server can continuously push data to the client after the connection is established. It’s like opening a continuously open “information broadcast channel”—whenever the server has new content, it sends it in, you just receive. Unlike traditional requests, once established, the connection stays open, with data continuously sent as a “stream.”

Technical characteristics:

Unidirectional communication (server → client)
Automatic reconnection (native browser support)
HTTP-based, strong compatibility

OpenClaw’s Communication Channels

OpenClaw supports multiple communication platforms, but the differences between these platforms aren’t just about different integration methods—they also differ in how they abstract “events” and “interactions.” Here are two concepts:

Communication method: How the message is delivered
Event model: What this message represents

Communication methods are like “courier services,” while events are more like “the letter’s content.” Even through webhook integration, some platforms only tell you “received a text segment,” while others explicitly tell you—the user sent a message, clicked a button, or a process status changed.

Official Bot API: Complete Communication and Event Models

These platforms not only provide stable integration methods but also define clear event models, making them the most suitable infrastructure for building Agents.

Platform	Communication	Event Model	Interaction	Features
Telegram	Webhook / Long Polling	Update (message-driven)	⭐⭐⭐	Simple and direct
Slack	Webhook / Socket Mode	Event (complete event system)	⭐⭐⭐⭐	Strong interaction
Discord	WebSocket	Gateway Event	⭐⭐⭐⭐	Strong real-time
Feishu	Webhook	Event (message / card / approval)	⭐⭐⭐⭐	Strong business integration
Google Chat	Webhook	Message Event	⭐⭐⭐	Lightweight
Microsoft Teams	HTTPS	Activity Event	⭐⭐⭐⭐	Enterprise integration

The benefit of this category is that the platforms have already defined “events” for you.

Unofficial Integration: Communication Exists, But Stable Event Abstraction Lacking

These platforms typically don’t have complete Bot APIs and require protocol simulation or wrapping to achieve integration.

Platform	Communication	Data You Receive	Interaction
WhatsApp	Long connection (simulating Web client)	Raw message content	⭐⭐⭐
Signal	Local tool forwarding	Raw message content	⭐⭐
iMessage	System notification forwarding	System messages	⭐⭐
WeChat (Personal)	Client simulation or automation	Scraped message content	⭐⭐⭐

The core problem of this category: communication can be achieved, but the event model is “inferred”—users must determine themselves whether it’s a regular message or command, input or state change.

Open Protocols: Neither Communication Nor Events Bound to Platform

There’s another category of solutions that don’t depend on specific products, but on open protocols:

Platform	Communication	Event Model	Features
Matrix	HTTP / WebSocket	Standardized event structure	Self-hosted
IRC	TCP Socket	Simple message model	Minimalist
Mattermost	HTTP / WebSocket	Slack-like event model	Open source

The characteristic of this category is that both communication capability and event definition don’t depend on a specific platform, but are determined by the protocol itself.

Design Trade-offs of Different Communication Models

The same is “doing communication,” but different platforms choose very different models. These differences are often not about technical capabilities, but about the core scenarios they aim to solve.

Take Telegram as an example—its design focus isn’t on “how strong real-time is,” but on making it easier for developers to integrate with the platform.

Webhook is technically a cleaner solution. When the platform has an event, it directly pushes the data over—no additional requests, no wasted traffic. But it implies a prerequisite: your service must be “externally accessible.” In other words, Telegram must be able to actively connect to you. This usually means you need a public address, a stably running service, and the ability to handle external requests.

But in reality, many developers don’t work in such environments. Many bots run directly on local machines, or on internal network machines, or are just temporarily running scripts. In this case, the platform simply cannot “find you.”

Long Polling solves exactly this problem. It turns the platform’s active push into the client continuously waiting for results. The request is initiated by you, the connection is always outward—this bypasses the “must be publicly accessible” restriction. From an implementation perspective, it just keeps the HTTP request held, waits for data to return, then immediately makes the next request.

This is a very typical engineering trade-off: it sacrifices some efficiency (there will be repeated requests and connection overhead), but gains strong adaptability to the running environment. Telegram providing both Webhook and Long Polling is essentially to provide a more convenient integration environment.

Understanding this, looking at Discord’s choice is based on its platform characteristic: assuming clients are long-online and continuously participating in interactions. So it directly uses WebSocket—in this model, communication is no longer “request and response,” but a continuously existing event stream. Messages, state changes, user behaviors are all pushed in real-time through the same connection. This design is suitable for high-frequency interactions, multi-user synchronization scenarios like chat rooms, communities, or collaboration tools. In contrast, using Long Polling would not only be inefficient but also difficult to express this “continuously online” state.

And SSE represents another trade-off—this is the integration approach for Rokid glasses. The AI output you see on glasses is essentially not “interaction” (glasses have very limited interaction modes), but “content continuously generating.” In this scenario, users don’t need a two-way channel or complex state synchronization—they just need a stable “output pipeline.” SSE retains HTTP’s simplicity while allowing data to be continuously sent as a stream. It’s essentially extending the “response” into a continuous data stream.

Model	Typical Platform	Core Problem Solved	Prerequisite	Cost
Webhook	Telegram (production)	How to efficiently receive events	Service publicly accessible	Complex deployment
Long Polling	Telegram (development)	How to integrate in any environment	Client can initiate requests	Lower resource efficiency
WebSocket	Discord	How to maintain real-time interaction	Client long-online	Complex implementation
SSE	Browser / Rokid / AI streaming	How to continuously output data	One-way output sufficient	No bidirectional support

In the AI era, there will be more tools and frameworks, and the surface-level choices will become more complex. As technical people, on one hand, we need a clear enough understanding of the underlying principles—not to remember every technical detail, but to see what problem it solves, what its prerequisite is; on the other hand, more importantly, we need to get used to analyzing the logic behind these technical choices.

Often, the technology itself isn’t complex—what’s complex is whether we’ve seen through the scenarios it corresponds to.

And this might be a capability every engineer should deliberately train now.