Author: OpenClaw Business Architect / Editor-in-Chief (Internal Codename: COO)
Keywords: Digital Team, Skill vs Script, Multi-Agent, LLM Gateway, SRE Observation, Docs Consistency, Cron Automation
[Introduction]: A Conceptual Breakthrough—What You Need Isn’t a Smarter Model, It’s a Digital Team Link to heading
When OpenClaw was born, it was more like a bare-metal machine: giving you a computing core, a set of API slots, and saying, “go wild.” If you just treated it as a chatbot, it would probably be forgotten in a corner within a week, with you thinking, “It’s nothing special, no different from using the web version of ChatGPT.” But when I decided to break the mold and treat it as the operating system for a “personal digital team,” the story took a completely different turn.
The work I do is neither about writing a few clever but impractical Prompt toys nor about drawing some vaporware concept diagrams for a PowerPoint presentation. It’s more like “building an AI company for myself,” and in this company:
- I am the CEO: I need to set strategies, establish rules, and select talent. Deciding what this company (this system) won’t do is more important than deciding what it will do.
- OpenClaw is our operating system: It’s like the company’s office building and IT infrastructure, responsible for scheduling computing power, loading skills, and allocating resources.
- Multiple Agents are digital employees in different roles: They each have their own KPIs. Some are responsible for front-desk reception, some for drawing blueprints, and others for doing the heavy lifting in the workshop.
- Skill is the standardized toolbox, and Script is the personal SOP: This is the core distinction between an amateur player and a professional force.
- Reset / Docs Consistency is about enterprise-level governance and memory alignment: It prevents employees from becoming jaded from being with the company too long; they must have their minds reset every morning.
- The LLM Gateway is the finance department’s “compute budget management”: Meticulously managing every cent of the API bill.
The Ceiling of Chat Mode Link to heading
No matter how smart a traditional chatbot is, even with a context window of a million tokens, it’s just a “question-answer pair.” It’s like a brilliant blind person locked in a dark room. It doesn’t know which files on your computer actually exist, which of your NAS drives holds client files, or if you have any open slots on your calendar today. More importantly, it won’t proactively run a Cron job, write logs, or send alerts at 6 AM every morning.
During my early days of deeply using monolithic conversational models, I fell into countless painful traps:
- The disaster of the “mega-prompt”: Trying to give it a super-prompt of several thousand words to make it “simultaneously handle scheduling, email summaries, and daily report writing.” The result was an overloaded context, causing the model to become extremely schizophrenic, either forgetting previous instructions or starting to hallucinate non-existent tasks.
- The fragility of stateless scripts: I wrote several extremely useful Python scripts for it to call, but once the webpage was refreshed or the session was reset, it suffered complete amnesia, not knowing which directory the script was in or what parameters to pass.
- The breakdown in human-machine handover: Vaguely saying “write a daily report for me” in the chat history, while the underlying Bash shell acted like an idiot, not knowing where the Markdown template was or the path to the SQLite database.
The Essential Difference Between Skill vs. Script Link to heading
After starting to treat OpenClaw as a team’s operating system, the first step is to clarify the boundaries of its capabilities. This is like a company deciding whether to buy off-the-shelf SaaS software or develop its own internal system.
- Skill (Standardized Capability): Like a general-purpose app from the App Store. Ready to use upon installation with fixed parameters. For example,
weather(check weather),goplaces(check maps), andoutlook(send email). If something can be handled by a Skill, I never write code for it. The goal is out-of-the-box usability. - Script (Personal SOP): Like a custom-tailored personal secretary. When a Skill can’t meet your extremely specific and personalized workflows (e.g., “I only want to see emails with ‘Urgent’ and ‘Budget’ in the subject line, and you need to translate them into a Chinese summary, and finally save it to my SQLite database at a specific path”), we use a Script to chain multiple Skills together, solidifying the process as local code stored under
/home/mk/clawd/scripts/. The goal of a Script is absolute control and customization.
[Practical Scenarios]: Implementation Details for Work and Life Link to heading
Let’s move beyond theory and look at how a real digital team gets involved in daily work and life. You need to understand that true automation isn’t about you typing lengthy instructions into a chatbox; it’s about the system silently getting things done for you behind the scenes.
Scenario One: Workflow—The “AI Personal Editor-in-Chief” for Breaking Through Information Silos and Private Publishing Link to heading
[Dependencies] - General Skill: himalaya (Process emails). - Custom Script: twitter_bird.py, influencer_insights.py (Scrape tweets for free), web_preview_publish.py (Micro CMS deployment), gen_daily_report.py. - Infrastructure: Local SQLite, Cron jobs, public lightweight server.
[Logic in Plain Language] The previous “AI Daily” merely summarized a few work emails, its perspective confined to a narrow information silo. Now, the Agent has evolved into your cool-headed and knowledgeable “personal editor-in-chief.” Step 1: The editor-in-chief not only uses himalaya to sort through Outlook emails but also dispatches two cyber agents: twitter_bird and influencer_insights. They bypass expensive official APIs, using hacking techniques to scrape the latest tweets and in-depth articles from top global tech bloggers in a headless browser running in the background. Step 2: The editor-in-chief aggregates all emails, tweets, and RSS feeds, feeding them to a Flash model for cross-referencing, noise reduction, and deduplication. The result is a beautifully formatted, high-information-density Markdown brief. Step 3 (Advanced Play): The generated brief is no longer just text to be admired in a chat window. The Agent immediately calls web_preview_publish.py to instantly convert this brief into a static HTML file and automatically deploy it to your private domain. Your Agent transforms directly into a micro CMS! You can not only read it yourself but also coolly drop the link into a WeChat group, saying, “Here’s the industry morning brief compiled by my digital secretary today.”
Scenario 2: Workflow - Create a calendar event with a single voice command Link to heading
[Dependencies] - General Skill: gemini (Call a multimodal large model), outlook (Call Microsoft Graph API to create a calendar event). - Custom Script: voice_calendar.sh.
[Logic in Plain Language] You’re on your noisy commute and suddenly remember you need to meet Mr. Zhang for coffee at 3 PM tomorrow. Previously, you’d have to stop, unlock your phone, open the calendar, tap the plus button, enter a title, and select the time. Now, you just say to your phone: “Chat about financing with Mr. Zhang at Starbucks tomorrow at 3 PM.” Step 1: This voice recording is sent to the voice_calendar.sh script on the server. Step 2: The script doesn’t waste time on speech-to-text (the old way). It directly sends the MP3 audio file, as is, to the Gemini 2.5 Flash model, which has “native ears.” It then yells at the model: “Listen to the recording and spit out the time, location, and attendees in standard JSON format!” Step 3: After receiving the JSON, the script passes it to the outlook skill, which, like a meticulous secretary, instantly blocks off the time on your Microsoft calendar. The whole process takes 10 seconds, without ever lighting up your screen.
Scenario 3: Life & Learning - Automatically clip WeChat articles to Obsidian Link to heading
[Dependencies] - General Skill: None. Relies entirely on custom scripts. - Custom Script: clip_wechat.py. - Infrastructure: Telegram Bot receiver, WebDAV service, local Obsidian Vault.
[Logic in Plain Language] WeChat Favorites is a black hole; articles that go in are never found again. You want to save articles to Obsidian, but WeChat Official Account articles are full of terrible formatting and garbled characters. Step 1: You send a link to your bot in Telegram: /clip https://mp.weixin.qq.com/s/xxx. Step 2: Upon receiving the command, the clip_wechat.py script dives into the webpage, swiftly cutting away all the messy ads, recommended readings, and QR codes, extracting only the pure body text. Step 3: It converts the body text into clean, exceptionally well-formatted Markdown. Step 4: It silently slips the file into the Obsidian folder on your NAS via a WebDAV tunnel. Your knowledge base is completely private, with no snooping from the cloud.
Scenario 4: Investing - A completely private quantitative dashboard Link to heading
[Dependencies] - General Skill: tvscreener (Fetch TradingView market data and indicators). - Custom Script: stock_dashboard.py. - Infrastructure: Local SQLite (stocks.db), Docker-deployed Metabase (listening on the internal network only).
【Plain Language Logic】 You don’t want Tonghuashun or Xueqiu to know which bankrupt reorganization stocks you’re watching, and you intensely dislike flashy stock recommendation ads. Every day at 16:15 when the stock market closes, while others are scrolling Douyin, your Cron scheduled task wakes up stock_dashboard.py. The script pulls out your “kill list” (watchlist stocks) and sends tvscreener, a seasoned detective, to investigate the details of these stocks today in the market: MACD golden cross/death cross, RSI overbought/oversold, net capital inflow. After the intelligence is brought back, it’s securely locked into a local SQLite safe. In the evening, when you open the Metabase data dashboard, you see a private quantitative dashboard presented with the most uncompromising charts. No noise, only data.
Scenario Five: Life Stream – One-Sentence Voice Access to Local Life (Amap Aggregation) Link to heading
【Pain Point】 When walking or driving, looking down to check maps, weather, or search for nearby places is extremely unsafe. Traditional voice assistants mechanically reply “Searching for you…” and then foolishly pop up a webpage for you to look at yourself.
【Dependencies】 - General Skill: gemini (multimodal semantic understanding). - Self-written Script: route_eta_amap.py, poi_search.py, amap_weather.py combination.
【Plain Language Logic and Experience】 You’re driving, and casually say into your headphones: “I have a meeting at Guomao at 2 PM, check the traffic, weather, and where there’s a Luckin Coffee nearby.” After receiving the voice command, the Agent won’t get stuck in a single-threaded manner like ordinary assistants. In the background, it acts like a comprehensive secretarial team, instantly concurrently launching three Amap API scripts: 1. route_eta_amap.py calculates the congested ETA and travel time from your current location to Guomao. 2. amap_weather.py fetches the precise weather for the Guomao business district at 2 PM (whether it will rain). 3. poi_search.py scans for coffee shops within 500 meters of Guomao. The three concurrent requests return massive JSON data within seconds. Then, the Agent exercises its strong summarization capabilities, dissecting this cold API data and aggregating it into an extremely concise conclusion, no more than 50 words, fed back to your headphones: “Expected to arrive at Guomao in 40 minutes, clear to cloudy in the afternoon, there’s a Luckin Coffee in Jianwai SOHO Building 3 downstairs, you can order directly.” This is true underlying multi-platform aggregation, integrated with one phrase.
【Core Architecture Chapter】: How to Manage a Digital Team? Link to heading
1. Multi-Agent Four-Level Pipeline Link to heading
Don’t treat AI as an omnipotent god; it’s an employee with limited energy. If you ask an Agent to simultaneously plan, write code, and check for errors, its ‘brain capacity’ (Context) will quickly collapse. We adopt a four-level pipeline, which is not just about division of labor, but also the art of ‘passing the buck’ and ‘prevention before it happens’.
Here’s a specific sequential example: “I want to write a complex crawling flow that automatically scrapes competitor posts from Xiaohongshu.”
- Dispatcher (Main Controller) Greets Clients
You tell the Dispatcher your requirements. The Dispatcher is like a front-desk lobby manager; upon hearing “scrape Xiaohongshu,” he finds it too complex and definitely won’t do it himself. He quickly writes a “Task Document” and hands it to the architects behind him.
- COO (Business Architect) Defines Logic
The COO receives the task document and considers the business logic: “There’s anti-scraping, so we can’t just brute force it; we need to identify core fields; what time should we scrape each day?” The COO outputs the “Business Blueprint and Data Structure Design.”
- Scoder (Technical Architect) Builds Framework
The Scoder takes the blueprint and makes technical selections: “Use Playwright headless browser, combined with XPath. I’ll write the directory structure and entry points; the actual crawler code will be written by the junior engineers.” The Scoder outputs the “Technical Architecture Contract.”
- Coder (Execution Engineer) Does the Hard Work
The most miserable Coder takes the contract and writes code. When encountering anti-scraping errors, he inspects the logs and modifies the code himself. Adding delays, changing User-Agents, simulating scrolling… Iterating through trial and error in the dark, until
Exit Code 0appears in the terminal, he finally looks up and says: “It’s done.”
2. LLM Gateway Intelligent Routing: Compute Power Allocation Philosophy Link to heading
Hardcoding all API requests directly into the code is extremely foolish. We need a gateway (LLM Gateway) to act as a financial director.
Compute Power Allocation Philosophy: - For roles like COO and Scoder, who handle macroscopic planning and architectural design, their brains must be sharp, with maximum budget, using Pro-level models. - For roles like Dispatcher, who need extremely fast response for chat, they must be quick, using Flash-level models. - For email summary and weather push scripts that silently run 500 times every night, which are purely manual labor, use Lite-level models to compress costs to the extreme.
Seamless Crisis Resolution: The Battle Against 429 Rate Limiting: At 8 PM, your system is concurrently processing summaries for 100 PDFs when suddenly, the Google API returns 429 Too Many Requests. Without a gateway, your application code would crash instantly. With an LLM Gateway, it seamlessly reroutes these requests to a backup channel within 100 milliseconds. The application code is completely unaware of what happened and still receives the results smoothly. This is enterprise-grade high availability.
[Defense and Governance]: Core Mechanisms from an Enterprise SRE Perspective Link to heading
If your system only performs well under ideal conditions, it will never be more than a toy. The hallmark of an enterprise-grade system is its resilience in the face of adversity, model hallucinations, and configuration drift. In this section, we elevate our perspective to SRE (Site Reliability Engineering) to examine how to prevent AI performance degradation and system failures.
1. Robust Contract-Based Test-Driven Development (TDD for Agents) Link to heading
The longer an LLM’s context window, the more prone it is to hallucination. If you’ve been chatting with it for three days, its memory is filled with the garbage data and deprecated test paths you’ve fed it. Often, the system has already crashed, but the Agent, trying to be helpful, will confidently claim, “everything is working fine.”
Stop just paying lip service to “Continuous Integration.” In my system, this mechanism is called Docs Consistency (Anti-Regression Code Auditing).
- Implementation: In the system’s
selftest/directory, there are 15 concrete automated unit test scripts (covering everything from the Amap API and database I/O to email connectivity). - The Unforgiving Referee: Every night, the Agent, like a relentless test engineer, automatically runs
selftest_all.sh. All health check results are compiled into an immutablehealth.json. - No More Lies: The next morning, you ask the Agent to fetch emails, but the overnight health report shows the IMAP interface has failed. The Agent will block the action and warn you: “Boss, this module’s self-check failed this morning (Error). I must refuse to perform a blind operation. Please fix the credentials first.” This is the hardcore implementation of TDD in the age of Agents. We’d rather refuse the task than mess things up.
2. Extreme Cost Attribution and an Observability Center (Observability BI) Link to heading
Don’t let your LLM calls be a black box. Many developers receive API bills for hundreds of dollars each month with no clue where the costs were incurred. As an architect, you need extremely granular cost control.
- Global Interception: At the system’s core lies the OpenClaw Core Observability Center. It doesn’t just record logs; it acts like a probe, intercepting all LLM calls.
- Precise Accounting: By correlating request logs with token billing sheets, you can calculate costs with precision. For instance, that automated “midday report” generated at noon—did it use the Pro or Lite model? It consumed 14,500 input tokens and 230 output tokens, for an exact cost of $0.034.
- When you discover a Coder Agent has fallen into an infinite loop while trying to fix a bug, burning $5 in an hour, you can instantly kill its process through the observability dashboard. This is a core sign of graduating from a toy to an industrial-grade system.
3. Security Baseline: Moving Beyond .env with Strict Token Isolation
Link to heading
One of the most common rookie mistakes when working with open-source projects is copying .env files loaded with API keys everywhere, sometimes even accidentally committing them to GitHub. For an Agent that manages your entire digital life, this is catastrophic.
- Eliminating
.env: In my enterprise-grade architecture, I have completely abandoned traditional.envfiles. All secrets are managed through OpenClaw’ssecrets.jsonreference mechanism. Application scripts no longer contain plaintext passwords, only secure reference placeholders. - Gateway Token Rotation: We’ve enabled a dynamic Gateway Token rotation system, making our internal network’s authentication ironclad. Even if a subordinate Agent malfunctions and tries to leak its credentials, its token will expire and become useless within hours. This is the real security baseline for an enterprise.
4. Physical Reset Mechanism (Supplementary) Link to heading
Every time /reset is typed, the Agent undergoes memory erasure. It won’t sift through hundreds of thousands of words of chat history to recall who it is. Instead, it devoutly performs a “morning reading ritual”. It must forcibly read the physical files on disk in order: SOUL.md, USER.md, AGENTS.md. Through this set of physical file readings, the Agent instantly achieves the most solid “alignment” with the real world. Physical files do not lie; this is more reliable than any ethereal long context.
[Appendix & Action]: Roadmap and Out-of-the-Box Checklist Link to heading
Practical Roadmap for Beginners Link to heading
- Set up the environment: Find an idle machine to install Node.js,
npm install -g openclaw. Don’t mess around on your main machine. - Establish rules: Set up
SOUL.mdandUSER.md. These are the soul of your system. - Install Skills:
openclaw skill install weather, and run your first tool. - Write Scripts: Turn high-frequency demands into extremely simple bash scripts.
- Schedule with Cron: Put the scripts into Linux Cron, let digital employees start shifts; this is the beginning of automation.
- Build Gateway & SRE: When tasks increase, configure routing tables to control costs, and write your first
selftest.
🕒 Grand Tour: My Cron Schedule Checklist Link to heading
True “full automation” means your system is still running while you sleep. Below is the core skeleton of Cron tasks currently running in my system:
| Time | Task Description | Underlying Execution Flow & Operational Significance |
|---|---|---|
| Morning 06:00 | AI Daily Industry Briefing (Private Deployment) | Operational Significance: twitter_bird crawls top bloggers -> noise reduction -> web_preview_publish generates public links. Breaks information cocoons. |
| Morning 07:30 | Yesterday’s Email Incremental Summary | Operational Significance: Cures inbox anxiety. Himalaya pull -> Lite model summary -> writes to SQLite -> push. |
| Noon 12:00 | Midday Briefing Push | Operational Significance: Consolidates fragmented information. Gathers morning news -> Observability center cost attribution (accurate to three decimal places in USD). |
| Evening 18:00 | Evening Briefing Push | Operational Significance: Decluttering before logging off. Gathers all-day task wrap-ups, unread important emails -> pushes to Discord. |
| After Market Close 18:15 | Private Stock Dashboard Update | Operational Significance: Emotionally isolated quantitative review. Calls tvscreener -> DB -> local Metabase. Just look at the data, don’t listen to stories. |
| Night 02:00 | Docs Consistency Health Check | Operational Significance: Underlying SRE inspection. Automatically runs selftest_all.sh, generates 15 red-yellow-green test reports, prevents degradation. |
| Night 21:00 | Proxy Log Deep Audit | Operational Significance: Security baseline guardian. Extracts network logs -> Pro model analyzes abnormal IP sniffing and traffic spikes -> sends security daily report. |
This is not just a Cron checklist; it is a tireless, highly autonomous AI company. Welcome to the era of digital teams.