Migrate from scripts and point automations
Most operations run on an accretion of cron jobs, zaps, and one-off scripts. Each works until it does not, and none can say afterward what it did or why. This guide maps those automations onto Fibric's parts, walks one real cron job through the rewrite, and lays out an incremental cutover that keeps the old path running as a fallback until the new one has earned trust.
You need a deployed workspace, the CLI, and a connector bound for the system your script touches; the quickstart covers all three. Nothing in this guide requires deleting anything on day one. The whole point of the ladder below is that the old automation keeps running while the new operator proves itself.
What a script is missing
A script or a zap is a trigger, some logic, and a side effect, fused into one unit. That fusion is what makes it fragile: the retry loop is hand-rolled, the mutex is a lockfile that a crashed run leaves behind, the audit trail is whatever echo statements survived the last edit, and nothing stands between the logic and the side effect. When the logic is wrong, the side effect happens anyway. The 657-message flood that shaped the Fibric kernel was exactly this failure class: an ungoverned loop with direct access to a side effect, repeating until someone noticed.
Fibric pulls those fused parts back apart. The trigger becomes an event envelope. The logic becomes an operator that proposes rather than acts. The side effect becomes a capability that only a deterministic executor can invoke, under a fail-closed trust policy. The retry and mutex problems stop being your code at all: they are the executor's idempotency_key and single-flight entity_key primitives. And every action, taken or refused, leaves a receipt.
Mapping automation parts to Fibric parts
Every migration starts with the same inventory. For each automation, name its trigger, its logic, its side effects, and its safety improvisations, then read each one across this table.
| In your script or zap | In Fibric | What changes |
|---|---|---|
Cron schedule (0 2 * * *) |
A scheduled envelope with source: "cron". "cron" is a documented envelope source; a tick is an event like any other. |
The schedule stops being host state on one box. Ticks are visible in the event stream and replayable like any other envelope. |
| Webhook trigger (a zap's "when this happens") | An event_type trigger glob on the operator, matching envelopes as connectors ingest them. |
The trigger is decoupled from the sender. Any connector emitting that event type fires the operator; duplicate deliveries are absorbed at ingest. |
| Script body (the logic) | The operator's run: sense through capabilities, reason, and propose a plan. It never executes side effects itself. |
Logic and effect are separated by the executor. A wrong conclusion produces a refused proposal and a receipt, not damage. |
| Side effect (API call, DB write) | A capability under trust policy, resolved to a connector by configuration. | The effect is named by intent (orders.hold), not by vendor, and nothing runs that policy does not explicitly permit. |
Retry loop (for i in 1 2 3; do curl … done) |
The action's idempotency_key. Retry by proposing again with the same key; a repeat disposes as DEDUP. |
Retries become safe by construction. There is no window where a retry double-applies. See Single-flight & idempotency. |
Mutex or lockfile (flock /tmp/job.lock) |
The action's single-flight entity_key. The executor serializes work per entity as a kernel primitive. |
No stale locks, no crashed-run cleanup. Two runs cannot both act on the same order, ever. |
echo and log files |
Receipts: proposal, policy evaluation, disposition, key, timestamps, for every action including refusals. | The audit trail is immutable and queryable, and it exists whether or not the author remembered to log. |
Resist the batch rewrite. Each automation carries different blast radius and different tolerance for the ALERT stage below. Pick one with a bounded, reversible side effect first, order holds, ticket notes, status syncs, and let its receipts teach you the process before you touch anything that moves money.
A worked migration: the nightly stale-order hold
The original script
A representative specimen: a cron job that runs at 02:00, finds open orders that have not shipped in five days, puts them on hold, and pings a channel. It has a lockfile, a hand-rolled retry, and no memory of what it did last week.
#!/usr/bin/env bash
# crontab: 0 2 * * * /opt/jobs/hold-stale-orders.sh
set -e
exec 9>/var/lock/hold-stale.lock; flock -n 9 || exit 0 # the mutex
STALE=$(curl -s "$ORDERS_API/orders?status=open&age_gt=5d" | jq -r '.[].id')
for id in $STALE; do
for i in 1 2 3; do # the retry loop
curl -sf -X POST "$ORDERS_API/orders/$id/hold" && break
sleep 5
done
echo "held $id" >> /var/log/hold-stale.log # the audit trail
done
curl -s -X POST "$CHAT_WEBHOOK" -d "{\"text\": \"held: $STALE\"}"
Read it against the mapping table: a cron trigger, a lockfile mutex, a retry loop that can double-apply if the first attempt succeeded but the response was lost, a log file as the only record, and a webhook call that fires even when $STALE is empty. Each line has a destination in the rewrite.
The operator
The logic becomes a proposal. The operator senses through orders.read, decides which orders are stale, and proposes holds. It does not call the order API, and it does not notify anyone; it proposes that those things happen.
import { defineOperator } from "@fibric/sdk";
export default defineOperator({
name: "stale-order-hold",
capabilities: ["orders.read", "orders.hold", "notify.send"],
goal: "Hold open orders that have not shipped in five days, and report what was held.",
trigger: { source: "cron", schedule: "0 2 * * *" }, // the crontab line, as an envelope trigger
async run(ctx) {
const stale = await ctx.orders.read({ status: "open", ageGreaterThan: "5d" });
// the script body becomes a proposal; execution belongs to the executor
return ctx.propose(stale.map(o => ({
capability: "orders.hold",
args: { id: o.id },
})));
},
});
The lockfile and the retry loop have no equivalent lines. Each proposed hold carries an entity_key (the order) and an idempotency_key (stale-order-hold:SO-11290:hold); the executor serializes per entity and collapses repeats to DEDUP. If the 02:00 run crashes midway, the 02:00 run tomorrow re-proposes everything and only the unapplied holds execute.
The policy, propose-only first
The initial policy grants the operator nothing unattended. Every proposal is disposed as ALERT: it lands in the human approval queue instead of executing silently, and every disposition leaves a receipt you can compare against what the legacy script did that same night.
# fail-closed: anything not listed is refused
version: 1
allow:
- orders.hold
- notify.send
decision: ALERT # every action pauses for human confirmation
limits:
orders.hold:
max_per_run: 50 # cap the blast radius of any one run
single_flight: by_order_id
require_receipt: true
The first shadow run
Before even the ALERT stage, run the operator in shadow: a --dry-run prints the exact ExecutionPlan it would submit, including every idempotency key, and executes nothing. Compare its list against what the legacy script held that night.
fibric operators deploy ./stale-order-hold.ts --policy ./policy.yaml
fibric operators run stale-order-hold --dry-run
Parity on one night proves little; parity across a few weeks of nights, including the odd ones (empty result sets, upstream outages, month-end volume), is what earns the next rung.
The incremental cutover ladder
Cutover is a ladder, not a switch. Each rung widens what the operator may do unattended, and each promotion is justified by receipts from the rung below. The legacy automation keeps running until the final rung.
Rung 1: shadow
The operator runs propose-only: --dry-run by hand, or deployed with a policy whose only effect is producing an ALERT queue nobody approves from yet. The legacy path remains the only thing taking action. Your job on this rung is comparison: does the operator propose the same set of actions the script performs, night after night? Divergence in either direction is a finding, sometimes the operator is wrong, and sometimes it has caught a bug the script has had for years.
Rung 2: ALERT-gated
The new path starts acting, with a human confirming each action from the approval queue, and the legacy cron job is disabled for this automation. Actions are slow on this rung by design; what you are buying is a supervised history. The ALERT receipts, what was proposed, what the human decided, how often the human said no, are the evidence that decides the next promotion. See Trust tiers for how the escalation queue works and the guardrails guide for promotion criteria.
Rung 3: ALLOW within limits
Change the policy decision to ALLOW, keeping the caps: max_per_run, maxValue where money is involved, single-flight per entity. The operator now runs unattended inside a bounded envelope, and anything outside the envelope still disposes as BLOCK. Monitor the veto rate and the dedup rate as described in Monitor operators in production; a stable veto rate on this rung is the signature of a healthy migration.
Rung 4: old path off
Only when the operator has run unattended through enough real variation, including at least one upstream incident, does the legacy automation get decommissioned, per the checklist below. Until then it stays installed and pausable, because the honest failure mode of any migration is discovering the new path's gap three weeks in and needing the old one back in minutes.
Inside Fibric, two proposals collapse to one side effect only when they share an idempotency_key. Your legacy script is outside Fibric: it carries no keys, so nothing deduplicates between it and the operator. If both paths are live and act on the same entities, the target system takes the action twice. Keep the overlap safe one of two ways: hold the new path at propose-only (rungs 1 and 2 with the legacy cron disabled before approvals begin), or scope the two paths to disjoint entities, for example the operator takes orders in one region while the script keeps the rest. Do not run both paths live against the same entities and hope the hold API happens to be idempotent.
One transition deserves care: moving from rung 1 to rung 2 is the moment responsibility changes hands. Disable the legacy cron entry before the first approval is granted, in the same change window, so there is never an interval where both paths are acting on the same entities.
Decommissioning the old path
Decommissioning is its own step, done deliberately after rung 3 has held. Work through the list per automation:
- Confirm the evidence. The operator has run unattended for an agreed period (weeks, not days) with a stable veto rate, no unexplained
BLOCKreceipts, and at least one upstream incident weathered correctly. - Remove the trigger, keep the body. Delete the crontab entry or disable the zap first, and keep the script itself in version control. The trigger is the dangerous part; the code is the documentation of prior behavior.
- Revoke the script's credentials. The API keys and webhook URLs the script used should stop working. A decommissioned automation with live credentials is a resurrection waiting to happen, usually on the wrong day.
- Redirect its consumers. Anything that read the script's log file or depended on its side-effect timing now reads receipts, via
fibric receipts exportor the Receipts API. - Record the mapping. Note which operator replaced which automation and when. Six months later, this is the answer to "what happened to the thing that used to hold stale orders."
- Retire the host last. Only after every automation on a box is migrated does the box itself go. Shared cron hosts accumulate undocumented passengers; check
crontab -lfor every user before assuming it is empty.
Troubleshooting
| Symptom | Likely cause | What to do |
|---|---|---|
| Shadow run proposes more actions than the script performed | The script had an undocumented filter (often a hard-coded exclusion), or its query and the connector's orders.read paginate differently. |
Diff the entity lists, not the counts. Read the script for special cases; they become explicit filters in the operator or explicit rules in policy. |
| Shadow run proposes fewer actions than the script performed | The operator's sense query is narrower than the script's, or the connector's capability exposes a subset of the raw API. | Compare the raw upstream result against orders.read output with fibric connectors test. Widen the capability binding or query before touching policy. |
Every proposal disposes as BLOCK |
Fail-closed default: the policy does not list the capability, or a maxValue or predicate constraint fails on every action. |
Read one receipt with fibric receipts show <id> --json; the policy field names the rule or the fail-closed default. Fix the policy to say what you meant. |
Unexpected DEDUP receipts on the first real run |
The dry-run period or a manual --once already applied some actions, and the keys are doing their job. |
Nothing. This is the designed behavior: repeats collapse. Verify against the target system that each entity was acted on exactly once. |
| The target system shows an action applied twice during overlap | Both paths were live against the same entities. The legacy script carries no idempotency keys, so Fibric cannot deduplicate its work. | Disable one path now, then re-scope per the warning above. The receipts tell you exactly which actions the operator applied; the remainder came from the script. |
| Operator stopped proposing after cutover | The trigger source went quiet, or the operator was paused. Cron envelopes stop for different reasons than webhook envelopes. | Check fibric operators list for state and fibric connectors list for probe health, then follow the stall playbook in the monitoring guide. |
Keep going
- Monitor operators in production: the signals that tell you a migrated operator is healthy.
- Single-flight & idempotency: the primitives that replaced your lockfile and retry loop.
- Trust tiers: ALLOW, ALERT, and BLOCK in full, including the approval queue.
- The event envelope: the shape every trigger, cron tick included, travels in.
- Build an order-risk operator: a complete operator of the kind most script migrations end up as.
- Work in the sandbox: rehearse a migration end to end before touching the live system.