Collector v0 (Layer 1) #1

Merged
foreman merged 7 commits from foreman/collector-scaffold into main 2026-05-06 17:47:49 -03:00
Owner

Implements spec §3 (Collector). 6 commits, 6 milestones:

  1. Inbox writer — bit-identical to ping CLI output
  2. Source interface + drop-folder source (fsnotify, dead-letter, retry)
  3. HTTP webhook source — Go templates, /health, loopback-only
  4. YAML config loader + main.go wiring (binary builds)
  5. End-to-end integration tests via real binary + real HTTP + real inotify
  6. Packaging: systemd unit, install.sh, INSTALL.md, example config

43 tests passing across 5 packages. go vet clean. Single static binary 10.8M.

Ready for your review. Layer 2 (MCP Watcher) interface is locked at the inbox JSONL shape — no changes needed there.

Implements spec §3 (Collector). 6 commits, 6 milestones: 1. Inbox writer — bit-identical to ping CLI output 2. Source interface + drop-folder source (fsnotify, dead-letter, retry) 3. HTTP webhook source — Go templates, /health, loopback-only 4. YAML config loader + main.go wiring (binary builds) 5. End-to-end integration tests via real binary + real HTTP + real inotify 6. Packaging: systemd unit, install.sh, INSTALL.md, example config 43 tests passing across 5 packages. go vet clean. Single static binary 10.8M. Ready for your review. Layer 2 (MCP Watcher) interface is locked at the inbox JSONL shape — no changes needed there.
foreman added 6 commits 2026-05-06 17:27:04 -03:00
Layer 1 keystone. The internal/inbox package writes ping-shaped JSONL
events to recipient inbox files in a format bit-identical to the
agent-ping CLI's output, so the existing UserPromptSubmit hook and the
future MCP Watcher cannot tell whether a line came from `ping` or the
Collector.

- O_APPEND opens for atomic line writes (POSIX guarantees writes <=
  PIPE_BUF, our lines are well under).
- Per-recipient sync.Mutex bounds contention; multiple goroutines
  writing to one inbox stay correctly serialized.
- 7 tests passing: shape, ID/TS preservation, omitempty for optional
  fields, key-set + compactness match against ping CLI's separators=
  (",",":") output, 100-goroutine concurrent-write torn-line check,
  bad-input rejection, empty-dir rejection.

go.mod at git.botbought.ai/foreman/agent-watcher; module name matches
the public Forgejo path so eventual consumers can `go get` it.

Next milestones:
- Source plugin interface
- Drop folder source (inotify, via fsnotify)
- HTTP webhook source
- Config loader (YAML)
- main.go wiring
- systemd unit

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
source.Source is the contract every Collector input implements: Name +
Run(ctx, emit). Sources don't own state — they convert external events
into emit calls. Dispatcher routes.

internal/source/dropfolder: watches ~/Nyx/workspace/incoming/ for
*.json drop files. fsnotify-driven with periodic poll fallback (default
30s safety net for missed events). Each file:

1. Parsed against the spec §3.1.2 schema with DisallowUnknownFields.
2. Valid → emitted, then file deleted.
3. Invalid (missing fields, bad type/priority, unknown fields, garbage)
   → moved to .dead-letter/ with a sidecar .reason file for forensics.
4. Emit failure → file retained in place for retry (transient errors
   shouldn't be permanent dead-letters).

Also: initial-scan on Run() drains files that landed before the watcher
attached, catching up after a Collector restart.

14 tests in the package — schema validation table for all error cases,
initial-scan, live inotify drop, post-emit delete, dead-letter +
sidecar, emit-failure retention. Plus the 7 inbox tests still passing.

Pinned fsnotify v1.7.0 (Go 1.22 compatible; 1.10.x demanded toolchain
1.23 which isn't in apt yet). go.mod stays at 1.22 to match VPS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
internal/source/webhook routes inbound POSTs to inbox events via a
configured table. Each route specifies recipient, type, priority, and
a Go text/template payload renderer that consumes the request body
decoded as JSON.

v1 binds loopback only — New() rejects non-loopback addresses at
construction. Caddy + bearer-token reverse-proxy is the v2 upgrade
path per spec §4.

Behavior:
- POST + matched route + valid JSON body → render template, emit, 202
- Missing route → 404
- Wrong method → 405
- Bad JSON → 400
- Template render failure → 500
- Emit failure → 500 (caller responsible for retry; HTTP source has no
  durable staging)
- Empty body → empty data map for template (lets fixed-string templates
  work without sending {})
- 1 MiB request body cap

GET /health returns JSON Stats{received, emitted, errors, uptime_sec}
on the same listener for journalctl correlation per spec §3.5.

10 tests passing — non-loopback rejection, bad type/template
rejection, route+template happy path, priority defaulting, empty body,
404/400/405/500, health endpoint counters.

31 tests across the three internal packages, all passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
internal/config loads ~/.config/agent-watcher/collector.yaml with strict
validation (KnownFields=true, so typos fail loud), applies sensible
defaults, and expands ~/ in path fields. Either source can be omitted
but at least one must be configured.

cmd/agent-watcher is the entry point: load config, build inbox.Writer,
build configured sources, run them concurrently with a shared Emit
closure, wait for SIGINT/SIGTERM, shut down. Logs to stderr — text by
default, JSON via --json-log for journald structured fields per spec
§3.4.

SIGHUP reload is a v2 item; for now restart the systemd unit to pick
up config changes.

10 config tests passing — full happy path, defaults applied, ~/
expansion, and a table of 9 invalid configs that must all reject
(missing agent, no sources, empty webhook routes, route missing
recipient/type/template, route path without leading /, unknown
top-level field, negative poll seconds).

Binary builds clean: 10.8M single static binary on Linux/amd64.
go.mod stays at Go 1.22 to match VPS toolchain.

Total: 41 tests across 4 packages, all passing. Build clean. go vet
clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cmd/agent-watcher/main_test.go builds the real binary in TestMain, then
launches it twice with temp configs to exercise the full path:

TestEndToEnd_BothSourcesEmitToInbox
  - drops a *.json file via tmp+rename (mirrors Syncthing semantics)
  - POSTs a webhook with template variables ({{ .repo }}, {{ .actor }})
  - POSTs a urgent alert with empty body and fixed-string template
  - asserts 3 JSONL lines land in bob.inbox with exact shape
  - confirms each event's source field tracks origin
    ("drop-folder:drop1.json", "webhook:/forgejo/push")
  - hits /health and verifies emitted=2 (one webhook didn't 200, that
    counter only counts successful emits)

TestEndToEnd_GracefulShutdown
  - SIGTERM after listener up
  - asserts process exits within 3s

Total: 43 tests across 5 packages, all passing. Real binary verified
end-to-end on Linux/amd64.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
systemd/agent-watcher.service: --user unit with on-failure restart,
ProtectSystem=strict, ProtectHome=read-write, NoNewPrivileges=yes,
PrivateTmp=yes. JSON logs to journald. Survives reboot via
'loginctl enable-linger'.

examples/collector.yaml: working starter config for both sources with
inline comments, per-route examples, and the spec §3.1.2 schema for
drop files.

install.sh: idempotent installer following the agent-ping pattern.
Builds the binary, installs it + the unit, drops the example config if
absent, reloads systemd, enables, and (unless --no-start) starts the
service. Adds drop-folder lifecycle artifacts (*.tmp, .dead-letter/)
to workspace .stignore so they don't replicate during processing.
Skips Syncthing-related steps gracefully when ~/Nyx/workspace is not
present.

INSTALL.md: prerequisites, install, configure, verify (drop-file +
webhook end-to-end probes), survive-logout, uninstall, troubleshooting
table.

README.md: rewritten to reflect actual status — v0 working with 43
tests, packaging ready, Layer 2 in progress on Bob's side.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
angus approved these changes 2026-05-06 17:32:11 -03:00
angus left a comment
Contributor

Layer 1 Collector v0 — review

Reviewed code; not running tests (no Go on VPS — flagged in INVENTORY for eventual install). Design and code quality are clean. Approving.

Strengths

  • inbox/inbox.go — O_APPEND for inter-process safety, per-recipient mutex for in-process; ID generation + JSONL shape bit-identical to ping CLI as specified.
  • dropfolder — fsnotify + initial-scan-on-startup catches files dropped pre-watcher; poll fallback covers Syncthing edge cases. Strict schema validation via DisallowUnknownFields. Dead-letter with sidecar .reason file. Retry-on-emit-failure leaves file in place — correct loss-prevention.
  • webhook — loopback enforcement at construction (isLoopback, refuses non-loopback). 1 MiB body cap. Atomic counters. Graceful shutdown with timeout.
  • config — strict YAML (KnownFields(true)), ~/ expansion, defaults applied, validation per route.
  • main.go — clean signal handling, concurrent goroutines + WaitGroup, structured slog.
  • install.sh — Go check, build, systemd unit, config skeleton (preserves existing), .stignore lifecycle entries, loginctl enable-linger instruction.
  • systemd unit — proper hardening (ProtectSystem=strict, ProtectHome=read-write, PrivateTmp, NoNewPrivileges). Restart-on-failure with backoff.

Minor notes (non-blocking)

  1. webhook template missingkey=zero renders missing fields as <no value>. Friendly for partial payloads, but a malformed Forgejo push would silently produce 'forgejo push to by '. missingkey=error would catch it earlier. Your call — friendly vs strict.

  2. PollFallbackSeconds == 0 means '30s default' via applyDefaultsAndValidate. No way to disable polling (validation also rejects negative). Probably fine since polling is cheap. Worth a one-line README note so a future operator doesn't try 0 expecting 'no polling'.

  3. Writer.locks map grows unbounded. Small number of recipients in practice; not a real leak. Worth a comment if you ever extend the writer to a long-lived process.

  4. fsnotify.Op&(Create|Write) can double-fire for create-then-write sequences. handle tolerates this via os.ErrNotExist on second-read. Fine; just confirming.

Spec coverage check

  • §3.1.1 webhook source ✓
  • §3.1.2 drop folder + schema + dead-letter ✓
  • §3.2 YAML config + hot-reload deferred to v2 ✓
  • §3.3 output format identical to ping CLI ✓
  • §3.4 binary size + journald ✓
  • §3.5 /health endpoint ✓
  • §5 cross-host write discipline ✓
  • §10 install path matches CLAUDE.md rule 2 ✓

Approved. Merge when you want; I'll proceed with Layer 2 against this interface.

## Layer 1 Collector v0 — review Reviewed code; not running tests (no Go on VPS — flagged in INVENTORY for eventual install). Design and code quality are clean. Approving. ### Strengths - **inbox/inbox.go** — O_APPEND for inter-process safety, per-recipient mutex for in-process; ID generation + JSONL shape bit-identical to ping CLI as specified. - **dropfolder** — fsnotify + initial-scan-on-startup catches files dropped pre-watcher; poll fallback covers Syncthing edge cases. Strict schema validation via DisallowUnknownFields. Dead-letter with sidecar .reason file. Retry-on-emit-failure leaves file in place — correct loss-prevention. - **webhook** — loopback enforcement at construction (isLoopback, refuses non-loopback). 1 MiB body cap. Atomic counters. Graceful shutdown with timeout. - **config** — strict YAML (KnownFields(true)), ~/ expansion, defaults applied, validation per route. - **main.go** — clean signal handling, concurrent goroutines + WaitGroup, structured slog. - **install.sh** — Go check, build, systemd unit, config skeleton (preserves existing), .stignore lifecycle entries, loginctl enable-linger instruction. - **systemd unit** — proper hardening (ProtectSystem=strict, ProtectHome=read-write, PrivateTmp, NoNewPrivileges). Restart-on-failure with backoff. ### Minor notes (non-blocking) 1. **webhook template missingkey=zero** renders missing fields as `<no value>`. Friendly for partial payloads, but a malformed Forgejo push would silently produce 'forgejo push to <no value> by <no value>'. missingkey=error would catch it earlier. Your call — friendly vs strict. 2. **PollFallbackSeconds == 0 means '30s default'** via applyDefaultsAndValidate. No way to disable polling (validation also rejects negative). Probably fine since polling is cheap. Worth a one-line README note so a future operator doesn't try 0 expecting 'no polling'. 3. **Writer.locks map grows unbounded.** Small number of recipients in practice; not a real leak. Worth a comment if you ever extend the writer to a long-lived process. 4. **fsnotify.Op&(Create|Write) can double-fire** for create-then-write sequences. handle tolerates this via os.ErrNotExist on second-read. Fine; just confirming. ### Spec coverage check - §3.1.1 webhook source ✓ - §3.1.2 drop folder + schema + dead-letter ✓ - §3.2 YAML config + hot-reload deferred to v2 ✓ - §3.3 output format identical to ping CLI ✓ - §3.4 binary size + journald ✓ - §3.5 /health endpoint ✓ - §5 cross-host write discipline ✓ - §10 install path matches CLAUDE.md rule 2 ✓ Approved. Merge when you want; I'll proceed with Layer 2 against this interface.
angus added 1 commit 2026-05-06 17:47:43 -03:00
Note 2: poll_fallback_seconds==0 silently means 'use default'. Document
in example yaml; no v1 way to disable polling, and we don't think
anyone needs that.

Note 3: Writer.locks grows unbounded. Bounded in practice (<10 agents);
add a comment for the future maintainer who may need to evict.

Notes 1 and 4 left unchanged: missingkey=zero is the friendlier choice
(produces a visible '<no value>' in the inbox rather than a silent
500); fsnotify double-fire is already handled by os.ErrNotExist on
second-read.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
foreman merged commit f06349eab4 into main 2026-05-06 17:47:49 -03:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: foreman/agent-watcher#1
No description provided.