623 lines
30 KiB
Markdown
623 lines
30 KiB
Markdown
|
|
# Agent Harness: GUI-to-CLI for Open Source Software
|
||
|
|
|
||
|
|
## Purpose
|
||
|
|
|
||
|
|
This harness provides a standard operating procedure (SOP) and toolkit for coding
|
||
|
|
agents (Claude Code, Codex, etc.) to build powerful, stateful CLI interfaces for
|
||
|
|
open-source GUI applications. The goal: let AI agents operate software that was
|
||
|
|
designed for humans, without needing a display or mouse.
|
||
|
|
|
||
|
|
## General SOP: Turning Any GUI App into an Agent-Usable CLI
|
||
|
|
|
||
|
|
### Phase 1: Codebase Analysis
|
||
|
|
|
||
|
|
1. **Identify the backend engine** — Most GUI apps separate presentation from logic.
|
||
|
|
Find the core library/framework (e.g., MLT for Shotcut, ImageMagick for GIMP).
|
||
|
|
2. **Map GUI actions to API calls** — Every button click, drag, and menu item
|
||
|
|
corresponds to a function call. Catalog these mappings.
|
||
|
|
3. **Identify the data model** — What file formats does it use? How is project state
|
||
|
|
represented? (XML, JSON, binary, database?)
|
||
|
|
4. **Find existing CLI tools** — Many backends ship their own CLI (`melt`, `ffmpeg`,
|
||
|
|
`convert`). These are building blocks.
|
||
|
|
5. **Catalog the command/undo system** — If the app has undo/redo, it likely uses a
|
||
|
|
command pattern. These commands are your CLI operations.
|
||
|
|
|
||
|
|
### Phase 2: CLI Architecture Design
|
||
|
|
|
||
|
|
1. **Choose the interaction model**:
|
||
|
|
- **Stateful REPL** for interactive sessions (agents that maintain context)
|
||
|
|
- **Subcommand CLI** for one-shot operations (scripting, pipelines)
|
||
|
|
- **Both** (recommended) — a CLI that works in both modes
|
||
|
|
|
||
|
|
2. **Define command groups** matching the app's logical domains:
|
||
|
|
- Project management (new, open, save, close)
|
||
|
|
- Core operations (the app's primary purpose)
|
||
|
|
- Import/Export (file I/O, format conversion)
|
||
|
|
- Configuration (settings, preferences, profiles)
|
||
|
|
- Session/State management (undo, redo, history, status)
|
||
|
|
|
||
|
|
3. **Design the state model**:
|
||
|
|
- What must persist between commands? (open project, cursor position, selection)
|
||
|
|
- Where is state stored? (in-memory for REPL, file-based for CLI)
|
||
|
|
- How does state serialize? (JSON session files)
|
||
|
|
|
||
|
|
4. **Plan the output format**:
|
||
|
|
- Human-readable (tables, colors) for interactive use
|
||
|
|
- Machine-readable (JSON) for agent consumption
|
||
|
|
- Both, controlled by `--json` flag
|
||
|
|
|
||
|
|
### Phase 3: Implementation
|
||
|
|
|
||
|
|
1. **Start with the data layer** — XML/JSON manipulation of project files
|
||
|
|
2. **Add probe/info commands** — Let agents inspect before they modify
|
||
|
|
3. **Add mutation commands** — One command per logical operation
|
||
|
|
4. **Add the backend integration** — A `utils/<software>_backend.py` module that
|
||
|
|
wraps the real software's CLI. This module handles:
|
||
|
|
- Finding the software executable (`shutil.which()`)
|
||
|
|
- Invoking it with proper arguments (`subprocess.run()`)
|
||
|
|
- Error handling with clear install instructions if not found
|
||
|
|
- Example (LibreOffice):
|
||
|
|
```python
|
||
|
|
# utils/lo_backend.py
|
||
|
|
def convert_odf_to(odf_path, output_format, output_path=None, overwrite=False):
|
||
|
|
lo = find_libreoffice() # raises RuntimeError with install instructions
|
||
|
|
subprocess.run([lo, "--headless", "--convert-to", output_format, ...])
|
||
|
|
return {"output": final_path, "format": output_format, "method": "libreoffice-headless"}
|
||
|
|
```
|
||
|
|
5. **Add rendering/export** — The export pipeline calls the backend module.
|
||
|
|
Generate valid intermediate files, then invoke the real software for conversion.
|
||
|
|
6. **Add session management** — State persistence, undo/redo
|
||
|
|
7. **Add the REPL with unified skin** — Interactive mode wrapping the subcommands.
|
||
|
|
- Copy `repl_skin.py` from the plugin (`cli-anything-plugin/repl_skin.py`) into
|
||
|
|
`utils/repl_skin.py` in your CLI package
|
||
|
|
- Import and use `ReplSkin` for the REPL interface:
|
||
|
|
```python
|
||
|
|
from cli_anything.<software>.utils.repl_skin import ReplSkin
|
||
|
|
|
||
|
|
skin = ReplSkin("<software>", version="1.0.0")
|
||
|
|
skin.print_banner() # Branded startup box
|
||
|
|
pt_session = skin.create_prompt_session() # prompt_toolkit with history + styling
|
||
|
|
line = skin.get_input(pt_session, project_name="my_project", modified=True)
|
||
|
|
skin.help(commands_dict) # Formatted help listing
|
||
|
|
skin.success("Saved") # ✓ green message
|
||
|
|
skin.error("Not found") # ✗ red message
|
||
|
|
skin.warning("Unsaved") # ⚠ yellow message
|
||
|
|
skin.info("Processing...") # ● blue message
|
||
|
|
skin.status("Key", "value") # Key-value status line
|
||
|
|
skin.table(headers, rows) # Formatted table
|
||
|
|
skin.progress(3, 10, "...") # Progress bar
|
||
|
|
skin.print_goodbye() # Styled exit message
|
||
|
|
```
|
||
|
|
- Make REPL the default behavior: use `invoke_without_command=True` on the main
|
||
|
|
Click group, and invoke the `repl` command when no subcommand is given:
|
||
|
|
```python
|
||
|
|
@click.group(invoke_without_command=True)
|
||
|
|
@click.pass_context
|
||
|
|
def cli(ctx, ...):
|
||
|
|
...
|
||
|
|
if ctx.invoked_subcommand is None:
|
||
|
|
ctx.invoke(repl, project_path=None)
|
||
|
|
```
|
||
|
|
- This ensures `cli-anything-<software>` with no arguments enters the REPL
|
||
|
|
|
||
|
|
### Phase 4: Test Planning (TEST.md - Part 1)
|
||
|
|
|
||
|
|
**BEFORE writing any test code**, create a `TEST.md` file in the
|
||
|
|
`agent-harness/cli_anything/<software>/tests/` directory. This file serves as your test plan and
|
||
|
|
MUST contain:
|
||
|
|
|
||
|
|
1. **Test Inventory Plan** — List planned test files and estimated test counts:
|
||
|
|
- `test_core.py`: XX unit tests planned
|
||
|
|
- `test_full_e2e.py`: XX E2E tests planned
|
||
|
|
|
||
|
|
2. **Unit Test Plan** — For each core module, describe what will be tested:
|
||
|
|
- Module name (e.g., `project.py`)
|
||
|
|
- Functions to test
|
||
|
|
- Edge cases to cover (invalid inputs, boundary conditions, error handling)
|
||
|
|
- Expected test count
|
||
|
|
|
||
|
|
3. **E2E Test Plan** — Describe the real-world scenarios to test:
|
||
|
|
- What workflows will be simulated?
|
||
|
|
- What real files will be generated/processed?
|
||
|
|
- What output properties will be verified?
|
||
|
|
- What format validations will be performed?
|
||
|
|
|
||
|
|
4. **Realistic Workflow Scenarios** — Detail each multi-step workflow:
|
||
|
|
- **Workflow name**: Brief title
|
||
|
|
- **Simulates**: What real-world task (e.g., "photo editing pipeline",
|
||
|
|
"podcast production", "product render setup")
|
||
|
|
- **Operations chained**: Step-by-step operations
|
||
|
|
- **Verified**: What output properties will be checked
|
||
|
|
|
||
|
|
This planning document ensures comprehensive test coverage before writing code.
|
||
|
|
|
||
|
|
### Phase 5: Test Implementation
|
||
|
|
|
||
|
|
Now write the actual test code based on the TEST.md plan:
|
||
|
|
|
||
|
|
1. **Unit tests** (`test_core.py`) — Every core function tested in isolation with
|
||
|
|
synthetic data. No external dependencies.
|
||
|
|
2. **E2E tests — intermediate files** (`test_full_e2e.py`) — Verify the project files
|
||
|
|
your CLI generates are structurally correct (valid XML, correct ZIP structure, etc.)
|
||
|
|
3. **E2E tests — true backend** (`test_full_e2e.py`) — **MUST invoke the real software.**
|
||
|
|
Create a project, export via the actual software backend, and verify the output:
|
||
|
|
- File exists and size > 0
|
||
|
|
- Correct format (PDF magic bytes `%PDF-`, DOCX/XLSX/PPTX is valid ZIP/OOXML, etc.)
|
||
|
|
- Content verification where possible (CSV contains expected data, etc.)
|
||
|
|
- **Print artifact paths** so users can manually inspect: `print(f"\n PDF: {path} ({size:,} bytes)")`
|
||
|
|
- **No graceful degradation** — if the software isn't installed, tests fail, not skip
|
||
|
|
4. **Output verification** — **Don't trust that export works just because it exits
|
||
|
|
successfully.** Verify outputs programmatically:
|
||
|
|
- Magic bytes / file format validation
|
||
|
|
- ZIP structure for OOXML formats (DOCX, XLSX, PPTX)
|
||
|
|
- Pixel-level analysis for video/images (probe frames, compare brightness)
|
||
|
|
- Audio analysis (RMS levels, spectral comparison)
|
||
|
|
- Duration/format checks against expected values
|
||
|
|
5. **CLI subprocess tests** — Test the installed CLI command as a real user/agent would.
|
||
|
|
The subprocess tests MUST also produce real final output (not just ODF intermediate).
|
||
|
|
Use the `_resolve_cli` helper to run the installed `cli-anything-<software>` command:
|
||
|
|
```python
|
||
|
|
def _resolve_cli(name):
|
||
|
|
"""Resolve installed CLI command; falls back to python -m for dev.
|
||
|
|
|
||
|
|
Set env CLI_ANYTHING_FORCE_INSTALLED=1 to require the installed command.
|
||
|
|
"""
|
||
|
|
import shutil
|
||
|
|
force = os.environ.get("CLI_ANYTHING_FORCE_INSTALLED", "").strip() == "1"
|
||
|
|
path = shutil.which(name)
|
||
|
|
if path:
|
||
|
|
print(f"[_resolve_cli] Using installed command: {path}")
|
||
|
|
return [path]
|
||
|
|
if force:
|
||
|
|
raise RuntimeError(f"{name} not found in PATH. Install with: pip install -e .")
|
||
|
|
module = name.replace("cli-anything-", "cli_anything.") + "." + name.split("-")[-1] + "_cli"
|
||
|
|
print(f"[_resolve_cli] Falling back to: {sys.executable} -m {module}")
|
||
|
|
return [sys.executable, "-m", module]
|
||
|
|
|
||
|
|
|
||
|
|
class TestCLISubprocess:
|
||
|
|
CLI_BASE = _resolve_cli("cli-anything-<software>")
|
||
|
|
|
||
|
|
def _run(self, args, check=True):
|
||
|
|
return subprocess.run(
|
||
|
|
self.CLI_BASE + args,
|
||
|
|
capture_output=True, text=True,
|
||
|
|
check=check,
|
||
|
|
)
|
||
|
|
|
||
|
|
def test_help(self):
|
||
|
|
result = self._run(["--help"])
|
||
|
|
assert result.returncode == 0
|
||
|
|
|
||
|
|
def test_project_new_json(self, tmp_dir):
|
||
|
|
out = os.path.join(tmp_dir, "test.json")
|
||
|
|
result = self._run(["--json", "project", "new", "-o", out])
|
||
|
|
assert result.returncode == 0
|
||
|
|
data = json.loads(result.stdout)
|
||
|
|
# ... verify structure
|
||
|
|
```
|
||
|
|
|
||
|
|
**Key rules for subprocess tests:**
|
||
|
|
- Always use `_resolve_cli("cli-anything-<software>")` — never hardcode
|
||
|
|
`sys.executable` or module paths directly
|
||
|
|
- Do NOT set `cwd` — installed commands must work from any directory
|
||
|
|
- Use `CLI_ANYTHING_FORCE_INSTALLED=1` in CI/release testing to ensure the
|
||
|
|
installed command (not a fallback) is being tested
|
||
|
|
- Test `--help`, `--json`, project creation, key commands, and full workflows
|
||
|
|
|
||
|
|
6. **Round-trip test** — Create project via CLI, open in GUI, verify correctness
|
||
|
|
7. **Agent test** — Have an AI agent complete a real task using only the CLI
|
||
|
|
|
||
|
|
### Phase 6: Test Documentation (TEST.md - Part 2)
|
||
|
|
|
||
|
|
After running all tests successfully, **append** to the existing TEST.md:
|
||
|
|
|
||
|
|
1. **Test Results** — Paste the full `pytest -v --tb=no` output showing all tests
|
||
|
|
passing with their names and status
|
||
|
|
2. **Summary Statistics** — Total tests, pass rate, execution time
|
||
|
|
3. **Coverage Notes** — Any gaps or areas not covered by tests
|
||
|
|
|
||
|
|
The TEST.md now serves as both the test plan (written before implementation) and
|
||
|
|
the test results documentation (appended after execution), providing a complete
|
||
|
|
record of the testing process.
|
||
|
|
|
||
|
|
## Critical Lessons Learned
|
||
|
|
|
||
|
|
### Use the Real Software — Don't Reimplement It
|
||
|
|
|
||
|
|
**This is the #1 rule.** The CLI MUST call the actual software for rendering and
|
||
|
|
export — not reimplement the software's functionality in Python.
|
||
|
|
|
||
|
|
**The anti-pattern:** Building a Pillow-based image compositor to replace GIMP,
|
||
|
|
or generating bpy scripts without ever calling Blender. This produces a toy that
|
||
|
|
can't handle real workloads and diverges from the actual software's behavior.
|
||
|
|
|
||
|
|
**The correct approach:**
|
||
|
|
1. **Use the software's CLI/scripting interface** as the backend:
|
||
|
|
- LibreOffice: `libreoffice --headless --convert-to pdf/docx/xlsx/pptx`
|
||
|
|
- Blender: `blender --background --python script.py`
|
||
|
|
- GIMP: `gimp -i -b '(script-fu-console-eval ...)'`
|
||
|
|
- Inkscape: `inkscape --actions="..." --export-filename=...`
|
||
|
|
- Shotcut/Kdenlive: `melt project.mlt -consumer avformat:output.mp4`
|
||
|
|
- Audacity: `sox` for effects processing
|
||
|
|
- OBS: `obs-websocket` protocol
|
||
|
|
|
||
|
|
2. **The software is a required dependency**, not optional. Add it to installation
|
||
|
|
instructions. The CLI is useless without the actual software.
|
||
|
|
|
||
|
|
3. **Generate valid project/intermediate files** (ODF, MLT XML, .blend, SVG, etc.)
|
||
|
|
then hand them to the real software for rendering. Your CLI is a structured
|
||
|
|
command-line interface to the software, not a replacement for it.
|
||
|
|
|
||
|
|
**Example — LibreOffice CLI export pipeline:**
|
||
|
|
```python
|
||
|
|
# 1. Build the document as a valid ODF file (our XML builder)
|
||
|
|
odf_path = write_odf(tmp_path, doc_type, project)
|
||
|
|
|
||
|
|
# 2. Convert via the REAL LibreOffice (not a reimplementation)
|
||
|
|
subprocess.run([
|
||
|
|
"libreoffice", "--headless",
|
||
|
|
"--convert-to", "pdf",
|
||
|
|
"--outdir", output_dir,
|
||
|
|
odf_path,
|
||
|
|
])
|
||
|
|
# Result: a real PDF rendered by LibreOffice's full engine
|
||
|
|
```
|
||
|
|
|
||
|
|
### The Rendering Gap
|
||
|
|
|
||
|
|
**This is the #2 pitfall.** Most GUI apps apply effects at render time via their
|
||
|
|
engine. When you build a CLI that manipulates project files directly, you must also
|
||
|
|
handle rendering — and naive approaches will silently drop effects.
|
||
|
|
|
||
|
|
**The problem:** Your CLI adds filters/effects to the project file format. But when
|
||
|
|
rendering, if you use a simple tool (e.g., ffmpeg concat demuxer), it reads raw
|
||
|
|
media files and **ignores** all project-level effects. The output looks identical to
|
||
|
|
the input. Users can't tell anything happened.
|
||
|
|
|
||
|
|
**The solution — a filter translation layer:**
|
||
|
|
1. **Best case:** Use the app's native renderer (`melt` for MLT projects). It reads
|
||
|
|
the project file and applies everything.
|
||
|
|
2. **Fallback:** Build a translation layer that converts project-format effects into
|
||
|
|
the rendering tool's native syntax (e.g., MLT filters → ffmpeg `-filter_complex`).
|
||
|
|
3. **Last resort:** Generate a render script the user can run manually.
|
||
|
|
|
||
|
|
**Priority order for rendering:** native engine → translated filtergraph → script.
|
||
|
|
|
||
|
|
### Filter Translation Pitfalls
|
||
|
|
|
||
|
|
When translating effects between formats (e.g., MLT → ffmpeg), watch for:
|
||
|
|
|
||
|
|
- **Duplicate filter types:** Some tools (ffmpeg) don't allow the same filter twice
|
||
|
|
in a chain. If your project has both `brightness` and `saturation` filters, and
|
||
|
|
both map to ffmpeg's `eq=`, you must **merge** them into a single `eq=brightness=X:saturation=Y`.
|
||
|
|
- **Ordering constraints:** ffmpeg's `concat` filter requires **interleaved** stream
|
||
|
|
ordering: `[v0][a0][v1][a1][v2][a2]`, NOT grouped `[v0][v1][v2][a0][a1][a2]`.
|
||
|
|
The error message ("media type mismatch") is cryptic if you don't know this.
|
||
|
|
- **Parameter space differences:** Effect parameters often use different scales.
|
||
|
|
MLT brightness `1.15` = +15%, but ffmpeg `eq=brightness=0.06` on a -1..1 scale.
|
||
|
|
Document every mapping explicitly.
|
||
|
|
- **Unmappable effects:** Some effects have no equivalent in the render tool. Handle
|
||
|
|
gracefully (warn, skip) rather than crash.
|
||
|
|
|
||
|
|
### Timecode Precision
|
||
|
|
|
||
|
|
Non-integer frame rates (29.97fps = 30000/1001) cause cumulative rounding errors:
|
||
|
|
|
||
|
|
- **Use `round()`, not `int()`** for float-to-frame conversion. `int(9000 * 29.97)`
|
||
|
|
truncates and loses frames; `round()` gets the right answer.
|
||
|
|
- **Use integer arithmetic for timecode display.** Convert frames → total milliseconds
|
||
|
|
via `round(frames * fps_den * 1000 / fps_num)`, then decompose with integer
|
||
|
|
division. Avoid intermediate floats that drift over long durations.
|
||
|
|
- **Accept ±1 frame tolerance** in roundtrip tests at non-integer FPS. Exact equality
|
||
|
|
is mathematically impossible.
|
||
|
|
|
||
|
|
### Output Verification Methodology
|
||
|
|
|
||
|
|
Never assume an export is correct just because it ran without errors. Verify:
|
||
|
|
|
||
|
|
```python
|
||
|
|
# Video: probe specific frames with ffmpeg
|
||
|
|
# Frame 0 for fade-in (should be near-black)
|
||
|
|
# Middle frames for color effects (compare brightness/saturation vs source)
|
||
|
|
# Last frame for fade-out (should be near-black)
|
||
|
|
|
||
|
|
# When comparing pixel values between different resolutions,
|
||
|
|
# exclude letterboxing/pillarboxing (black padding bars).
|
||
|
|
# A vertical video in a horizontal frame will have ~40% black pixels.
|
||
|
|
|
||
|
|
# Audio: check RMS levels at start/end for fades
|
||
|
|
# Compare spectral characteristics against source
|
||
|
|
```
|
||
|
|
|
||
|
|
### Testing Strategy
|
||
|
|
|
||
|
|
Four test layers with complementary purposes:
|
||
|
|
|
||
|
|
1. **Unit tests** (`test_core.py`): Synthetic data, no external dependencies. Tests
|
||
|
|
every function in isolation. Fast, deterministic, good for CI.
|
||
|
|
2. **E2E tests — native** (`test_full_e2e.py`): Tests the project file generation
|
||
|
|
pipeline (ODF structure, XML content, format validation). Verifies the
|
||
|
|
intermediate files your CLI produces are correct.
|
||
|
|
3. **E2E tests — true backend** (`test_full_e2e.py`): Invokes the **real software**
|
||
|
|
(LibreOffice, Blender, melt, etc.) to produce final output files (PDF, DOCX,
|
||
|
|
rendered images, videos). Verifies the output files:
|
||
|
|
- Exist and have size > 0
|
||
|
|
- Have correct format (magic bytes, ZIP structure, etc.)
|
||
|
|
- Contain expected content where verifiable
|
||
|
|
- **Print artifact paths** so users can manually inspect results
|
||
|
|
4. **CLI subprocess tests** (in `test_full_e2e.py`): Invokes the installed
|
||
|
|
`cli-anything-<software>` command via `subprocess.run` to run the full workflow
|
||
|
|
end-to-end: create project → add content → export via real software → verify output.
|
||
|
|
|
||
|
|
**No graceful degradation.** The real software MUST be installed. Tests must NOT
|
||
|
|
skip or fake results when the software is missing — the CLI is useless without it.
|
||
|
|
The software is a hard dependency, not optional.
|
||
|
|
|
||
|
|
**Example — true E2E test for LibreOffice:**
|
||
|
|
```python
|
||
|
|
class TestWriterToPDF:
|
||
|
|
def test_rich_writer_to_pdf(self, tmp_dir):
|
||
|
|
proj = create_document(doc_type="writer", name="Report")
|
||
|
|
add_heading(proj, text="Quarterly Report", level=1)
|
||
|
|
add_table(proj, rows=3, cols=3, data=[...])
|
||
|
|
|
||
|
|
pdf_path = os.path.join(tmp_dir, "report.pdf")
|
||
|
|
result = export(proj, pdf_path, preset="pdf", overwrite=True)
|
||
|
|
|
||
|
|
# Verify the REAL output file
|
||
|
|
assert os.path.exists(result["output"])
|
||
|
|
assert result["file_size"] > 1000 # Not suspiciously small
|
||
|
|
with open(result["output"], "rb") as f:
|
||
|
|
assert f.read(5) == b"%PDF-" # Validate format magic bytes
|
||
|
|
print(f"\n PDF: {result['output']} ({result['file_size']:,} bytes)")
|
||
|
|
|
||
|
|
|
||
|
|
class TestCLISubprocessE2E:
|
||
|
|
CLI_BASE = _resolve_cli("cli-anything-libreoffice")
|
||
|
|
|
||
|
|
def test_full_writer_pdf_workflow(self, tmp_dir):
|
||
|
|
proj_path = os.path.join(tmp_dir, "test.json")
|
||
|
|
pdf_path = os.path.join(tmp_dir, "output.pdf")
|
||
|
|
self._run(["document", "new", "-o", proj_path, "--type", "writer"])
|
||
|
|
self._run(["--project", proj_path, "writer", "add-heading", "-t", "Title"])
|
||
|
|
self._run(["--project", proj_path, "export", "render", pdf_path, "-p", "pdf", "--overwrite"])
|
||
|
|
assert os.path.exists(pdf_path)
|
||
|
|
with open(pdf_path, "rb") as f:
|
||
|
|
assert f.read(5) == b"%PDF-"
|
||
|
|
```
|
||
|
|
|
||
|
|
Run tests in force-installed mode to guarantee the real command is used:
|
||
|
|
```bash
|
||
|
|
CLI_ANYTHING_FORCE_INSTALLED=1 python3 -m pytest cli_anything/<software>/tests/ -v -s
|
||
|
|
```
|
||
|
|
The `-s` flag shows the `[_resolve_cli]` print output confirming which backend
|
||
|
|
is being used and **prints artifact paths** for manual inspection.
|
||
|
|
|
||
|
|
Real-world workflow test scenarios should include:
|
||
|
|
- Multi-segment editing (YouTube-style cut/trim)
|
||
|
|
- Montage assembly (many short clips)
|
||
|
|
- Picture-in-picture compositing
|
||
|
|
- Color grading pipelines
|
||
|
|
- Audio mixing (podcast-style)
|
||
|
|
- Heavy undo/redo stress testing
|
||
|
|
- Save/load round-trips of complex projects
|
||
|
|
- Iterative refinement (add, modify, remove, re-add)
|
||
|
|
|
||
|
|
## Key Principles
|
||
|
|
|
||
|
|
- **Use the real software** — The CLI MUST invoke the actual application for rendering
|
||
|
|
and export. Generate valid intermediate files (ODF, MLT XML, .blend, SVG), then hand
|
||
|
|
them to the real software. Never reimplement the rendering engine in Python.
|
||
|
|
- **The software is a hard dependency** — Not optional, not gracefully degraded. If
|
||
|
|
LibreOffice isn't installed, `cli-anything-libreoffice` must error clearly, not
|
||
|
|
silently produce inferior output with a fallback library.
|
||
|
|
- **Manipulate the native format directly** — Parse and modify the app's native project
|
||
|
|
files (MLT XML, ODF, SVG, etc.) as the data layer.
|
||
|
|
- **Leverage existing CLI tools** — Use `libreoffice --headless`, `blender --background`,
|
||
|
|
`melt`, `ffmpeg`, `inkscape --actions`, `sox` as subprocesses for rendering.
|
||
|
|
- **Verify rendering produces correct output** — See "The Rendering Gap" above.
|
||
|
|
- **E2E tests must produce real artifacts** — PDF, DOCX, rendered images, videos.
|
||
|
|
Print output paths so users can inspect. Never test only the intermediate format.
|
||
|
|
- **Fail loudly and clearly** — Agents need unambiguous error messages to self-correct.
|
||
|
|
- **Be idempotent where possible** — Running the same command twice should be safe.
|
||
|
|
- **Provide introspection** — `info`, `list`, `status` commands are critical for agents
|
||
|
|
to understand current state before acting.
|
||
|
|
- **JSON output mode** — Every command should support `--json` for machine parsing.
|
||
|
|
|
||
|
|
## Rules
|
||
|
|
|
||
|
|
- **The real software MUST be a hard dependency.** The CLI must invoke the actual
|
||
|
|
software (LibreOffice, Blender, GIMP, etc.) for rendering and export. Do NOT
|
||
|
|
reimplement rendering in Python. Do NOT gracefully degrade to a fallback library.
|
||
|
|
If the software is not installed, the CLI must error with clear install instructions.
|
||
|
|
- **Every `cli_anything/<software>/` directory MUST contain a `README.md`** that explains how to
|
||
|
|
install the software dependency, install the CLI, run tests, and shows basic usage.
|
||
|
|
- **E2E tests MUST invoke the real software** and produce real output files (PDF, DOCX,
|
||
|
|
rendered images, videos). Tests must verify output exists, has correct format, and
|
||
|
|
print artifact paths so users can inspect results. Never test only intermediate files.
|
||
|
|
- **Every export/render function MUST be verified** with programmatic output analysis
|
||
|
|
before being marked as working. "It ran without errors" is not sufficient.
|
||
|
|
- **Every filter/effect in the registry MUST have a corresponding render mapping**
|
||
|
|
or be explicitly documented as "project-only (not rendered)".
|
||
|
|
- **Test suites MUST include real-file E2E tests**, not just unit tests with synthetic
|
||
|
|
data. Format assumptions break constantly with real media.
|
||
|
|
- **E2E tests MUST include subprocess tests** that invoke the installed
|
||
|
|
`cli-anything-<software>` command via `_resolve_cli()`. Tests must work against
|
||
|
|
the actual installed package, not just source imports.
|
||
|
|
- **Every `cli_anything/<software>/tests/` directory MUST contain a `TEST.md`** documenting what the tests
|
||
|
|
cover, what realistic workflows are tested, and the full test results output.
|
||
|
|
- **Every CLI MUST use the unified REPL skin** (`repl_skin.py`) for the interactive mode.
|
||
|
|
Copy `cli-anything-plugin/repl_skin.py` to `utils/repl_skin.py` and use `ReplSkin`
|
||
|
|
for the banner, prompt, help, messages, and goodbye. REPL MUST be the default behavior
|
||
|
|
when the CLI is invoked without a subcommand (`invoke_without_command=True`).
|
||
|
|
|
||
|
|
## Directory Structure
|
||
|
|
|
||
|
|
```
|
||
|
|
<software>/
|
||
|
|
└── agent-harness/
|
||
|
|
├── <SOFTWARE>.md # Project-specific analysis and SOP
|
||
|
|
├── setup.py # PyPI package configuration (Phase 7)
|
||
|
|
├── cli_anything/ # Namespace package (NO __init__.py here)
|
||
|
|
│ └── <software>/ # Sub-package for this CLI
|
||
|
|
│ ├── __init__.py
|
||
|
|
│ ├── __main__.py # python3 -m cli_anything.<software>
|
||
|
|
│ ├── README.md # HOW TO RUN — required
|
||
|
|
│ ├── <software>_cli.py # Main CLI entry point (Click + REPL)
|
||
|
|
│ ├── core/ # Core modules (one per domain)
|
||
|
|
│ │ ├── __init__.py
|
||
|
|
│ │ ├── project.py # Project create/open/save/info
|
||
|
|
│ │ ├── ... # Domain-specific modules
|
||
|
|
│ │ ├── export.py # Render pipeline + filter translation
|
||
|
|
│ │ └── session.py # Stateful session, undo/redo
|
||
|
|
│ ├── utils/ # Shared utilities
|
||
|
|
│ │ ├── __init__.py
|
||
|
|
│ │ ├── <software>_backend.py # Backend: invokes the real software
|
||
|
|
│ │ └── repl_skin.py # Unified REPL skin (copy from plugin)
|
||
|
|
│ └── tests/ # Test suites
|
||
|
|
│ ├── TEST.md # Test documentation and results — required
|
||
|
|
│ ├── test_core.py # Unit tests (synthetic data)
|
||
|
|
│ └── test_full_e2e.py # E2E tests (real files)
|
||
|
|
└── examples/ # Example scripts and workflows
|
||
|
|
```
|
||
|
|
|
||
|
|
**Critical:** The `cli_anything/` directory must NOT contain an `__init__.py`.
|
||
|
|
This is what makes it a PEP 420 namespace package — multiple separately-installed
|
||
|
|
PyPI packages can each contribute a sub-package under `cli_anything/` without
|
||
|
|
conflicting. For example, `cli-anything-gimp` adds `cli_anything/gimp/` and
|
||
|
|
`cli-anything-blender` adds `cli_anything/blender/`, and both coexist in the
|
||
|
|
same Python environment.
|
||
|
|
|
||
|
|
Note: This HARNESS.md is part of the cli-anything-plugin. Individual software directories reference this file — do NOT duplicate it.
|
||
|
|
|
||
|
|
## Applying This to Other Software
|
||
|
|
|
||
|
|
This same SOP applies to any GUI application:
|
||
|
|
|
||
|
|
| Software | Backend CLI | Native Format | System Package | How the CLI Uses It |
|
||
|
|
|----------|-------------|---------------|----------------|-------------------|
|
||
|
|
| LibreOffice | `libreoffice --headless` | .odt/.ods/.odp (ODF ZIP) | `apt install libreoffice` | Generate ODF → convert to PDF/DOCX/XLSX/PPTX |
|
||
|
|
| Blender | `blender --background --python` | .blend-cli.json | `apt install blender` | Generate bpy script → Blender renders to PNG/MP4 |
|
||
|
|
| GIMP | `gimp -i -b '(script-fu ...)'` | .xcf | `apt install gimp` | Script-Fu commands → GIMP processes & exports |
|
||
|
|
| Inkscape | `inkscape --actions="..."` | .svg (XML) | `apt install inkscape` | Manipulate SVG → Inkscape exports to PNG/PDF |
|
||
|
|
| Shotcut/Kdenlive | `melt` or `ffmpeg` | .mlt (XML) | `apt install melt ffmpeg` | Build MLT XML → melt/ffmpeg renders video |
|
||
|
|
| Audacity | `sox` | .aup3 | `apt install sox` | Generate sox commands → sox processes audio |
|
||
|
|
| OBS Studio | `obs-websocket` | scene.json | `apt install obs-studio` | WebSocket API → OBS captures/records |
|
||
|
|
|
||
|
|
**The software is a required dependency, not optional.** The CLI generates valid
|
||
|
|
intermediate files (ODF, MLT XML, bpy scripts, SVG) and hands them to the real
|
||
|
|
software for rendering. This is what makes the CLI actually useful — it's a
|
||
|
|
command-line interface TO the software, not a replacement for it.
|
||
|
|
|
||
|
|
The pattern is always the same: **build the data → call the real software → verify
|
||
|
|
the output**.
|
||
|
|
|
||
|
|
### Phase 7: PyPI Publishing and Installation
|
||
|
|
|
||
|
|
After building and testing the CLI, make it installable and discoverable.
|
||
|
|
|
||
|
|
All cli-anything CLIs use **PEP 420 namespace packages** under the shared
|
||
|
|
`cli_anything` namespace. This allows multiple CLI packages to be installed
|
||
|
|
side-by-side in the same Python environment without conflicts.
|
||
|
|
|
||
|
|
1. **Structure the package** as a namespace package:
|
||
|
|
```
|
||
|
|
agent-harness/
|
||
|
|
├── setup.py
|
||
|
|
└── cli_anything/ # NO __init__.py here (namespace package)
|
||
|
|
└── <software>/ # e.g., gimp, blender, audacity
|
||
|
|
├── __init__.py # HAS __init__.py (regular sub-package)
|
||
|
|
├── <software>_cli.py
|
||
|
|
├── core/
|
||
|
|
├── utils/
|
||
|
|
└── tests/
|
||
|
|
```
|
||
|
|
|
||
|
|
The key rule: `cli_anything/` has **no** `__init__.py`. Each sub-package
|
||
|
|
(`gimp/`, `blender/`, etc.) **does** have `__init__.py`. This is what
|
||
|
|
enables multiple packages to contribute to the same namespace.
|
||
|
|
|
||
|
|
2. **Create setup.py** in the `agent-harness/` directory:
|
||
|
|
```python
|
||
|
|
from setuptools import setup, find_namespace_packages
|
||
|
|
|
||
|
|
setup(
|
||
|
|
name="cli-anything-<software>",
|
||
|
|
version="1.0.0",
|
||
|
|
packages=find_namespace_packages(include=["cli_anything.*"]),
|
||
|
|
install_requires=[
|
||
|
|
"click>=8.0.0",
|
||
|
|
"prompt-toolkit>=3.0.0",
|
||
|
|
# Add Python library dependencies here
|
||
|
|
],
|
||
|
|
entry_points={
|
||
|
|
"console_scripts": [
|
||
|
|
"cli-anything-<software>=cli_anything.<software>.<software>_cli:main",
|
||
|
|
],
|
||
|
|
},
|
||
|
|
python_requires=">=3.10",
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Important details:**
|
||
|
|
- Use `find_namespace_packages`, NOT `find_packages`
|
||
|
|
- Use `include=["cli_anything.*"]` to scope discovery
|
||
|
|
- Entry point format: `cli_anything.<software>.<software>_cli:main`
|
||
|
|
- The **system package** (LibreOffice, Blender, etc.) is a **hard dependency**
|
||
|
|
that cannot be expressed in `install_requires`. Document it in README.md and
|
||
|
|
have the backend module raise a clear error with install instructions:
|
||
|
|
```python
|
||
|
|
# In utils/<software>_backend.py
|
||
|
|
def find_<software>():
|
||
|
|
path = shutil.which("<software>")
|
||
|
|
if path:
|
||
|
|
return path
|
||
|
|
raise RuntimeError(
|
||
|
|
"<Software> is not installed. Install it with:\n"
|
||
|
|
" apt install <software> # Debian/Ubuntu\n"
|
||
|
|
" brew install <software> # macOS"
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **All imports** use the `cli_anything.<software>` prefix:
|
||
|
|
```python
|
||
|
|
from cli_anything.gimp.core.project import create_project
|
||
|
|
from cli_anything.gimp.core.session import Session
|
||
|
|
from cli_anything.blender.core.scene import create_scene
|
||
|
|
```
|
||
|
|
|
||
|
|
4. **Test local installation**:
|
||
|
|
```bash
|
||
|
|
cd /root/cli-anything/<software>/agent-harness
|
||
|
|
pip install -e .
|
||
|
|
```
|
||
|
|
|
||
|
|
5. **Verify PATH installation**:
|
||
|
|
```bash
|
||
|
|
which cli-anything-<software>
|
||
|
|
cli-anything-<software> --help
|
||
|
|
```
|
||
|
|
|
||
|
|
6. **Run tests against the installed command**:
|
||
|
|
```bash
|
||
|
|
cd /root/cli-anything/<software>/agent-harness
|
||
|
|
CLI_ANYTHING_FORCE_INSTALLED=1 python3 -m pytest cli_anything/<software>/tests/ -v -s
|
||
|
|
```
|
||
|
|
The output must show `[_resolve_cli] Using installed command: /path/to/cli-anything-<software>`
|
||
|
|
confirming subprocess tests ran against the real installed binary, not a module fallback.
|
||
|
|
|
||
|
|
7. **Verify namespace works across packages** (when multiple CLIs installed):
|
||
|
|
```python
|
||
|
|
import cli_anything.gimp
|
||
|
|
import cli_anything.blender
|
||
|
|
# Both resolve to their respective source directories
|
||
|
|
```
|
||
|
|
|
||
|
|
**Why namespace packages:**
|
||
|
|
- Multiple CLIs coexist in the same Python environment without conflicts
|
||
|
|
- Clean, organized imports under a single `cli_anything` namespace
|
||
|
|
- Each CLI is independently installable/uninstallable via pip
|
||
|
|
- Agents can discover all installed CLIs via `cli_anything.*`
|
||
|
|
- Standard Python packaging — no hacks or workarounds
|