Product.ai / Join / Projects / Agent Commerce PRD v1 — write the spec for one external-facing MCP capability with eval criteria scoped correctly
Project Open to Alpha Team

Agent Commerce PRD v1 — write the spec for one external-facing MCP capability with eval criteria scoped correctly

Write the v1 PRD for one external-facing MCP capability — the surface other AI agents will call to query Product.ai's verified commerce knowledge. The PRD covers the JTBD, the API surface, success criteria, eval methodology, trajectory vs outcome metrics scoped correctly to the use class, failure modes the design accepts, and the explicit non-goals. The deliverable is a PRD an engineer could implement against without re-asking the PM.
Project Overview
Discipline
founding-product-manager · product-manager-agent-commerce · AI Systems — AI Engineer
Duration
2 weeks
Compensation
Your stated freelance rate
Surface
Agent commerce · Product.ai · Truth Graph
Kernels
agent-commerce · productai · truth-graph
Outcomes
dev-integrate · agent-infra · truth-graph-depth
Tier
Consequential
Alpha Team
Open to alpha members who want to take this on
Tooling
Claude Code or Co-work

Why we want this done

Agent commerce is the multi-decade thesis. Product.ai's "embedded in the AI fabric" framing depends on AI agents calling our verified commerce intelligence from their reasoning loops at scale. The agent-commerce kernel is unambiguous: protocol commerce dominates browser automation by 10-100x economics and 99% vs 60-70% reliability. Today, Product.ai has internal MCP infrastructure but no external-facing MCP capability with the production-grade PRD an engineer can implement against. The first PM hire will need to write this kind of spec for SimplyCodes; an Agent-Commerce PM hire (next wave) will need to write it for Alloy/MCP. This project tests whether the candidate can think in agent-commerce physics, scope eval methodology to the actual use class (trajectory metrics for multi-turn agentic use, outcome metrics for single-shot lookup), and produce an artifact engineering can implement.

Scope

  1. Read agent-commerce kernel and truth-graph kernel end-to-end — physics is non-optional
  2. Pick one capability — verdict-by-product-id, evidence-trace, merchant-truth-claim, deal-validity, or another candidate (the candidate proposes; we pressure-test)
  3. JTBD — name the user (the calling agent's goal) and the agent-commerce primitive Product.ai exposes
  4. API surface — request shape, response shape, schema, confidence + evidence semantics, error semantics
  5. Success criteria — what shipping it means; what "good" looks like for the calling agent
  6. Eval methodology — trajectory metrics (for multi-turn use) AND outcome metrics (for single-shot use); explicit scoping argument for which applies where
  7. Failure modes the design accepts — edge cases the v1 does not handle and why
  8. Non-goals — what this capability is explicitly NOT
  9. One-page architectural decision record — what was rejected, why
  10. Two-page PRD an engineer can implement against

What success looks like

  • The PRD reads like a Stripe API doc — concrete, opinionated, no marketing language
  • An engineer could pick it up cold and start implementing without re-asking the PM for clarification
  • Trajectory vs outcome metrics scoping is defensible — Probe 4 (Trajectory-Metric Scoping) from the PM Phase 3 briefing applied correctly: not over-applied to single-shot, not under-applied to multi-turn
  • Failure modes are named honestly (not "no known failure modes")
  • Non-goals are explicit (the v1 does X and not Y; Y is on the v2 list)
  • The candidate consulted the agent-commerce Cascade Architecture and made it visible in the design

References

references.md
PM Phase 3 briefing axiom D4 (Trajectory-Metric Scoping), VERDICT 7 (Eval as Engineering Hygiene, NOT PM Deliverable)
AI Engineering Phase 3 briefing axioms C3 (MCP Bifurcation), F4 (Phase-Sequenced Adoption)
Agent-commerce kernel — A-1 Protocol Economics, Cascade Architecture, code-gating primitive
Truth-graph kernel — verdict primitives, evidence chain, confidence semantics
Anthropic MCP specification (April 2026)
Existing Product.ai MCP servers as architectural reference
Galileo's agentic-system compound-error-decay data (60% → 25% over 8 runs) cited in PM briefing axiom D4

Constraints

  • Claude Code or Co-work as primary substrate
  • The PRD is two pages; the ADR is one page; expansion is a fail
  • Trajectory vs outcome metrics scoping must be argued explicitly — the universal-trajectory-metrics application is over-engineering theater per axiom D4
  • The candidate may not specify implementation details that lock engineering into a specific framework — engineering owns that
  • IP separation: capability is application-layer; methodology paths are out of scope
  • Schema parity with internal MCP servers where the same primitive exists — no fragmentation across internal/external
Apply
01

Read the Codex (10 min)

The operating principles we work by. If they resonate, the rest of this will land. Open the Codex →

02

12-minute video screen

Hireflix, async. Questions are calibrated to this project specifically.

03

Chemistry call (30-60 min)

Direct call with the CEO. Strategic alignment and mutual fit. No problem-solving exercise.

04

Project begins within 2-3 weeks

1099 contractor agreement, NDA, paid at your stated rate. Day 1 in Santa Monica.

Alpha Team members can take this project without the screen-and-call sequence. Reach out via the Alpha Team channel.