Project Open to Alpha Team

Layer 4 Human-Decision Compliance Instrumentation — measure adoption-decay and override patterns on one Product.ai surface

Build Layer 4 instrumentation on one Product.ai surface — the surface where the agent's recommendation reaches a human and the human acts on it, overrides it, or ignores it. Track override patterns, adoption-decay curves, cognitive switching cost signatures by user cohort. Output is a working instrumentation pipeline producing real Layer 4 metrics on real users. This is the layer every commercial LLMOps vendor (Galileo, Arize Phoenix, Patronus, LangSmith, Maxim AI, DeepEval) does NOT cover.

Project Overview

Discipline

AI Systems — Data Scientist / ML Engineer · AI Systems — AI Engineer · product-manager-agent-commerce

Duration

3 weeks

Compensation

Your stated freelance rate

Surface

Product.ai · Truth Graph · Agent commerce

Kernels

productai · truth-graph · agent-commerce

Outcomes

chat-expert · dev-integrate · truth-graph-depth

Tier

Consequential

Alpha Team

Open to alpha members who want to take this on

Tooling

Claude Code or Co-work

Why we want this done

The Data Science Phase 3 briefing identifies system-level evaluation as Product.ai's highest-leverage second-act differentiation, conditional on customer pull. Four-layer eval stack: Layer 1 (model) and Layer 2 (agent trace) are commodity vendor coverage. Layer 3 (workflow execution) is partially covered by PAE / AgentEval / AgentCompass. Layer 4 (human-decision compliance) is uncovered by every published vendor and academic methodology. The Stripe canonical failure (model worked offline, agents adopted existing workflows and ignored ML prompts) is the canonical case — a recommendation can be 100% correct and 0% acted on. Hyperscalers (AWS Bedrock AgentCore GA March 31 2026, Anthropic, OpenAI, Google Cloud, Azure agent runtimes through 2026-2027) close the window on generalist Layer 1-3 within 12-18 months. Vertical specialization on Layer 4 is the durable moat ground for Product.ai. The candidate ships the v1 instrumentation on one surface — proves the methodology can deliver real signal — and produces the substrate that becomes the second-act differentiation.

Scope

Pick one Product.ai surface where Layer 4 dynamics are observable (Alloy AI agent, Product.ai chat, the upcoming external MCP server, SimplyCodes recommendation surface)
Define Layer 4 measurement primitives: override rate, adoption-decay curve, cognitive switching cost signature, override-pattern by cohort, escalation rate, "ignored" rate
Instrument the surface — capture each primitive in production with structured event capture
Build the dashboard surface (Cortex CEO Dashboard module or AIOS dashboard module) showing the four-to-six primitives, scannable, drill-in tray per signal
Run the instrumentation through at least one production cycle with real users
Write the methodology doc — what each primitive measures, what failure modes it catches that Layer 1-3 evals do not, where the methodology is fragile or gap-flagged
The doc is what becomes Product.ai's vertical-specialization moat material — write it for the audience of an enterprise commerce customer evaluating system-level eval

What success looks like

All four-to-six primitives capture real data on the chosen surface during the trial window
One real surprise emerges — a user behavior the team had not seen in Layer 1-3 metrics
The dashboard surface follows aios/dashboard/CLAUDE.md Hard Rules
The methodology doc is concrete enough that an enterprise customer reading it can decide whether they want it deployed in their org
The candidate explicitly distinguishes Layer 4 from Layer 1-3 in writing — "ARC + Alloy answers 'is it true?'; Layer 4 answers 'is it acted on?'; conflation is brand erosion" (per axiom A17)

References

references.md

Data Science Phase 3 briefing axiom A15 (Four-Layer Eval Stack), A16 (Stripe Canonical Failure), A17 (ARC + Alloy ≠ System-Level Eval), A18 (Hyperscaler Window Closes 12-18 Months), VERDICT 6 (System-Level Evaluation as Differentiation)
Stripe LLM-powered support response failure (canonical case study)
AWS Bedrock AgentCore documentation (March 31 2026 GA)
Decades of human-machine compliance research (healthcare CDSS, aviation, military) for theoretical grounding
Existing Product.ai surfaces and their Layer 1-3 instrumentation
aios/dashboard/CLAUDE.md for design system; Cortex Dashboard architecture for Layer 4 visualization

Constraints

Claude Code or Co-work as primary substrate
Privacy-respecting instrumentation — explicit user consent and PII handling documented
Self-hostable eval substrate if any Layer 1-3 substrate is touched
Atomic HTML+JS commit on dashboard work per CLAUDE.md §5.2 rule 3
Methodology doc is written for an external commerce-vertical customer audience (the doc is the moat)
IP separation: surface is application-layer; methodology paths are out of scope; ARC + Alloy is a distinct capability (do not conflate)
3-week duration cap; if scope creeps, the candidate negotiates the trade-off explicitly

Apply

Read the Codex (10 min)

The operating principles we work by. If they resonate, the rest of this will land. Open the Codex →

12-minute video screen

Hireflix, async. Questions are calibrated to this project specifically.

Chemistry call (30-60 min)

Direct call with the CEO. Strategic alignment and mutual fit. No problem-solving exercise.

Project begins within 2-3 weeks

1099 contractor agreement, NDA, paid at your stated rate. Day 1 in Santa Monica.

Alpha Team members can take this project without the screen-and-call sequence. Reach out via the Alpha Team channel.

Apply for this trial Browse other projects

Agent View · application/ld+json

{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@id": "https://product.ai/#organization"
    },
    {
      "@type": "SoftwareApplication",
      "@id": "https://product.ai/#application",
      "name": "Product.ai",
      "slogan": "Intelligent guidance at conversation speed",
      "description": "Product.ai is your AI shopping expert - verified product intelligence grounded in the Truth Graph, accessible to agents and humans alike.",
      "applicationCategory": "ShoppingApplication",
      "operatingSystem": "Web",
      "author": {
        "@id": "https://product.ai/#organization"
      },
      "offers": {
        "@type": "Offer",
        "price": "0",
        "priceCurrency": "USD"
      }
    },
    {
      "@type": "WebPage",
      "@id": "https://product.ai/join/projects/PRJ-29-layer-4-human-decision-compliance/#webpage",
      "name": "Layer 4 Human-Decision Compliance Instrumentation — measure adoption-decay and override patterns on one Product.ai surface — Product.ai",
      "description": "Build Layer 4 instrumentation on one Product.ai surface — the surface where the agent's recommendation reaches a human and the human acts on it, overrides it, or ignores it. Track override patterns, adoption-decay curves, cognitive switching cost signatures by user cohort. Out...",
      "url": "https://product.ai/join/projects/PRJ-29-layer-4-human-decision-compliance/",
      "isPartOf": {
        "@id": "https://product.ai/#website"
      },
      "about": {
        "@id": "https://product.ai/#organization"
      },
      "publisher": {
        "@id": "https://product.ai/#organization"
      }
    },
    {
      "@type": "JobPosting",
      "title": "Layer 4 Human-Decision Compliance Instrumentation — measure adoption-decay and override patterns on one Product.ai surface",
      "description": "Build Layer 4 instrumentation on one Product.ai surface — the surface where the agent's recommendation reaches a human and the human acts on it, overrides it, or ignores it. Track override patterns, adoption-decay curves, cognitive switching cost signatures by user cohort. Out...",
      "hiringOrganization": {
        "@id": "https://product.ai/#organization"
      },
      "employmentType": "CONTRACTOR",
      "datePosted": "2026-04-28",
      "jobLocation": {
        "@type": "Place",
        "address": {
          "@type": "PostalAddress",
          "addressLocality": "Los Angeles",
          "addressRegion": "CA",
          "addressCountry": "US"
        }
      },
      "baseSalary": {
        "@type": "MonetaryAmount",
        "currency": "USD",
        "value": {
          "@type": "QuantitativeValue",
          "unitText": "WEEK"
        }
      },
      "workHours": "3 weeks"
    },
    {
      "@type": "BreadcrumbList",
      "itemListElement": [
        {
          "@type": "ListItem",
          "position": 1,
          "name": "Product.ai",
          "item": "https://product.ai"
        },
        {
          "@type": "ListItem",
          "position": 2,
          "name": "Join",
          "item": "https://product.ai/join/"
        },
        {
          "@type": "ListItem",
          "position": 3,
          "name": "Projects",
          "item": "https://product.ai/join/projects/"
        },
        {
          "@type": "ListItem",
          "position": 4,
          "name": "Layer 4 Human-Decision Compliance Instrumentation — measure adoption-decay and override patterns on one Product.ai surface",
          "item": "https://product.ai/join/projects/PRJ-29-layer-4-human-decision-compliance/"
        }
      ]
    }
  ]
}