Product Design Jobs & Trial Projects at Product.ai

Open challenges

Frontier product design challenges we are working on right now.

Real, paid 1-3 week engagements with the Product.ai team. Each one is a problem we are working on at the frontier of product design — and the kind of work we hire against. Pick one that pulls at you and apply.

Verification Confidence Indicator — calibrated trust UI across chat, MCP, and extension

2 weeks · Product.ai + Consumer experience · Consequential Open to Alpha Team

A small, generalizable UI component that shows how confident Product.ai is in its own answer, with the evidence trace one click away.

Experience Paradigm v0.1 — define what verified-truth commerce feels like as a consumer experience

3 weeks · Product.ai + Consumer experience · Consequential Open to Alpha Team

A founding designer's first attempt at the question "what does verified-truth commerce feel like for a consumer?" Output is a working v0.1 of the Product.ai experience — chat + verdict surface — that a non-technical shopper can use end-t...

Multi-Surface Trust Pattern Library — how trust signals travel across web, chat, MCP, and extension

2 weeks · Product.ai + Brand · Applied Open to Alpha Team Draft

Audit the four Product.ai surfaces (web, chat, MCP, extension) for how each currently signals trust.

SimplyCodes Verification Layer — surface verified-truth physics inside the consumer commerce engine

2 weeks · SimplyCodes + Product.ai · Applied Open to Alpha Team Draft

Take the verification primitives Product.ai uses (verdict states, evidence trace, confidence calibration) and adapt them for SimplyCodes.

Hello-World Audit + Top-3 Ships — read the surfaces, ship your three highest-leverage calls

1 week · Product.ai + SimplyCodes · Foundational Open to Alpha Team

A first-week orientation project that doubles as a high-signal diagnostic.

Agent Confidence-Signaling Component — how Alloy speaks confidence to other AI agents and to the human in the loop

2 weeks · Agent commerce + Truth Graph · Consequential Open to Alpha Team Draft

Design and ship a confidence-signaling component for Alloy — Product.ai's local-first agentic workbench.

Browse all open challenges across disciplines

Discipline physics

Where product design is heading at frontier firms.

Last updated 2026-05-02

In April 2026, product design at frontier-AI firms (Anthropic, OpenAI, Vercel, Linear, Cursor, Anysphere, Replit, Paradigm) is structured around one shared physics: AI generation cost collapsed for mockups, drafts, variants, and code scaffolds, while verification cost did not. The locus of design value moved upstream — from generation to verification. Senior practitioners are not rejecting AI. They are explicitly repositioning value above the assembly tier — calibrated taste, design-system architecture, accessibility correctness, behavioral verification instrumentation, and the framing of the right research question. Generation is commoditizing. The work above generation is concentrating.

The Design Engineer hybrid is real but rare

The "Design Engineer" title is structurally distinct only at a small cohort of firms. Vercel is the canonical instance — multi-role commitment, dedicated director, salary parity with software engineering, and a published methodology that explicitly skips the designer-to-frontend handoff. Single-hire instances exist at Anthropic, Linear, Cursor, Replit, Resend, Browser Company, and Perplexity. The rest of the market uses the title without the structural commitment, which is why operators inside those firms feel the role as workload tax rather than leverage.

The methodological signature is consistent across every firm where the role functions: the designer-to-frontend handoff is skipped. The designer prototypes in code or sketches in Figma and iterates with the same person who ships to production. AI tooling (Cursor, Claude Code, Vercel v0) accelerates the convergence. It does not cause it. Production code-merge rights are universal — without them, the title reduces to "designer who codes prototypes."

Where 10x designers concentrate now

Frontier teams operate in two stable equilibria: hyper-lean (Linear at ~120 headcount with single-digit designers) and hyper-scaling (Anthropic, OpenAI). The middle is unstable. The same role title produces multi-year tenures at firms with mature design systems, code-first tooling, and selection-population fit, and produces burned-out exits at firms missing those preconditions.

The "third population" is the load-bearing variable. Practitioners who find context-switching between design and code energizing — intrinsic taste motivation, public output bridging both modalities, founder-track temperament — function as leverage multipliers. Practitioners told to "also code" or "also design" experience the same scope expansion as workload tax. Same role label, opposite outcomes. The variance is explained by selection fit and structural configuration, not by the title.

What strong portfolios look like in 2026

The strongest portfolio filters in 2026 are deployed: a live URL to a shipped product the candidate primarily built beats a polished case study every time. AI tools collapsed shipping cost to ~hours, so non-shipping reads as uninterested or as avoiding falsifiability. GitHub commits that map to portfolio work — design tokens, component PRs, motion logic, accessibility fixes — distinguish the designer who shipped from the designer who handed off.

The next-strongest filters are behavioral. A senior designer forms a specific opinion about your product after using it for an hour — two defensible critiques minimum, not a critique disguised as praise. They name AI tools in their workflow with before-and-after artifact comparison (specific prompts, what changed, where the tool got in the way). And they describe what they cut. A senior designer who can name what they removed beats a senior designer who can only name what they added.

What the industry got wrong

Eval theater is the default failure mode at AI-product firms. Comprehensive eval frameworks systematically discover problems weeks after users do. Anthropic's April 23 2026 Claude Code postmortem is canonical: three quality-degrading changes passed every internal eval, code review, end-to-end test, and dogfooding cycle. Users surfaced the degradation through /feedback. The recruiting filter inverts: designers who can articulate which evals produced false confidence are higher-signal than designers who built Braintrust dashboards.

Stated-trust survey scores systematically over-state actual reliance on AI. 74% of consumers rate trust 4-5/5; 93% verify before acting. Behavioral verification rates (citation-click, override, escalation) are the load-bearing trust signal — not survey scores. Senior designers who instrument the second beat senior designers who measure the first. The Trust Paradox is the same epistemic principle that makes shipped artifacts beat polished portfolios in hiring: behavior beats stated intent.

The senior counter-narrative

Five surfaces produce a coherent counter-narrative against AI velocity in design. Karri Saarinen ("Output Isn't Design," April 2026), Andy Budd ("peak designer," March 2026), Joel Lewenstein on synchronous co-creation at Config 2025, Smashing Magazine on homogenization, and Nielsen Norman Group on AI fatigue all converge on the same observation: senior practitioners are the calibrated taste-makers detecting what AI generates that's wrong before the market correction.

The Figma 2025 State of the Designer report contains the load-bearing data point: 78% of designers feel more efficient with AI and fewer than 50% feel "better at their job." Efficiency is real. Quality is degrading. Senior designers are the canaries detecting the second signal. A candidate who agrees with the senior counter-narrative voices specifically (not generically) is the higher-signal hire than one who recites the dominant narrative. The frontier is in the dissent, not the consensus.

Open roles

Full-time product design roles open right now.

Most of our best people came through projects, not interviews. If a project pulls at you and the trial goes well, the role conversation follows.

Senior Product Designer

Hiring · $225k - $300k · HQ - Los Angeles

→

Browse all open roles

Other disciplines

Working at the edge of an adjacent discipline?

AI Systems Engineering Product

Apply

If a product design challenge above is the kind of work you want to be doing this month, send a screen.

Twelve-minute Hireflix video, async. Then a 30-60 minute chemistry call. Then a paid 1-3 week project alongside the team. We will know within a week whether to move forward.

Browse open challenges Read the Codex

Where is product designheading at frontier firmsin 2026?