AI generation cost has collapsed for mockups, drafts, and code scaffolds. Verification cost has not. The locus of design value is moving upstream — and the operators who can hold that ground are the ones we are looking for. Below: open projects you can take on with us, the physics of what is changing, and roles we are hiring against.
Real, paid 1-3 week engagements with the Product.ai team. Each one is a problem we are working on at the frontier of product design — and the kind of work we hire against. Pick one that pulls at you and apply.
Last updated
In April 2026, product design at frontier-AI firms (Anthropic, OpenAI, Vercel, Linear, Cursor, Anysphere, Replit, Paradigm) is structured around one shared physics: AI generation cost collapsed for mockups, drafts, variants, and code scaffolds, while verification cost did not. The locus of design value moved upstream — from generation to verification. Senior practitioners are not rejecting AI. They are explicitly repositioning value above the assembly tier — calibrated taste, design-system architecture, accessibility correctness, behavioral verification instrumentation, and the framing of the right research question. Generation is commoditizing. The work above generation is concentrating.
The "Design Engineer" title is structurally distinct only at a small cohort of firms. Vercel is the canonical instance — multi-role commitment, dedicated director, salary parity with software engineering, and a published methodology that explicitly skips the designer-to-frontend handoff. Single-hire instances exist at Anthropic, Linear, Cursor, Replit, Resend, Browser Company, and Perplexity. The rest of the market uses the title without the structural commitment, which is why operators inside those firms feel the role as workload tax rather than leverage.
The methodological signature is consistent across every firm where the role functions: the designer-to-frontend handoff is skipped. The designer prototypes in code or sketches in Figma and iterates with the same person who ships to production. AI tooling (Cursor, Claude Code, Vercel v0) accelerates the convergence. It does not cause it. Production code-merge rights are universal — without them, the title reduces to "designer who codes prototypes."
Frontier teams operate in two stable equilibria: hyper-lean (Linear at ~120 headcount with single-digit designers) and hyper-scaling (Anthropic, OpenAI). The middle is unstable. The same role title produces multi-year tenures at firms with mature design systems, code-first tooling, and selection-population fit, and produces burned-out exits at firms missing those preconditions.
The "third population" is the load-bearing variable. Practitioners who find context-switching between design and code energizing — intrinsic taste motivation, public output bridging both modalities, founder-track temperament — function as leverage multipliers. Practitioners told to "also code" or "also design" experience the same scope expansion as workload tax. Same role label, opposite outcomes. The variance is explained by selection fit and structural configuration, not by the title.
The strongest portfolio filters in 2026 are deployed: a live URL to a shipped product the candidate primarily built beats a polished case study every time. AI tools collapsed shipping cost to ~hours, so non-shipping reads as uninterested or as avoiding falsifiability. GitHub commits that map to portfolio work — design tokens, component PRs, motion logic, accessibility fixes — distinguish the designer who shipped from the designer who handed off.
The next-strongest filters are behavioral. A senior designer forms a specific opinion about your product after using it for an hour — two defensible critiques minimum, not a critique disguised as praise. They name AI tools in their workflow with before-and-after artifact comparison (specific prompts, what changed, where the tool got in the way). And they describe what they cut. A senior designer who can name what they removed beats a senior designer who can only name what they added.
Eval theater is the default failure mode at AI-product firms. Comprehensive eval frameworks systematically discover problems weeks after users do. Anthropic's April 23 2026 Claude Code postmortem is canonical: three quality-degrading changes passed every internal eval, code review, end-to-end test, and dogfooding cycle. Users surfaced the degradation through /feedback. The recruiting filter inverts: designers who can articulate which evals produced false confidence are higher-signal than designers who built Braintrust dashboards.
Stated-trust survey scores systematically over-state actual reliance on AI. 74% of consumers rate trust 4-5/5; 93% verify before acting. Behavioral verification rates (citation-click, override, escalation) are the load-bearing trust signal — not survey scores. Senior designers who instrument the second beat senior designers who measure the first. The Trust Paradox is the same epistemic principle that makes shipped artifacts beat polished portfolios in hiring: behavior beats stated intent.
Five surfaces produce a coherent counter-narrative against AI velocity in design. Karri Saarinen ("Output Isn't Design," April 2026), Andy Budd ("peak designer," March 2026), Joel Lewenstein on synchronous co-creation at Config 2025, Smashing Magazine on homogenization, and Nielsen Norman Group on AI fatigue all converge on the same observation: senior practitioners are the calibrated taste-makers detecting what AI generates that's wrong before the market correction.
The Figma 2025 State of the Designer report contains the load-bearing data point: 78% of designers feel more efficient with AI and fewer than 50% feel "better at their job." Efficiency is real. Quality is degrading. Senior designers are the canaries detecting the second signal. A candidate who agrees with the senior counter-narrative voices specifically (not generically) is the higher-signal hire than one who recites the dominant narrative. The frontier is in the dissent, not the consensus.
Most of our best people came through projects, not interviews. If a project pulls at you and the trial goes well, the role conversation follows.
Twelve-minute Hireflix video, async. Then a 30-60 minute chemistry call. Then a paid 1-3 week project alongside the team. We will know within a week whether to move forward.