These are my notes on the role and responsibilities of an AI product manager. The job of a product manager and designer is evolving fast. Today’s AI PMs must deeply understand how models work, predict their trajectory and shape product direction around rapidly changing capabilities.
-
Model Sense is the new Product Sense. Run 5-10 personal evals on the latest model version with your actual data. Log where it hallucinates and where it surprises you. Maintain a living “capability map” doc. Spend some time reading model release notes and updating your mental model of what just got unlocked.
-
Build and maintain eval suites (prompt templates, test cases, confidence thresholds).
-
Write a monthly “6-month capability bet” memo: “We’re betting that multi-step reasoning will be reliable by Q3, so we’re scaffolding with chains now but not fine-tuning.” Review it in every roadmap grooming.
-
Host a weekly “Model Behavior Translation” sync where you demo flaky model outputs to eng, design and GTM, and facilitate the tradeoff call: “Do we ship with 85% accuracy or delay for a fix?”
-
Spend the first hour of your day probing the frontier: test new prompting techniques, scan arXiv/twitter for capability signals, update your “betting ledger” with new evidence. Run a weekly “Frontier Mapping” session with 2–3 engineers where you collectively red-team what the model almost can do.
-
Write one brutally clear “Build Intent” doc per week: one page, no jargon, that answers “what capability are we unlocking, for whom, and how will we know it works?” Prune 20% of scope from engineering tickets before sprint planning by asking: “does this need to be perfect, or just better than status quo?”
-
Reframe one feature request into a capability question. For eg.: Instead of “Add export CSV,” ask “What if AI could anticipate which data the user needs next and pre-generate it?” Run monthly capability-mapping workshops where you, design, and eng list all possible AI behaviors your data could enable, then prioritize by feasibility x user need.
-
Design prompt patterns and fallback flows. Write system prompts that encode brand voice; design “graceful degradation” interactions (e.g., “I’m not sure, here are three options, or I can ask a human”). Test boundary cases: “What happens if the user asks something etc.
-
Check your model performance dashboard first thing; if latency p95 spiked or accuracy dropped, escalate. “If user feedback rate on summaries falls below 4/5 for 3 days, auto-pause and review.” Plan quarterly “model refresh” sprints like you plan feature releases.
-
Build and stare at your “workflow completion” dashboard: not clicks. Run weekly trust calibration sessions: review 10 random AI decisions with users; ask “would you have done the same?” If trust dips, design interventions (explanations, undo flows, human-in-the-loop escalations).
-
Prototype in cursor yourself. Ship a clickable demo to eng. Co-design with engineers in real-time, sit with them if possible, tweak prompts together, and let them own some of the UX decisions within guardrails agreed. Run a weekly “build party” where PM, design, and eng each bring a prototype, then merge the best ideas.
-
Run ethics review before any launch: “What could go wrong for our most vulnerable user?” Build trust through transparency: write changelog posts that honestly explain what the AI can and can’t do.
-
Master AI tools, maintain a personal prompt library, automate your own busywork with GPTs, and share templates with your team. Your leverage comes from automating the PM grunt work (competitive research, first-draft specs) so you can focus on high-judgment calls: “Should we ship this now, or wait for GPT-6?”