Notes On The New AI Product Manager

These are my notes on the role and responsibilities of an AI product manager. The job of a product manager and designer is evolving fast. Today’s AI PMs must deeply understand how models work, predict their trajectory and shape product direction around rapidly changing capabilities.

Model Sense is the new Product Sense. Run 5-10 personal evals on the latest model version with your actual data. Log where it hallucinates and where it surprises you. Maintain a living “capability map” doc. Spend some time reading model release notes and updating your mental model of what just got unlocked.
Build and maintain eval suites (prompt templates, test cases, confidence thresholds).
Write a monthly “6-month capability bet” memo: “We’re betting that multi-step reasoning will be reliable by Q3, so we’re scaffolding with chains now but not fine-tuning.” Review it in every roadmap grooming.
Host a weekly “Model Behavior Translation” sync where you demo flaky model outputs to eng, design and GTM, and facilitate the tradeoff call: “Do we ship with 85% accuracy or delay for a fix?”
Spend the first hour of your day probing the frontier: test new prompting techniques, scan arXiv/twitter for capability signals, update your “betting ledger” with new evidence. Run a weekly “Frontier Mapping” session with 2–3 engineers where you collectively red-team what the model almost can do.
Write one brutally clear “Build Intent” doc per week: one page, no jargon, that answers “what capability are we unlocking, for whom, and how will we know it works?” Prune 20% of scope from engineering tickets before sprint planning by asking: “does this need to be perfect, or just better than status quo?”
Reframe one feature request into a capability question. For eg.: Instead of “Add export CSV,” ask “What if AI could anticipate which data the user needs next and pre-generate it?” Run monthly capability-mapping workshops where you, design, and eng list all possible AI behaviors your data could enable, then prioritize by feasibility x user need.
Design prompt patterns and fallback flows. Write system prompts that encode brand voice; design “graceful degradation” interactions (e.g., “I’m not sure, here are three options, or I can ask a human”). Test boundary cases: “What happens if the user asks something etc.
Check your model performance dashboard first thing; if latency p95 spiked or accuracy dropped, escalate. “If user feedback rate on summaries falls below 4/5 for 3 days, auto-pause and review.” Plan quarterly “model refresh” sprints like you plan feature releases.
Build and stare at your “workflow completion” dashboard: not clicks. Run weekly trust calibration sessions: review 10 random AI decisions with users; ask “would you have done the same?” If trust dips, design interventions (explanations, undo flows, human-in-the-loop escalations).
Prototype in cursor yourself. Ship a clickable demo to eng. Co-design with engineers in real-time, sit with them if possible, tweak prompts together, and let them own some of the UX decisions within guardrails agreed. Run a weekly “build party” where PM, design, and eng each bring a prototype, then merge the best ideas.
Run ethics review before any launch: “What could go wrong for our most vulnerable user?” Build trust through transparency: write changelog posts that honestly explain what the AI can and can’t do.
Master AI tools, maintain a personal prompt library, automate your own busywork with GPTs, and share templates with your team. Your leverage comes from automating the PM grunt work (competitive research, first-draft specs) so you can focus on high-judgment calls: “Should we ship this now, or wait for GPT-6?”

Read other posts

< Every Product Manager should have a personal LLM Eval Framework . Working with Cursor: Rules, Commands, MCP servers, Modes, Hooks, Skills >