Rebuilding the Quiz editor used by thousands of Shopify brands

Status: In progress · Shipping iteratively · Data being collected Role: Product Manager at Octane AI · Nov 2025 – present Team: Working closely with the Director of Engineering, design, and customer-facing teams

TL;DR

The Quiz editor at Octane AI is a mature product thousands of Shopify brands rely on to turn shopper conversations into personalized commerce. It had been doing its job for years — and was also showing real signs of strain. [NEEDS YOUR INPUT: one specific sentence — what was the strongest signal that a rewrite was warranted? An NPS drop? A specific pattern across tickets? A churn cohort? Pick the single most concrete observation.]
I joined as PM in November 2025 and [NEEDS YOUR INPUT: timeframe — "within the first weeks", "after my first round of customer interviews"] proposed the rewrite.
We're shipping iteratively, not in a single relaunch. [NEEDS YOUR INPUT: one sentence summarizing what's already in production vs. still ahead.]
Data is being collected as we ship. The "What we're measuring" table below stays live — published mid-flight, not after the fact.

Context

Octane AI is an AI-powered commerce platform built for Shopify. The Quiz editor is one of its core surfaces: it's how merchants build the quizzes that turn first-time shoppers into segmented customer profiles, used downstream for personalized product recommendations, email flows, and conversational follow-ups.

Thousands of Shopify brands use it daily. By the time I joined as PM in November 2025, the editor was a much-loved product — and also one carrying [NEEDS YOUR INPUT: your own framing — "years of UX debt"? "patterns that no longer matched how merchants actually built quizzes in 2025"? "feature accretion without a unifying model"?].

The problem

[NEEDS YOUR INPUT: 2-4 paragraphs in your voice describing what was actually wrong. The scaffolding below shows the shape — replace the bracketed pieces with the real specifics from your tickets, NPS comments, and merchant conversations.]

User feedback consistently pointed at [PAIN POINT 1 — pick the loudest one from real tickets. Examples of the shape: "creating a quiz took longer than it should", "merchants couldn't preview the shopper experience without leaving the editor", "the surface didn't reflect how merchants actually thought about their funnel"].

[PAIN POINT 2 — the second most common pattern].

[PAIN POINT 3, if there's a third — otherwise cut].

The business impact was concrete: [NEEDS YOUR INPUT: was it churn in a specific merchant segment? Support volume? Time-to-launch for new accounts? Conversion drop somewhere in the funnel?].

[QUOTE: pull a real user quote from a ticket, NPS comment, or interview if you have permission. Even an anonymized one with attribution like "— Shopify brand, 2025 NPS survey" is more powerful than the prose around it.]

The approach

The pattern was visible from the outside before I joined — internal hypotheses already existed about what was wrong. I didn't want to take those at face value, so I started with discovery: [NEEDS YOUR INPUT: what did you actually do in your first weeks? Examples: "ran 1:1 calls with N merchants across pricing tiers", "reviewed the last 6 months of support tickets tagged 'editor'", "watched 20 Loom recordings of merchants using the editor live", "shadowed onboarding sessions"].

That work surfaced [NEEDS YOUR INPUT: 1 sentence — what did discovery confirm? What did it change? Often the most interesting answer is "it confirmed X but reframed Y" — that nuance is what recruiters notice.].

Three principles shaped how I scoped the rewrite:

[Principle 1 — replace with the real one. Examples: "Don't rebuild what already works", "Preserve every merchant's existing quizzes through the migration", "Ship the new editor alongside the old one until adoption proves the new model"] — [1-2 sentences on why this principle, and what tradeoff it forced].
[Principle 2] — [1-2 sentences].
[Principle 3] — [1-2 sentences].

Selling the rewrite internally meant [NEEDS YOUR INPUT: 1-2 sentences. Who needed to be convinced? Was there resistance? What moved the conversation — discovery data, a prototype, a tradeoff diagram? This is the section that signals "this PM can navigate an org," which is half of what recruiters are reading for.].

What shipped so far

`[NEEDS YOUR INPUT: List of releases. Rough is fine — I can format. Shape:

Month YYYY — Release name / capability — 1 line on what it changes for merchants
Month YYYY — Release name / capability — 1 line on what it changes
Month YYYY — Release name / capability — 1 line on what it changes

If you'd rather group by theme (e.g., "Editor UX", "Preview", "Migration") than by date, say the word and I'll restructure. Both formats work — chronological is more honest about iteration, thematic is easier to skim.]`

What we're measuring

We're collecting baseline and post-launch data on the metrics that map to the original problem. Some baselines are already in hand, others are still being gathered, and most "current" numbers will only be meaningful once the next release cycle lands. I'm publishing this table mid-flight rather than waiting for a clean retrospective — partly because honest in-progress measurement is more useful than after-the-fact storytelling, and partly because the rewrite isn't done.

Metric	Baseline	Target	Current	Status
NPS (editor-specific)	`[DATA]`	`[DATA]`	`[DATA]`	`[STATUS]`
Time to create a quiz	`[DATA]`	`[DATA]`	`[DATA]`	`[STATUS]`
Feature adoption (% merchants using new capabilities)	`[DATA]`	`[DATA]`	`[DATA]`	`[STATUS]`
Support ticket volume (editor-related)	`[DATA]`	`[DATA]`	`[DATA]`	`[STATUS]`
Churn attributable to editor friction	`[DATA]`	`[DATA]`	`[DATA]`	`[STATUS]`

[NEEDS YOUR INPUT: are there other metrics on the dashboard worth showing? Candidates: quiz completion rate on the shopper side, time-to-first-published-quiz for new merchants, % of new quizzes built in the new editor vs the legacy one. Cut anything you're not actually tracking — empty placeholders hurt credibility more than fewer columns do.]

A note on data honesty: numbers will land in this table as they're collected. If a metric moves the wrong way I'll publish that too — a portfolio that only shows wins is a portfolio nobody trusts.

What I've learned so far

`[NEEDS YOUR INPUT: 3-5 honest learnings from the middle of the work. The strongest patterns I'd push you toward:

A specific moment where you changed direction (e.g., "We started by rebuilding the canvas; merchant interviews in February made it clear the bigger pain was preview, not creation. We re-sequenced the roadmap.")
A piece of feedback that reframed scope (e.g., "Three calls in, the same word — 'fragile' — kept coming up. That changed how we thought about migration.")
A tradeoff you weighed and would defend (e.g., "We chose to ship behind a flag rather than gate the rollout on every feature shipping first. That bought us learning speed at the cost of two months of duplicate code paths.")
Something specific about the AI/PM/commerce intersection that surprised you

Three sharp, honest learnings beats five generic ones. "Discovery matters" doesn't count. "Watching a merchant try to undo a quiz change taught me more about state management than any spec I'd written" does.]`

What's next

[NEEDS YOUR INPUT: 1 short paragraph. Not the big-picture vision (you asked me to scope this case study to the editor rewrite specifically), but the next 1-2 releases on the roadmap and what they're aimed at. Keep it concrete.]

Working on AI in commerce, or thinking about how to scope a product rewrite without breaking what already works? I'd genuinely like to talk.

→ rafaehlers@gmail.com · LinkedIn · GitHub gists