Prioritizing Product Ideas Using ICE Confidence Scoring
ICE scoring teaches you to evaluate and rank competing product ideas by systematically scoring each on Impact, Confidence, and Ease, producing a prioritized list that tells your team which hypotheses to test first.
List each product idea and score it from 1 to 10 on three dimensions: Impact (how much it moves a target metric), Confidence (how much evidence supports your assumptions), and Ease (how quickly and cheaply you can test or ship it). Multiply the three scores together, then rank ideas by their composite ICE score. Prioritize high-scoring ideas first, treating the ranking as a starting point for discussion rather than a rigid queue.
Outcome: You produce a rank-ordered list of product ideas with transparent, defensible scores that the team can use to decide which hypotheses to test next, replacing gut-feel debates with structured reasoning.
Prerequisites
- A populated idea bank with at least 5-10 candidate product ideas
- Defined product goals with measurable key results (so you can assess Impact against something concrete)
- Basic familiarity with the GIST Planning Framework layers (Goals, Ideas, Step-projects, Tasks)
- Access to any available data: analytics dashboards, user research summaries, or competitive analysis
Overview
ICE scoring is a lightweight prioritization technique that converts a messy list of product ideas into a rank-ordered backlog. Each idea gets a score from 1 to 10 on three dimensions: Impact, Confidence, and Ease. The three scores are multiplied together to produce a composite number, and ideas are sorted from highest to lowest. The output is a prioritized list your team can act on immediately when deciding which experiments to run or which features to build next. This skill sits squarely in the Ideas layer of the GIST Planning Framework, bridging the gap between a broad idea bank and the focused step-projects that actually test those ideas.
The specific problem ICE scoring solves is decision paralysis. Product teams routinely accumulate dozens or hundreds of ideas from customer feedback, stakeholder requests, competitive research, and internal brainstorms. Without a consistent evaluation method, teams default to whoever argues loudest or whichever idea the most senior person likes. ICE scoring replaces that dynamic with a shared vocabulary and a repeatable process. It forces every participant to make their assumptions explicit, especially through the Confidence dimension, which directly penalizes ideas built on wishful thinking. The artifact you produce is a scored spreadsheet or table where every idea has a visible rationale, making it easy to revisit priorities when new evidence arrives.
ICE scoring is also one of the most frequently referenced frameworks in product manager interview questions about prioritization. Interviewers use it to test whether candidates can structure ambiguous trade-offs, articulate assumptions, and defend a ranking under pushback. Mastering ICE scoring does not just help you ship better products. It gives you a portable mental model for explaining your decision-making process clearly, which matters in interviews, stakeholder reviews, and cross-functional planning sessions alike.
The technique is deliberately simple. You can run your first ICE scoring session in under an hour with nothing more than a spreadsheet and your team's current knowledge. The simplicity is a feature: it keeps the cost of prioritization low so you can re-score frequently as you learn. More elaborate frameworks like RICE or weighted scoring matrices exist, but ICE's speed makes it the right default for teams operating in high-uncertainty environments where priorities shift quarterly or faster.
How It Works
ICE scoring works by decomposing the question "should we work on this idea?" into three independent judgments, then combining them multiplicatively so that weakness on any single dimension drags the total score down.
Impact measures how much an idea would move a target metric if it succeeds. The metric should come directly from your product goals. If your goal is "increase trial-to-paid conversion from 8% to 12%," then a high-Impact idea is one that, if it works, would meaningfully close that gap. A score of 10 means the idea could single-handedly achieve or nearly achieve the goal. A score of 1 means the idea would produce a barely detectable change even in the best case. The key mental model is to evaluate Impact assuming the idea works exactly as hoped. You are not discounting for risk here. That is what Confidence is for.
Confidence measures how much evidence you have that your Impact and Ease estimates are correct. This is the dimension most teams underweight, and it is the dimension that makes ICE scoring genuinely useful rather than just another opinion-averaging exercise. A Confidence score of 10 means you have strong, direct evidence: you ran a prototype test, you have analogous data from a comparable feature launch, or multiple independent user research studies point to the same conclusion. A score of 1 means you are guessing based on intuition alone with no supporting data. The reason Confidence is scored separately rather than baked into Impact is that it forces the team to distinguish between "this would be huge if true" and "we actually know this would be huge." Ideas with high Impact but low Confidence are not bad ideas. They are ideas that need a cheap validation experiment before you commit significant resources.
Ease measures how quickly and cheaply you can deliver a testable version of the idea. Note that Ease is about the smallest meaningful test, not the full production-quality implementation. If you can validate an idea with a two-week prototype, Ease reflects the cost of that prototype, not the cost of scaling it to every user. A score of 10 means you could run a test within days using existing infrastructure and a small team. A score of 1 means the idea requires months of engineering work, new infrastructure, regulatory approvals, or dependencies on external partners before you can learn anything.
The multiplication formula (I x C x E) is important because it creates a natural penalty for any dimension that scores very low. An idea with Impact 10, Confidence 2, and Ease 9 scores 180, while an idea with Impact 7, Confidence 7, and Ease 7 scores 343. The second idea is less exciting on paper but far more likely to produce real value. This multiplicative property steers teams away from moonshots with no evidence and toward solid bets with strong supporting data, which aligns with the GIST Planning Framework's emphasis on iterative learning over big-bang launches.
One common misconception is that the ICE score is a decision. It is not. It is a conversation starter. The ranked list surfaces which ideas deserve discussion and which can be safely deprioritized. Two ideas with similar scores may differ in strategic alignment, team morale impact, or dependencies that the three dimensions do not capture. The score gets you to that conversation faster by eliminating the obvious low-priority items and highlighting the genuine contenders.
Step-by-Step
Step 1: Assemble Your Idea List and Scoring Team
Pull your current idea bank into a single list. If you have been using the idea banking skill, export the active ideas that have not yet been prioritized or that need re-scoring due to new information. Each idea should have a one-sentence description and a clear hypothesis ("We believe [action] will cause [outcome] for [audience]"). Recruit 3-5 people to participate in scoring.
Include at least one person from engineering (for realistic Ease estimates), one from product or design (for Impact framing), and one from a customer-facing role like sales or support (for grounding in real user needs). Avoid groups larger than 7, which slow the process without improving accuracy. Share the idea list and the product goals 24 hours before the session so participants arrive with informed opinions rather than cold reactions.
Tip: Remove ideas that are clearly out of scope before the session. If an idea requires a technology your team cannot access or contradicts a confirmed strategic constraint, pre-filter it. Scoring obviously infeasible ideas wastes the group's attention budget.
Step 2: Align on Your Impact Metric
Before scoring anything, confirm which metric or key result Impact will be measured against. Pull this directly from your product goals. If your goal has multiple key results, choose the primary one for this scoring session, or run separate scoring rounds per key result if the idea list maps to different goals. Write the chosen metric at the top of the scoring sheet so every participant references the same target.
" This step prevents the most common failure mode in ICE scoring: different people evaluating Impact against different unstated criteria. Without alignment, one person scores an idea high because it improves NPS while another scores the same idea low because it does not affect revenue, and the resulting average is meaningless.
Tip: If your team cannot agree on which metric to use, that is a signal you need to revisit your goal-setting process before prioritizing ideas. See the [defining measurable product goals](/skills/defining-measurable-product-goals) skill.
Step 3: Score Impact Independently (1-10)
Have each participant score every idea's Impact independently before any group discussion. Use a simple spreadsheet with ideas in rows and participant names in columns. The instruction is: "Assuming this idea works exactly as described, how much would it move our target metric? 10 = achieves or nearly achieves the goal on its own.
5 = makes a meaningful but partial contribution. " Independent scoring matters because group discussion creates anchoring bias. The first person to speak sets the range, and everyone else adjusts from that anchor rather than forming their own judgment. After all scores are submitted, calculate the average and note the spread (difference between highest and lowest score).
Any idea with a spread of 4 or more deserves a brief discussion to surface the different assumptions driving the disagreement.
Tip: A useful calibration question for Impact: "If we launched this and it worked perfectly, would we write a blog post about the result?" If the answer is no, the idea probably scores below a 5.
Step 4: Score Confidence Independently (1-10)
Now score Confidence. " Provide a simple rubric to prevent Confidence from becoming another gut-feel dimension. A score of 8-10 means direct evidence: you ran a test, you have data from an analogous feature, or multiple user research studies converge. A score of 5-7 means indirect evidence: customer interviews suggest demand, competitors have validated the concept, or internal metrics show a related pattern.
A score of 1-4 means intuition only: you believe it would work but have no supporting data beyond personal experience or anecdote. Each participant writes a one-sentence justification for their Confidence score ("Confidence 3: no user research on this, based on my hunch from support tickets"). These justifications are more valuable than the numbers themselves because they surface what the team knows versus what it assumes. Again, score independently first, then discuss items with high spread.
Tip: Teams almost always over-score Confidence. Before the session, remind participants that a Confidence score of 8+ should feel uncomfortable to give. If most of your ideas score 8+ on Confidence, the team is not being honest about uncertainty, and the entire ranking will be distorted.
Step 5: Score Ease Independently (1-10)
Score Ease with this instruction: "How quickly and cheaply could we run a meaningful test of this idea? 10 = a few days with existing tools and a single person. 5 = a few weeks with a small team and some new work. " Emphasize that Ease refers to the smallest experiment that would generate useful signal, not the full production implementation.
An idea might require six months to build at scale but could be validated in two weeks with a Wizard-of-Oz prototype or a concierge MVP. Ease should reflect the cost of that validation step. Engineering participants are especially important here because they can identify hidden technical dependencies ("this requires migrating to a new auth system first") that non-technical team members would miss. As with the other dimensions, score independently, then discuss high-spread items.
Tip: If an idea scores low on Ease, ask: "Is there a cheaper version of this test?" Often a manual process, a fake door test, or a landing page experiment can validate demand at a fraction of the cost. If a cheaper test exists, re-score Ease against that cheaper test, not the full build.
Step 6: Calculate Composite Scores and Initial Ranking
For each idea, calculate the composite ICE score by multiplying the averaged Impact, Confidence, and Ease scores. Sort the list from highest to lowest composite score. At this point you have a raw ranking. Before treating it as final, scan for two patterns.
First, look for ideas where the composite score is high but Confidence is below 4. These ideas are speculative bets that the math makes look attractive only because Impact and Ease are high. Flag them for validation experiments rather than direct investment. Second, look for ideas with moderate composite scores but Confidence above 8.
These are safe, evidence-backed bets that might deserve a higher effective priority than their raw score suggests because they carry much less execution risk. Record the full scoring breakdown (not just the composite) in your prioritized list so the reasoning is transparent to anyone who reviews it later.
Tip: Add a column for "Next action" next to the composite score. For high-ICE ideas, the next action is "design step-project." For high-Impact, low-Confidence ideas, the next action is "run validation experiment." For low-ICE ideas, the next action is "park in idea bank for re-evaluation next quarter."
Step 7: Discuss and Adjust the Top 5
Take the top 5 ranked ideas and discuss them as a group for 3-5 minutes each. The goal is not to re-score but to pressure-test the ranking against factors ICE does not capture. Ask: Does this idea align with our current strategic bets, or does it pull the team in a new direction? Does it create dependencies that would block other high-priority work?
Does it have timing constraints (a seasonal window, a partner launch, a regulatory deadline) that affect when we should execute? Does it require skills or resources we do not currently have? If the discussion reveals that a top-5 idea has a critical constraint that the ICE score missed, move it down or add a qualifying note. If a top-10 idea has a strategic tailwind the score missed, move it up.
Document the rationale for any manual adjustments so the team can distinguish data-driven ranking from judgment-based overrides.
Tip: Time-box the discussion strictly. Without a time limit, teams will debate the top idea for 30 minutes and rush through the rest. Use a visible timer and appoint someone to enforce it.
Step 8: Convert Top Ideas into Step-Projects
For the top 2-4 ideas coming out of the scoring session, immediately draft a step-project outline. A step-project is a small, time-boxed experiment designed to test the idea's core hypothesis. Define the hypothesis, the success metric, the timeline (typically 1-4 weeks), and the resources needed. This conversion step is critical because ICE scoring only has value if it leads to action.
A prioritized list that sits in a spreadsheet without triggering experiments is organizational theater. By ending the scoring session with concrete step-project drafts, you create momentum and accountability. See the designing step-projects skill for detailed guidance on structuring these experiments.
Tip: Assign a single owner to each step-project before leaving the room. Shared ownership means no ownership. The owner does not need to do all the work, but they are responsible for making sure the experiment runs and results are reported.
Step 9: Schedule the Re-Scoring Cadence
ICE scores are not permanent. They represent the team's best judgment at a specific point in time, and that judgment should update as you learn. Before closing the session, schedule the next re-scoring session. For most teams, quarterly re-scoring works well and aligns with the Ideas layer cadence in the GIST Planning Framework.
Between sessions, update Confidence scores whenever new evidence arrives: a completed step-project, new user research, a competitor launch, or a significant shift in the target metric. Add a column to your scoring sheet for "Last scored" dates so you can see at a glance which ideas are based on fresh assessments and which are stale. Ideas that have not been re-scored in over 90 days should be flagged for review at the next session.
Tip: Keep a running log of evidence that arrives between sessions. When a step-project completes, note which ideas' Confidence scores should increase or decrease. This evidence log makes re-scoring faster because participants do not have to reconstruct what they learned since the last session.
Examples
Example: Early-Stage B2B SaaS with 5 Ideas and a 3-Person Team
A 3-person startup building a project management tool for freelancers has accumulated 5 ideas from customer interviews. Their primary goal is to increase free-trial-to-paid conversion from 4% to 8% within 6 months. The team has limited engineering bandwidth (one full-stack developer) and no dedicated data analyst. They have 12 customer interview transcripts and basic analytics from Mixpanel.
The team runs a 45-minute ICE session. They align on "30-day trial-to-paid conversion rate" as the Impact metric. Idea A (add a client invoicing feature) scores Impact 8, Confidence 4, Ease 3 (ICE = 96). The Impact is high because 7 of 12 interviewees mentioned invoicing, but Confidence is only 4 because none of those interviewees were asked whether invoicing would make them pay, and Ease is low because invoicing requires payment integration.
Idea B (improve the onboarding checklist) scores Impact 6, Confidence 7, Ease 9 (ICE = 378). Confidence is higher because the team has Mixpanel data showing 60% of users who complete onboarding convert, versus 8% who do not, and Ease is high because it is a UI change to existing screens. Idea B ranks first despite lower Impact because the evidence is strong and the test is cheap. The team designs a two-week step-project to A/B test a redesigned onboarding flow, with a clear success criterion of 15% improvement in checklist completion.
Example: Mid-Stage B2C Mobile App Scoring 15 Ideas Across Engineering and Growth
A fitness app with 200,000 monthly active users and a 12-person team needs to prioritize 15 ideas spanning new features, performance improvements, and growth experiments. The primary goal is to increase 7-day retention from 35% to 45%. They have a robust analytics stack, a user research team that runs monthly studies, and historical data from 6 previous A/B tests on onboarding and engagement features.
The product manager assembles a scoring group of 5: herself, a senior engineer, a data analyst, a designer, and a growth marketer. She shares the idea list and goal metric 24 hours in advance with links to relevant analytics dashboards and past experiment results. During the 90-minute session, the data analyst brings Confidence evidence that significantly reshapes the ranking. Idea #3 (gamified workout streaks) had been assumed high-Impact, but the analyst shows that a previous experiment with daily reminders, a conceptually similar engagement mechanic, produced only a 2% retention lift.
The team scores its Confidence at 5 instead of the 8 the PM had initially estimated, dropping it from the top 3 to position 7. Idea #9 (personalized rest-day recommendations) surfaces a high-spread situation: the engineer scores Ease at 3 (citing ML model complexity) while the growth marketer scores it at 8 (thinking of a rules-based heuristic). Discussion reveals that a simple rules-based version ("if you worked out 3 days in a row, suggest rest") could be tested in one week without ML. Ease is re-scored at 7.
The final top 3 are all ideas with Confidence scores of 6 or higher and Ease scores of 7 or higher, meaning the team can run three parallel step-projects in the next sprint cycle.
Example: Enterprise Platform Team Re-Scoring After a Failed Experiment
An enterprise analytics platform team scored 8 ideas last quarter. Their top-ranked idea was a self-serve dashboard builder (ICE = 504). They ran a 4-week step-project offering a prototype to 20 beta users. Only 3 users engaged meaningfully, and none reported it as a must-have feature. The team needs to re-score and reprioritize for the next quarter.
The PM updates the scoring sheet before the re-scoring session. The self-serve dashboard builder's Confidence drops from 7 to 2 based on the beta results, collapsing its ICE score from 504 (I:9, C:7, E:8) to 144 (I:9, C:2, E:8). The justification reads: "Beta test showed low engagement. 3 of 20 users engaged.
Zero reported must-have. " Meanwhile, Idea #4 (automated anomaly alerts) had its Confidence increase from 4 to 7. During the beta period, the support team logged 23 tickets from churned customers citing "I did not realize the data had changed until it was too late" as a key frustration. This is indirect but strong evidence.
Its ICE score rises from 192 (I:8, C:4, E:6) to 336 (I:8, C:7, E:6). The re-scored ranking is materially different from last quarter's. The team designs a step-project for anomaly alerts: a 3-week build of email-based alerts for the top 5 anomaly types, with success measured by alert open rates and a post-experiment churn cohort comparison.
Example: Answering a Product Manager Interview Question About ICE Scoring
A product manager candidate is asked in an interview: "You have three feature ideas and limited engineering resources for the next quarter. How would you decide what to build?" The candidate needs to demonstrate structured thinking, honest treatment of uncertainty, and a connection to business outcomes. This is one of the most common product manager interview questions about prioritization.
The candidate structures their answer using ICE. They say: "I would start by confirming the team's primary metric for the quarter so I have a consistent measure of Impact. Then I would score each idea on Impact, Confidence, and Ease on a 1-10 scale. Impact is how much each idea moves that metric if it works.
Confidence is how much evidence I have, not how excited I am. " They give a concrete example: "Imagine the goal is reducing support ticket volume by 30%. Idea A is an AI chatbot (Impact 8, Confidence 3, Ease 2, ICE = 48). Idea B is better error messages on the top 10 error pages (Impact 5, Confidence 8, Ease 9, ICE = 360).
Idea C is a knowledge base redesign (Impact 6, Confidence 5, Ease 5, ICE = 150). I would start with Idea B because it has the strongest evidence and the fastest feedback loop. " The candidate then notes: "ICE gives me a starting ranking, but I would bring it to the team for discussion. " This answer demonstrates the scoring mechanics, honest treatment of Confidence, and awareness that the score is a tool, not a decision.
Best Practices
Score each dimension independently in writing before any group discussion. Anchoring bias is the single largest threat to useful ICE scores. When the first person says "I think Impact is an 8," everyone else unconsciously adjusts toward that number. Silent, independent scoring surfaces genuine disagreement and produces more accurate averages.
If you skip this step, your ICE ranking will reflect the opinion of whoever speaks first, not the collective judgment of the team.
Always score Confidence against an explicit evidence rubric, not gut feel. Define what each range means before scoring begins: 8-10 requires direct test data or strong analogous evidence, 5-7 requires indirect evidence like customer interviews or competitor validation, and 1-4 means intuition only. Without this rubric, Confidence becomes a proxy for enthusiasm, and enthusiastic teams will rate everything 7+ regardless of actual evidence.
Use Ease to score the smallest meaningful experiment, not the full production implementation. Teams frequently score Ease against the cost of building the complete feature at scale, which makes most ideas look expensive and compresses the Ease range into 2-5. This defeats the purpose of the dimension. A concierge test, a fake-door experiment, or a manual workaround almost always exists and is the right unit of analysis for Ease.
Re-score after every significant learning event, not just on a fixed schedule. When a step-project completes and delivers results, the Confidence scores for related ideas should change immediately. An idea that scored Confidence 3 before a test might jump to 8 after a successful prototype, or drop to 1 after a failed experiment. Waiting for the next quarterly session to update scores means your ranking is stale when you need it most.
Document the one-sentence justification for each score, not just the number. The numbers enable sorting. The justifications enable learning. When you re-score in three months, the justifications tell you what evidence existed at the time and what has changed.
They also make ICE scoring defensible in stakeholder conversations and product manager interview questions, where interviewers often probe the reasoning behind a score rather than the score itself.
Treat the ranked list as a conversation starter, not a decision. ICE scoring eliminates the bottom of the list (ideas not worth discussing) and surfaces the top (ideas worth serious consideration). It does not account for strategic alignment, team morale, technical debt implications, or timing constraints. The top-5 discussion in Step 7 exists precisely to layer in these qualitative factors.
Teams that treat ICE scores as final rankings skip the hardest and most valuable part of prioritization.
Keep the scoring session to 90 minutes or less. Prioritization fatigue is real. After 90 minutes, participants start defaulting to 5s and 7s on everything, which compresses the scoring range and makes the ranking useless. If you have more than 20 ideas to score, split them across two sessions or pre-filter the obvious low-priority items before the meeting.
Common Mistakes
Scoring Impact based on effort invested rather than outcome produced
Correction
Impact measures the expected change in your target metric if the idea succeeds, not how hard the team will work on it. Teams frequently conflate large projects with high impact because it feels like more work should produce more value. A two-month project that moves retention by 0.5% has lower Impact than a two-day experiment that moves retention by 3%. To catch this mistake, ask: "If a magic wand implemented this overnight with zero effort, how much would the metric move?" If the answer is "not much," the idea has low Impact regardless of its implementation cost.
Inflating Confidence because the idea "makes sense" or the team is excited about it
Correction
Confidence should reflect the quality and quantity of evidence, not the team's conviction. An idea can be logically sound and still have Confidence of 2 because nobody has tested the assumptions with real users. " When you see this pattern, ask the scorer to name a specific piece of evidence (a data point, a user quote, a test result) that supports their score. If they cannot, the score should drop to 3-4.
This mistake matters because inflated Confidence makes speculative ideas look like safe bets, leading the team to skip validation experiments they actually need.
Scoring Ease against the full production build instead of the minimum viable experiment
Correction
Ease is about how quickly you can learn, not how quickly you can ship the final product. Teams with engineering backgrounds tend to mentally spec the complete feature, estimate the build time, and translate that into an Ease score. This makes nearly everything score between 2 and 5, eliminating Ease as a differentiating factor. To catch this, look for an Ease column where the range is compressed (all scores between 3 and 6).
" Re-score Ease against that experiment. A landing page test, a manual concierge process, or a simple A/B test on copy can validate demand at a fraction of the cost of building the feature.
Averaging scores across the group without discussing high-spread items
Correction
When one person scores Impact as 9 and another scores it as 3, the average of 6 is meaningless. The spread signals that the two people are operating on different assumptions. Maybe one person knows about a customer segment the other has not considered, or one person has data the other lacks. High-spread items (a gap of 4+ between the highest and lowest score) are the most valuable moments in an ICE session because they surface hidden information.
Skip the discussion and you get a false consensus that masks real disagreement. Budget 2-3 minutes per high-spread item to identify the root of the disagreement, share the missing information, and either converge or agree on a range.
Running the ICE session once and treating the output as a permanent roadmap
Correction
ICE scores have a shelf life. They reflect what the team knows at a specific moment, and every completed experiment, new user research insight, or market shift changes the underlying assumptions. Teams that score ideas once and then execute the list in order for six months are not using ICE scoring. They are using a waterfall roadmap with extra steps.
The fix is to build re-scoring into your regular cadence. In the GIST Planning Framework, the Ideas layer operates on a quarterly cadence. Re-score at least that often, and update individual Confidence scores whenever a step-project delivers results. If your top-ranked idea from three months ago has not been validated by any experiment, its Confidence score should decrease, not stay the same.
Using ICE scoring to compare ideas across completely different goals
Correction
Impact is relative to a specific metric. An idea that scores Impact 9 against a retention goal and another idea that scores Impact 9 against a revenue goal are not comparable because they are measuring different things. When teams mix goals in a single ICE session, the resulting ranking conflates different strategic priorities into a single list, which leads to incoherent prioritization. Run separate scoring sessions for each goal, or at minimum, group ideas by goal and rank within each group.
Then use a strategic discussion (not the ICE math) to decide how to allocate resources across goals.
Other Skills in This Method
Designing Step-Projects to Validate Product Ideas
How to break ideas into small, time-boxed experiments (step-projects) of no more than 10 weeks that test assumptions and build evidence iteratively.
Defining Measurable Product Goals in GIST
How to set strategic, outcome-based goals using metrics and timeframes that align the entire GIST hierarchy and replace vague roadmap themes.
Breaking Step-Projects into Actionable Daily Tasks
How to decompose validated step-projects into granular, developer-ready tasks using agile tools like Kanban boards or sprint backlogs.
Presenting GIST Plans in Stakeholder and Interview Settings
How to communicate the GIST planning hierarchy to executives, cross-functional teams, and in product manager interviews to demonstrate strategic thinking.
Replacing Traditional Product Roadmaps with GIST Planning
How to transition a product team from feature-based roadmaps to the GIST framework while maintaining stakeholder alignment and executive buy-in.
Managing Different Planning Cadences Across GIST Layers
How to operate goals on quarterly/annual cycles, ideas continuously, step-projects in short sprints, and tasks daily to maintain agile responsiveness.
Building and Managing an Idea Bank for Product Development
How to continuously collect, document, and organize hypothetical solution ideas that map to strategic goals using an always-open idea bank.
Frequently Asked Questions
How do I handle ICE scoring when different team members have wildly different scores for the same idea?
High spread (a gap of 4+ between the highest and lowest score on any dimension) is a feature, not a bug. It means team members are operating on different assumptions or have access to different information. Pause and discuss the specific item for 2-3 minutes. Ask each outlier scorer to share the evidence or reasoning behind their number. Often, one person has data the others lack, such as a support ticket pattern or a customer interview quote. Share that information, then let each person revise their score independently. If the spread remains after discussion, use the average but note the disagreement in your scoring log so you know this idea needs more evidence before committing resources.
Should I use ICE scoring before or after generating step-projects for my ideas?
Score ideas with ICE first, then design step-projects only for the top-ranked ideas. The purpose of ICE scoring is to narrow a broad idea list down to the 2-4 ideas worth investing in. Designing step-projects for every idea in the bank wastes effort on ideas that will never be tested. After ICE scoring, use the [designing step-projects](/skills/designing-step-projects-as-experiments) skill to create focused experiments for your top picks. However, if scoring reveals that a top idea has very low Confidence, the appropriate step-project is a validation experiment, not a feature build.
How long should an ICE scoring session take for a team of 5 people with 15 ideas?
Plan for 75-90 minutes. Allow 5 minutes for aligning on the Impact metric and scoring rubric. Budget 30-40 minutes for independent scoring across all three dimensions (participants score silently in a shared spreadsheet). Reserve 20-25 minutes for discussing high-spread items, which typically account for about a third of the ideas. Use the final 10-15 minutes to review the ranked list, discuss the top 5, and assign next actions. If you go over 90 minutes, fatigue will compress scores toward the middle and reduce the ranking's usefulness.
Why does my ICE ranking keep changing every quarter even though the ideas are the same?
This is expected and healthy. ICE scores reflect the team's current evidence and assumptions, both of which change as you run experiments, talk to customers, and observe the market. An idea that scored Confidence 3 last quarter might score 7 this quarter because a step-project validated a key assumption. Conversely, an idea that scored Impact 9 might drop after the team hits its target metric and shifts to a new goal. Stable rankings would actually be a warning sign, because they would mean the team is not learning. If you want to track how scores evolve, keep a version history of your scoring sheet with dates so you can see the trajectory of each idea over time.
How is ICE scoring different from RICE scoring, and when should I use each?
RICE adds a Reach dimension (how many users or accounts the idea affects) and divides by Effort instead of multiplying by Ease, producing a score with a different scale and interpretation. RICE works well when you have reliable quantitative data on reach, such as the number of users in a specific segment or the percentage of customers affected by a problem. ICE is faster and works better in high-uncertainty environments where reach estimates would be guesses anyway. For most teams using the GIST Planning Framework, ICE is the better default because the Ideas layer emphasizes speed and iteration over precision. If you are a larger team with a mature analytics stack and well-defined user segments, consider switching to RICE for ideas past the initial validation stage.
Can I use ICE scoring to prioritize tasks or bugs, or is it only for product ideas?
ICE scoring is designed for ideas, which are hypothetical solutions with uncertain outcomes. Tasks and bugs are typically not hypothetical. You know the bug exists and you know fixing it will improve the user experience. For tasks, a simple effort-versus-urgency matrix or a MoSCoW prioritization is usually more appropriate. For bugs, severity and frequency are better dimensions than Impact, Confidence, and Ease. Where ICE could apply to bugs is when you are deciding between investing in a systemic fix (rebuild the error handling system) versus a tactical fix (patch this one bug). The systemic fix is an idea with uncertain impact, and ICE can help evaluate whether the larger investment is justified.
How do I present ICE scoring results to stakeholders who were not in the session?
Lead with the ranked list and the goal it was scored against. Show the full breakdown (I, C, E, and composite) rather than just the final number, because stakeholders will want to understand why certain ideas ranked higher. " Call out any manual adjustments you made after the scoring discussion and explain the strategic reasoning. If a stakeholder's pet idea ranked low, point to the specific dimension that dragged it down and suggest what evidence would need to change for the score to improve. For tips on presenting this in stakeholder reviews or interviews, see the [presenting GIST plans](/skills/presenting-gist-plans-to-stakeholders) skill.