Scrum Estimation: Estimating Work with Story Points and Planning Poker

This skill teaches you how to use relative estimation techniques—story points and planning poker—to size work items, forecast sprint capacity, and improve your Scrum team's predictability over time.

Scrum estimation with story points uses relative sizing to compare user stories against a reference baseline rather than estimating in hours. Teams play planning poker by privately selecting a Fibonacci-scale card (1, 2, 3, 5, 8, 13, 21) representing perceived effort, complexity, and uncertainty. Differences are discussed until consensus emerges. Over multiple sprints, velocity data transforms these estimates into reliable capacity forecasts.

Outcome: Your team will consistently size work using a shared estimation language, enabling accurate sprint planning and reliable delivery forecasts.

Synthesized from public framework references and reviewed for accuracy.

ProductIntermediate45-90 minutes

Prerequisites

  • Basic understanding of Scrum framework and sprints
  • Familiarity with user stories and acceptance criteria
  • A refined product backlog with well-defined backlog items

Overview

Scrum estimation is one of the most debated yet essential practices in agile software development. Rather than guessing how many hours a task will take—a notoriously inaccurate approach—story points let teams express the relative effort, complexity, and uncertainty of work items on a standardized scale. When combined with planning poker, a structured consensus-building game, teams surface hidden assumptions, share knowledge, and arrive at estimates the entire group owns.

The real power of scrum estimation isn't in any single estimate's accuracy. It's in the pattern that emerges over sprints. As your team tracks velocity—the total story points completed per sprint—you gain an empirical baseline that turns abstract estimates into concrete delivery forecasts. Product owners can answer stakeholder questions like "When will this feature ship?" with data rather than hope.

This skill is foundational to the broader Scrum framework. It directly feeds into planning and executing sprints and depends on having a well-maintained backlog through grooming and refining the product backlog. Master it, and every other Scrum ceremony becomes more effective.

How It Works

Story points are a unit of relative measure. Instead of asking "How many hours will this take?", you ask "How big is this compared to something we've already done?" A team establishes a reference story—a small, well-understood piece of work rated at, say, 3 points—and sizes everything else relative to it.

The Fibonacci sequence (1, 2, 3, 5, 8, 13, 21) is the most popular scale because the increasing gaps between numbers force teams to acknowledge growing uncertainty. You can confidently distinguish a 2-point story from a 3-point story, but at higher magnitudes, the difference between 19 and 21 units of effort is meaningless—so the scale doesn't offer that false precision.

Planning poker is the mechanism that prevents anchoring bias and groupthink during estimation. Each team member holds a deck of Fibonacci-numbered cards. After a story is presented and discussed, everyone simultaneously reveals their card. If estimates diverge significantly (e.g., one person plays a 3 while another plays a 13), the outliers explain their reasoning. This surfaces risks, misunderstandings, and hidden complexity that the team would otherwise miss. After discussion, the team re-votes until estimates converge.

Over time, the team's velocity—the rolling average of story points completed per sprint—becomes a powerful planning tool. If your average velocity is 34 points and the remaining backlog totals 170 points, you can forecast roughly five sprints to completion, accounting for variance.

Step-by-Step

  1. Step 1: Select and calibrate a reference story

    Before your first planning poker session, the team needs a shared baseline. Pull a recently completed user story that the whole team worked on and understands well. It should be small but not trivial—something that involved a bit of development, testing, and review. Assign it a baseline value, typically 3 or 5 points.

    Walk the team through why this story earns its point value: the development effort involved, the complexity of the logic, any uncertainty about requirements, and the testing burden. This calibration conversation is crucial because it aligns everyone's mental model of what a "3" or "5" actually means on your team.

    Tip: Choose a reference story from the same domain your team typically works in. A backend API story makes a poor baseline if 80% of your backlog is frontend work.

  2. Step 2: Prepare the backlog items for estimation

    Ensure each story coming into the estimation session has a clear title, description, and acceptance criteria. Stories that are vague or too large (epics masquerading as stories) will produce wildly divergent estimates and waste time.

    Work with the product owner to pre-filter the backlog. Only bring stories that are likely to be pulled into the next 1-2 sprints. This keeps sessions focused. If a story triggers too many questions during estimation, flag it for further backlog refinement rather than burning the room's energy trying to estimate something undefined.

    Tip: A good rule of thumb: if the product owner can't answer two clarifying questions about a story on the spot, it's not ready for estimation.

  3. Step 3: Run the planning poker session

    Gather the development team (everyone who will do the work). The product owner reads the story aloud and answers clarifying questions. Set a time box of 2-3 minutes for discussion before the first vote.

    Once discussion ends, everyone simultaneously reveals their card. If all estimates are within one step on the Fibonacci scale (e.g., all 5s and 8s), take the higher number or the majority and move on. Speed matters—you'll refine your instincts over time.

    If estimates diverge by more than one step, ask the highest and lowest estimators to explain their reasoning. Often the high estimator has spotted a risk others missed, or the low estimator has a simpler implementation approach the team hadn't considered. After this discussion, vote again. Most stories converge within two rounds.

    Tip: Use a physical or digital timer. Without one, discussions on a single story can easily consume 15 minutes. Aim for 2-5 minutes per story including voting.

  4. Step 4: Handle outliers and edge cases

    Sometimes a story consistently gets extreme estimates—some team members play a 2 while others play a 21. This is a signal, not a problem. It typically means the story is poorly defined, team members have vastly different assumptions about scope, or there's hidden technical debt.

    When this happens, don't force consensus. Instead, document the disagreement, move the story back to refinement, and ask the product owner to clarify scope or break it into smaller stories. Artificially averaging divergent estimates destroys the very signal that makes planning poker valuable.

    Tip: If someone plays the '?' card (or the infinity card in some decks), treat it as a hard stop—that person doesn't have enough information to estimate, and the story needs more definition.

  5. Step 5: Record estimates and calculate initial velocity

    Log every estimate in your project management tool—whether that's Jira, Linear, or a simple spreadsheet. At the end of each sprint, record how many story points the team actually completed (not started, completed).

    For new teams, it takes 3-4 sprints to establish a reliable velocity baseline. During this calibration period, resist the urge to over-commit. Use the lowest sprint's completed points as your capacity estimate for the next sprint until you have enough data for a rolling average.

    Tip: Track velocity as a 3-sprint rolling average rather than a single-sprint snapshot. This smooths out anomalies from holidays, sick days, or unusually complex work.

  6. Step 6: Use velocity for sprint planning and forecasting

    Once you have a stable velocity, sprint planning becomes much more predictable. During sprint planning, pull stories from the top of the backlog until you hit your velocity ceiling. If your rolling average is 34 points, plan for 30-34 points of work.

    For longer-range forecasting, divide the total remaining backlog points by your average velocity to estimate the number of sprints needed. Present this as a range (optimistic velocity vs. pessimistic velocity) rather than a single number to communicate uncertainty honestly to stakeholders.

    Tip: Never inflate velocity by counting incomplete stories. This creates a false signal that compounds over time and erodes trust in your forecasts.

  7. Step 7: Recalibrate periodically

    Every 6-8 sprints, or when team composition changes significantly, revisit your reference story and recalibrate. Teams naturally improve over time—what was once a 5-point story may now feel like a 3 because the team has built expertise in that area.

    Recalibration doesn't mean retroactively changing old estimates. It means acknowledging that your scale may have drifted and consciously resetting. Some teams do this as part of a retrospective, reviewing a handful of recently completed stories and asking: "Does our estimate for this still feel right compared to our baseline?"

Examples

Example: E-commerce team's first planning poker session

A newly formed Scrum team at an e-commerce company is preparing for their third sprint. They've never formally estimated before and have been pulling stories into sprints based on gut feel, often over-committing. The product owner has 12 refined stories ready for estimation.

The team picks a previously completed story—'Add product to wishlist'—as their reference, rating it a 3. It involved a new API endpoint, a simple database write, frontend button integration, and standard test coverage.

The first story up for estimation is 'Implement guest checkout flow.' After the product owner explains the acceptance criteria, the team votes: 5, 8, 8, 13, 8. The developer who voted 13 explains they're concerned about payment gateway edge cases with guest users. The one who voted 5 hadn't considered the email verification step. After a 2-minute discussion, the re-vote comes in at 8, 8, 8, 8, 13—they record it as an 8.

They power through 12 stories in 45 minutes, totaling 64 points. Since they don't have velocity data yet, they commit to 40 points for the sprint (a conservative approach). They complete 38 points. Over the next two sprints they complete 42 and 44. Their rolling average velocity settles around 41 points, and sprint planning becomes dramatically more predictable.

Example: Using velocity to forecast a product launch

A mobile app team has been running Scrum for 6 months with a stable velocity of 26 points per 2-week sprint (range: 22-30). The VP of Product asks: 'When can we launch the v2.0 feature set?' The remaining backlog for v2.0 totals 145 story points.

The Scrum Master calculates three scenarios: optimistic (30 points/sprint = 5 sprints = 10 weeks), average (26 points/sprint = ~6 sprints = 12 weeks), and pessimistic (22 points/sprint = ~7 sprints = 14 weeks). They present this as a range: 'We expect to complete v2.0 in 10-14 weeks, with 12 weeks being our most likely scenario.'

The VP appreciates the transparency and uses the pessimistic scenario for the external launch date while planning internal readiness around the average. This data-driven approach—rooted entirely in the team's scrum estimation practice—replaces the old method of asking each developer for hour estimates and adding a 20% buffer.

Best Practices

  • Estimate as a whole team—never let one person (especially a lead or manager) dictate story point values. The collective intelligence of the group surfaces risks that individuals miss.

  • Keep estimation sessions under 60 minutes. After that, decision fatigue sets in and estimate quality degrades. If you have more stories, schedule a second session.

  • Compare stories to each other, not to clock time. Ask "Is this bigger or smaller than our reference story?" rather than "How many days will this take?"

  • Use the '13' and '21' point values as warning flags. Stories this large should almost always be broken down into smaller pieces before entering a sprint.

  • Revisit and discuss estimation accuracy during retrospectives. Track which stories were significantly over- or under-estimated and discuss what signals the team missed.

  • Let the people doing the work do the estimating. Product owners and Scrum Masters facilitate but should not hold estimation cards.

Common Mistakes

Converting story points to hours or using them as a productivity metric

Correction

Story points measure relative complexity, not time. Using them to compare developer productivity or convert to billable hours destroys psychological safety and incentivizes gaming the system. If management needs time-based estimates, derive them from velocity data, not individual point assignments.

Allowing the first person to speak to anchor the entire team's estimate

Correction

This is the exact problem planning poker solves—but only if you enforce simultaneous reveal. If team members show cards one at a time or verbally announce estimates, anchoring bias takes over. Use a digital tool or strict simultaneous card flip every time.

Estimating stories that are vague or lack acceptance criteria

Correction

Garbage in, garbage out. If a story doesn't have clear acceptance criteria, send it back to backlog refinement instead of guessing. The team's time is better spent estimating well-defined work than debating ambiguous requirements.

Treating velocity as a target to increase sprint over sprint

Correction

Velocity is a diagnostic metric, not a performance target. Pressuring teams to increase velocity leads to point inflation—stories get rated higher to make the number look good, but actual output doesn't change. Focus on consistency, not growth.

Spending 10+ minutes debating whether a story is a 5 or an 8

Correction

At adjacent Fibonacci values, the difference is noise. If two rounds of voting don't resolve it, take the higher number and move on. The precision you're chasing doesn't exist—your time is better spent estimating the next story.

Frequently Asked Questions

What is the best story point scale for scrum estimation?

The modified Fibonacci sequence (1, 2, 3, 5, 8, 13, 21) is the most widely used and recommended scale. The increasing gaps between numbers reflect the growing uncertainty in larger work items. Some teams use T-shirt sizes (S, M, L, XL) for early-stage estimation, then convert to Fibonacci for sprint planning.

How many sprints does it take to establish a reliable velocity?

Most teams need 3-5 sprints to establish a stable velocity baseline. During this period, use the lowest completed sprint total as your capacity estimate for the next sprint to avoid over-commitment. After 5 sprints, a 3-sprint rolling average provides reliable forecasting data.

Should the Scrum Master or Product Owner participate in planning poker?

No. Only team members who will actually do the work should estimate. The Scrum Master facilitates the session and the Product Owner answers clarifying questions about requirements, but neither should hold estimation cards. Their involvement can create anchoring bias or implicit pressure.

What do I do when one developer always estimates much higher than the rest of the team?

This is valuable signal, not a problem. That developer may have deeper knowledge of technical debt, edge cases, or testing complexity. Use planning poker's discussion round to surface their reasoning. If they're consistently right (stories take longer than the majority estimated), the team's calibration needs adjustment.

Can story points be used across different Scrum teams?

Story points should not be compared across teams because each team calibrates to their own reference baseline and velocity. A '5' on one team is not equivalent to a '5' on another. For cross-team planning, use each team's individual velocity to forecast timelines rather than comparing raw point totals.

How do I estimate bugs and technical debt with story points?

Estimate bugs and tech debt the same way as feature work—using relative complexity compared to your reference story. Some teams reserve a fixed percentage of sprint capacity (e.g., 20%) for unplanned bugs and don't estimate those individually. Planned tech debt stories should go through normal planning poker like any other backlog item.