Orchestrating gstack's 8 Power Tools in Complex Workflows
This skill teaches you how to sequence and combine gstack's 8 power tools to manage end-to-end development workflows like feature buildout, codebase migration, and large refactors without losing context between phases.
Identify your workflow type (feature buildout, migration, refactor), then sequence gstack's power tools in order: use /decide for scoping, /design for architecture, /code for implementation, /review for quality, and /ship for deployment. Each power tool chains multiple specialist skills automatically, so your job is selecting the right tool at each phase.
Outcome: You can take any complex development task, decompose it into the correct sequence of power tools, and execute the full workflow from decision through deployment with clear handoffs, preserved context, and structured quality gates at every phase.
Prerequisites
- gstack skill pack installed and configured (see installing-and-configuring-gstack-skill-pack)
- Familiarity with gstack's 23 specialist skills and slash commands
- Understanding of multi-agent perspectives (CEO, Engineer, QA roles)
- Working knowledge of Claude Code or a compatible AI coding agent
- Basic comfort with iterative development workflows (plan, build, test, ship)
Overview
gstack's 8 power tools are higher-order commands that combine multiple specialist skills into coordinated sequences. Where individual specialist skills handle focused tasks like writing a test or drafting a commit message, power tools orchestrate entire phases of work. Think of specialist skills as individual instruments and power tools as sheet music that tells you which instruments to play, in what order, and how they harmonize. This skill, documented in the gstack framework docs, teaches you how to read that sheet music and conduct the full orchestra across a complex workflow.
The core challenge power tools solve is context loss between phases. When you manually switch from planning to coding to reviewing, critical decisions get dropped, architectural constraints get forgotten, and quality standards drift. Power tools encode the handoff protocol between phases so that outputs from /decide feed directly into /design, which feeds into /code, and so on. Each power tool knows what artifacts it needs from the previous phase and what artifacts it must produce for the next one. Your role shifts from doing the work to directing the sequence and making judgment calls at each gate.
Mastering this orchestration skill changes how you approach any non-trivial development task. Instead of diving into code and hoping the architecture holds, you work through a structured sequence that surfaces risks early, captures decisions in durable artifacts, and applies multi-agent perspectives (CEO for prioritization, Engineer for feasibility, QA for edge cases) at the moments where each perspective adds the most value. The concrete artifact you produce is a completed workflow trace: a sequence of power tool invocations with their inputs, outputs, and gate decisions that documents the full journey from problem statement to shipped feature. This trace becomes both your project record and your template for similar future work.
Power tools are not rigid pipelines. They are composable building blocks. A feature buildout might use all eight in sequence. A hotfix might use only /code and /ship. A codebase migration might loop through /design and /code repeatedly for each module. The skill here is recognizing which tools to invoke, in what order, and when to skip, repeat, or run tools in parallel.
How It Works
Power tools work by encapsulating common multi-step development patterns into single invocable commands. Each power tool internally triggers a defined sequence of specialist skills, passes context between them, and surfaces decision points where you need to provide human judgment. The mental model is a pipeline with gates: each power tool runs its internal sequence, produces an output artifact, and pauses at a gate where you review the output before the next power tool picks it up.
The reason this pipeline structure works better than ad-hoc sequencing is that it enforces completeness. When you manually decide what to do next, you tend to skip uncomfortable steps. You skip the /decide phase because you already know what you want to build. You skip /review because the code looks fine. Power tools make skipping conscious and visible rather than accidental. If you choose to skip /decide, you are explicitly acknowledging that you are proceeding without structured decision-making, and the downstream tools will note the absence of that input artifact.
The 8 power tools map to a generalized development lifecycle, though the gstack Framework deliberately avoids locking you into one methodology. The tools roughly cover: scoping and decision-making, architectural design, implementation, review and quality assurance, testing, documentation, deployment, and retrospective. Not every workflow uses all eight. The power of the system is in composition: you pick the tools that match your workflow's needs and skip the rest.
Each power tool also activates different multi-agent perspectives at appropriate moments. During /decide, the CEO perspective dominates because the core question is whether to build this at all and what the business impact is. During /code, the Engineer perspective takes over because the question is how to implement the decision efficiently. During /review, QA perspective leads because the question is what can go wrong. These perspective shifts are not cosmetic. They change the prompt structure, the evaluation criteria, and the types of questions the AI agent asks you. Understanding which perspective is active and why helps you provide better inputs and make sharper judgment calls at each gate.
The assumptions that can break this model: power tools assume that your project can be decomposed into sequential phases with clear handoffs. Highly exploratory work where you do not yet know what you are building may not fit neatly into the /decide then /design then /code pipeline. In those cases, you might loop through /decide and /design multiple times before ever invoking /code, or you might use specialist skills directly for rapid prototyping and only bring in power tools once the shape of the solution becomes clear. The skill is knowing when the structured pipeline serves you and when it constrains you.
Step-by-Step
Step 1: Classify your workflow type
Before invoking any power tool, identify which category your work falls into. The three main workflow types are feature buildout (new capability from scratch), codebase migration (moving existing code to a new pattern, framework, or architecture), and refactor/improvement (restructuring existing code without changing external behavior). Each type has a different default power tool sequence. Feature buildout typically uses the full sequence from /decide through /ship.
Migration emphasizes /design and /code in a repeating loop with /review gates. Refactors often skip /decide (the decision is already made) and focus on /design, /code, and /review. Write down which type you are working on and list the power tools you expect to use. This prevents mid-workflow confusion about what comes next.
Tip: If your work does not fit cleanly into one category, that is a signal you may need to split it into sub-workflows. A feature buildout that also requires a migration should be two separate orchestrated sequences, not one tangled one.
Step 2: Define the entry artifact for /decide
Every power tool needs an input artifact. For /decide, the entry artifact is a problem statement: what needs to change, why it matters, and what constraints exist. Write this as a plain-language brief of 3-5 sentences. Include the business context (who wants this and why), the technical context (what exists today), and the success criteria (how you will know the work is done).
This brief becomes the seed document that flows through every subsequent power tool. If you skip /decide (for example, in a hotfix), write the brief anyway and pass it directly to whatever tool you start with. The brief prevents scope drift because every downstream tool can reference it to check whether the work still aligns with the original intent.
Tip: Write the success criteria as observable outcomes, not vague goals. 'Users can export reports as CSV from the dashboard' is actionable. 'Improve the export experience' is not.
Step 3: Run /decide and capture the decision record
Invoke /decide with your problem statement. This power tool activates the CEO perspective to evaluate whether the work should happen, what the priority is relative to other work, and what the rough scope should be. It internally chains specialist skills for impact assessment, effort estimation, and risk identification. Your job during this phase is to answer the questions the agent surfaces honestly, especially about effort and risk.
Do not downplay complexity to get a green light. The output artifact is a decision record: a structured document stating the decision (proceed, defer, or reject), the rationale, the agreed scope boundaries, and any constraints or dependencies identified. Save this artifact. Every subsequent power tool will reference it.
Tip: If /decide recommends deferring the work, take that seriously. The CEO perspective exists specifically to catch work that feels urgent but is not actually high-impact. Override it only if you have information the agent does not.
Step 4: Run /design and produce the architecture artifact
Pass the decision record to /design. This tool activates the Engineer perspective to translate the approved scope into a technical plan. It chains specialist skills for architecture patterns, interface design, data modeling, and dependency mapping. The output is an architecture artifact: a document describing the components involved, the interfaces between them, the data flow, and the implementation sequence (which piece to build first).
During this phase, push back on designs that feel over-engineered for the scope defined in /decide. A common failure mode is gold-plating at the design phase, adding complexity that was not in the approved scope. Cross-reference the design output against the decision record's scope boundaries before proceeding.
Tip: For migrations, you will likely run /design once for the overall migration strategy and then re-run it for each individual module or subsystem being migrated. Keep the overall strategy artifact as the reference and treat per-module designs as children of it.
Step 5: Run /code in scoped increments
Pass the architecture artifact to /code. This is where implementation happens. The critical discipline here is scoping each /code invocation to a single component or concern from the architecture artifact, not trying to build everything in one pass. If your architecture has four components, invoke /code four times, once per component, passing the relevant slice of the architecture artifact each time.
After each /code invocation, you get an implementation artifact: the actual code changes plus a summary of what was built and what assumptions were made. Review each implementation artifact before invoking /code for the next component. This incremental approach prevents the compound error problem where a bad assumption in component one propagates silently through components two, three, and four.
Tip: If a /code invocation surfaces a question that the architecture artifact does not answer, stop and re-invoke /design for that specific question before continuing. Do not let /code make architectural decisions implicitly.
Step 6: Run /review with QA perspective at each gate
After each /code invocation (or after a logical batch of related invocations), run /review. This tool activates the QA perspective and chains specialist skills for code quality, security, edge case analysis, and consistency checking. Pass it both the implementation artifact from /code and the architecture artifact from /design so it can verify that the implementation matches the design intent. The output is a review artifact: a list of findings categorized as blockers (must fix before proceeding), warnings (should fix soon), and notes (consider for future improvement).
Address all blockers before the next /code invocation. Track warnings in a running list and schedule time to address them before /ship. Do not let warnings accumulate silently across multiple cycles.
Tip: Run /review even on code that you think is straightforward. The QA perspective catches edge cases and consistency issues that the Engineer perspective systematically underweights because it is optimizing for getting things working, not for what happens when things break.
Step 7: Consolidate with /docs and /test before shipping
Once all components are implemented and reviewed, invoke /docs to generate or update documentation and /test to generate or validate test coverage. Pass the full set of implementation artifacts and the architecture artifact. /docs produces documentation artifacts (READMEs, API docs, inline comments) that match the actual implementation, not the original design. /test produces test artifacts (test files, coverage reports, identified gaps).
Review both artifacts for accuracy. Documentation that describes the design intent rather than the actual implementation is worse than no documentation, because it actively misleads future developers. Similarly, tests that pass but do not exercise the real edge cases identified during /review provide false confidence.
Tip: If /test identifies coverage gaps that require additional implementation, loop back to /code for those specific gaps rather than trying to patch tests to cover the gaps superficially.
Step 8: Run /ship and close with a retrospective
Invoke /ship with the full bundle: decision record, architecture artifact, implementation artifacts, review findings, documentation, and test results. /ship chains deployment specialist skills to produce a deployment artifact: the actual deployment steps, rollback procedures, and verification checks. After deployment, close the workflow by reviewing the full trace of power tool invocations. Note which tools you skipped and why, which tools you had to re-invoke, and where the most time was spent.
This retrospective is not a formal ceremony. It is 5-10 minutes of writing what worked and what you would change next time. Save it alongside the workflow trace so that future you (or future team members) can learn from this execution.
Tip: Keep the workflow trace (the sequence of all artifacts produced by each power tool) as a single linked document or folder. It becomes your template for similar workflows and your audit trail if something goes wrong post-deployment.
Examples
Example: Feature buildout for a new user dashboard (small team, B2B SaaS)
A 3-person startup needs to add a usage analytics dashboard to their B2B SaaS product. The team has one full-time developer using Claude Code with gstack. Timeline is 2 weeks. The dashboard must show usage metrics, user activity, and export capabilities.
The developer starts by classifying this as a feature buildout, which uses the full power tool sequence. They write a problem statement: 'Add a usage analytics dashboard so customers can track their team's product usage. Must show daily active users, feature usage frequency, and CSV export. ' They invoke /decide, which activates CEO perspective and identifies that CSV export is lower priority than the visualization components.
The decision record scopes the first release to charts only, with export as a fast-follow. /design produces an architecture with three components: a data aggregation service, a REST API endpoint, and a React dashboard component. The developer runs /code three times, once per component, starting with the data service. After each /code pass, they run /review, which catches a missing index on the analytics table (blocker) and a timezone handling inconsistency (warning).
They fix the blocker immediately and log the warning. After all three components are built and reviewed, /docs generates API documentation and /test produces integration tests covering the main query paths. /ship produces deployment steps including a database migration. The full workflow trace shows 12 power tool invocations over 8 working days, with one loop-back from /review to /code for the missing index fix.
Example: Codebase migration from REST to GraphQL (mid-size team, B2C app)
A team of 8 developers is migrating a B2C mobile app's backend from REST to GraphQL. The API has 40 endpoints serving 3 mobile clients. Migration needs to happen incrementally over 6 weeks without breaking existing clients. Each developer uses gstack individually for their assigned modules.
The tech lead runs /decide once at the project level, producing a decision record that scopes the migration to the 15 highest-traffic endpoints first, with remaining endpoints deferred. They run /design once for the overall migration strategy, producing an architecture artifact that defines the GraphQL schema, the adapter pattern for maintaining REST compatibility during transition, and the module-by-module migration order. Each developer then takes their assigned module (2-3 endpoints each) and runs /design for their specific module to map REST endpoints to GraphQL resolvers. They run /code per resolver, not per module, keeping each invocation small and focused.
/review runs after each resolver, checking both the GraphQL implementation and the REST adapter for backward compatibility. The QA perspective catches that two developers have defined conflicting type names in their resolvers, which would not surface until schema stitching. This triggers a loop-back to /design to establish a naming convention. After each module passes /review, /test generates integration tests that verify both the GraphQL and REST paths return equivalent responses.
The workflow trace for each developer shows 6-8 /code and /review cycles per module. The tech lead's trace shows the overarching /decide, /design, and final /ship invocations.
Example: Emergency hotfix for a payment processing bug (solo developer, fintech)
A solo developer discovers that a rounding error in the payment processing service is causing $0.01 discrepancies in 3% of transactions. This needs to ship within 4 hours. The developer uses gstack with Claude Code.
The developer classifies this as a hotfix and selects only /code, /review, and /ship from the power tool lineup. They skip /decide (the decision is already made by the severity of the bug) and /design (the fix is localized to a known function). 01 discrepancies in 3% of transactions. ' They invoke /code with the problem statement and the specific file, producing a fix that replaces floating-point arithmetic with decimal arithmetic.
/review activates QA perspective and flags that the fix needs to handle currency conversion edge cases where the rounding behavior was intentionally different (a near-miss that would have caused a new bug). The developer loops back to /code with this constraint, producing a revised fix that preserves intentional rounding in currency conversion while fixing the error in domestic transactions. /review passes the revised fix. 5% in the first hour, auto-rollback.
5 hours, with the workflow trace documenting exactly what was changed and why.
Example: Large refactor of authentication system (large team, enterprise)
An enterprise team of 20 developers needs to refactor their authentication system from a monolithic auth module to a microservice with OAuth2 support. The existing system handles 50,000 daily active users. The refactor has a 3-month timeline with a hard requirement of zero downtime during transition.
The project lead runs /decide with a comprehensive problem statement covering the business case (compliance requirements, partner integration needs), technical context (current module's tight coupling to 12 other services), and constraints (zero downtime, backward compatibility for 6 months). The decision record scopes Phase 1 to extracting the auth module into a standalone service with the existing auth protocol, deferring OAuth2 to Phase 2. /design produces an architecture artifact with a strangler fig pattern: new auth service runs alongside the old module, with a routing layer that gradually shifts traffic. The architecture identifies 5 extraction stages.
Each stage gets its own /design, /code, /review, /test cycle. The team lead assigns stages to sub-teams of 3-4 developers. Each sub-team runs their own power tool sequences for their stage, with the team lead running /review at stage boundaries to verify cross-stage consistency. After all 5 stages complete, /docs generates migration guides for the 12 dependent services, and /test produces a load testing plan that simulates the traffic shift.
/ship produces a 3-phase deployment plan with canary releases at 1%, 10%, and 100% of traffic, with automated rollback triggers at each phase. The total project trace spans 3 months and contains over 80 power tool invocations across the team, organized hierarchically under the original decision record.
Best Practices
Always start with a written problem statement, even for work that feels obvious. The problem statement is the contract that every downstream power tool references. Without it, scope creeps incrementally at each phase because there is no baseline to measure against. When you skip this step, you typically discover the scope has doubled by the time you reach /review.
Treat power tool outputs as immutable artifacts at each gate. Do not go back and silently edit the decision record after /design reveals complexity. Instead, re-invoke /decide with the new information and let it produce a revised decision record. This preserves the audit trail and forces you to consciously re-evaluate scope rather than quietly absorbing more work.
Scope each /code invocation to a single architectural component or concern. When you try to implement the entire design in one /code pass, the AI agent loses focus partway through, makes inconsistent decisions across components, and produces code that is harder to review. Smaller, focused invocations produce better code and cleaner review artifacts.
Run /review after every /code invocation, not just at the end. Batching all review to the end creates a pile of findings that interact with each other, making it hard to address blockers without cascading changes. Incremental review catches issues when they are cheap to fix and before downstream code depends on them.
Preserve the workflow trace as a single navigable document. Link each artifact to its predecessor and successor so anyone (including your future self) can walk the chain from problem statement to deployed feature. This trace is both your project documentation and your template for future similar work.
When a power tool's output surprises you, that is a signal to pause, not to override. If /design produces an architecture that seems overly complex, ask why before simplifying it. The complexity might reflect real constraints you forgot to mention in the problem statement. Override only after you understand the reasoning.
Use the multi-agent perspective shifts intentionally. When the CEO perspective is active during /decide, resist the urge to think like an engineer about implementation. When the QA perspective is active during /review, resist the urge to think like a CEO about business priority. Each perspective exists to catch what the others miss, and that only works if you let them operate without interference.
For repeating workflows (weekly releases, recurring migrations), save your workflow trace as a named template. Next time, start with the template and modify only what changed. This compounds your efficiency over time and reduces the cognitive load of figuring out the sequence from scratch each time.
Common Mistakes
Skipping /decide and jumping straight to /code because the task feels clear
Correction
This happens because engineers naturally want to build, and /decide feels like overhead when you already know what to implement. The problem is that /decide does more than approve the work. It scopes the work, identifies risks, and creates the baseline that every downstream tool references. Without it, /design has no scope boundaries to respect, /code has no constraints to honor, and /review has no success criteria to check against.
You catch this mistake when /review keeps flagging scope questions that should have been answered before coding started. Always run /decide, even as a quick 10-minute pass for small tasks.
Running all 8 power tools in strict sequence for every task regardless of size
Correction
This mistake comes from treating the 8-tool sequence as a mandatory checklist rather than a composable toolkit. A two-line bug fix does not need /decide, /design, /docs, and a retrospective. It needs /code and /ship. Over-orchestrating small tasks wastes time and trains you to resent the framework.
The diagnostic signal is when the orchestration overhead exceeds the implementation time. Use the workflow classification from Step 1 to select only the tools that add value for the task's complexity level.
Passing insufficient context between power tools, forcing each tool to re-derive decisions
Correction
This happens when you invoke the next power tool with just a brief verbal summary instead of passing the actual artifact from the previous tool. The AI agent then makes assumptions that may contradict decisions already captured in the artifact you did not share. You notice this when /code produces an implementation that contradicts the architecture from /design, or when /review flags issues that were already accepted as constraints in /decide. Always pass the full artifact, not a summary.
The artifacts are the connective tissue of the workflow.
Treating /review findings as suggestions rather than gates
Correction
When /review flags blockers, there is a strong temptation to reclassify them as warnings so you can keep moving forward. This happens especially under time pressure or when the blocker requires revisiting a design decision. The result is that you ship code with known critical issues and pay for it later in production incidents or technical debt. The signal is a pattern of 'we will fix it later' notes accumulating across multiple review cycles.
Enforce the discipline: blockers stop forward progress until resolved, full stop.
Running /code once for the entire architecture instead of incrementally per component
Correction
This typically happens when the architecture looks simple or when you are in a hurry. A single massive /code invocation produces a large, entangled changeset that is hard to review, hard to debug, and hard to roll back. The AI agent also tends to lose coherence in long generation sessions, producing inconsistent patterns across different parts of the codebase. You detect this when /review returns an unusably long list of findings that span multiple unrelated components.
Split your /code invocations to match the component boundaries from your architecture artifact.
Ignoring perspective shifts and answering every tool's questions from a single mindset
Correction
When /decide asks about business impact and you answer with technical details, or when /review asks about edge cases and you answer with business justifications for why they do not matter, you are undermining the multi-agent perspective system. Each perspective exists to surface a specific category of insight. The signal is when your review artifacts keep missing the same types of issues (usually edge cases or business alignment gaps). Consciously shift your thinking to match the active perspective.
When QA asks about failure modes, think about failure modes, not about why the feature is important.
Other Skills in This Method
Customizing and Extending gstack with Your Own Skills
How to fork, modify, or author new specialist skills and power tools within the gstack open-source framework to fit your team's specific conventions and tech stack.
Comparing gstack to Other AI Coding Agent Frameworks
How to evaluate gstack's opinionated multi-agent approach against alternatives like Cursor rules, Aider conventions, or custom system prompts to choose the right AI coding workflow.
Using Multi-Agent Perspectives (CEO, Engineer, QA) in Development
How to leverage gstack's multi-role system — CEO, engineer, and QA perspectives — to structure decision-making, implementation, and quality assurance across a development workflow.
Installing and Configuring the gstack Skill Pack
How to install gstack from GitHub, set up slash commands, and configure it for use with Claude Code or other AI coding agents.
Structuring AI Coding Sessions from Decision-Making to Execution
How to follow gstack's opinionated phased workflow — moving from problem framing and architecture decisions through implementation and verification — for disciplined AI-assisted development.
Navigating gstack's 23 Specialist Skills via Slash Commands
How to discover, invoke, and chain gstack's 23 specialist slash commands to handle discrete tasks like planning, scaffolding, refactoring, and debugging.
Frequently Asked Questions
How do I decide which power tools to skip for smaller tasks?
Match the power tools to the risk and complexity of the task. For a bug fix with a known root cause, you likely need only /code, /review, and /ship. For a new feature touching multiple services, use the full sequence. The heuristic is: if skipping a tool means you are making implicit assumptions that could be wrong, do not skip it. If the tool's output would be trivially obvious (a /decide on a critical production bug), skip it but still write the artifact it would have produced (the problem statement) so downstream tools have context.
How long should a full power tool workflow take compared to working without gstack?
For a first attempt, expect the orchestration overhead to add 20-30% to your total time. After 3-4 complete workflows, the overhead drops to near zero because you internalize the sequence and the artifacts flow naturally. The time savings come from avoided rework: catching architectural mismatches at /design instead of at /review, and catching scope creep at /decide instead of at /ship. Teams that track this consistently report net time savings of 15-25% on medium-to-large tasks after the learning period.
Should I orchestrate power tools before or after structuring my AI coding session phases?
Structure your session phases first, then map power tools to those phases. The [session structuring skill](/skills/structuring-ai-coding-sessions-with-gstack-phases) defines the decision-to-execution flow at a high level. Power tools are the concrete commands you invoke within each phase. Think of session phases as your agenda and power tools as the specific work items on that agenda. If you try to orchestrate power tools without a session structure, you tend to invoke them reactively rather than strategically.
Can I run multiple power tools in parallel for independent components?
Yes, if the components are genuinely independent at the architecture level. After /design produces an architecture artifact that identifies independent components, you can run /code and /review in parallel for each component. The key requirement is that parallel branches must share the same architecture artifact and decision record so they do not diverge. Merge the branches back together before /ship so that integration issues surface during the final /review pass, not in production.
How do I handle a power tool producing output that contradicts a previous tool's artifact?
Treat contradictions as information, not errors. If /code produces an implementation that contradicts the /design architecture, it usually means the design missed a constraint that only became visible during implementation. Do not force the code to match the design. Instead, loop back to /design with the new information, produce a revised architecture artifact, and then re-invoke /code with the updated design. The workflow trace should show this loop explicitly so future readers understand why the design changed.
Why does my workflow keep requiring loop-backs between /design and /code?
Frequent loop-backs usually indicate that the /design input was missing critical technical context. The problem statement passed to /decide and /design may have been too high-level, or the architecture artifact may have been based on assumptions about the existing codebase that turned out to be wrong. Fix this by investing more time in the /design phase: have the AI agent explore the existing codebase before producing the architecture artifact. If loop-backs persist, it may signal that the work is exploratory rather than planned, in which case you should switch to using specialist skills directly for prototyping and reserve power tools for the implementation phase once the exploration is complete.
How do I use power tools effectively when working on a team where not everyone uses gstack?
Focus on the artifacts, not the tools. Power tools produce structured artifacts (decision records, architecture documents, review findings) that are useful regardless of whether the recipient uses gstack. Share the artifacts through your normal channels (PRs, docs, Slack). Team members who do not use gstack can read and contribute to the artifacts manually. Over time, the consistency and quality of the artifacts often convinces non-users to adopt the framework for their own workflows.