Structuring AI Coding Sessions with gstack Framework Examples

This skill teaches you how to follow gstack's opinionated phased workflow, moving from problem framing and architecture decisions through implementation and verification, so that AI-assisted development sessions produce reliable, well-reasoned code instead of fast but fragile output.

Start every AI coding session by framing the problem and exploring architecture options before writing code. Use gstack's phased workflow to move sequentially through problem definition, multi-perspective design review, implementation planning, code generation, and verification. Each phase produces a concrete artifact that feeds the next, preventing the common failure mode of jumping straight into code generation without adequate context or constraints.

Outcome: You gain a repeatable, phase-by-phase structure for every AI coding session that prevents premature implementation, surfaces architectural trade-offs early, and produces code that survives review and deployment rather than needing immediate rework.

Jun 1, 2026

Synthesized from public framework references and reviewed for accuracy.

DevelopmentIntermediate45-90 minutes per coding session

Prerequisites

Basic familiarity with AI coding agents (Claude Code, Cursor, or similar)
gstack skill pack installed and configured (see installing-and-configuring-gstack-skill-pack)
Understanding of slash commands for invoking gstack skills
Working knowledge of software development fundamentals (version control, testing, code review)

Overview

Most developers who adopt AI coding agents fall into a predictable trap: they describe a feature, let the agent generate hundreds of lines of code, and then spend hours debugging output that was built on incorrect assumptions. The problem is not the agent's capability. The problem is the absence of a structured workflow that forces critical thinking before code generation begins. Structuring AI coding sessions with the gstack Framework addresses this by encoding an opinionated, phased progression into your development process. Instead of treating the AI agent as a code autocomplete that you prompt and pray with, gstack's phases treat it as a collaborator that must be guided through problem definition, architecture review, implementation planning, and verification in sequence.

The phased workflow produces a specific artifact at each stage. Problem framing yields a problem statement and success criteria. Architecture review yields a design document with trade-offs evaluated from multiple perspectives (CEO, engineer, QA). Implementation planning yields a task breakdown with dependencies and risk flags. Code generation produces working code against that plan. Verification confirms the code meets the original success criteria. Each artifact constrains and informs the next phase, creating a chain of reasoning that is auditable and debuggable when something goes wrong. This is the discipline that separates developers who use AI agents effectively from those who generate technical debt faster than they can manage it.

This skill is the central orchestration pattern in the gstack Framework. While sibling skills like using multi-agent perspectives and navigating slash commands teach specific capabilities within the framework, this skill teaches you how to sequence those capabilities into a coherent session. Think of it as the conductor's score that tells you when each instrument enters. By the end, you will have a repeatable session template that you can adapt to features of varying complexity, from a small bug fix completed in 20 minutes to a multi-day architectural migration.

The concrete output of mastering this skill is a session log: a structured record of each phase's artifact, the decisions made, the alternatives considered, and the verification results. This log serves double duty as both a development record and a prompt engineering reference, because you can reuse successful session structures as templates for future work of similar shape.

How It Works

The phased workflow works because it breaks the AI coding session into stages that mirror how experienced engineers naturally think, but makes each stage explicit and non-skippable. Without structure, AI agents are happy to generate code the moment you describe a feature. The code will compile. It might even pass basic tests. But it will be built on whatever assumptions the agent defaulted to, and those assumptions are invisible until something breaks in production. Gstack's phases make the assumptions visible by requiring you to articulate them before code generation begins.

The underlying mental model is a decision funnel. At the top of the funnel, you have maximum uncertainty and maximum optionality. You do not know what the right architecture is, you have not decided on trade-offs, and the problem itself may not be well-defined. Each phase narrows the funnel by forcing a specific type of decision. Problem framing narrows the space of what you are building. Architecture review narrows the space of how you are building it. Implementation planning narrows the sequence and decomposition. Code generation operates within the constraints established by the prior phases. Verification confirms that the output satisfies the constraints from the top of the funnel.

This funnel structure has a critical property: it makes rework cheap. If you discover during architecture review that your problem statement is wrong, you rewrite a paragraph, not a thousand lines of code. If you discover during implementation planning that your architecture has a fatal dependency, you revise a design document, not a half-built feature branch. The cost of changing direction increases exponentially as you move down the funnel, so gstack front-loads the thinking where changes are cheapest.

The phases also create natural checkpoints for multi-agent perspectives. During problem framing, a CEO perspective asks whether this is even the right problem to solve. During architecture review, an engineer perspective evaluates technical feasibility while a QA perspective identifies testability gaps. During verification, a QA perspective runs the acceptance criteria. These perspectives are not cosmetic labels. They are structured prompts that force the AI agent to evaluate the same artifact from genuinely different angles, surfacing conflicts that a single-perspective review would miss. See using multi-agent perspectives for the mechanics of how these perspective shifts work.

One important nuance: the phases are sequential but not rigid. For a trivial bug fix, you might compress problem framing and architecture review into a single prompt. For a complex migration, you might iterate within the architecture review phase three or four times before proceeding. The discipline is in never skipping a phase entirely, not in spending equal time on each. The size of the phase scales with the complexity and risk of the work.

Step-by-Step

Step 1: Frame the Problem with Explicit Success Criteria
Before opening your AI agent, write a problem statement that answers three questions: what is broken or missing, who is affected, and how will you know when it is fixed. This is not a feature description or a user story. It is a diagnosis. Include the observable symptom ("users see a 500 error when submitting the payment form"), the suspected root cause if you have one ("the Stripe webhook handler does not retry on timeout"), and the measurable success criteria ("payment submissions succeed within 3 seconds with zero 500 errors over a 24-hour window").

Feed this problem statement to your AI agent as the opening context for the session. Explicitly instruct the agent not to generate code yet, only to confirm understanding and ask clarifying questions. Review the agent's clarifying questions carefully, because they reveal assumptions you may not have stated. Revise the problem statement based on any gaps the questions expose.
Tip: Write the success criteria as if you are writing acceptance tests in plain language. If you cannot describe a test that would pass when the work is done, your problem statement is too vague. "Improve payment reliability" is not testable. "Payment form submissions return 200 status within 3 seconds, with fewer than 0.1% error rate over 24 hours" is testable.
Step 2: Explore Architecture Options Before Committing
With the problem statement established, prompt the AI agent to generate at least three distinct approaches to solving it. Do not ask for "the best" approach, because that collapses the decision space prematurely. Ask for three approaches with different trade-off profiles: one optimizing for speed of implementation, one for long-term maintainability, and one for minimal risk. For each approach, require the agent to specify which existing components are affected, what new components are introduced, what the failure modes are, and what the testing strategy looks like.

Document all three options in a brief design comparison, even if one is obviously better. The act of comparing forces you to articulate why you are choosing one path over another, and that reasoning becomes invaluable when you need to explain the decision later or revisit it.
Tip: If the AI agent keeps converging on a single approach despite being asked for three, it likely means your problem statement is so constrained that only one architecture makes sense. That is fine for small tasks. For anything touching multiple systems or requiring more than a day of work, push for genuine alternatives. Reframe the prompt: "Assume approach A is not available. What would you do instead?"
Step 3: Apply Multi-Agent Perspectives to the Chosen Architecture
). You can do this by prompting the AI agent to adopt each role in sequence, or by using gstack's built-in perspective commands. For each perspective, capture the concerns raised and the mitigations proposed. 1% of users.

The engineering perspective might identify a performance bottleneck in the chosen data model. The QA perspective might reveal that the architecture makes a critical path untestable without an integration environment. Record these concerns and your responses to them in the session log before proceeding.
Tip: The QA perspective is the most frequently skipped and the most valuable. Engineers naturally think about the happy path. QA thinks about what happens when the network drops mid-transaction, when the input is malformed, when the external API returns unexpected data. Force this perspective even when the feature seems simple.
Step 4: Decompose the Implementation into Ordered Tasks
Convert the chosen architecture into a sequenced task list where each task produces a verifiable output. A good task breakdown has three properties: each task is small enough to complete in a single AI agent interaction (roughly 15-30 minutes of work), each task has a clear "done" criterion, and the tasks have explicit dependencies so you know which must complete before others can start. For example, instead of "implement the payment retry system," break it into: (1) add retry configuration to the webhook handler, (2) implement exponential backoff logic with tests, (3) add dead letter queue for permanently failed webhooks, (4) update monitoring dashboards to track retry rates. Prompt the AI agent to review the task breakdown for missing steps, particularly around error handling, database migrations, configuration changes, and deployment concerns.
Tip: A reliable heuristic: if a task description contains the word "and," it is probably two tasks. "Add retry logic and update the monitoring dashboard" is two tasks with different risk profiles and different verification needs. Split them.
Step 5: Execute Tasks Sequentially with Context Boundaries
Begin code generation by working through your task list in order. For each task, provide the AI agent with three pieces of context: the original problem statement (so it remembers the why), the relevant section of the architecture document (so it follows the how), and the specific task with its done criterion (so it knows when to stop). This explicit context boundary prevents a common failure mode where the agent accumulates conflicting context over a long session and starts generating code that contradicts earlier decisions. After each task, review the generated code against the task's done criterion before moving to the next task.

If the code does not meet the criterion, iterate on that task, do not proceed. Resist the temptation to let the agent "fix it later" in a subsequent task, because cascading assumptions built on broken code are the primary source of AI-generated technical debt.
Tip: If your AI agent's context window is filling up during a long session, start a new session for the next task rather than continuing. Provide the new session with the problem statement, the architecture document, and the completed tasks list. A fresh context window with explicit documentation outperforms a cluttered one with implicit history every time.
Step 6: Verify Each Task Against Success Criteria Before Proceeding
After each task is complete, run verification before moving to the next. Verification has two layers: automated verification (do the tests pass? does the linter clear? ).

Prompt the AI agent to act as a QA reviewer for its own output. Provide it with the task's done criterion and ask it to evaluate whether the code meets it, identify any edge cases not covered by the tests, and flag any assumptions it made that were not specified in the architecture document. If the agent identifies gaps, address them within the current task. This phase-gate pattern prevents the accumulation of "almost done" tasks that collectively produce a broken system.
Tip: Keep a running tally of assumptions the agent made during code generation. When the agent says "I assumed the database connection pool is configured to handle 50 concurrent connections" or "I assumed this API returns ISO 8601 dates," write those down. At the end of the session, validate every assumption. Unvalidated assumptions are the number one cause of "it works on my machine" failures.
Step 7: Run End-to-End Verification Against Original Problem Statement
Once all tasks are complete, return to the problem statement and success criteria from step 1. Run a final verification that evaluates the complete solution, not individual tasks, against the original criteria. This is where you catch integration issues that task-level verification misses: the retry logic works in isolation, but the dead letter queue does not receive messages because the queue configuration was not deployed. Prompt the AI agent with the full problem statement, the success criteria, and the complete set of changes, then ask it to evaluate whether the solution as a whole meets the criteria.

If you defined the success criteria as testable statements in step 1, you can write or generate integration tests that directly assert them. Document the verification results in the session log.
Tip: The most common gap in end-to-end verification is environmental differences. The code works in your development environment but fails in staging because of a missing environment variable, a different database version, or a network policy that blocks an external API call. Include deployment configuration in your verification checklist.
Step 8: Document Session Decisions and Create Reusable Templates
Close the session by capturing a brief log that records: the original problem statement, the architecture options considered and the one chosen with rationale, the task breakdown, any assumptions made during implementation, the verification results, and any deviations from the original plan. This log takes 5-10 minutes to write and pays for itself immediately, because it provides the context needed to onboard another developer, debug a regression, or restart the work if the session was interrupted. Over time, you will accumulate session logs for different types of work (bug fixes, new features, migrations, refactors), and these logs become templates for future sessions. A session log for a successful API migration becomes the starting template for the next API migration, saving 30-40 minutes of setup time.
Tip: Store session logs alongside the code they produced, not in a separate documentation system. A markdown file in the repository's docs folder, committed with the feature branch, is more likely to be found and read than a Confluence page. If your team uses pull request descriptions, paste the session summary there.

Examples

Example: Solo Developer Fixing a Payment Processing Bug

A solo developer at a small SaaS startup needs to fix a bug where Stripe webhook events are occasionally dropped, causing customers to be charged without their subscription status updating. The codebase is a Node.js API with about 15,000 lines of code. The developer has 3 hours to fix and deploy the solution.

The developer starts by writing a problem statement: "Stripe webhook events for subscription updates are dropped approximately 2% of the time, causing customer records to show 'active' when the subscription has been canceled or updated. Success criteria: webhook processing succeeds for 99.9% of events over a 7-day window, with failed events retried automatically and alerting when retries are exhausted." They feed this to Claude Code with an explicit instruction not to generate code yet. The agent asks three clarifying questions about the current webhook handler, the database transaction model, and whether idempotency keys are used. These questions reveal that the current handler has no idempotency protection, which the developer had not considered as part of the problem.

For architecture review, the agent proposes three approaches: (1) add retry logic within the existing handler, (2) introduce a message queue between webhook receipt and processing, or (3) use Stripe's built-in retry mechanism with idempotent event processing. The developer selects option 3 as the lowest-risk approach, since it leverages Stripe's infrastructure rather than building custom retry logic. A QA perspective review flags that the idempotency check needs a database index on the event ID column, and that the current test suite has no webhook integration tests.

The task breakdown produces four tasks: add an events table with unique constraint on Stripe event ID, modify the webhook handler to check for duplicate events before processing, add integration tests using Stripe's test webhook events, and update the monitoring dashboard. Each task is completed in a separate prompt with fresh context. End-to-end verification confirms that replaying the same webhook event twice results in only one database update, and the monitoring dashboard shows the new retry metrics. Total session time: 2 hours 15 minutes, including 25 minutes on framing and architecture.

Example: Team Lead Planning a Database Migration at a Growth-Stage Company

A team lead at a 50-person company needs to migrate a core PostgreSQL table from a monolithic schema to a partitioned schema to handle growing query volumes. The table has 200 million rows, the application serves 10,000 requests per minute during peak hours, and the migration must complete with zero downtime. The team has two backend engineers and one week.

The team lead frames the problem with quantified constraints: "Migrate the events table (200M rows, 45GB) from a single table to range-partitioned by month. Success criteria: all queries continue to return correct results during and after migration, p99 query latency stays below 200ms, zero application errors during migration, and migration completes within 5 business days." The AI agent's clarifying questions surface that the application uses several raw SQL queries that reference the table directly and will need modification for the partitioned schema.

Architecture review generates three migration strategies: (1) pg_partman with online partitioning, (2) logical replication to a new partitioned table with a cutover, and (3) a custom dual-write migration with incremental backfill. The engineer perspective evaluates each for operational complexity. The QA perspective identifies that strategy 1 requires PostgreSQL 14+ (the production database is on 13), eliminating it immediately. This catch, made during a 5-minute perspective review, saves what would have been a day of wasted implementation work. Strategy 2 is selected.

The implementation plan produces 12 tasks across the week: create the partitioned table structure, set up logical replication, backfill historical data in batches, modify application queries, run shadow traffic comparison, perform cutover, verify data integrity, and clean up. Each task has a done criterion and an estimated duration. The team lead assigns tasks to the two engineers, with each engineer completing their tasks using structured AI sessions following the same phased workflow. The session logs from each engineer's work become the migration runbook, which proves critical when a similar migration is needed for two other tables the following quarter.

Example: Junior Developer Adding a Feature to an Open-Source Project

A junior developer with 8 months of experience wants to add dark mode support to a React component library used by 500+ projects. They are contributing to an open-source project they did not write, so they need to understand existing patterns before modifying code. The component library has 40 components, a Storybook setup, and uses CSS-in-JS with styled-components.

The developer starts with a problem statement that acknowledges their knowledge gaps: "Add a dark mode theme to the component library so that consuming applications can toggle between light and dark themes. Success criteria: all 40 components render correctly in both themes, the API for toggling themes follows the existing ThemeProvider pattern, no visual regressions in existing light theme, and the solution is documented in Storybook." They prompt the AI agent to first analyze the existing codebase's theming approach before proposing solutions. The agent identifies that the library uses a ThemeProvider with a single light theme object, and that 12 of the 40 components have hardcoded color values that bypass the theme.

Architecture review produces three approaches: (1) extend the existing theme object with a mode property and dark color tokens, (2) create a separate dark theme object and switch between them at the provider level, and (3) use CSS custom properties for colors with theme-level overrides. The developer applies a CEO perspective, which asks whether users of the library have requested dark mode (yes, it is the most-upvoted issue). The engineer perspective recommends approach 2 because it aligns with the library's existing pattern and requires the least refactoring of component internals. The QA perspective flags the 12 components with hardcoded colors as the primary risk.

Task decomposition produces 8 tasks: audit all 40 components for hardcoded colors (the agent generates a script to find them), create the dark theme token object, refactor the 12 hardcoded components to use theme tokens, build a theme toggle component, add Storybook dark mode decorator, write visual regression tests, update documentation, and submit the PR. The developer completes each task in a separate session, using the prior session's log as context. The final PR includes a session summary that explains every design decision, which the maintainers praise as the most thorough contribution documentation they have received.

Example: B2B Platform Team Implementing Multi-Tenant Data Isolation

A platform team at a B2B SaaS company serving 200 enterprise customers needs to implement row-level security for a new reporting module. Compliance requirements mandate that no customer can ever access another customer's data, even in error. The existing codebase uses a shared database with tenant_id columns but no database-level enforcement. The team has a tech lead, two senior engineers, and three weeks.

The tech lead frames the problem with compliance language baked into the success criteria: "Implement database-level tenant data isolation for the reporting module such that a query executed in the context of tenant A can never return rows belonging to tenant B, regardless of application-level bugs. Success criteria: all reporting queries are filtered at the database level via PostgreSQL row-level security policies, a test suite demonstrates isolation by attempting cross-tenant access and verifying denial, the implementation passes the compliance team's penetration testing checklist, and query performance degrades by no more than 10% compared to the current unprotected queries."

Architecture review generates approaches spanning row-level security policies, separate schemas per tenant, and application-middleware enforcement. The CEO perspective validates that database-level enforcement is non-negotiable for the enterprise sales pipeline (three prospects have asked for SOC 2 evidence of data isolation). The engineer perspective notes that separate schemas would require rewriting the ORM layer, while RLS policies can be applied incrementally. The QA perspective designs a specific attack scenario: a modified API request that substitutes a different tenant_id in the query parameters, which the RLS policy must block regardless of what the application layer does.

The 3-week implementation plan breaks into three phases: week 1 establishes RLS policies on the reporting tables with integration tests, week 2 modifies the application's database connection to set the tenant context at session level, and week 3 runs performance benchmarks and compliance testing. Each engineer's daily AI sessions follow the phased workflow, with session logs shared in the team's engineering channel. When the compliance team runs penetration testing, the session logs serve as evidence of the threat modeling performed during architecture review, satisfying a SOC 2 control that would otherwise require separate documentation.

Best Practices

Never let the AI agent generate production code during the problem framing or architecture review phases. The purpose of these phases is to constrain the solution space, not to produce output. If the agent starts writing code unprompted during these phases, redirect it explicitly. Allowing premature code generation anchors you to a specific implementation before you have evaluated alternatives, making you reluctant to discard work even when a better approach is obvious.
Set explicit time boundaries for each phase proportional to the complexity of the work. For a small bug fix (under 2 hours total), spend 5 minutes on problem framing, 5 on architecture, and the rest on implementation and verification. For a multi-day feature, spend 30-60 minutes on problem framing and architecture before writing any code. Without time boundaries, teams either rush through framing (and pay for it later) or get stuck in analysis paralysis during architecture review.

The ratio of framing-to-coding should be roughly 20% framing and 80% implementation for well-understood problems, and 40% framing and 60% implementation for novel problems.
Maintain a separate, persistent document for the architecture decision rather than relying on the AI agent's conversation history. Conversation history is ephemeral, gets truncated by context window limits, and cannot be shared with other team members. A markdown file with the problem statement, architecture options, and chosen approach becomes the single source of truth that every subsequent prompt references. This also prevents the common failure where a long conversation drifts and the agent forgets constraints established earlier.
Use the QA perspective as a phase gate, not a final check. ). Teams that defer QA to the end consistently discover fundamental issues that require reworking multiple completed tasks. Catching testability gaps at the architecture stage costs minutes.

Catching them after implementation costs hours.
When the AI agent proposes a solution you do not fully understand, pause and ask it to explain the reasoning before accepting the code. This is not about distrust. It is about maintaining your ability to debug, modify, and extend the code after the session ends. If you cannot explain why the agent chose a particular pattern, you cannot evaluate whether it is correct for your context.

A useful prompt: "Explain why you chose this approach over the alternatives.
Review gstack framework examples from previous sessions before starting a new one of similar shape. If you previously completed a database migration session, review that session log before starting a new migration. The prior session's architecture options, task breakdown, and verification checklist are a proven template. Adapting a proven template is faster and more reliable than starting from scratch, and it surfaces lessons learned from the prior session (such as "the migration required a backfill script we initially forgot").
Keep each AI agent interaction focused on a single phase or task. Prompts that combine multiple phases ("frame the problem and then generate the code") collapse the workflow and eliminate the review checkpoints between phases. The 30-second overhead of sending a separate prompt for each phase is negligible compared to the debugging cost of code generated without adequate problem framing.

Common Mistakes

Jumping straight to code generation without framing the problem or reviewing architecture

Correction

This is the most common and most expensive mistake. It happens because developers are optimizing for speed of output rather than speed of correct output. The symptom is a long AI session that produces a large diff, followed by a long debugging session where you discover the code solved the wrong problem or made architectural choices that conflict with your existing system. The fix is mechanical: before every session, write at least three sentences describing the problem, the success criteria, and the constraints.

Even this minimal framing cuts rework rates by more than half, because it forces the agent to operate within stated boundaries rather than inferred ones.

Treating all tasks as the same size and skipping decomposition

Correction

This manifests as a single prompt like "build the user authentication system" that produces 500 lines of code spanning multiple files, modules, and concerns. When something in that output is wrong, you cannot isolate which part failed or why, because the entire implementation was generated as a monolith. The underlying cause is usually impatience with the decomposition step. The fix is to enforce a maximum scope for each AI interaction: one task, one concern, one verifiable output.

If you cannot describe what "done" looks like for a task in a single sentence, the task is too large. Split it until each piece has a clear, testable completion criterion.

Skipping multi-agent perspectives because they feel redundant for simple tasks

Correction

Developers skip perspectives because they believe they already know the answer and the perspectives will just confirm it. This is confirmation bias in action. The symptom is code that works correctly but creates operational problems: it is not observable (no logging or metrics), not maintainable (complex coupling that makes future changes expensive), or not deployable (requires manual steps that were not documented). Simple tasks accumulate into complex systems, and each shortcut in perspective review compounds.

The fix is to apply at minimum a QA perspective to every task, even trivial ones. ") is free compared to the debugging cost when something does go wrong.

Allowing architecture review to become an infinite loop of options analysis

Correction

This is the opposite failure mode from jumping to code. It shows up as a session that spends 90 minutes evaluating increasingly exotic architecture options without ever writing a line of code. It happens when the developer is uncertain and uses option generation as a form of procrastination. The signal to watch for: if you have evaluated more than three options for a single component, or if you are on your second round of evaluation after already selecting an approach, you are stuck.

The fix is to set a time boundary before starting architecture review and commit to choosing when the time expires. For most decisions, the difference between the second-best and best architecture is smaller than the cost of the time spent finding it. Pick, commit, and proceed. You can always revisit if verification reveals a problem.

Not resetting context between tasks in long sessions

Correction

AI agents accumulate context over a conversation, and that context can become contradictory. The symptom is subtle: code generated in task 7 contradicts a decision made in task 2, because the agent's attention has drifted to more recent context. You will not catch this during the session because the code looks reasonable in isolation. " The fix is to treat each task as a semi-independent interaction.

Either start a new conversation for each task (providing the problem statement and architecture document as fresh context) or explicitly re-state the relevant constraints at the beginning of each task prompt.

Documenting nothing and relying on the AI conversation log as the session record

Correction

Conversation logs are not documentation. They are interleaved with false starts, corrections, tangential explorations, and the agent's thinking-out-loud text. Trying to reconstruct decisions from a raw conversation log takes longer than making the decisions did originally. " and nobody being able to answer without reading 200 messages of conversation.

The fix is to spend 5-10 minutes at the end of each session writing a brief summary: what you decided, why, what alternatives you considered, and what you would do differently. This investment pays for itself the first time someone (including future you) needs to understand or modify the code.

Other Skills in This Method

Customizing and Extending gstack with Your Own Skills

How to fork, modify, or author new specialist skills and power tools within the gstack open-source framework to fit your team's specific conventions and tech stack.

Orchestrating gstack's 8 Power Tools in Complex Workflows

How to use gstack's 8 power tools — higher-order commands that combine specialist skills — to manage end-to-end development workflows like feature buildout or codebase migration.

Comparing gstack to Other AI Coding Agent Frameworks

How to evaluate gstack's opinionated multi-agent approach against alternatives like Cursor rules, Aider conventions, or custom system prompts to choose the right AI coding workflow.

Using Multi-Agent Perspectives (CEO, Engineer, QA) in Development

How to leverage gstack's multi-role system — CEO, engineer, and QA perspectives — to structure decision-making, implementation, and quality assurance across a development workflow.

Installing and Configuring the gstack Skill Pack

How to install gstack from GitHub, set up slash commands, and configure it for use with Claude Code or other AI coding agents.

Navigating gstack's 23 Specialist Skills via Slash Commands

How to discover, invoke, and chain gstack's 23 specialist slash commands to handle discrete tasks like planning, scaffolding, refactoring, and debugging.

Frequently Asked Questions

How long should the problem framing phase take for a typical coding session?

For a well-understood bug fix or small feature, 5-10 minutes is sufficient. Write three to five sentences covering the symptom, the suspected cause, and the testable success criteria. For novel work touching multiple systems or requiring architectural decisions, spend 15-30 minutes. The test is whether you could hand your problem statement to another developer and they would understand what needs to be true when the work is done. If the answer is no, your framing is not complete yet. The framing phase should never exceed 20% of your total estimated session time.

Should I structure AI coding sessions differently for bug fixes versus new features?

Yes, but the phases remain the same. For bug fixes, problem framing emphasizes diagnosis: what is the observable symptom, what is the expected behavior, and how can you reproduce it. Architecture review is usually short because the fix is constrained to the existing system. For new features, problem framing emphasizes scope: what is included, what is explicitly excluded, and what are the acceptance criteria. Architecture review takes longer because you are making design choices that did not exist before. The phased structure adapts by changing the time allocation per phase, not by skipping phases.

How do I apply gstack's phased workflow when I am using a different AI agent than Claude Code?

The phased workflow is agent-agnostic. The principles of problem framing before code generation, architecture review with multi-perspective evaluation, task decomposition with verifiable done criteria, and end-to-end verification work with any AI coding agent. What changes is the invocation mechanism. With Claude Code, you use gstack's slash commands to invoke specific skills. With other agents (Cursor, GitHub Copilot, Aider), you implement the same phases by structuring your prompts to match each phase's purpose. The session log format and artifact chain are identical regardless of which agent you use.

How do I handle a session where the AI agent's architecture suggestion contradicts my team's existing patterns?

This is exactly what the architecture review phase is designed to catch. When the agent proposes an approach that conflicts with your team's conventions, do not simply override it or accept it. Instead, ask the agent to explain the trade-offs of its approach versus your team's established pattern. Sometimes the agent identifies a genuine improvement that your team should consider adopting. More often, the agent is optimizing for the isolated problem without awareness of your broader system context. In either case, document the conflict and the decision in your session log, because it becomes a useful reference for future architectural discussions.

Why does my session keep producing code that needs significant rework despite following the phases?

The most common cause is insufficient specificity in the problem statement or architecture document. If your success criteria say "the feature should work well" instead of "the API returns a 200 response with a JSON body containing the user's subscription status within 500ms," the agent fills the ambiguity with its own assumptions, and those assumptions may not match your requirements. The second most common cause is skipping the context reset between tasks in a long session, which causes the agent to accumulate contradictory context. Review your problem statements for testable specificity and check whether you are providing fresh context boundaries for each task.

Can I use gstack's phased workflow for non-coding work like infrastructure configuration or data pipeline design?

Yes. The phased workflow applies to any technical work where decisions have cascading consequences. For infrastructure work, problem framing defines the operational requirements (availability targets, cost constraints, compliance needs). Architecture review evaluates IaC approaches (Terraform modules, CloudFormation stacks, Pulumi programs). Task decomposition breaks the work into independently deployable and verifiable changes. The only adaptation needed is in the verification phase, where you verify infrastructure state (resource exists, configuration matches, connectivity works) rather than running unit tests. Several gstack framework examples in the community involve Terraform and Kubernetes configurations using this exact approach.

Should I complete the entire phased workflow in a single sitting or can I split it across multiple sessions?

You can and should split across sessions for complex work. The key requirement is that each phase produces a written artifact that can bootstrap the next session. Your problem statement is a markdown file. Your architecture decision is a markdown file. Your task breakdown is a markdown file or a list of issues. As long as these artifacts exist outside the AI conversation, you can pick up at any phase in a new session by providing the relevant artifacts as context. This is one reason the documentation step matters so much: it is not just for your team, it is the handoff mechanism between your past self and your future self.

Structuring AI Coding Sessions with gstack Framework Examples

Prerequisites

Overview

How It Works

Step-by-Step

Step 1: Frame the Problem with Explicit Success Criteria

Step 2: Explore Architecture Options Before Committing

Step 3: Apply Multi-Agent Perspectives to the Chosen Architecture

Step 4: Decompose the Implementation into Ordered Tasks

Step 5: Execute Tasks Sequentially with Context Boundaries

Step 6: Verify Each Task Against Success Criteria Before Proceeding

Step 7: Run End-to-End Verification Against Original Problem Statement

Step 8: Document Session Decisions and Create Reusable Templates

Examples

Example: Solo Developer Fixing a Payment Processing Bug

Example: Team Lead Planning a Database Migration at a Growth-Stage Company

Example: Junior Developer Adding a Feature to an Open-Source Project

Example: B2B Platform Team Implementing Multi-Tenant Data Isolation

Best Practices

Common Mistakes

Other Skills in This Method

Customizing and Extending gstack with Your Own Skills

Orchestrating gstack's 8 Power Tools in Complex Workflows

Comparing gstack to Other AI Coding Agent Frameworks

Using Multi-Agent Perspectives (CEO, Engineer, QA) in Development

Installing and Configuring the gstack Skill Pack

Navigating gstack's 23 Specialist Skills via Slash Commands

Frequently Asked Questions