Architecture Decisions
Leveraging AI for research and RFC generation while keeping humans in the decision-making loop
Real Case: Notification System Design at Client A
Step 1: Research with AI
You: "Compare approaches for notification system:
Requirements:
- Multiple channels (email, SMS, in-app)
- 10k+ daily notifications
- User preferences
- Retry logic
- Priority levels
Options:
1. Event-driven with message queue
2. Polling + cron jobs
3. Webhooks + callbacks
Analyze: scalability, complexity, cost, reliability"AI provides detailed comparison
Step 2: Generate RFC with AI
You: "Based on option 1 (event-driven), draft RFC:
Include:
- Architecture diagram (mermaid)
- Component breakdown
- Data flow
- Scalability plan
- Failure modes
- Migration strategy
Follow format from @docs/rfcs/template.md"Step 3: Human Review and Decision
- Team reviews RFC
- Discusses trade-offs AI can't know (budget, existing infrastructure, team skills)
- Makes decision
- AI helps with implementation plan
Lesson: AI excellent for research and documentation. Humans make final decisions based on context AI doesn't have.
Leveraging Extended Thinking for Complex Problems
Modern AI can engage in deeper reasoning before responding. This is valuable for complex architectural problems that benefit from thorough analysis.
When to Use Deep Reasoning
Problem types that benefit:
- Architecture decisions - Trade-off analysis across multiple dimensions
- Security audits - Following attack vectors through the system
- Complex debugging - Multi-layer issues with subtle interactions
- Performance optimization - Systemic bottlenecks requiring holistic view
- Migration planning - Dependencies, risks, rollback strategies
Common pattern:
These problems have:
- Multiple valid solutions with non-obvious trade-offs
- Long-term consequences
- Cross-cutting concerns
- Need to consider many constraints simultaneously
How to Prompt for Thorough Analysis
Instead of: "What's the best architecture for this?"
Try:
"Analyze authentication architecture options for SaaS with these constraints:
Requirements:
- Multi-tenant (500+ organizations)
- SSO support (SAML, OAuth)
- Role-based access control
- API keys for programmatic access
- Session management
Constraints:
- Team of 4 developers
- 6-month timeline
- Budget: $50k for auth infrastructure
- Must be SOC2 compliant
- Current stack: Node.js, PostgreSQL
Compare:
1. Build custom (JWT + sessions)
2. Auth0 / Okta integration
3. Open-source (Keycloak, Ory)
For each, analyze:
- Development time
- Ongoing maintenance
- Cost at 1k, 10k, 100k users
- Compliance implications
- Team expertise needed
- Vendor lock-in risk
Think through second-order effects before recommending."Key elements:
- Specific constraints (time, budget, team)
- Multiple options to compare
- Clear evaluation criteria
- Explicit request for thorough analysis
When Extended Thinking Is Overkill
Don't overthink:
- Simple CRUD endpoints - Follow established patterns, don't philosophize
- Bug fixes with clear root cause - Just fix it
- Boilerplate code - Standard implementation, no decisions needed
- Well-established patterns - If team has done it 10 times, just do it again
- Time-sensitive hotfixes - Analysis paralysis is worse than imperfect solution
Rule of thumb:
If the answer is in your existing codebase or standard practice, don't ask AI to reinvent it.
Pattern: Think First, Then Implement
For complex problems, use a two-phase approach:
Phase 1: Analysis (thinking mode)
You: "Before we implement, analyze:
- What could go wrong with approach X?
- What are the failure modes?
- What constraints did I miss?
- What assumptions need validation?"
AI: [Thorough analysis, identifies issues]
You: [Reviews, discusses with team, decides on approach]Phase 2: Implementation (execution mode)
You: "Based on our discussion, implement approach X
with these specific decisions:
- [Decision 1]
- [Decision 2]
- [Decision 3]
Follow patterns from @existing-code"
AI: [Implements according to plan]Why separate phases:
- Analysis doesn't get rushed by "just ship it" pressure
- Team can review reasoning before committing resources
- Implementation is clearer with decisions already made
- Easier to course-correct early
Real Example: Database Scaling Decision
Rushed approach (don't do this):
You: "Our database is slow, should we add read replicas?"
AI: "Yes, here's how to set up read replicas..."
[Implements complex infrastructure]
[Doesn't solve the actual problem - was missing indexes]Thoughtful approach:
You: "Database performance degrading. Before jumping to solutions, analyze:
Current state:
- 50k users
- 2M database rows
- Average query time: 800ms (was 200ms)
- 95th percentile: 3s
- Growth: 20% month-over-month
Symptoms:
- Slow on complex reports
- Fast on simple queries
- Worse during business hours (9am-5pm EST)
Constraints:
- Budget: $5k one-time, $500/month ongoing
- Team has PostgreSQL experience only
- Can't take downtime > 5 minutes
- Must ship improvement in 2 weeks
Analyze root causes before suggesting solutions.
What data would help narrow this down?"
AI: "Need more data before recommending solutions:
1. Run EXPLAIN ANALYZE on slow queries
2. Check if indexes exist on joined columns
3. Review query patterns (N+1 queries?)
4. Check connection pool utilization
5. Monitor disk I/O during slow periods
Most likely culprits given symptoms:
- Missing indexes (explains report slowness)
- N+1 queries (explains time-of-day pattern)
Recommend profiling before adding infrastructure."
You: [Runs suggested diagnostics]
You: "Here's EXPLAIN output [paste]"
AI: "Root cause: Missing composite index on orders(user_id, created_at)
All reports scan full table.
Solution: CREATE INDEX - no infrastructure needed
Cost: $0, 5 minutes
Expected improvement: 800ms → 50ms"Result: Solved with a 5-minute fix instead of weeks of infrastructure work.
Balancing Thoroughness with Speed
How much analysis is enough?
| Decision Reversibility | Stakes | Analysis Time |
|---|---|---|
| Easy to reverse | Low | 5-15 minutes |
| Moderate effort | Medium | 30-60 minutes |
| Hard to reverse | High | 2-4 hours |
| Irreversible | Critical | Days (with team) |
Examples:
Easy to reverse (15 min):
- Which npm package for date parsing?
- REST vs GraphQL for new endpoint?
Moderate effort (1 hour):
- State management approach (Redux, Zustand, Context)?
- Background job system (BullMQ, Celery)?
Hard to reverse (half day):
- Database choice (Postgres, MySQL, Mongo)?
- Monolith vs microservices?
Critical (multi-day):
- Cloud provider (AWS, GCP, Azure)?
- Programming language for new service?Golden rule: Analysis time should match reversal cost.
Remember: Deep thinking is a tool, not a requirement. Use it for genuinely complex decisions. For everything else, ship fast and iterate.