Lesson 4: Software Design and Architecture

Software Design and Architecture: Building Systems That Last

Learning Objectives

By the end of this lesson, you will be able to:

  1. Distinguish between software architecture and software design, and explain how they work together to create robust systems
  2. Apply fundamental architectural principles (Separation of Concerns, Modularity, Abstraction, Loose Coupling) to decompose complex systems
  3. Identify and implement common architectural patterns (Layered Architecture, MVC, Microservices, Event-Driven) based on system requirements
  4. Apply SOLID principles at the architectural level to design systems that are maintainable, extensible, and testable
  5. Analyze quality attributes (Scalability, Performance, Availability, Maintainability, Security) and make informed architectural trade-offs
  6. Design a complete system architecture for a real-world application, documenting your decisions and trade-offs

Introduction

Imagine you’re tasked with building a social media platform that needs to support 100 million active users. Where do you start? How do you ensure the system remains fast, reliable, and maintainable as it grows? These questions lie at the heart of software architecture—the high-level structure that determines whether your application succeeds or collapses under its own complexity.

Software architecture is the blueprint for your system [1]. Just as a building architect creates detailed plans showing how rooms, floors, and structural elements work together, a software architect designs how components, services, and data flows interact to deliver business value. Poor architecture leads to systems that are slow, buggy, impossible to maintain, and eventually abandoned. Good architecture creates systems that adapt to changing requirements, scale gracefully, and empower teams to work efficiently [2].

This lesson bridges the gap between coding individual features and designing complete systems. You’ll learn architectural principles that guide your design decisions, common patterns that solve recurring problems, and trade-offs that experienced architects navigate daily. By understanding architecture, you’ll write better code because you’ll see how your work fits into the larger system [3].


Core Content

Architecture vs. Design: Understanding the Difference

Many developers use “architecture” and “design” interchangeably, but they operate at different levels of abstraction and serve distinct purposes [1].

Software Architecture defines the high-level structure of the entire system. It determines:

  • How major components (frontend, backend, database, cache) are organized
  • How these components communicate (REST APIs, message queues, gRPC)
  • Where data lives and how it flows through the system
  • How the system handles cross-cutting concerns like security, logging, and error handling

Think of architecture as city planning. It decides where residential zones, business districts, and industrial areas sit, and how roads and utilities connect them. Architecture answers questions like “Should we use microservices or a monolith?” and “How do we ensure the system remains available during peak traffic?”

Software Design focuses on the internal structure of individual components. It determines:

  • How classes and modules within a component are organized
  • What design patterns (Factory, Strategy, Observer) are used
  • How data structures are implemented
  • How algorithms solve specific problems

Design is like architectural plans for a single building within that city. It details how rooms connect, where electrical wiring runs, and how plumbing flows. Design answers questions like “How should we structure our authentication module?” and “What pattern should we use for database access?”

The Relationship

Architecture provides constraints and guidelines that design must follow [2]. For example, if the architecture mandates microservices, the design of each service must ensure it’s independently deployable and doesn’t tightly couple to other services. If architecture chooses REST APIs for communication, design must implement proper HTTP methods, status codes, and error handling.

Good architecture makes good design easier. Poor architecture forces designers into contortions to make things work. That’s why understanding architecture is crucial—it sets the foundation for everything that follows [3].


Fundamental Architectural Principles

Certain principles appear repeatedly in successful software architectures. These aren’t rigid rules but guiding lights that help you make better decisions [1].

Separation of Concerns

Separation of Concerns (SoC) means dividing your system so that each component addresses a distinct problem [2]. This principle prevents components from becoming tangled messes that do everything.

Real-world example: Consider an e-commerce application. Instead of one massive component handling product search, payment processing, order management, and shipping, SoC suggests separate components:

  • Product Catalog Service: Manages product data and search
  • Payment Service: Handles credit card processing and fraud detection
  • Order Service: Tracks order status and history
  • Shipping Service: Calculates shipping costs and tracks deliveries

Each service focuses on one concern, making them easier to understand, test, and modify independently [3].

Code-level example:

# POOR: Everything mixed together
class EcommerceHandler:
    def process_order(self, user_id, product_id, card_number):
        # Search database for product
        product = db.query("SELECT * FROM products WHERE id = ?", product_id)

        # Process payment
        charge_response = stripe.charge(card_number, product.price)

        # Send email
        send_email(user_id, "Your order is confirmed")

        # Update inventory
        db.execute("UPDATE products SET stock = stock - 1 WHERE id = ?", product_id)

# GOOD: Separated concerns
class ProductService:
    def get_product(self, product_id):
        return db.query("SELECT * FROM products WHERE id = ?", product_id)

class PaymentService:
    def charge(self, card_number, amount):
        return stripe.charge(card_number, amount)

class NotificationService:
    def send_order_confirmation(self, user_id):
        send_email(user_id, "Your order is confirmed")

class InventoryService:
    def decrement_stock(self, product_id):
        db.execute("UPDATE products SET stock = stock - 1 WHERE id = ?", product_id)

When payment processing changes (e.g., switching from Stripe to PayPal), you only modify PaymentService. When notification logic changes (e.g., adding SMS), you only touch NotificationService. Each concern is isolated [1].

Modularity

Modularity structures systems as collections of cohesive, loosely coupled modules [2]. Each module encapsulates related functionality and exposes well-defined interfaces.

Benefits:

  • Reusability: Well-designed modules can be used in multiple projects
  • Maintainability: Changes to one module don’t cascade to others
  • Parallel development: Teams can work on different modules simultaneously
  • Testing: Modules can be tested in isolation

Example: A logging module should handle all logging concerns—formatting messages, writing to files, sending to remote servers—and expose a simple interface like log(level, message). Whether the application logs to files, databases, or cloud services, the rest of the codebase uses the same interface [3].

Abstraction

Abstraction hides complex implementation details behind simple interfaces [1]. This reduces cognitive load by letting developers focus on what a component does rather than how it does it.

Layers of abstraction in a web application:

User Interface (High abstraction)
    ↓ Uses
Application Services
    ↓ Uses
Business Logic
    ↓ Uses
Data Access Layer
    ↓ Uses
Database (Low abstraction)

Each layer provides services to the layer above while hiding how those services are implemented. The UI layer doesn’t know whether data comes from PostgreSQL, MongoDB, or a REST API—it just calls getUserProfile(userId) [2].

Loose Coupling

Loose coupling minimizes dependencies between components [3]. Tightly coupled systems are fragile—changing one component requires changing many others. Loosely coupled systems are resilient—components can evolve independently.

Example:

# TIGHT COUPLING: Service directly depends on specific database
class UserService:
    def __init__(self):
        self.db = PostgreSQLDatabase()  # Hard dependency

    def get_user(self, user_id):
        return self.db.query(f"SELECT * FROM users WHERE id = {user_id}")

# LOOSE COUPLING: Service depends on abstraction
class UserService:
    def __init__(self, database: Database):  # Depends on interface
        self.db = database

    def get_user(self, user_id):
        return self.db.get_user(user_id)  # Interface method

# Now we can swap database implementations without changing UserService
user_service = UserService(PostgreSQLDatabase())
# or
user_service = UserService(MongoDBDatabase())

Loose coupling uses dependency injection and interfaces to decouple components from specific implementations [1].


Major Architectural Patterns

Architectural patterns are proven solutions to recurring structural problems [1]. Understanding these patterns helps you choose appropriate structures for your system’s requirements.

Layered Architecture (N-Tier)

Layered architecture organizes the system into horizontal layers, where each layer provides services to the layer above and uses services from the layer below [2].

Classic 3-tier architecture:

┌─────────────────────────────────────┐
│  Presentation Layer                 │  (UI, API endpoints)
│  - Handles user interaction         │
│  - Displays data                    │
│  - Routes requests                  │
└──────────────┬──────────────────────┘
               ↓
┌─────────────────────────────────────┐
│  Business Logic Layer               │  (Services, domain logic)
│  - Processes business rules         │
│  - Validates data                   │
│  - Orchestrates workflows           │
└──────────────┬──────────────────────┘
               ↓
┌─────────────────────────────────────┐
│  Data Access Layer                  │  (Repositories, ORMs)
│  - Reads/writes to database         │
│  - Manages connections              │
│  - Executes queries                 │
└──────────────┬──────────────────────┘
               ↓
┌─────────────────────────────────────┐
│  Database                           │
└─────────────────────────────────────┘

Strengths:

  • Clear separation of concerns
  • Easy to understand and maintain
  • Testable (can test business logic without database)

Weaknesses:

  • Can become tightly coupled if layers leak abstractions
  • All requests flow through all layers even when unnecessary [3]

Model-View-Controller (MVC)

MVC separates application into three interconnected components [1]:

Model: Represents data and business logic. Manages state, validates data, executes business rules. Independent of UI.

View: Presents data to users. Generates UI based on model state. Multiple views can display same model differently.

Controller: Handles user input. Updates model based on user actions. Selects appropriate view for response.

Flow:

User → Controller → Model
         ↓           ↓
         ↓           ↓
        View ←───────┘

Example in Flask (Python web framework):

# Model - Business logic and data
class User:
    def __init__(self, username, email):
        self.username = username
        self.email = email

    def save(self):
        db.insert('users', username=self.username, email=self.email)

    @staticmethod
    def find_by_username(username):
        data = db.query("SELECT * FROM users WHERE username = ?", username)
        return User(data['username'], data['email']) if data else None

# Controller - Handles requests
@app.route('/register', methods=['POST'])
def register():
    username = request.form['username']
    email = request.form['email']

    # Create and save user
    user = User(username, email)
    user.save()

    # Render view
    return render_template('success.html', user=user)

# View - Template (success.html)
# <h1>Welcome, {{ user.username }}!</h1>
# <p>Confirmation sent to {{ user.email }}</p>

Strengths:

  • Clear separation of UI from business logic
  • Multiple views can use same model
  • Easier to test business logic [2]

Weaknesses:

  • Can become complex with intricate UI interactions
  • Controller can grow bloated (“fat controller” problem) [3]

Microservices Architecture

Microservices decompose applications into small, independently deployable services [1]. Each service:

  • Focuses on a single business capability
  • Runs in its own process
  • Communicates via lightweight protocols (HTTP, message queues)
  • Can use different technologies (polyglot architecture)
  • Is developed and deployed independently

Example e-commerce system:

┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│   User       │    │   Product    │    │   Order      │
│   Service    │    │   Service    │    │   Service    │
│              │    │              │    │              │
│  - Register  │    │  - Search    │    │  - Create    │
│  - Login     │    │  - Details   │    │  - Track     │
│  - Profile   │    │  - Reviews   │    │  - History   │
└──────┬───────┘    └──────┬───────┘    └──────┬───────┘
       │                   │                   │
       │     REST API      │       REST API    │
       └───────────────────┴───────────────────┘
                           │
                  ┌────────┴─────────┐
                  │   API Gateway    │
                  │  (Single Entry)  │
                  └──────────────────┘
                           │
                        Client

Strengths:

  • Independent scaling (scale only services under load)
  • Technology flexibility (use best tool for each service)
  • Fault isolation (one service failure doesn’t crash system)
  • Team autonomy (teams own entire services) [2]

Weaknesses:

  • Distributed system complexity (network failures, latency)
  • Data consistency challenges
  • Requires sophisticated operations (monitoring, deployment)
  • Testing across services is harder [3]

Event-Driven Architecture

Event-driven systems use events to trigger and communicate between services [1]. When something happens (user registers, order ships), the system publishes an event. Interested services subscribe to events and react accordingly.

Example:

Order Created Event
       ↓
   Event Bus (e.g., Kafka, RabbitMQ)
       ↓
       ├→ Inventory Service (reduces stock)
       ├→ Payment Service (charges card)
       ├→ Shipping Service (prepares shipment)
       └→ Notification Service (sends confirmation email)

Code example:

# Publisher
def create_order(order_data):
    order = Order.create(order_data)
    event_bus.publish('order.created', {
        'order_id': order.id,
        'user_id': order.user_id,
        'items': order.items
    })
    return order

# Subscribers
@event_bus.subscribe('order.created')
def reduce_inventory(event):
    for item in event['items']:
        inventory.decrement(item['product_id'], item['quantity'])

@event_bus.subscribe('order.created')
def send_confirmation(event):
    user = User.find(event['user_id'])
    email.send(user.email, 'Order Confirmed', template='order_confirmation')

Strengths:

  • Loose coupling (services don’t directly call each other)
  • Scalability (easy to add new subscribers)
  • Resilience (failed service doesn’t block others) [2]

Weaknesses:

  • Harder to trace execution flow
  • Eventual consistency (data may be temporarily inconsistent)
  • Debugging distributed events is complex [3]

SOLID Principles at the Architectural Level

SOLID principles, originally defined for object-oriented programming, apply equally to system architecture [1].

Single Responsibility Principle (SRP)

Each architectural component should have one reason to change [2].

Poor: A monolithic “Backend Service” that handles authentication, product catalog, payments, and shipping. Changing payment processing requires redeploying the entire backend.

Better: Separate services for Auth, Products, Payments, Shipping. Each service changes independently.

Open/Closed Principle (OCP)

Architectures should be open for extension but closed for modification [3].

Example: Use plugin architectures. A notification system initially supports email. To add SMS:

  • Poor: Modify core notification service code, risking bugs in existing email functionality
  • Better: Define a NotificationProvider interface. Email and SMS are plugins implementing this interface. Core service remains unchanged.
# Core notification service (closed for modification)
class NotificationService:
    def __init__(self):
        self.providers = {}  # Registry of providers

    def register_provider(self, name, provider):
        self.providers[name] = provider

    def send(self, provider_name, recipient, message):
        self.providers[provider_name].send(recipient, message)

# Extension (open for extension)
class EmailProvider:
    def send(self, recipient, message):
        # Send email logic
        pass

class SMSProvider:
    def send(self, recipient, message):
        # Send SMS logic
        pass

# Usage
notification_service = NotificationService()
notification_service.register_provider('email', EmailProvider())
notification_service.register_provider('sms', SMSProvider())

Liskov Substitution Principle (LSP)

Components implementing an interface should be interchangeable without affecting system behavior [1].

Example: All payment providers (Stripe, PayPal, Square) implement the same PaymentProcessor interface. The order service can use any provider without changing its code.

Interface Segregation Principle (ISP)

Don’t force components to depend on interfaces they don’t use [2].

Poor: A single IDatabase interface with methods for SQL, NoSQL, caching, and graph queries. Services using only SQL still depend on all methods.

Better: Separate interfaces: ISQLDatabase, INoSQLDatabase, ICache, IGraphDatabase. Services depend only on what they need.

Dependency Inversion Principle (DIP)

High-level modules should depend on abstractions, not concrete implementations [3].

Example:

# Poor: OrderService depends directly on PostgreSQLDatabase
class OrderService:
    def __init__(self):
        self.db = PostgreSQLDatabase()  # Concrete dependency

# Good: OrderService depends on abstraction
class OrderService:
    def __init__(self, database: IDatabase):
        self.db = database  # Abstract dependency

# Now you can inject any database implementation
order_service = OrderService(PostgreSQLDatabase())
# or
order_service = OrderService(MongoDBDatabase())

Quality Attributes and Trade-offs

Architecture is fundamentally about trade-offs [1]. You can’t maximize all quality attributes simultaneously—improving one often degrades another. Understanding these trade-offs helps you make informed decisions.

Scalability

Scalability is the system’s ability to handle increased load by adding resources [2].

Vertical Scaling (Scale Up): Add more CPU, RAM, disk to existing servers.

  • Pros: Simple, no application changes
  • Cons: Limited by hardware capacity, single point of failure

Horizontal Scaling (Scale Out): Add more servers.

  • Pros: Virtually unlimited capacity, high availability
  • Cons: Requires stateless application design, increased complexity

Architectural decisions for scalability:

  • Use stateless services (session state in external store like Redis)
  • Implement load balancing across multiple instances
  • Use database replication (read replicas for queries, write master for updates)
  • Cache frequently accessed data (CDN for static content, in-memory cache for hot data)

Performance

Performance measures how quickly the system responds to requests [3].

Techniques:

  • Caching: Store computed results to avoid repeated calculations
  • Asynchronous processing: Queue long-running tasks instead of blocking users
  • Database optimization: Indexes, query optimization, connection pooling
  • Content Delivery Networks: Serve static assets from geographically distributed servers

Trade-off: Aggressive caching improves performance but risks serving stale data. You must balance freshness vs. speed [1].

Availability

Availability is the percentage of time the system is operational [2].

Five 9s (99.999%): Downtime = 5.26 minutes per year Three 9s (99.9%): Downtime = 8.76 hours per year

Architectural patterns for availability:

  • Redundancy: Multiple instances of critical components
  • Failover: Automatic switching to backup when primary fails
  • Health checks: Continuous monitoring with automatic recovery
  • Graceful degradation: Continue operating with reduced functionality during failures

Trade-off: Higher availability requires more infrastructure, increasing cost and complexity [3].

Maintainability

Maintainability measures how easily developers can understand, modify, and extend the system [1].

Architectural decisions for maintainability:

  • Clear separation of concerns (microservices, layered architecture)
  • Well-defined interfaces between components
  • Comprehensive documentation
  • Automated testing
  • Consistent coding standards

Trade-off: Microservices increase maintainability of individual services but increase operational complexity [2].

Security

Security protects data and prevents unauthorized access [3].

Architectural decisions:

  • Defense in depth: Multiple security layers (firewall, authentication, authorization, encryption)
  • Principle of least privilege: Grant minimum necessary permissions
  • Secure by default: Systems should be secure without configuration
  • Audit logging: Track all security-relevant events

Trade-off: Stronger security (e.g., multi-factor authentication) adds friction to user experience [1].


Practical Example: Designing a Social Media Platform Architecture

Let’s design a complete architecture for a social media platform supporting 100 million users. We’ll apply all the principles and patterns we’ve learned.

Requirements

Functional:

  • Users can post text, images, videos
  • Users can follow other users
  • Feed shows posts from followed users
  • Users can like and comment on posts

Non-functional:

  • 100 million users, 10 million daily active
  • Average response time < 200ms
  • 99.9% availability
  • Handle traffic spikes (viral posts)

Architectural Decisions

1. Microservices Architecture

Decision: Use microservices to enable independent scaling and team ownership.

Services:

  • User Service: Authentication, profiles, settings
  • Post Service: Create, edit, delete posts
  • Feed Service: Generate personalized feeds
  • Media Service: Upload, store, serve images/videos
  • Engagement Service: Likes, comments, shares
  • Follower Service: Follow relationships, follower lists

Rationale: Each service handles a distinct concern and can scale independently. Feed generation is computationally expensive and benefits from dedicated resources. Media storage has different infrastructure needs than text data [1].

2. Data Storage Strategy

Decision: Polyglot persistence—use the best database for each service.

User Service: PostgreSQL (relational data, ACID transactions for user accounts)

Post Service: PostgreSQL (structured post data, queries by user and time)

Feed Service: Redis (in-memory cache for generated feeds, fast read access)

Media Service: Amazon S3 (object storage for images/videos, CDN integration)

Engagement Service: Cassandra (high write volume for likes/comments, eventual consistency acceptable)

Follower Service: Graph Database (Neo4j for relationship queries like “friends of friends”)

Rationale: Different data has different access patterns and consistency requirements. Choosing specialized databases optimizes each service [2].

3. Caching Strategy

Decision: Multi-layer caching to minimize database load.

User Request
    ↓
CDN Cache (static media)
    ↓ (miss)
API Gateway
    ↓
Redis Cache (feeds, user profiles, popular posts)
    ↓ (miss)
Database

Rationale: Most requests hit cache, reducing database load by 80-90%. CDN serves media from edge locations, reducing latency [3].

4. Feed Generation Architecture

Challenge: Generating feeds for millions of users is computationally expensive.

Decision: Hybrid push-pull approach.

Pull (Fan-out on Read):

  • When user opens feed, query recent posts from followed users
  • Works for users with many followers (celebrities)
  • Slower but doesn’t require pre-computation

Push (Fan-out on Write):

  • When user creates post, push to feeds of all followers
  • Pre-computed feeds ready instantly
  • Works for average users with hundreds of followers
def create_post(user_id, content):
    # Save post
    post = PostService.create(user_id, content)

    # Get follower count
    follower_count = FollowerService.get_follower_count(user_id)

    if follower_count < 10000:
        # Push to follower feeds (fan-out on write)
        followers = FollowerService.get_followers(user_id)
        for follower_id in followers:
            FeedService.add_post_to_feed(follower_id, post.id)
    else:
        # Mark for pull (fan-out on read)
        FeedService.mark_as_pull_user(user_id)

Rationale: Balances precomputation cost with read performance. Most users see instant feeds, celebrity posts are computed on-demand [1].

5. Scalability and Resilience

Decisions:

  • Load Balancers: Distribute traffic across multiple instances of each service
  • Auto-scaling: Add/remove instances based on CPU and memory metrics
  • Circuit Breakers: Prevent cascading failures (if Media Service is down, show feeds without images rather than failing completely)
  • Rate Limiting: Prevent abuse and ensure fair resource allocation

Rationale: Horizontal scaling allows virtually unlimited growth. Circuit breakers maintain availability even during partial failures [2].

6. Security Architecture

Decisions:

  • API Gateway: Single entry point, enforces authentication, rate limiting, SSL
  • JWT Tokens: Stateless authentication, no server-side session storage
  • Service Mesh (Istio): Encrypted service-to-service communication
  • Data Encryption: At rest (database encryption) and in transit (TLS)

Rationale: Defense in depth—multiple security layers protect against various attack vectors [3].

Architecture Diagram

                   ┌─────────────────┐
                   │   Mobile App    │
                   │   Web Browser   │
                   └────────┬────────┘
                            │ HTTPS
                            ↓
                   ┌─────────────────┐
                   │   CDN (Media)   │
                   └────────┬────────┘
                            │
                   ┌─────────────────┐
                   │  API Gateway    │
                   │ - Auth          │
                   │ - Rate Limiting │
                   └────────┬────────┘
                            │
         ┌──────────────────┼──────────────────┐
         │                  │                  │
    ┌────▼────┐       ┌─────▼──────┐     ┌────▼────┐
    │  User   │       │   Post     │     │  Feed   │
    │ Service │       │  Service   │     │ Service │
    └────┬────┘       └─────┬──────┘     └────┬────┘
         │                  │                  │
    ┌────▼────┐       ┌─────▼──────┐     ┌────▼────┐
    │   SQL   │       │    SQL     │     │  Redis  │
    └─────────┘       └────────────┘     └─────────┘

    ┌─────────────┐   ┌──────────────┐   ┌──────────────┐
    │ Engagement  │   │   Follower   │   │    Media     │
    │   Service   │   │   Service    │   │   Service    │
    └──────┬──────┘   └──────┬───────┘   └──────┬───────┘
           │                 │                   │
    ┌──────▼──────┐   ┌──────▼───────┐   ┌──────▼───────┐
    │  Cassandra  │   │  Neo4j Graph │   │  Amazon S3   │
    └─────────────┘   └──────────────┘   └──────────────┘

Common Pitfalls and Best Practices

Pitfall 1: Premature Optimization

Mistake: Designing for Netflix-scale when you have 100 users.

Example: Implementing microservices, Kubernetes, service mesh, and event sourcing for a startup’s MVP.

Why it’s bad: Adds massive complexity before you understand actual needs. Most startups die from lack of users, not scalability problems.

Best practice: Start with a monolith. Optimize architecture when you have real data on bottlenecks. Instagram and Twitter both started as monoliths [1].

Pitfall 2: The Distributed Monolith

Mistake: Creating microservices that are tightly coupled and must deploy together.

Example: User Service can’t function without Order Service. Every deployment requires deploying all services.

Why it’s bad: You get all the complexity of microservices with none of the benefits (independent deployment, scaling, fault isolation).

Best practice: Ensure services have clear boundaries and can operate independently. Use eventual consistency and compensating transactions instead of distributed transactions [2].

Pitfall 3: Ignoring Conway’s Law

Conway’s Law: “Organizations design systems that mirror their own communication structure” [3].

Mistake: Choosing microservices when you have a small team that works closely together.

Why it’s bad: Microservices require clear team ownership and independent deployment. A 3-person team can’t effectively manage 20 microservices.

Best practice: Align architecture with team structure. Small teams should use monoliths or a few well-defined services. Microservices work best with multiple autonomous teams [1].

Pitfall 4: Not Documenting Architectural Decisions

Mistake: Making important architectural choices without recording the reasoning.

Example: Choosing DynamoDB over PostgreSQL without documenting why (cost, scalability needs, consistency requirements).

Why it’s bad: Future developers don’t understand why decisions were made and may inadvertently break assumptions.

Best practice: Use Architecture Decision Records (ADRs):

# ADR-001: Use DynamoDB for Session Storage

## Context
We need to store user session data for a globally distributed application
serving 10M daily active users with sub-50ms read latency requirements.

## Decision
Use Amazon DynamoDB for session storage instead of PostgreSQL.

## Rationale
- Global Tables provide multi-region replication (< 1 second propagation)
- Auto-scaling handles traffic spikes without manual intervention
- Single-digit millisecond latency at any scale
- Serverless model reduces operational overhead
- Pay-per-request pricing aligns with our variable traffic patterns

## Consequences
- Must design around eventual consistency (sessions may be slightly stale)
- DynamoDB query patterns require careful key design (no ad-hoc queries)
- Vendor lock-in to AWS ecosystem
- Team needs to learn DynamoDB data modeling patterns

Best Practices for System Design

1. Start Simple, Evolve Gradually

Begin with the simplest architecture that meets current requirements. Add complexity only when you have evidence it’s needed [1].

Evolution path:

  1. Week 1-3: Monolith, single database, manual deployment
  2. Month 3-6: Add caching layer, implement CI/CD
  3. Year 1: Extract high-traffic components into services
  4. Year 2+: Microservices, multi-region deployment, advanced observability

2. Design for Failure

Assume everything will fail—networks, databases, servers, third-party APIs [2].

Strategies:

  • Implement timeouts on all network calls
  • Use circuit breakers to prevent cascading failures
  • Design retry logic with exponential backoff
  • Implement graceful degradation (serve cached data when database is down)
from circuitbreaker import circuit

@circuit(failure_threshold=5, recovery_timeout=60)
def call_external_api():
    response = requests.get('https://api.example.com/data', timeout=5)
    return response.json()

try:
    data = call_external_api()
except CircuitBreakerError:
    # Serve cached data or return graceful error
    data = cache.get('api_data_fallback')

3. Build Observability Into Architecture

You can’t improve what you can’t measure [3]. Design logging, metrics, and tracing from day one.

Three pillars of observability:

  • Logging: Record events (errors, warnings, important state changes)
  • Metrics: Quantitative measurements (request rate, error rate, latency)
  • Tracing: Track requests across services (see complete execution path)

4. Use API Versioning

APIs evolve. Design for backward compatibility from the start [1].

Versioning strategies:

  • URL versioning: /api/v1/users, /api/v2/users
  • Header versioning: Accept: application/vnd.myapi.v2+json
  • Content negotiation: Different versions based on client capabilities

5. Document Your Architecture

Use multiple levels of documentation [2]:

C4 Model:

  1. Context Diagram: System in context (users, external systems)
  2. Container Diagram: High-level technology choices (web app, API, database)
  3. Component Diagram: Components within containers
  4. Code Diagram: Class-level details (only for complex components)

Key Takeaways

  1. Architecture defines system structure; design defines component internals. Architecture makes high-level decisions about components, communication, and data flow. Design implements details within those constraints.

  2. Four foundational principles guide good architecture: Separation of Concerns isolates distinct problems, Modularity creates cohesive units, Abstraction hides complexity, and Loose Coupling minimizes dependencies.

  3. Architectural patterns solve recurring problems: Layered Architecture separates concerns horizontally, MVC separates UI from logic, Microservices enable independent deployment, and Event-Driven systems decouple components through messages.

  4. SOLID principles apply at architectural scale: Single Responsibility keeps services focused, Open/Closed enables extension without modification, Liskov ensures interchangeability, Interface Segregation prevents bloated contracts, and Dependency Inversion decouples through abstractions.

  5. All architecture is about trade-offs: You can’t maximize scalability, performance, availability, maintainability, and security simultaneously. Understand business priorities and make informed compromises.

  6. Start simple and evolve gradually: Premature optimization creates complexity without benefit. Begin with monoliths, add caching and CDNs when needed, extract microservices when team structure supports it.

  7. Design for failure from day one: Use timeouts, circuit breakers, retry logic, and graceful degradation. Assume networks partition, servers crash, and databases become unavailable.

  8. Document architectural decisions: Use Architecture Decision Records to capture context, decisions, rationale, and consequences. Future developers will thank you.


Practice Quiz

Question 1

What is the primary difference between software architecture and software design?

A) Architecture uses diagrams; design uses code B) Architecture defines high-level system structure; design defines component internals C) Architecture is for managers; design is for developers D) Architecture focuses on databases; design focuses on algorithms

Click to see answer

Answer: B

Architecture operates at the system level, defining how major components interact, communicate, and are deployed. Design operates within components, defining classes, patterns, and algorithms. Architecture provides constraints that design must follow. For example, architecture decides “we’ll use microservices,” while design decides “the auth service will use the Strategy pattern for different authentication methods.”


Question 2

You’re building an e-commerce platform. Which architectural principle is violated if your Order Service contains code for payment processing, inventory management, sending emails, and calculating shipping costs?

A) Abstraction B) Separation of Concerns C) Dependency Inversion D) Modularity

Click to see answer

Answer: B

Separation of Concerns is violated when a component addresses multiple distinct problems. Each concern (payments, inventory, notifications, shipping) should be isolated so changes to one don’t affect others. A change to email notification logic shouldn’t risk breaking payment processing. The correct approach is separate services or modules for each concern.


Question 3

Your social media app is experiencing slow performance during peak hours. Database queries for user feeds take 5-10 seconds. What architectural pattern would most effectively address this?

A) Switch from microservices to a monolith B) Implement caching layer (Redis) for frequently accessed feeds C) Add more database indexes D) Increase server CPU and RAM

Click to see answer

Answer: B

Caching addresses the root cause—repeatedly computing expensive feed queries. Storing generated feeds in Redis (in-memory cache) provides sub-millisecond access times. Option C (indexes) helps but doesn’t eliminate the computation cost. Option D (vertical scaling) is limited and expensive. Option A (monolith) doesn’t address the performance problem.

Caching is the architectural pattern for handling read-heavy workloads where data doesn’t need to be perfectly fresh.


Question 4

When should you choose microservices architecture over a monolith?

A) Always—microservices are objectively better B) When you have a large team (10+ engineers) and clear service boundaries C) When you have 100 users and need to move fast D) When you want to learn new technologies

Click to see answer

Answer: B

Microservices make sense when you have multiple teams that can own independent services, clear bounded contexts that separate cleanly, and the operational maturity to manage distributed systems. Small teams should use monoliths to move faster and avoid distributed system complexity.

Conway’s Law states systems mirror team structure. If you have one small team, a monolith matches your organizational structure. If you have 50 engineers in 6 teams, microservices enable autonomous development.


References

[1] Martin, Robert C. “Clean Architecture: A Craftsman’s Guide to Software Structure and Design.” Prentice Hall, 2017. Quote: “The goal of software architecture is to minimize the human resources required to build and maintain the required system.”

[2] Richards, Mark, and Neal Ford. “Fundamentals of Software Architecture.” O’Reilly Media, 2020. Quote: “Architecture is about the important stuff… whatever that is. Architecture represents the significant design decisions that shape a system, where significant is measured by cost of change.”

[3] Newman, Sam. “Building Microservices: Designing Fine-Grained Systems.” 2nd Edition, O’Reilly Media, 2021. Quote: “Microservices are not a silver bullet. They introduce their own complexities and challenges. The key is understanding when the benefits outweigh the costs.”

[4] Evans, Eric. “Domain-Driven Design: Tackling Complexity in the Heart of Software.” Addison-Wesley, 2003. Quote: “The heart of software is its ability to solve domain-related problems for its user. All other features, vital though they may be, support this basic purpose.”

[5] Bass, Len, Paul Clements, and Rick Kazman. “Software Architecture in Practice.” 4th Edition, Addison-Wesley, 2021. Quote: “Architecture is a set of principal design decisions about a system. It is the blueprint for a system’s construction and evolution.”

[6] Fowler, Martin. “Patterns of Enterprise Application Architecture.” Addison-Wesley, 2002. Quote: “A good architecture is important, otherwise it becomes slower and more expensive to add new capabilities in the future.”