Skip to content

Distributed Systems - what should be considered

Design in Distributed World

Composition

Typical Architecture

  • Load balancer with multiple backend replicas
  • Server with multiple backends
  • Server tree

Distributed State

The CAP Principle

  • Consistency
  • Availability
  • Partition Tolerance

Design for Operations

Operational Requirements

  • Configuration
  • Startup and shutdown
  • Queue draining
  • Software upgrades
  • Backups and restores
  • Redundancy
  • Replicated databases
  • Hot swaps
  • Toggles for individual features
  • Graceful degradation
  • Access controls and rate limits
  • Data import controls
  • Monitoring
  • Auditing
  • Debug instrumentation
  • Exception collection
  • Documentation for Operations

Platform Selection

Platform Description

A platform may be described along three axes

  • Level of service abstraction: IaaS, PaaS, SaaS
  • Type of machine: Physical, virtual, or process container
  • Level of resource sharing: Shared or private

Selection Strategies

Common Strategies

  • Default to Virtual
  • Make a Cost-Based Decision
  • Leverage Provider Expertise
  • Get Started Quickly
  • Implement Ephemeral Computing
  • Use the Cloud for Overflow Capacity
  • Leverage Superior Infrastructure
  • Develop an In-House Service Provider
  • Contract for an On-Premises, Externally Run Service
  • Implement a Bare Metal Cloud

Application Architectures

General architecture categories

  • Single-Machine Web Server
  • Two-Tier Web Service
  • Three-Tier Web Service
  • Four-Tier Web Service

Load Balancer Types

  • DNS Round Robin
  • Layer 3 and 4 Load Balancers
  • Layer 7 Load Balancer

Load Balancing Methods

  • Round Robin (RR)
  • Weighted RR
  • Least Loaded (LL)
  • Least Loaded with Slow Start
  • Utilization Limit
  • Latency
  • Cascade

Also need to consider

  • Reverse Proxy Service
  • Cloud-Scale Service
  • Message Bus Architectures
  • Service-Oriented Architecture

Design for Scaling

General Strategy

  1. Identify Bottlenecks
  2. Reengineer Components
  3. Measure Results
  4. Be Proactive

The AKF Scaling Cube

developed by Abbott, Keeven, and Fisher

  • x: Horizontal Duplication (also known as horizontal scaling or scaling out.)
  • y: Functional or Service Splits
  • z: Lookup-Oriented Split

Others need to consider

  • Caching
  • Data Sharding
  • Threading
  • Queueing
  • CDN

Caching

  • Cache Effectiveness
  • Cache Placement
  • Cache Persistence
  • Cache Replacement Algorithms
  • Cache Entry Invalidation
  • Cache Size

Design for Resiliency

key features for Resiliency

  • Everything Malfunctions Eventually
  • Resiliency through Spare Capacity
  • Failure Domains
  • Software Failures
  • Physical Failures
  • Overload Failures
  • Human Error

Operations in Distributed World

Distributed Systems Operations

  • Defining SRE
  • Change versus Stability
  • Operations at Scale

Service Life Cycle

  • Service Launch
  • Emergency Tasks
  • Nonemergency Tasks
  • Upgrades
  • Decommissioning
  • Project Work

Organizing Strategy

Daily work categories

  • Emergency Issues
  • Normal Requests
  • Project Work

Team Member Day Types

  1. Project-Focused Days
  2. Oncall Days
  3. Ticket Duty Days

DevOps

Three Ways of DevOps

  1. Workflow
    • Ensure each step is done in a repeatable way
    • Never pass defects to the next step
    • Ensure no local optimizations degrade global performance
    • Increase the flow of work
  2. Improve Feedback
    • Understand and respond to all customers, internal and external
    • Shorten feedback loops
    • Amplify all feedback
    • Embed knowledge where it is needed
  3. ContinualExperimentationand Learning
    • Rituals are created that reward risk taking
    • Management allocates time for projects that improve the system
    • Faults are introduced into the system to increase resilience
    • You try “crazy” or audacious things

Common Technical DevOps Practices

  • Same Development and Operations Toolchain
  • Consistent Software Development Life Cycle (SDLC)
  • Managed Configuration and Automation
  • Infrastructure as Code
  • Automated Provisioning and Deployment
  • Artifact-Scripted Database Changes
  • Automated Build and Release
  • Release Vehicle Packaging
  • Abstracted Administration

Service Delivery

Build-Phase Steps: Develop -> Commit -> Build -> Package -> Register

delivery platform should consider

  • Confidence
  • Reduced Risk
  • Shorter Interval from Keyboard to Production
  • Less Wait Time
  • Less Rework
  • Improved Execution
  • A Culture of Continuous Improvement
  • Improved Job Satisfaction

Deployment-Phase Steps: promoted, installed, and configured

Upgrading

There are many kinds of upgrading.

  • Taking the Service Down for Upgrading
  • Rolling Upgrades
  • Canary
  • Phased Roll-outs
  • Proportional Shedding
  • Blue-Green Deployment
  • Toggling Features

Taking Toggling Features as an example.

reasons to use flag flips

  • Rapid Development
  • Gradual Introduction of New Features
  • Finely Timed Release Dates
  • Dynamic Roll Backs
  • Bug Isolation
  • A-BTesting
  • One Percent Testing
  • Differentiated Services

Continuous Deployment

factors should be taken into consideration when deciding whether to pause continuous delivery

  • Build Health
  • Test Comprehensiveness
  • Test Reproducibility
  • Production Health
  • Schedule Permission
  • Oncall Schedule
  • Manual Stop
  • Push Conflicts
  • Intentional Delays
  • Resource Contention

Terms to Know

  • Server
  • Service
  • Machine
  • QPS
  • Traffic
  • Performant: A neologism from merging “performance” and “conformant”.
  • IaaS
  • Saas
  • Paas
  • Oversubscribed
  • Undersubscribed
  • Static Content
  • Dynamic Content
  • Database-Driven Dynamic Content
  • Control Panel
  • Main Database
  • Trend Server
  • Link Redirect Servers
  • Content Delivery Networks
  • Outage
  • Failure
  • Malfunction
  • MTBF
  • Innovate
  • Oncall
  • Soft launch
  • SRE
  • Stakeholders
  • Artifacts
  • Service Delivery Flow
  • Cycle Time
  • Deployment
  • Release Candidate
  • Release
  • Domain-Specific Language
  • Toil

Reference

  • The Practice of Cloud System Administration - DevOps and SRE Practices for Web Services Volume 2 (Thomas A. Limoncelli Strata R. Chalup Christina J. Hogan)
  • https://www.atlassian.com/microservices/microservices-architecture/distributed-architecture
  • https://en.wikipedia.org/wiki/Distributed_computing
  • https://aws.amazon.com/builders-library/challenges-with-distributed-systems/
  • Understanding Distributed Systems (Roberto Vitillo)
  • Foundations of Scalable Systems (Ian Gorton)
  • Distributed Systems: Concepts and Design, 5th ed. (Pearson, 2001)
  • Designing Distributed Systems: Patterns and Paradigms for Scalable, Reliable Services (Brendan Burn)
  • The Art of Scalability: Scalable Web Architecture, Processes, and Organizations for the Modern Enterprise by Abbott and Fisher (2009)
  • Scalability Rules: 50 Principles for Scaling Web Sites, also by Abbott and Fisher (2011)

Disclaimer
  1. License under CC BY-NC 4.0
  2. Copyright issue feedback me#imzye.me, replace # with @
  3. Not all the commands and scripts are tested in production environment, use at your own risk
  4. No privacy information is collected here
Try iOS App