SRE at a glance
Introduction
Site Reliability Engineering (SRE) is a relatively new field that has been gaining popularity in recent years. SRE teams are responsible for ensuring the reliability, performance, and efficiency of complex systems. To become an SRE, you need to have a solid understanding of computer science fundamentals, as well as a deep knowledge of distributed systems, networking, and cloud infrastructure. In this blog post, we will discuss the learning materials available at the School of SRE that can help you gain the knowledge and skills needed to become an SRE.
Online Courses
The School of SRE offers several online courses that cover a wide range of topics related to SRE. These courses are designed to be self-paced, and they include a mix of lectures, hands-on labs, and quizzes to help you reinforce your learning. Some of the courses available include “Introduction to SRE”, “Distributed Systems”, “Cloud Infrastructure”, “Networking”, and “Monitoring and Alerting.” These courses are perfect for those who prefer a structured learning experience and want to earn certificates upon completion.
https://linkedin.github.io/school-of-sre/
https://github.com/bregman-arie/sre-checklist
https://github.com/upgundecha/howtheysre
https://github.com/bregman-arie/devops-exercises
https://sre.google/books/
https://docs.microsoft.com/en-us/azure/site-reliability-engineering/resources/books
https://www.oreilly.com/library/view/seeking-sre/9781491978856/
https://opensource.com/article/18/10/sre-startup
https://stackpulse.com/blog/site-reliability-engineering-sre-what-why-and-5-best-practices/
https://www.usenix.org/blog/what-is-sre-how-does-it-relate-to-devops-lisa18
https://www.bmc.com/blogs/sre-vs-devops/
https://cloud.google.com/blog/products/management-tools/sre-error-budgets-and-maintenance-windows
https://www.atlassian.com/incident-management/kpis/error-budget
https://devopsinstitute.com/choosing-the-right-service-level-indicators/
https://www.observability.splunk.com/en_us/infrastructure-monitoring/guide-to-sre-and-the-four-golden-signals-of-monitoring.html
https://www.enov8.com/blog/site-reliability-engineering-sre-top-10-best-practice/
https://www.blameless.com/blog/5-best-practices-nailing-postmortems
https://learnxinyminutes.com/
Book & Course
- (Book) Site Reliability Engineering -
https://landing.google.com/sre/book/index.html
- (Book) Site Reliability Workbook -
https://landing.google.com/sre/workbook/toc/
- (Book) Building Secure and Reliable Systems -
https://landing.google.com/sre/resources/foundationsandprinciples/srs-book/
- (Course) Intro to DevOps -
https://www.udacity.com/course/intro-to-devops--ud611
- (Course) Google Cloud Platform for Systems Operations -
https://www.coursera.org/specializations/gcp-sysops
- (Course) Measuring and Managing Reliability -
https://www.coursera.org/learn/site-reliability-engineering-slos
Operating Systems
- (Course) Introduction to Operating Systems -
https://www.udacity.com/course/introduction-to-operating-systems--ud923
- (Course) Advanced Operating Systems -
https://www.udacity.com/course/advanced-operating-systems--ud189
Automation
- (Tutorial) Ansible -
https://www.digitalocean.com/community/tutorials/configuration-management-101-writing-ansible-playbooks
- (Course) Terraform -
https://www.udemy.com/course/learn-devops-infrastructure-automation-with-terraform/
Distributed Systems
- (Tutorial) Introduction to Distributed Systems Design -
http://www.hpcs.cs.tsukuba.ac.jp/~tatebe/lecture/h23/dsys/dsd-tutorial.html
Networking
- (Book) Understanding Linux Network Internals -
http://shop.oreilly.com/product/9780596002558.do
Programming Languages
Python
- (Book) Learn Python 3 The Hard Way -
https://learnpythonthehardway.org/python3/
- (Course) Developing Scalable Apps in Python -
https://www.udacity.com/course/developing-scalable-apps-in-python--ud858
Go
- (Book) The Go Programming Language -
https://www.amazon.com/Programming-Language-Addison-Wesley-Professional-Computing/dp/0134190440
- (Webinar) Go Language for Ops and Site Reliability Engineering -
https://www.youtube.com/watch?v=Q_H4hrUez80
- (Hands On)
https://gopherlabs.kubedaily.com/
Production Web App
- (Tutorial) Building for Production: Web Applications -
https://www.digitalocean.com/community/tutorial_series/building-for-production-web-applications
- (Book) Production Ready Microservices -
https://www.amazon.com/gp/product/1491965975/
Monitoring and Logging
- (Course) Monitoring and Alerting with Prometheus -
https://www.udemy.com/course/monitoring-and-alerting-with-prometheus/
- (Book) Prometheus UP and Running -
https://www.amazon.com/Prometheus-Infrastructure-Application-Performance-Monitoring/dp/1492034142
Continuous Integration | Continuous Delivery
- (Course) Continuous Deliver Better Software -
https://www.udemy.com/course/learn-devops-continuously-deliver-better-software/
Containers
- (Course) Docker for Devops -
https://www.udemy.com/course/docker-tutorial-for-devops-run-docker-containers/
Web Servers
Nginx
- (Course) Nginx Fundamentals -
https://www.udemy.com/course/nginx-fundamentals/
Cluster Management
Kubernetes
- (Tutorial) Kubernetes Bootcamp -
https://kubernetes.io/docs/tutorials/kubernetes-basics/
- (Course) Scalable Microservices with Kubernetes -
https://www.udacity.com/course/scalable-microservices-with-kubernetes--ud615
- (Tutorial) Kubernetes Tutorial for Beginners -
https://spacelift.io/blog/kubernetes-tutorial
Cloud
Amazon AWS
- (Tutorial) Amazon AWS -
https://aws.amazon.com/getting-started/tutorials/
Post-Mortem
- Post-Mortem Template -
https://sre.google/sre-book/example-postmortem/
Websites
https://highscalability.com
https://sreweekly.com
https://sre.news
DevOps | SRE Roadmap
- DevOps Roadmap -
https://roadmap.sh/devops
SRE Interview
https://github.com/michaelkkehoe/sre-interview
Leave a message
Disclaimer
- Welcome to visit the knowledge base of SRE and DevOps!
- License under CC BY-NC 4.0
- Made with Material for MkDocs and improve writing by generative AI tools
- Copyright issue feedback me#imzye.com, replace # with @