BPB Online LLP
Hands-on Site Reliability Engineering
Hands-on Site Reliability Engineering
US$ 19.95
The publisher has enabled DRM protection, which means that you need to use the BookFusion iOS, Android or Web app to read this eBook. This eBook cannot be used outside of the BookFusion platform.
Description
Contents
Reviews

A comprehensive guide with basic to advanced SRE practices and hands-on examples.

Key Features
● Demonstrates how to execute site reliability engineering along with fundamental concepts.
● Illustrates real-world examples and successful techniques to put SRE into production.
● Introduces you to DevOps, advanced techniques of SRE, and popular tools in use.

Description
Hands-on Site Reliability Engineering (SRE) brings you a tailor-made guide to learn and practice the essential activities for the smooth functioning of enterprise systems, right from designing to the deployment of enterprise software programs and extending to scalable use with complete efficiency and reliability.

The book explores the fundamentals around SRE and related terms, concepts, and techniques that are used by SRE teams and experts. It discusses the essential elements of an IT system, including microservices, application architectures, types of software deployment, and concepts like load balancing. It explains the best techniques in delivering timely software releases using containerization and CI/CD pipeline. This book covers how to track and monitor application performance using Grafana, Prometheus, and Kibana along with how to extend monitoring more effectively by building full-stack observability into the system.

The book also talks about chaos engineering, types of system failures, design for high-availability, DevSecOps and AIOps.

What you will learn
● Learn the best techniques and practices for building and running reliable software.
● Explore observability and popular methods for effective monitoring of applications.
● Workaround SLIs, SLOs, Error Budgets, and Error Budget Policies to manage failures.
● Learn to practice continuous software delivery using blue/green and canary deployments.

Who this book is for
This book caters to experienced IT professionals, application developers, software engineers, and all those who are looking to develop SRE capabilities at the individual or team level.

Table of Contents
1. Understand the World of IT
2. Introduction to DevOps
3. Introduction to SRE
4. Identify and Eliminate Toil
5. Release Engineering
6. Incident Management
7. IT Monitoring
8. Observability
9. Key SRE KPIs: SLAs, SLOs, SLIs, and Error Budgets
10. Chaos Engineering
11. DevSecOps and AIOps
12. Culture of Site Reliability Engineering

Language
English
ISBN
9789391030322
Cover Page
Title Page
Copyright Page
Foreword
Dedication Page
About the Authors
About the Reviewer
Acknowledgement
Preface
Errata
Table of Contents
1. Understanding the World of IT
Structure
Objective
What is the role of IT in an organization?
Hardware availability
Core software services
Compliance and security
Application development and hosting
Enterprise Architecture (EA)
Software delivery
Understanding the IT organization structure
Role of infrastructure teams
Data centers
Virtualization
Containerization
On-premise infrastructure
Cloud infrastructure
Development and deployment platforms
Role of application teams
Cross-functional development teams
DevOps teams
Production support/operations teams
IT security
Change management team
The TCP/IP protocol suite
Domain Name System
Conclusion
Multiple choice questions
Answers
2. Introduction to DevOps
Structure
Objective
Introduction to DevOps
DevOps principles and practices
DevOps principles
DevOps practices
Benefits of DevOps
Overview of DevOps tools
Git
Ansible
Jenkins
Conclusion
Multiple choice questions
Answers
3. Introduction to SRE
Structure
Objective
DevOps and SRE
Rise of internet companies
SRE overview
SRE terms
SRE team responsibilities
Skill set of SREs
Conclusion
Multiple choice questions
Answers
4. Identify and Eliminate Toil
Structure
Objective
Understanding toil
Importance of eliminating toil
Process optimization with automation
Examples of toil with approaches to automate
Purging and archiving of files
Purging of database tables
Installation/Patching
Monitoring
Checking log files
Identify and Access Management
Vulnerability scans
Infrastructure provisioning/decommissioning
Incident management
Conclusion
Multiple choice questions
Answers
5. Release Management
Structure
Objective
Understanding release management
Release planning
Build package
Test for quality and security
Deployment
Release automation with CI/CD
Using IaC for release management
Blue-green deployments
Canary deployments
Conclusion
Multiple Choice Questions
Answers
6. Incident Management
Structure
Objective
Understanding an incident management
Incident
Incident lifecycle
Blameless postmortems
Incident example
Incident detection/notification
Incident triage
Incident communication
Incident resolution
Incident retrospective/postmortem
Incident knowledge base
Role of development teams
Conclusion
Multiple choice questions
Answers
7. IT Monitoring
Structure
Objective
End to end monitoring strategy
Infrastructure monitoring
Server monitoring
Network monitoring
Storage monitoring
Application monitoring
Probes
Checking logs
Capturing processing time
MQ monitoring
Database monitoring
End user monitoring
DNS monitoring
Monitoring Tools
Agents
Transport
Collectors
Data transformation
Storage
Alerting
Dashboarding
Prometheus
Metricbeat
Grafana
ElastAlert
Conclusion
Multiple choice questions
Answers
8. Observability
Structure
Objective
Goals of observability
Service reliability
Operational efficiency
Security and compliance
Three pillars of observability
Standardized libraries/APIs/SDKs
Standardized trace context
Tracers
Cardinality attributes
Open source libraries and tools
Filebeat
Logstash
Fluentd
OpenTelemetry
Conclusion
Multiple Choice Questions
Answers
9. Key SRE KPIs: SLAs, SLOs, SLIs, and Error Budgets
Structure
Objective
Key metrics for SRE
Service level indicator (SLI)
Service Level Objective (SLO)
Service level agreement (SLA)
Error budgets
Error budget policy
Conclusion
Multiple choice questions
Answers
10. Chaos Engineering
Structure
Objective
Introducing chaos engineering
Application/service unavailability
Network delays
Network failures
Resource unavailability
Configuration errors
Database failures
Chaos engineering process
Define steady state
Build a hypothesis
Minimize blast radius
Inject the failure condition
Verify hypothesis
Reverse failure condition
Fix any issues
Automate to run continuously
Chaos GameDays
Injecting failures
Killing a process
Network failures
HTTP failures
Injecting multiple failures
Techniques for building resiliency
Single point of failures
Rate limiting/throttling
Circuit breaker
Handle retry storms
Conclusion
Multiple choice questions
Answers
11. DevSecOps and AIOps
Structure
Objective
Understanding DevSecOps
Code scanning for security
Secure releases using Infrastructure as Code
Introduction to AIOps
Use cases with AIOps
Intelligent alerting
Noise reduction
Automated root cause analysis
Automated remediation
ChatOps
ChatOps example with Rasa, Flask, and Telegram
Conclusion
Multiple choice questions
Answers
12. Culture of Site Reliability Engineering
Structure
Objective
Breaking silos in the organization
Embracing risk
Continuous improvement
Intelligent automation
Shift-left mindset
Conclusion
Multiple choice questions
Answers
Index
The book hasn't received reviews yet.