Red Hat Performance Tuning: Linux in Physical, Virtual and Cloud
RH442
Course Objectives and Structure
Orientation to the Classroom Lab Environment
Chapter 1: Introducing Performance Tuning
Goal: Describe performance tuning concepts and goals.
Objectives:
-
Define performance tuning goals, expectations, and terminology.
-
Discuss recommended performance tuning methods of analysis.
-
Define the standard units of measure and conversion arithmetic.
Defining Performance Tuning Concepts
What is Performance Tuning?
Performance Tuning Concepts
Quiz: Defining Performance Tuning Concepts
Discussing Performance Tuning Methodology
Analyzing System Performance
The USE Method Workflow
Applying Changes to a System
Quiz: Discussing Performance Tuning Methodology
Describing Computer Units of Measure
Guided Exercise: Converting Computer Units of Measure
Summary
- Performance tuning concepts.
- The trade-offs in performance tuning when you face security vulnerabilities like MDS.
- The importance of the USE Method in performance analysis.
- The different measurement systems used in the IT world.
- How to convert performance metrics from one unit of measure to another.
Chapter 2: Selecting Performance Monitoring Tools
Goal: Evaluating the large selection of performance monitoring tools that are included with Red Hat Enterprise Linux.
Objectives:
-
Identify the common system monitoring tools and explain the purpose of each tool.
-
Describe the sysstat architecture and explain typical sysstat utility command use.
-
Describe the Performance Co-Pilot utility and its standardized data structures.
Identifying System Monitoring Tools
Linux Performance Observability Tools by Brendan Gregg (CC BY-SA 4.0)
CPU usage history in System Monitor
Memory map of a process in System Monitor
Guided Exercise: Identifying System Monitoring Tools
Describing the Sysstat Package Utilities
Accessing the Sysstat Package Utilities
Monitoring Disk I/O Usage
Monitoring Virtual Memory Usage
Generating System Activity Reports
Linux Performance Observability: sar by Brendan Gregg (CC BY-SA 4.0)
Guided Exercise: Viewing the Sysstat Package Utilities
Collecting Performance Data with Performance Co-Pilot
Performance Co-Pilot Overview
Gathering and Displaying Performance Data
Plotting Performance Metric Data Using a Graphic Utility
The pmchart view without any charts configured
The pmchart Available Metrics item selection
Graph displaying the number of active processes (nprocs)
Replaying data from the log
Guided Exercise: Collecting Performance Data with Performance Co-pilot
Lab: Selecting Performance Monitoring Tools
Summary
- One of the goals of system monitoring is to determine whether the current execution of system components meets the specified technical requirements.
- The sysstat package utilities derive raw data from kernel counters, and provide performance monitoring of CPU usage, disk I/O, process usage, memory usage, and more.
- Performance Co-Pilot (PCP) packages provide the framework for monitoring and management of real-time data, as well as logging and retrieval of historical data. System monitoring tools are available in both command-line and graphical user interfaces.
Chapter 3: Viewing Hardware Resources
Goal: View and interpret hardware resource listings.
Objectives:
Displaying Physical Resources
Linux Static Performance Tools by Brendan Gregg (CC BY-SA 4.0)
Reviewing Kernel Messages
Retrieving CPU information
Retrieving SMBIOS/DMI information
Retrieving Peripheral Information
Collecting System Component Information
Guided Exercise: Displaying Physical Resources
Displaying Resources in Virtual and Cloud Instances
Viewing Hardware Resources From a Host
Observing Guest and Host Events
Guided Exercise: Displaying Resources in Virtual and Cloud Instances
Quiz: Viewing Hardware Resources
Summary
- Tuning a system for performance optimization starts with a hardware profile.
- Identifying hardware on a system, and understanding the theoretical capabilities, requires familiarity with commands that discover hardware resource details. Commands that profile hardware include:
dmesg
, dmidecode
lspci
, lscpu
, lsusb
, lshw
, and lstopo
- Profiling system performance is essential to right-sizing virtual system instances for optimal performance and operation costs. Commands that profile performance include:
kvm_stat
to analyze host and guest behavior.
perf kvm
to analyze performance counters on both host and guest systems.
Chapter 4: Configuring Kernel Tunables and Tuned Profiles
Goal: Configure the operating system to tune for different workload requirements.
Objectives:
-
Configure settings for both the static kernel and for dynamic loadable kernel modules and drivers.
-
Select an appropriate tuned profile to use, based on a system's most common workloads.
-
Create customized tuned profiles for unique workloads.
Configuring Kernel Tunables
Introducing the Proc File System
Introducing Kernel Tunables
Modifying Kernel Tunables
Linux Performance Tuning Tools by Brendan Gregg (CC BY-SA 4.0)
Introducing the Sysfs File System
Configuring Module Parameters
Guided Exercise: Configuring Kernel Tunables
Selecting a Tuned Profile
Tuning System Performance
Installing and Enabling Tuned
Selecting a Tuning Profile
Managing Profiles from the Command Line
Managing Profiles with Web Console
Web Console privileged login
Active performance profile
Select a preferred performance profile
Verify active performance profile
Guided Exercise: Selecting a Tuned Profile
Customizing Tuned Profiles
Creating Custom Tuned Profiles
Inheritance in tuned profile
Guided Exercise: Customizing Tuned Profiles
Lab: Configuring Kernel Tunables and Tuned Profiles
Summary
- Kernel tunables customize the behavior of Red Hat Enterprise Linux at boot, or on a running system. The
sysctl
command is used to list, read, and set kernel tunables.
- Installation of the tuned package also presets the profile, which should be the best available for the system.
- Tuned also monitors the use of system components, and tunes system settings dynamically, based on that monitoring information.
- A custom tuned profile can be based on other profiles, but include only certain aspects of the parent profile.
- The monitoring and tuning plug-ins, which are part of a tuned profile, enable monitoring and optimizing different devices on the system.
Chapter 5: Managing Resource Limits with Control Groups
Goal: Manage resource contention and set limits for resource use on services, applications, and users using cgroup configuration.
Objectives:
-
Configure resource settings for individual and groups of services, applications, and users to control system resource sharing.
-
Customize service, application, and user resource limits using cgroup plug-in parameters.
Limiting System Resources with ulimit
Describing Control Groups
Example of systemd slices
Guided Exercise: Managing Resource Limits
Customizing Control Groups
Managing systemd cgroup Settings
cgroups v1 Hierarchy
cgroups v2 Hierarchy
Guided Exercise: Customizing Control Groups
Lab: Managing Resource Limits with Control Groups
Summary
- How to control system resources using POSIX limits with the help of
systemd
.
- How to activate cgroups in runtime from the command line.
- Why using cgroups integrated with
systemd
is the recommended way to use control groups.
- How to use customized cgroups slices for granular management of system resources.
- The significant changes in the technical preview of cgroups v2.
Chapter 6: Analyzing Performance Using System Tracing Tools
Goal: Diagnose system and application behaviors using a variety of resource-specific tracing tools.
Objectives:
-
Profile system events and observe performance counters with the perf
command.
-
Trace system and library calls for a process.
-
Gather specific diagnostic data using the SystemTap infrastructure.
-
Debug applications using tools instrumented by the in-kernel eBPF virtual machine.
Profiling with Performance Counters
Introduction to Performance Counters
Describing Frequently Used PERF Tools
Guided Exercise: Profiling with Performance Counters
Tracing System and Library Calls
System Calls and Library Calls
Guided Exercise: Tracing System and Library Calls
Gathering Diagnostic Data Using SystemTap
Introduction to SystemTap
Installing a SystemTap Host System
Installing SystemTap Using stap-prep
Installing a SystemTap Target Runtime
Performance Analysis With the SystemTap Examples
SystemTap Examples Installed Locally
Guided Exercise: Gathering Diagnostic Data Using SystemTap
Tracing System Events with eBPF Tools
Linux bcc/BPF Training Tools by Brendan Gregg (CC BY-SA 4.0)
Introduction to BPF Compiler Collection
Guided Exercise: Tracing System Events with eBPF Tools
Lab: Analyzing Performance Using Tracing Tools
Summary
- The
perf
command allows administrators to observe performance counters that track hardware and software events, including:
- Number of instructions executed.
- Cache misses.
- Context switches.
- CLI to collect events in real-time, or record events for later reporting.
- The SystemTap framework allows easy probing and instrumentation of almost any component within the kernel. SystemTap scripts specify where to attach probes and what data to collect when the probe executes.
- The Extended Berkeley Packet Filter (eBPF) is a performance analysis tool that uses front-end applications, such as the BPF Compiler Collection (BCC), to perform dynamic and static tracing of profiling events.
Chapter 7: Tuning CPU Utilization
Goal: Manage CPU resource sharing and scheduling to control utilization.
Objectives:
-
Configure CPU scheduling policies.
-
Configure CPU and interrupt affinity to control application isolation and CPU resource commitments.
-
Describe how the CPU caches are used by applications.
Configuring CPU Scheduling Policies and Tunables
Describing Process Priority
Process priorities
Defining Scheduling Policies
Real-time runtime scheduling
Introducing Deadline Scheduler
Deadline scheduler parameters
Setting Scheduling Options for Processes
Viewing Process Scheduler Statistics
Guided Exercise: Configuring CPU Scheduling Policies and Tunables
Configuring CPU Affinity and Isolation
CPU Affinity Can Increase Cache and Memory Effectiveness
Guided Exercise: Configuring CPU Affinity and Isolation
Hardware topology of Intel Core2 cache diagram
Hardware topology of Intel i7 cache diagram
Cache miss and cache line fill
Cache miss, cache line fill, cache snoop, and cache hit
Cache write through
Cache write back
Profiling CPU Cache Usage
Guided Exercise: Profiling Cache Use
Lab: Tuning CPU Utilization
Summary
- CFS is the scheduling class for non-real-time processes, named
SCHED_NORMAL
. Meant for interactive applications to provide better desktop experience.
- The
SCHED_DEADLINE
scheduler guarantees the scheduling of real-time tasks, even in higher load systems.
- The
irqbalance
service adjusts CPU affinities to increase cache-hits for interrupts. The CPUAffinity
parameter limits the CPUs available to a service.
- Runtime performance of a particular program largely depends on the efficiency of cache usage, since caches are faster than system memory.
Chapter 8: Tuning Memory Utilization
Goal: Manage settings for efficient memory utilization for different types of workloads.
Objectives:
-
Describe memory architecture, including virtual memory, caches, huge pages and process space.
-
Describe system paging behavior and configure paging for server workload requirements.
-
Describe the non-uniform memory architecture and the utilities for viewing and configuring NUMA topology.
-
Describe memory overcommitment in typical application design.
Configuring Memory Architecture
Explaining Virtual and Physical Memory
Virtual to physical memory translation
Monitoring Processes Memory Usage
Introducing Page Tables and the TLB
Actual page table hierarchy on x86-64
TLB flowchart
Guided Exercise: Configuring Memory Architecture
Configuring Memory Paging and Reclamation
Explaining the System Memory and Page Cache
Reclaiming Anonymous Pages
Memory pressure and working set
Reclaiming Page Cache and Tuning Swappiness
Tuning Dirty Pages Cleaning
Handling Out-of-memory Events
Guided Exercise: Configuring Memory Paging and Reclamation
Configuring NUMA Topology
Introducing the Non-Uniform Memory Access architecture
UMA architecture
8-nodes NUMA topology
Managing Non-Uniform Memory Access
4-nodes NUMA system (showing only two nodes)
Managing Non-Uniform Memory for Virtual Machines
Guided Exercise: Configuring NUMA Topology
Managing Memory Overcommit
Introducing Memory Overcommitment
Virtual memory overcommit
Tuning Memory Overcommitment
Guided Exercise: Managing Memory Overcommit
Lab: Tuning Memory Utilization
Summary
- Memory is organized and allocated in pages, which are normally 4 KiB in size.
- Memory limits can be enforced on systemd units through systemd configuration files.
- The
vm.nr_hugepages
sysctl parameter defines the number of huge pages to allocate.
- The
vm.dirty_background_ratio
and vm.dirty_ratio
sysctl tunables control when per-BDI flush threads start writing data to disk.
- The
/proc/PID/oom_score_adj
tunable adjusts the oom_score
for a specific process.
- The
numastat
tool displays process memory statistics on a per-NUMA node basis. numactl
controls the process affinity for processor and memory.
- The
vm.overcommit_memory
sysctl parameter defines the system overcommit mode.
Chapter 9: Tuning Storage Device I/O
Goal: Manage settings for efficient disk utilization in various use cases.
Objectives:
-
Describe transaction-oriented and sequential I/O patterns and disk scheduling algorithms.
-
Describe basic RAID architecture to understand the correlation between application writes and RAID sizing alignment.
-
Describe multiple tool choices for analyzing disk I/O behavior.
Evaluating I/O Patterns and Scheduling Algorithms
Transitioning to Solid State Drives
Selecting an I/O Scheduler
Multiple queue I/O scheduling
Tuning Storage with Tuned Profiles
Simulating workloads with the fio
tool
Guided Exercise: Evaluating I/O Patterns and Scheduling Algorithms
Reviewing RAID Fundamentals
Configuring RAID on Logical Volumes
Guided Exercise: Reviewing RAID Fundamentals
Selecting I/O Analysis Tools
Diagnosing Storage I/O Scenarios
Analyzing Storage Performance with Performance Co-pilot
Guided Exercise: Selecting I/O Analysis Tools
Lab: Tuning Storage Device I/O
Summary
- Red Hat Enterprise Linux 8 includes by default a new collection of multiple queue I/O schedulers for disks which replace the old single-queue based I/O schedulers.
- The Tuned profiles support the tuning of storage systems through plugins like
disk
or sysfs
.
- RAID is an array of disks, usually cheap disks, configured together as a single logical storage unit to support increased performance, redundancy, and fault tolerance.
- Red Hat Enterprise Linux supports the creation and management of software RAID with the
mdadm
utility.
- Several utilities are available to diagnose I/O:
iostat
, iotop
, btrace
, blkparse
, btt
, and blkiomon
.
Chapter 10: Tuning File System Utilization
Goal: Manage application efficiency for file system utilization.
Objectives:
Managing File System Attributes
File System Formatting Options
File System Mount Options
Benchmarking File System Performance
Guided Exercise: Managing File System Attributes
Managing File System Journaling
Guided Exercise: Managing File System Journaling
Lab: Tuning File System Utilization
Summary
- The different use cases of the XFS and ext4 file systems.
- How to tune the XFS and ext4 file systems for specific workloads, with formatting and mount options.
- How to configure the XFS and ext4 file systems with external journaling.
Chapter 11: Tuning Network Utilization
Goal: Manage application efficiency for network utilization.
Objectives:
Configuring for Network Latency and Throughput
Packet Transmission and Reception
Kernel Tunables for Networking
Calculating Bandwidth Delay Product
Buffer bloat
Guided Exercise: Configuring for Network Latency and Throughput
Configuring Network Driver Parameters
Network Performance Tuning
Configuring Interface Teams
Persistent Team Configuration
Troubleshooting Team Interfaces
Guided Exercise: Configuring the Network Link Layer
Lab: Tuning Network Utilization
Summary
- Network buffer sizes can be manipulated through
sysctl
tunables such as net.core.rmem_max
, net.core.wmem_max
, net.ipv4.tcp_rmem
, and net.ipv4.tcp_wmem
.
- Protocol overhead can account for a significant percentage of a data packet's content. Overhead can be reduced by increasing the MTU to make use of jumbo frames.
- The
ethtool
utility can be used to display and modify network card settings.
- The
qperf
utility can be used to measure network performance between two systems.
- Network teaming configuration is done with NetworkManager.
- Some teaming runners improve throughput.
Chapter 12: Performing Tuning Hosts in Virtualization Environments
Goal: Distinguish the requirements for tuning in virtualized environments.
Objectives:
Tuning Hosts in Virtualization Environments
Tuning Virtualization Hosts
Describing Kernel Samepage Merging
Tuning Disk and Block I/O
Setting Limits on KVM Guests
Guided Exercise: Tuning Hosts in Virtualization Environments
Monitoring Performance Metrics in a Virtualized Environment
Retrieving Performance Metrics
Retrieving Metrics Using Prometheus
Prometheus architecture
Using Grafana for Visualization
Grafana web UI
Adding a Prometheus data source to Grafana
Guided Exercise: Monitoring Performance Metrics in a Virtualized Environment
Lab: Performing Tuning in Virtualization Environments
Summary
- The performance impact of cache misses on a NUMA-based system are significant. Therefore virtual CPU pinning and NUMA tuning must be configured together.
- Kernel Samepage Merging (KSM) scans the identical memory pages and consolidates a Copy On Write (COW) shared pages, which helps to reduce physical memory consumption.
- Prometheus determines what to scrape using a static configuration defined in the
static_configs
section of the prometheus.yml
file.
Chapter 13: Comprehensive Review
Lab: Tuning for a Computation-heavy Application
Lab: Tuning for a Storage Intensive Application
Lab: Tuning for a Large Memory Implementation