Red Hat Performance Tuning: Linux in Physical, Virtual and Cloud

RH442

Welcome

Course Objectives and Structure

Schedule

Orientation to the Classroom Lab Environment

Internationalization

Chapter 1: Introducing Performance Tuning

Goal: Describe performance tuning concepts and goals.


Objectives:

  • Define performance tuning goals, expectations, and terminology.

  • Discuss recommended performance tuning methods of analysis.

  • Define the standard units of measure and conversion arithmetic.

Defining Performance Tuning Concepts

What is Performance Tuning?

Performance Tuning Concepts

Quiz: Defining Performance Tuning Concepts

Discussing Performance Tuning Methodology

Analyzing System Performance

The USE Method Workflow

Applying Changes to a System

Quiz: Discussing Performance Tuning Methodology

Describing Computer Units of Measure

Decimal and Binary Units

Unit conversions

Guided Exercise: Converting Computer Units of Measure

Summary

  • Performance tuning concepts.
  • The trade-offs in performance tuning when you face security vulnerabilities like MDS.
  • The importance of the USE Method in performance analysis.
  • The different measurement systems used in the IT world.
  • How to convert performance metrics from one unit of measure to another.

Chapter 2: Selecting Performance Monitoring Tools

Goal: Evaluating the large selection of performance monitoring tools that are included with Red Hat Enterprise Linux.


Objectives:

  • Identify the common system monitoring tools and explain the purpose of each tool.

  • Describe the sysstat architecture and explain typical sysstat utility command use.

  • Describe the Performance Co-Pilot utility and its standardized data structures.

Identifying System Monitoring Tools

System Monitoring

Linux Performance Observability Tools by Brendan Gregg (CC BY-SA 4.0)

Monitoring Tools

CPU usage history in System Monitor

Memory map of a process in System Monitor

Guided Exercise: Identifying System Monitoring Tools

Describing the Sysstat Package Utilities

Accessing the Sysstat Package Utilities

Monitoring CPU Usage

Monitoring Disk I/O Usage

Monitoring Process Usage

Monitoring Virtual Memory Usage

Generating System Activity Reports

Linux Performance Observability: sar by Brendan Gregg (CC BY-SA 4.0)

Using Systemd Timer

Guided Exercise: Viewing the Sysstat Package Utilities

Collecting Performance Data with Performance Co-Pilot

Performance Co-Pilot Overview

Gathering and Displaying Performance Data

Plotting Performance Metric Data Using a Graphic Utility

The pmchart view without any charts configured

The pmchart Available Metrics item selection

Graph displaying the number of active processes (nprocs)

Replaying data from the log

Guided Exercise: Collecting Performance Data with Performance Co-pilot

Lab: Selecting Performance Monitoring Tools

Summary

  • One of the goals of system monitoring is to determine whether the current execution of system components meets the specified technical requirements.
  • The sysstat package utilities derive raw data from kernel counters, and provide performance monitoring of CPU usage, disk I/O, process usage, memory usage, and more.
  • Performance Co-Pilot (PCP) packages provide the framework for monitoring and management of real-time data, as well as logging and retrieval of historical data. System monitoring tools are available in both command-line and graphical user interfaces.

Chapter 3: Viewing Hardware Resources

Goal: View and interpret hardware resource listings.


Objectives:

  • Describe the utilities that list system hardware resources.

  • Describe the techniques used to list resources in virtual and cloud instances.

Displaying Physical Resources

Hardware Resources

Linux Static Performance Tools by Brendan Gregg (CC BY-SA 4.0)

Reviewing Kernel Messages

Retrieving CPU information

Retrieving SMBIOS/DMI information

Retrieving Peripheral Information

Collecting System Component Information

Guided Exercise: Displaying Physical Resources

Displaying Resources in Virtual and Cloud Instances

Viewing Hardware Resources From a Host

Observing Guest and Host Events

Guided Exercise: Displaying Resources in Virtual and Cloud Instances

Quiz: Viewing Hardware Resources

Summary

  • Tuning a system for performance optimization starts with a hardware profile.
  • Identifying hardware on a system, and understanding the theoretical capabilities, requires familiarity with commands that discover hardware resource details. Commands that profile hardware include:
    • dmesg, dmidecode
    • lspci, lscpu, lsusb, lshw, and lstopo
  • Profiling system performance is essential to right-sizing virtual system instances for optimal performance and operation costs. Commands that profile performance include:
    • kvm_stat to analyze host and guest behavior.
    • perf kvm to analyze performance counters on both host and guest systems.

Chapter 4: Configuring Kernel Tunables and Tuned Profiles

Goal: Configure the operating system to tune for different workload requirements.


Objectives:

  • Configure settings for both the static kernel and for dynamic loadable kernel modules and drivers.

  • Select an appropriate tuned profile to use, based on a system's most common workloads.

  • Create customized tuned profiles for unique workloads.

Configuring Kernel Tunables

Introducing the Proc File System

Introducing Kernel Tunables

Modifying Kernel Tunables

Linux Performance Tuning Tools by Brendan Gregg (CC BY-SA 4.0)

Introducing the Sysfs File System

Configuring Module Parameters

Guided Exercise: Configuring Kernel Tunables

Selecting a Tuned Profile

Tuning System Performance

Installing and Enabling Tuned

Selecting a Tuning Profile

Managing Profiles from the Command Line

Managing Profiles with Web Console

Web Console privileged login

Active performance profile

Select a preferred performance profile

Verify active performance profile

Defining Tuned Profile

Guided Exercise: Selecting a Tuned Profile

Customizing Tuned Profiles

Creating Custom Tuned Profiles

Inheritance in tuned profile

Guided Exercise: Customizing Tuned Profiles

Lab: Configuring Kernel Tunables and Tuned Profiles

Summary

  • Kernel tunables customize the behavior of Red Hat Enterprise Linux at boot, or on a running system. The sysctl command is used to list, read, and set kernel tunables.
  • Installation of the tuned package also presets the profile, which should be the best available for the system.
  • Tuned also monitors the use of system components, and tunes system settings dynamically, based on that monitoring information.
  • A custom tuned profile can be based on other profiles, but include only certain aspects of the parent profile.
  • The monitoring and tuning plug-ins, which are part of a tuned profile, enable monitoring and optimizing different devices on the system.

Chapter 5: Managing Resource Limits with Control Groups

Goal: Manage resource contention and set limits for resource use on services, applications, and users using cgroup configuration.


Objectives:

  • Configure resource settings for individual and groups of services, applications, and users to control system resource sharing.

  • Customize service, application, and user resource limits using cgroup plug-in parameters.

Managing Resource Limits

Limiting System Resources with ulimit

Describing Control Groups

Example of systemd slices

Guided Exercise: Managing Resource Limits

Customizing Control Groups

Managing systemd cgroup Settings

Previewing cgroups v2

cgroups v1 Hierarchy

cgroups v2 Hierarchy

Guided Exercise: Customizing Control Groups

Lab: Managing Resource Limits with Control Groups

Summary

  • How to control system resources using POSIX limits with the help of systemd.
  • How to activate cgroups in runtime from the command line.
  • Why using cgroups integrated with systemd is the recommended way to use control groups.
  • How to use customized cgroups slices for granular management of system resources.
  • The significant changes in the technical preview of cgroups v2.

Chapter 6: Analyzing Performance Using System Tracing Tools

Goal: Diagnose system and application behaviors using a variety of resource-specific tracing tools.


Objectives:

  • Profile system events and observe performance counters with the perf command.

  • Trace system and library calls for a process.

  • Gather specific diagnostic data using the SystemTap infrastructure.

  • Debug applications using tools instrumented by the in-kernel eBPF virtual machine.

Profiling with Performance Counters

Introduction to Performance Counters

Installing PERF

Describing Frequently Used PERF Tools

Guided Exercise: Profiling with Performance Counters

Tracing System and Library Calls

System Calls and Library Calls

Tracing System Calls

Guided Exercise: Tracing System and Library Calls

Gathering Diagnostic Data Using SystemTap

Introduction to SystemTap

Installing a SystemTap Host System

Installing SystemTap Using stap-prep

Using SystemTap

Installing a SystemTap Target Runtime

Performance Analysis With the SystemTap Examples

SystemTap Examples Installed Locally

Guided Exercise: Gathering Diagnostic Data Using SystemTap

Tracing System Events with eBPF Tools

Introduction to eBPF

Linux bcc/BPF Training Tools by Brendan Gregg (CC BY-SA 4.0)

Introduction to BPF Compiler Collection

Using BCC Tools

Guided Exercise: Tracing System Events with eBPF Tools

Lab: Analyzing Performance Using Tracing Tools

Summary

  • The perf command allows administrators to observe performance counters that track hardware and software events, including:
    • Number of instructions executed.
    • Cache misses.
    • Context switches.
    • CLI to collect events in real-time, or record events for later reporting.
  • The SystemTap framework allows easy probing and instrumentation of almost any component within the kernel. SystemTap scripts specify where to attach probes and what data to collect when the probe executes.
  • The Extended Berkeley Packet Filter (eBPF) is a performance analysis tool that uses front-end applications, such as the BPF Compiler Collection (BCC), to perform dynamic and static tracing of profiling events.

Chapter 7: Tuning CPU Utilization

Goal: Manage CPU resource sharing and scheduling to control utilization.


Objectives:

  • Configure CPU scheduling policies.

  • Configure CPU and interrupt affinity to control application isolation and CPU resource commitments.

  • Describe how the CPU caches are used by applications.

Configuring CPU Scheduling Policies and Tunables

Scheduling Processes

Describing Process Priority

Process priorities

Defining Scheduling Policies

Real-time runtime scheduling

Introducing Deadline Scheduler

Deadline scheduler parameters

Setting Scheduling Options for Processes

Viewing Process Scheduler Statistics

Guided Exercise: Configuring CPU Scheduling Policies and Tunables

Configuring CPU Affinity and Isolation

Pinning Processes

CPU Affinity Can Increase Cache and Memory Effectiveness

Balancing Interrupts

CPU Partitioning

Guided Exercise: Configuring CPU Affinity and Isolation

Profiling Cache Use

Describing CPU Caches

Hardware topology of Intel Core2 cache diagram

Hardware topology of Intel i7 cache diagram

Cache architecture

Cache miss and cache line fill

Cache miss, cache line fill, cache snoop, and cache hit

Cache write through

Cache write back

Profiling CPU Cache Usage

Guided Exercise: Profiling Cache Use

Lab: Tuning CPU Utilization

Summary

  • CFS is the scheduling class for non-real-time processes, named SCHED_NORMAL. Meant for interactive applications to provide better desktop experience.
  • The SCHED_DEADLINE scheduler guarantees the scheduling of real-time tasks, even in higher load systems.
  • The irqbalance service adjusts CPU affinities to increase cache-hits for interrupts. The CPUAffinity parameter limits the CPUs available to a service.
  • Runtime performance of a particular program largely depends on the efficiency of cache usage, since caches are faster than system memory.

Chapter 8: Tuning Memory Utilization

Goal: Manage settings for efficient memory utilization for different types of workloads.


Objectives:

  • Describe memory architecture, including virtual memory, caches, huge pages and process space.

  • Describe system paging behavior and configure paging for server workload requirements.

  • Describe the non-uniform memory architecture and the utilities for viewing and configuring NUMA topology.

  • Describe memory overcommitment in typical application design.

Configuring Memory Architecture

Explaining Virtual and Physical Memory

Virtual to physical memory translation

Monitoring Processes Memory Usage

Monitoring Page Faults

Introducing Page Tables and the TLB

Actual page table hierarchy on x86-64

TLB flowchart

Managing Huge Pages

Guided Exercise: Configuring Memory Architecture

Configuring Memory Paging and Reclamation

Explaining the System Memory and Page Cache

Reclaiming Anonymous Pages

Memory pressure and working set

Reclaiming Page Cache and Tuning Swappiness

Tuning Dirty Pages Cleaning

Handling Out-of-memory Events

Guided Exercise: Configuring Memory Paging and Reclamation

Configuring NUMA Topology

Introducing the Non-Uniform Memory Access architecture

UMA architecture

8-nodes NUMA topology

Managing Non-Uniform Memory Access

4-nodes NUMA system (showing only two nodes)

Managing Non-Uniform Memory for Virtual Machines

Guided Exercise: Configuring NUMA Topology

Managing Memory Overcommit

Introducing Memory Overcommitment

Virtual memory overcommit

Tuning Memory Overcommitment

Guided Exercise: Managing Memory Overcommit

Lab: Tuning Memory Utilization

Summary

  • Memory is organized and allocated in pages, which are normally 4 KiB in size.
  • Memory limits can be enforced on systemd units through systemd configuration files.
  • The vm.nr_hugepages sysctl parameter defines the number of huge pages to allocate.
  • The vm.dirty_background_ratio and vm.dirty_ratio sysctl tunables control when per-BDI flush threads start writing data to disk.
  • The /proc/PID/oom_score_adj tunable adjusts the oom_score for a specific process.
  • The numastat tool displays process memory statistics on a per-NUMA node basis. numactl controls the process affinity for processor and memory.
  • The vm.overcommit_memory sysctl parameter defines the system overcommit mode.

Chapter 9: Tuning Storage Device I/O

Goal: Manage settings for efficient disk utilization in various use cases.


Objectives:

  • Describe transaction-oriented and sequential I/O patterns and disk scheduling algorithms.

  • Describe basic RAID architecture to understand the correlation between application writes and RAID sizing alignment.

  • Describe multiple tool choices for analyzing disk I/O behavior.

Evaluating I/O Patterns and Scheduling Algorithms

Transitioning to Solid State Drives

Selecting an I/O Scheduler

Multiple queue I/O scheduling

Tuning Storage with Tuned Profiles

Simulating workloads with the fio tool

Guided Exercise: Evaluating I/O Patterns and Scheduling Algorithms

Reviewing RAID Fundamentals

Reviewing RAID

Creating RAID arrays

Configuring RAID on Logical Volumes

Guided Exercise: Reviewing RAID Fundamentals

Selecting I/O Analysis Tools

Diagnosing Storage I/O Scenarios

The iostat Utility

The iotop Utility

The blktrace Toolset

Analyzing Storage Performance with Performance Co-pilot

Guided Exercise: Selecting I/O Analysis Tools

Lab: Tuning Storage Device I/O

Summary

  • Red Hat Enterprise Linux 8 includes by default a new collection of multiple queue I/O schedulers for disks which replace the old single-queue based I/O schedulers.
  • The Tuned profiles support the tuning of storage systems through plugins like disk or sysfs.
  • RAID is an array of disks, usually cheap disks, configured together as a single logical storage unit to support increased performance, redundancy, and fault tolerance.
  • Red Hat Enterprise Linux supports the creation and management of software RAID with the mdadm utility.
  • Several utilities are available to diagnose I/O: iostat, iotop, btrace, blkparse, btt, and blkiomon.

Chapter 10: Tuning File System Utilization

Goal: Manage application efficiency for file system utilization.


Objectives:

  • Describe the file system choices and tunable attributes.

  • Configure file system journaling and discuss appropriate use cases.

Managing File System Attributes

Local File Systems

File System Formatting Options

File System Mount Options

Benchmarking File System Performance

Guided Exercise: Managing File System Attributes

Managing File System Journaling

Journal Placement

External Journal

Guided Exercise: Managing File System Journaling

Lab: Tuning File System Utilization

Summary

  • The different use cases of the XFS and ext4 file systems.
  • How to tune the XFS and ext4 file systems for specific workloads, with formatting and mount options.
  • How to configure the XFS and ext4 file systems with external journaling.

Chapter 11: Tuning Network Utilization

Goal: Manage application efficiency for network utilization.


Objectives:

  • Describe latency and throughput definitions and discuss the contrasting requirements for different network workloads.

  • Configure network device parameters using ethtool.

Configuring for Network Latency and Throughput

Packet Transmission and Reception

Network Kernel Tunables

Kernel Tunables for Networking

Calculating Bandwidth Delay Product

Buffer bloat

Enabling Jumbo Frames

Guided Exercise: Configuring for Network Latency and Throughput

Configuring Network Driver Parameters

Network Performance Tuning

Network Teaming

Configuring Interface Teams

Persistent Team Configuration

Troubleshooting Team Interfaces

Guided Exercise: Configuring the Network Link Layer

Lab: Tuning Network Utilization

Summary

  • Network buffer sizes can be manipulated through sysctl tunables such as net.core.rmem_max, net.core.wmem_max, net.ipv4.tcp_rmem, and net.ipv4.tcp_wmem.
  • Protocol overhead can account for a significant percentage of a data packet's content. Overhead can be reduced by increasing the MTU to make use of jumbo frames.
  • The ethtool utility can be used to display and modify network card settings.
  • The qperf utility can be used to measure network performance between two systems.
  • Network teaming configuration is done with NetworkManager.
  • Some teaming runners improve throughput.

Chapter 12: Performing Tuning Hosts in Virtualization Environments

Goal: Distinguish the requirements for tuning in virtualized environments.


Objectives:

  • Explain the differences between tuning hosts in a virtualized environment and a cloud environment.

  • Gather performance metrics from virtual machines using open source tools.

Tuning Hosts in Virtualization Environments

Tuning Virtualization Hosts

Tuning Virtual CPU

Limiting Host Memory

Describing Kernel Samepage Merging

Tuning Disk and Block I/O

Setting Limits on KVM Guests

Tuning Virtual Networks

Guided Exercise: Tuning Hosts in Virtualization Environments

Monitoring Performance Metrics in a Virtualized Environment

Retrieving Performance Metrics

Retrieving Metrics Using Prometheus

Prometheus architecture

Using Grafana for Visualization

Grafana web UI

Adding a Prometheus data source to Grafana

Guided Exercise: Monitoring Performance Metrics in a Virtualized Environment

Lab: Performing Tuning in Virtualization Environments

Summary

  • The performance impact of cache misses on a NUMA-based system are significant. Therefore virtual CPU pinning and NUMA tuning must be configured together.
  • Kernel Samepage Merging (KSM) scans the identical memory pages and consolidates a Copy On Write (COW) shared pages, which helps to reduce physical memory consumption.
  • Prometheus determines what to scrape using a static configuration defined in the static_configs section of the prometheus.yml file.

Chapter 13: Comprehensive Review

Comprehensive Review

Reviewing

Lab: Tuning for a Computation-heavy Application

Lab: Tuning for a Storage Intensive Application

Lab: Tuning for a Large Memory Implementation

RH442-RHEL8.0-en-1-20190828