Application 2025-06-14

Understanding IO and CPU Characteristics in Ruby and Rails

Understand Ruby GVL, Puma concurrency, IO-bound vs CPU-bound workloads, thread/process tuning, and GC impact on Rails performance.

Read in: ja
Understanding IO and CPU Characteristics in Ruby and Rails

Overview

This post organizes the concepts of Ruby's concurrency model, the role of the GVL, the thread and process design of the Puma server, the understanding of IO/CPU-bound workloads, bottleneck identification through measurement methods, and the background of changes in Rails/Puma default settings, providing appropriate tuning strategies.

Ruby's Concurrency Model and Global VM Lock (GVL)

The Purpose of GVL

Ruby (MRI/CRuby) has a Global VM Lock (GVL) that restricts the execution of Ruby code to one thread at a time within the same process. The GVL exists to maintain the consistency of memory management, object management, and garbage collection (GC) within the Ruby VM, which is implemented in C. For example, heap operations during object allocation and deallocation, object traversal in mark-and-sweep GC, method cache updates, and internal table operations are not thread-safe, and the GVL prevents simultaneous execution to avoid crashes and data corruption.

Many C extensions (native extensions) are also designed to operate under the GVL, and removing the GVL requires the extension to ensure thread safety, making the cost of maintaining VM-wide consistency very high. While application-level thread safety must be ensured by developers using Mutex, the GVL functions as a global lock to ensure VM internal consistency.

Impact of GVL on Performance

Under the GVL, CPU-bound Ruby code cannot be executed in parallel by multiple threads within the same process, and only one thread can be executed at a time. However, when IO waits occur, such as during database access or external API calls, the GVL is released, allowing other threads to continue execution, making thread concurrency effective in mixed IO-bound workloads. However, when overhead from GVL contention, thread switching delays, and GC pauses are involved, what appears to be an IO wait may actually include waits due to CPU starvation.

Comparison with Other Implementations

TruffleRuby and JRuby do not have a GVL, but they depend on memory management and thread management methods by the VM or JVM. Simply modifying MRI to remove the GVL is vast and challenging, and Rails users can handle most web workloads sufficiently with a multi-process and moderate thread concurrency operational model.

Concurrency with Puma and Trends in Default Settings

Puma's Architecture

Puma is widely used as the standard server for Rails. The master process generates multiple worker processes using fork, and each process uses a thread pool to handle requests. While there are situations where the GVL is released during IO waits and thread switching is effective, process parallelism demonstrates parallel performance in CPU-bound parts.

Below is an excerpt from the config/puma.rb generated for a new Rails app. By default, the number of threads is set by the environment variable RAILS_MAX_THREADS, and the number of workers is controlled by WEB_CONCURRENCY.

# config/puma.rb
threads_count = ENV.fetch("RAILS_MAX_THREADS") { 3 }.to_i
threads threads_count, threads_count

workers ENV.fetch("WEB_CONCURRENCY") { 2 }

preload_app!

on_worker_boot do
  ActiveRecord::Base.establish_connection if defined?(ActiveRecord)
end

The benefits and drawbacks of the fork model (process parallelism) and the thread model (thread concurrency) can be summarized as follows:

Model Benefits Drawbacks
Process Parallelism (fork) - Avoids GVL restrictions, demonstrates true parallel performance in CPU-bound processing- Independent memory space isolates crash impact - Memory usage tends to increase- Process startup cost is incurred
Thread Concurrency - Small memory overhead, lightweight- Other threads can operate easily during IO waits - CPU-bound parallel performance is limited by GVL- Waits may occur due to thread contention or GVL contention

Based on the above, it is important to adjust the combination of the number of processes (workers) and the number of threads (threads) according to the workload characteristics of the app and infrastructure resources.

Background of Default Thread Count Change

As discussed in GitHub Issue #50450, the default Puma thread count for new Rails app generation was changed from the previous 5 to 3. In the issue, DHH proposed "1 thread per worker contributes to low latency" based on his operational experience, and many developers shared their app benchmark results and considerations using Amdahl's Law. The main points of consideration were the trade-off between latency and throughput, the optimal thread count for different IO/CPU characteristics, and ensuring a safety margin under resource constraints such as Heroku Dyno or container environments. As a result, it was agreed that around 3 threads are a balanced reasonable value for many apps, and the default was lowered from 5 to 3 in Rails 7.2. Existing apps are not affected if they explicitly set RAILS_MAX_THREADS or WEB_CONCURRENCY, and it is recommended to start with 3 threads for new projects and adjust as needed based on monitoring and benchmark results.

Misinterpretation of IO-bound vs CPU-bound and Measurement Methods

Pitfalls of Apparent IO Waits

The time recorded as "Query took: XX ms" in Rails logs or APM measurements may include thread scheduling waits, GVL waits, GC execution time, etc., in addition to the actual DB response time. Misinterpreting this as "DB wait is dominant" may lead to excessively increasing the number of threads, worsening GVL contention, and potentially degrading performance.

Measuring GC Time

From Ruby 3.x onwards, GC.total_time is provided as a cumulative counter in nanoseconds, allowing you to understand the time spent on GC by calculating the difference before and after a specific block. From Rails 7.2 onwards, GC time is included in request logs via ActiveSupport::Notifications, making it possible to visualize the impact of GC load.

Visualizing GVL Wait Time

Using the GVL Instrumentation API and dedicated gems (e.g., gvltools) from Ruby 3.2 onwards, there is a method to separately measure IO parts and GVL wait time. This allows you to specifically understand the increase in GVL waits under high CPU load in the background, reducing misinterpretations.

Key Points of OS Scheduler Waits

OS-level scheduler wait times may also be included in IO measurements, but accurate measurement for each individual IO is difficult. By utilizing Linux's /proc/<pid>/schedstat, etc., and monitoring the runqueue wait status of the container or host as a whole, it serves as a guideline for determining the adequacy of the number of processes or threads.

Importance of Profiling

By measuring the various metrics mentioned above, you can understand the IO/CPU ratio and the reality of GVL waits in your application, and determine the number of threads or processes from an Amdahl's Law perspective. Rather than just following defaults, it is important to profile and optimize based on your workload characteristics (frequency of external API calls, DB access patterns, rendering load, etc.).

Background Jobs and Concurrency Settings

In job processing with Sidekiq, etc., where IO-intensive processing (external API calls, file operations, email sending, etc.) is common, a higher concurrency setting (e.g., concurrency: 10-25) is sometimes adopted. However, the following points need attention:

By doing so, it is possible to understand the impact of GVL even in background jobs like Sidekiq and derive the optimal concurrency settings.

Efforts to Improve Ruby Execution Performance

Benefits of JIT (YJIT, etc.)

There are many cases of latency improvement with YJIT introduction, and even with the assumption of many IO waits, improvements of about 15-30% are seen in many apps, so the cost of executing Ruby code cannot be ignored.

Consideration of Removing GVL

While there is discussion about removing the GVL, completely eliminating the GVL in MRI Ruby involves extensive and risky work, including changes to C extensions and the VM internals. Learning from the GIL removal cases of TruffleRuby/JRuby and Python, many web workloads can be sufficiently handled with multi-process and moderate thread concurrency under the GVL.

Operational and Tuning Guidelines

Conclusion

Optimizing Ruby/Rails performance requires a multifaceted understanding of GVL, threads, processes, IO/CPU-bound characteristics, GC, OS scheduler waits, etc. By continuously measuring and understanding the reality and performing appropriate tuning, you can build a system that flexibly meets latency and throughput requirements.

References

Tags: Ruby Ruby on Rails
Share: 𝕏 Post Facebook Hatena
✏️ View source / Discuss on GitHub
☕ Support

If you enjoy this blog, consider supporting it. Every bit helps keep it running!


Related Articles