In July's Tech Talk, we're talking about performance issues for Ruby on Rails apps with Freelance Ruby on Rails Developer Nate Berkopec.
What are the top 3 most common causes of performance issues for Ruby on Rails apps on Heroku?
1. Not understanding when to scale dynos. Most Rails applications on Heroku are using far more dynos than they really need for their request load. Heroku’s dyno sliders make scaling simple, but they also make it easy to scale even when you don’t need to. Many Rails developers think that scaling dynos will make their application faster. If the app is slow, scale up the dynos. Most of the time though, it won’t work! Scaling dynos only speeds up response times if requests are spending time waiting in the request queue. You can see this in NewRelic. The green area at the base of the web response time graph is the amount of time requests spend in the queue. This is typically about 10ms - if it becomes more than that, scaling dynos (adding concurrency, in effect), will alleviate this issues.
In the image above, this app generally hasn’t experienced any spikes in time spent in the request queue. This is a low-volume application (<100 requests/minute), so there’s a lot of noise in the graph, but generally request queuing takes just 10ms. Scaling this application’s dynos would be inappropriate. Don’t scale your application based on response times alone. Your application may be slowing down due to increased time in the request queue, or it may not. If your request queue is empty and you’re scaling dynos, you’re just wasting money.
The same applies to worker dynos. Scale them based on the depth of your queue. If there aren’t any jobs waiting to be processed, scaling your worker dynos is pointless.
2. Memory bloat. Heroku dynos are small. The base 1x dyno carries just 512MB of memory, the 2X 1024MB. While Heroku (correctly) recommends using a worker-based multi-process webserver like Puma or Unicorn, far too many Rails developers don’t know how much memory just 1 worker uses to run their application. This makes it impossible to tune how many server workers are running on each dyno.
It’s simple math - the maximum number of processes (unicorn workers, puma workers) you can run per dyno is governed by the following formula:
(Dyno RAM size in MB + memory used by the master worker process) / Memory per worker process
Heroku recommends setting the number of worker processes per dyno based on an environment variable called WEB_CONCURRENCY. However, they also suggest that most applications will probably have WEB_CONCURRENCY set to 3 or 4. This just hasn’t been my experience - most Rails applications would be comfortable at WEB_CONCURRENCY=2 or even WEB_CONCURRENCY=1 for 1X dynos. For example, for a typical Rails application, the app will use about ~250 MB in RAM once it’s warmed up. This is a big number (I’ll go into ways to measure it and make it smaller later), but this seems to be the usual size. Now a 1X dyno only has 512MB of RAM available, and the master process of a typical Puma server will use about 90MB of RAM itself. So with WEB_CONCURRENCY set to 1, we’re already using 340MB of RAM! Scaling WEB_CONCURRENCY to 2 will use 590MB, sending us sailing by the memory limit of the dyno and causing us to use ultra-slow swap memory.
So the problem here is twofold - most Rails applications use way too much memory per process, and most developers don’t set WEB_CONCURRENCY correctly based on their application’s RAM usage.
3. Inaccurate performance metrics. Do you use NewRelic? Great! Do you deploy with the 12-factor methodology, as encouraged by Heroku? Even better! But a 12-factor application (and this is true of your application if you use the `rails-12factor` gem as recommended by Heroku) serves it owns assets, rather than uploading them somewhere else like Amazon S3 and serving the assets from there. And if you’re serving your own assets, NewRelic (and the default Heroku metrics page onheroku.com) is measuring those asset requests and adding them into your average server response times.You *must* exclude the assets directory from NewRelic’s tracking to get accurate average server response metrics - you can do this in it’s provided YAML configuration file.
What are some common symptoms of those problems?
- Using >25mb of swap memory/PSS. Most *nix systems use something called swap space when they run out of RAM. This is essentially the operating system using the file system as RAM. However, the filesystem is a lot slower than RAM - 10-50x slower, in fact. If we run out of memory on Heroku, we’ll start using swap memory instead of regular memory. This can slow your app to a crawl. As noted above, set WEB_CONCURRENCY correctly. If you’re using swap memory on Heroku, you’re Doing It Wrong and need to figure out what’s causing that
- Low server response times in New Relic but still a slow site. This can be a symptom of two different issues:
- Serving assets from your production server but not excluding the assets directory from NewRelic. This will artificially depress your server response times.
- Huge amounts of “per-transaction” database calls in web transactions. A simple way to figure out if you’ve got an N+1 query or not is to check how often an SQL query runs per web transaction on NewRelic. If a Transaction Trace shows something like User#find with a count of 30, you know you’ve got a N+1 query. Ideally, there should only be 1 SQL query *per model* used on the page. Any more than a dozen SQL queries per page and you’ve likely got a serious N+1 issue.
What can be done to address those problems? How long does this typically take?
- Use a worker-killer gem. If you’ve got a memory leak you can’t track down, you need to employ a solution that will restart your workers when they start to use swap memory. There are a lot of ways to do this. Several gems, like puma-worker-killer, will do it for you. Some third party services, like hirefire.io, can also provide this.
Don’t use the default cache store. Prefer Redis. By default, Rails uses the filesystem for your cache store. That’s super slow on Heroku. Instead, use a networked cache store like Memcache or Redis. I prefer Redis - it’s under more active development and performs better on benchmarks than Memcache.
- Pay attention to performance in development. Far too many Rails developers use overly simplistic data in development, usually generated by rake db:seed. Where security concerns permit, use a copy of the production database in development. Production databases are nearly always larger and more complicated than anything in our database seeds, which makes it easier to identify N+1 queries and slow SQL. Queries that return 1,000,000 rows in production should return 1,000,000 rows in development. Use gems like rack-mini-profiler to constantly monitor the speed of your controller actions.
Is there anything developers should be mindful of when creating new applications on Heroku?
Implement all the little things that make your app faster right from the start.
Set up a Redis-based cache store
Set up Cloudfront or another CDN
- Remove assets from NewRelic or your preferred performance monitoring solution (and if you’re not using one, use one!)
- Tune WEB_CONCURRENCY to match your dyno size and per-process memory usage.
Do you have recommendations for how to keep these performance issues from recurring? Monitoring, dashboards, performance tests?
Use a performance monitoring solution. I use NewRelic, but only because it’s the easiest to use on Heroku and I haven’t used it’s main competitor in the Rails app space, Skylight. Pay attention to NewRelic’s Appdex scores in particular, because they take into account the inherent variance of site response time over time. In addition, pay particular attention to time spent in the request queue for the reasons mentioned above - it’s your most important scaling metric.
Decide on a maximum acceptable server response time and treat anything more than that as a bug. One of the reasons Rails developers don’t cache enough is because they don’t know how “slow” a slow average response time is. Decide on one for your application. Most Rails applications should be averaging less than 250ms. Less than 100ms is a great goal for a performance-focused site or a site that requires extra fast response times or has a high number of requests, like a social media site. Any action that averages more than your maximum acceptable time should be treated as a bug.