In July's Tech Talk, we're talking about performance issues for Ruby on Rails apps with Freelance Ruby on Rails Developer Nate Berkopec.
What are the top 3 most common causes of performance issues for Ruby on Rails apps on Heroku?
1. Not understanding when to scale dynos. Most Rails applications on Heroku are using far more dynos than they really need for their request load. Heroku’s dyno sliders make scaling simple, but they also make it easy to scale even when you don’t need to. Many Rails developers think that scaling dynos will make their application faster. If the app is slow, scale up the dynos. Most of the time though, it won’t work! Scaling dynos only speeds up response times if requests are spending time waiting in the request queue. You can see this in NewRelic. The green area at the base of the web response time graph is the amount of time requests spend in the queue. This is typically about 10ms - if it becomes more than that, scaling dynos (adding concurrency, in effect), will alleviate this issues.
In the image above, this app generally hasn’t experienced any spikes in time spent in the request queue. This is a low-volume application (<100 requests/minute), so there’s a lot of noise in the graph, but generally request queuing takes just 10ms. Scaling this application’s dynos would be inappropriate. Don’t scale your application based on response times alone. Your application may be slowing down due to increased time in the request queue, or it may not. If your request queue is empty and you’re scaling dynos, you’re just wasting money.
The same applies to worker dynos. Scale them based on the depth of your queue. If there aren’t any jobs waiting to be processed, scaling your worker dynos is pointless.
2. Memory bloat. Heroku dynos are small. The base 1x dyno carries just 512MB of memory, the 2X 1024MB. While Heroku (correctly) recommends using a worker-based multi-process webserver like Puma or Unicorn, far too many Rails developers don’t know how much memory just 1 worker uses to run their application. This makes it impossible to tune how many server workers are running on each dyno.
It’s simple math - the maximum number of processes (unicorn workers, puma workers) you can run per dyno is governed by the following formula:
(Dyno RAM size in MB + memory used by the master worker process) / Memory per worker process
Heroku recommends setting the number of worker processes per dyno based on an environment variable called WEB_CONCURRENCY. However, they also suggest that most applications will probably have WEB_CONCURRENCY set to 3 or 4. This just hasn’t been my experience - most Rails applications would be comfortable at WEB_CONCURRENCY=2 or even WEB_CONCURRENCY=1 for 1X dynos. For example, for a typical Rails application, the app will use about ~250 MB in RAM once it’s warmed up. This is a big number (I’ll go into ways to measure it and make it smaller later), but this seems to be the usual size. Now a 1X dyno only has 512MB of RAM available, and the master process of a typical Puma server will use about 90MB of RAM itself. So with WEB_CONCURRENCY set to 1, we’re already using 340MB of RAM! Scaling WEB_CONCURRENCY to 2 will use 590MB, sending us sailing by the memory limit of the dyno and causing us to use ultra-slow swap memory.
So the problem here is twofold - most Rails applications use way too much memory per process, and most developers don’t set WEB_CONCURRENCY correctly based on their application’s RAM usage.
3. Inaccurate performance metrics. Do you use NewRelic? Great! Do you deploy with the 12-factor methodology, as encouraged by Heroku? Even better! But a 12-factor application (and this is true of your application if you use the `rails-12factor` gem as recommended by Heroku) serves it owns assets, rather than uploading them somewhere else like Amazon S3 and serving the assets from there. And if you’re serving your own assets, NewRelic (and the default Heroku metrics page onheroku.com) is measuring those asset requests and adding them into your average server response times.You *must* exclude the assets directory from NewRelic’s tracking to get accurate average server response metrics - you can do this in it’s provided YAML configuration file.
Don’t use the default cache store. Prefer Redis. By default, Rails uses the filesystem for your cache store. That’s super slow on Heroku. Instead, use a networked cache store like Memcache or Redis. I prefer Redis - it’s under more active development and performs better on benchmarks than Memcache.
Implement all the little things that make your app faster right from the start.
Set up a Redis-based cache store
Set up Cloudfront or another CDN
Use a performance monitoring solution. I use NewRelic, but only because it’s the easiest to use on Heroku and I haven’t used it’s main competitor in the Rails app space, Skylight. Pay attention to NewRelic’s Appdex scores in particular, because they take into account the inherent variance of site response time over time. In addition, pay particular attention to time spent in the request queue for the reasons mentioned above - it’s your most important scaling metric.
Decide on a maximum acceptable server response time and treat anything more than that as a bug. One of the reasons Rails developers don’t cache enough is because they don’t know how “slow” a slow average response time is. Decide on one for your application. Most Rails applications should be averaging less than 250ms. Less than 100ms is a great goal for a performance-focused site or a site that requires extra fast response times or has a high number of requests, like a social media site. Any action that averages more than your maximum acceptable time should be treated as a bug.