Knowing how to use cloud-based managed services is a useful skill for engineers. However, there are many cloud providers, each has many services, and it is not always clear how they can be used together.
In this four part series, we'll look into various services offered by Amazon Web Services (AWS) and eventually we'll integrate them in an application. Streaming services like AWS Kinesis, queue services like AWS SQS, serverless functions like AWS Lambda, and data warehousing storage like AWS Redshift are an integral part of data processing and moving data from point A to point B. These are integral services when setting up say a scalable and fault tolerant ETL pipeline or an application that can be scaled up to millions of users.
Part 1: Lambda
Part 2: SQS
Part 3: Kinesis
Part 4: A sample application
Part 1: AWS Lambda
Serverless could be understood as the use of 3rd party stateless compute containers to handle server-side logic. These containers are event-triggered and may last for only one invocation (i.e. they are ephemeral).
But the phrase 'serverless' doesn’t mean servers are no longer involved. It just means that developers no longer have to worry about managing servers. The application still runs on servers, but the server and application runtime are both managed by the vendor (like AWS or Google Cloud)
Serverless compute services like Lambda are often referred to as FaaS (function as a service). These functions are not continuously active in a server process -- the functions sit idle until they need to be run as they would be in a traditional system.
When a specified event occurs (like a message received by a message queue like AWS SQS or by a streaming service like AWS Kinesis), the vendor platform initiates the function and then calls it with the triggering event.
Because of the 'serverless' design, this shifts the focus from servers and applications to the individual operations or functions that express our application’s logic.
AWS Lambda is AWS’s implementation of serverless compute. Again, serverless lets you run code without provisioning or managing servers yourself. Just upload your code to AWS Lambda and AWS takes care of everything required to run and scale your code with high availability.
In the words of Missy Elliot... is it worth it? Can you work it? Often serverless functions are most cost-effective than their full-server counterparts.
With AWS Lambda, you pay only for the compute time you consume - there is no charge when your code is not running. You are charged based on the number of requests for your functions, memory you allocate to your function, and the duration -- the time it takes for your code to execute.
Some back-of-the-envelope math for Lambda costs:
Requests: $0.20 per 1M requests
Duration: $0.000016667 for every GB-second and remember one gigabyte is equivalent to 1024 MB so...
If you allocated 512MB of memory to your function, executed it 3 million times in one month, and it ran for 1 second each time, your charges would be calculated as follows:
- The monthly compute price is $0.00001667 per GB-second and the free tier provides 400,000 GB-seconds.
- Total compute (seconds) = 3M requests * (1s per request) = 3,000,000 seconds
- Total compute (GB-seconds) = (3,000,000 seconds * 512MB) * (1GB / 1024 MB) = 1,500,000 GB-seconds
If you’re still in the free-tier year:
- Total compute – Free tier compute = Monthly billable compute GB - seconds
- 1,500,000 GB-seconds – 400,000 free tier GB-seconds = 1,100,000 GB-seconds
- Monthly compute charges = 1,100,000 * $0.00001667 = $18.34
Why should we use AWS Lambda?
- No server management. Just upload your code and AWS Lambda takes care of everything required to run and scale your code with high availability.
- It can scale. Before the emergence of serverless, operations teams had to allocate the server resources based on forecasted traffic and demand. With computing resources like AWS Lambda, the computing resources can scale and descend automatically based on real-time demands. Scaling is automated and you don’t have to manually buy more EC2 instances (AWS servers) or use Elastic Beanstalk.
- You can still run web applications! Lambdas are exposed to web traffic through AWS’s API Gateway which functions as a URL router to your Lambdas.
AWS Lambda vs EC2 setup:
AWS Lambda setup: Whether you need to set up a multiple or single environment, you do not need to do much work. You are not required to spin up or provision containers or make them available for your applications, scaling is fully automated. Just deploy code to AWS via a zip file.
- Logging into the EC2 server via SSH.
- Manually installing Apache and doing a git clone.
- You need to install and configure all the required software in a manner which is automated and reproducible.
- And more! (advanced provisioning, adjusting number of servers to fit needs, etc.)
When should we use AWS Lambda?
These are conditions for which choosing Lambda might be appropriate:
- Stateless code. Lambda is perfectly suitable to execute code with no state that needs to be persisted. Once the function executes, the function and state changes are gone. This isn’t a restriction however! You can still use storage services like RDS and DynamoDB to store state if needed.
- Short execution time. There is no point in maintaining an EC2 server where your tasks are occasional and can be executed within seconds. For example, transforming image/video when uploaded to S3.
- Infrequent traffic. In typical scenarios, your servers are idle while you still pay for it. With Lambdas, you can just run them when traffic hits.
- Real-time processing. AWS Lambda with AWS Kinesis works best for real-time batch processing. For example, writing batch data passed from Kinesis to DynamoDB, analysing logs, etc. Below is an example of a Lambda with a Kinesis trigger.
- Scheduled CRON jobs. You can use AWS Lambda function with scheduled events to function at a fixed scheduled time.
Limitations of Lambda:
- The maximum execution duration per request is set to 300 seconds. In general, serverless architecture is not efficient for running long-lived applications. In such cases, containers or EC2 instances are more appropriate.
- Another interesting limit is the Lambda function deployment package size, which is set to 50MB (compressed), and the non-persistent scratch area (disk space) available for the function to use – 500MB in a /tmp folder. So if your lambda has a lot of third-party dependencies and needs a lot of disk space, lambdas are not an option.
- Another significant issue to consider is AWS Lambda “cold starts”. After becoming idle, it takes some time for the Lambda function to handle a first request because Lambda has to start a new instance of the function rather than reuse an existing instance. One workaround is to send a request periodically to make sure that there is always an active instance and avoid the cold start. However, this increases cost.
- CloudWatch is the only source for logs and troubleshooting and it is annoying to use.
Interested in learning more?
Stay tuned for part 2 or subscribe for all future blogs!