I have already written quite a lot about the serverless approach, the AWS Lambda service in particular, and how I use it for my own personal purposes. In this post, we will walk once again through how AWS Lambda set up and running. We will talk about strategies to mitigate the impact of DDoS attacks (the days of DoS are long gone) and create fail-safe serverless applications. There is very little information on this topic, although it is quite important and most common when discussing AWS security.
P.S. Unfortunately, this post has not been sponsored.
Briefly about AWS Lambda. AWS Lambda is an AWS computing service that allows us to run simple functions as FaaS in the cloud. AWS Lambda performs all administration for us, including server and operating system maintenance, resource allocation, automatic scaling, monitoring, and logging. All you have to do is provide code in one of the languages that AWS Lambda supports.
Advantages of using the AWS Lambda:
- Cost-effective. You only pay when the service is running(but some services cost a lot).
- No ops. You do not need to manage anything yourself. AWS takes care of the operating system, deployment, scaling, and so on.
- Speed. The lambda itself goes up and runs very fast(but there are overhead on spinning up the runtime).
- Scalability. Functions can be run in parallel with limit depending on the region, from 1000 to 3000 copies maximum. And if you want, this limit can be increased.
The work of AWS Lambda is quite simple and clear. The first time a function is called, an instance of a custom function is created, the runtime is created which passes on queries and answers between AWS Lambda and the function code. The function handler is started to handle the event. The source of the event(called trigger) can be a wide variety of things inside the AWS, the most popular triggers for the web I would call AWS CloudFront, AWS API Gateway, and AWS SQS. When the function has processed an event it returns a response and remains active — it waits for the next events to be passed. As more events arrive, the internal AWS Lambda schedulers direct them to the warm(already running) instances if there is one and create new instances as needed. When the number of requests decreases, AWS Lambda stops unused instances to free up the resources for other functions.
The number of instances of functions that serve requests at a given time is called concurrency. And this is essentially horizontal scalability inside the AWS Lambda. When requests arrive faster than your function can scale, or when your function is on the maximum concurrent level, the additional requests fail with the 429 status (too many requests).
In terms of security, each function of AWS Lambda works in its own isolated environment, with its own resources and file system. It stores code on an Amazon S3 and encrypts it at rest.
When you deploy an endpoint that is open to the world, you open it not only for use but also for abuse.
Imagine someone who wants to disrupt the bus system. Thousands of people get on the bus at the beginning of the route and ride aimlessly through the city from end to end without leaving at the stops. The transport keeps going, but in fact, the traffic is paralyzed. People are standing at intermediate stops and are sadly watching the crowded buses without being able to push through. People are unable to get home, and the bus company is suffering losses due to low passenger traffic.
This is easy to apply for web applications — basically, an attacker is trying to overload some component of the system to bring it to some critical point after which there will be a system failure, it can be a communication channel, a queue of requests or just an overload of the system handler.
Probably the most popular DDoS attack because of its cheapness and difficulty to block it is DNS (Domain Name Server) DDoS. In the DNS DDoS, the attacker tries to overwhelm the capacity of the target's DNS name servers in an attempt to prevent the name server query resolution. Blocking DNS makes the target application or website unavailable to users even if the rest of the infrastructure is running normally. Going back to the bus system analogy, imagine that the buses are running on schedule, they are empty and everything seems fine, but the bus doors just don't open — people just can't get in.
In AWS it is a bit more complicated because, as it has already been said, both management and scaling take place on the AWS side, and therefore control.
AWS Lambda at scale
AWS provides services and mechanisms to avoid common abuse methods but often, as with typical DDoS attacks, it doesn't know what traffic is and isn't abusive.
AWS itself is incredibly huge and has many regions, availability zones, and edge locations, which allows to eliminate bad traffic to some extent and absorbs the rest. But it is not enough just to use the lambda function thinking that AWS will do everything for you automatically. You can of course, but in the end, you can get either a broken service or a huge bill or all at once. AWS imposes limits on the number of concurrent handlers, you have to think about where the traffic is coming from, how DNS resolves, and if you use any external AWS services it sometimes makes sense to migrate them all inside AWS for more complete control. Services such as AWS Route53 and AWS CloudFront which allow you to take advantage of the variety of internal AWS infrastructure — which itself is built quite interestingly.
I tried to illustrate an example of incoming traffic to lambda service located in one region. As you can see, there are many steps before you get to the lambda itself. It all starts with a DNS resolution of a record on one of the name servers in AWS Route53(this is the service that is responsible for name resolution and stuff) that is intentionally hosted on the edge locations so that clients from different networks and locations have their own quick way to the service while having more than one path in case of localized outages. Also, there is AWS CloudFront (you can use Lambda@Edge option in lambda) on the edge locations — it is a content delivery network for delivery of all static and dynamic content, which allows users not to go directly to the function if it is not necessary. So Route 53 can assign different users to the closest AWS CloudFront instances (which can already be in the same edge location) which can already have the necessary content. Sounds great, right?
An example of how it works step by step:
- When the request is sent by the user, the DNS resolves at the user's closest edge location where AWS Route 53 is located.
- Route 53 forwards the request to the nearest edge location where AWS CloudFront is located.
- Then, there can be two possibilities i.e. whether files requested are in cache or not. If files are in the cache then CloudFront returns them.
- CloudFront compares the specifications in your distribution with the request. Then trigger the AWS Lambda function with the user request.
- The origin server sends the requested files to the CloudFront edge location.
- When the first byte of the requested files arrives, CloudFront starts sending the files to the user.
- It also saves the files to the internal cache of CloudFront(for specified TTL) so they could be accessed easily in the future if the same or another user requests them.
You can of course use the API Gateway in an edge-optimized mode in front of your API for about the same purposes (caching and function trigger). However, this is likely to be more expensive, the API Gateway charges for the size of the cache per hour, CloudFront charges per request, and data transfer. But of course, everything depends on the specific architecture, purpose and workload.
The API Gateway is essentially a proxy server that the user is accessing, and this proxy server is calling the lambda. Typically, the API Gateway can also do SSL certificate processing, load balancing, authorization and authentication, caching, request content compression. But it's not that effective under abuse as CloudFront.
How to mitigate the impact
As always, this requires a multi-level approach and everything depends very much on the specific architecture, the workflow, and of course the budget.
Check your code
Let's start with the dumbest and traditionally most effective method. Make sure your code does not "hang" on unexpected input. You should carefully check all edge cases and think about possible inputs that may cause function timeouts, ReDoS attacks, or long payloads. An attacker may take advantage of this weakness.
Another incredibly brilliant idea if you don't want to get a huge bill at some point — set up the billing alerts. It's very easy and fast to set up (it's better to do it through AWS Budgets than through AWS SNS and AWS Cloudwatch), but it's very useful — you will be informed in case of a problem.
Also, I would advise making limits on billing. Of course, everything depends on the specific business task of the service and maybe it's better to overpay, but have a working endpoint.
Use throttling(reserved concurrency)
We have already found out that the AWS Lambda provides multiple instances running concurrently to scale functions, but if you have several Lambda functions running at the same time and one of them is under abuse, the resources of other functions may be exhausted because of it. The AWS Lambda has a default limit on the number of concurrent executions per account per region. And if your functions exceed this limit, additional user requests will be throttled by AWS with 429 status as it was described earlier.
But the concurrency level can be set on per-function bases. Besides AWS Lambda, the API Gateway supports throttling as well. The defaults are reasonable, but you can alter them however you like. For example, you can allow 5 calls per second if it makes sense for your application, after which the API Gateway will block additional requests. In the case of DDoS, the main severity of the attack will lie on the AWS, not on your lambda. The billing is charged for each request plus the total memory allocation per request, so this is the parameter you want to control.
Use batch processing
Consider using SQS as a broker for your Lambda function. By defining a queue as a destination instead of a Lambda, you get the ability to process multiple events at once, aka batch processing. It will reduce the number of function calls in total.
As we have seen before, the use of a content delivery network (CDN) such as Amazon CloudFront is a common strategy to decrease web page load time, reduce bandwidth costs, reduce the load on web servers and mitigate the impact of DDoS attacks. In addition, CloudFront is a platform for deploying AWS WAF. AWS WAF is a web application firewall that helps protect your application from DDoS attacks by giving you control over what traffic to allow or block by defining custom security rules (also called ACLs).
HTTP and HTTPS requests sent to CloudFront can be monitored, and access to application resources can be controlled at the edge locations using AWS WAF. Based on the conditions you specify in the AWS WAF, such as the IP addresses from which the requests originate or the values of query strings, traffic can be allowed, blocked, or allowed and counted for further investigation.
The minimum threshold that you can set for a speed-based WAF rule is 2000 requests per 5-minute period. If you want to apply aggressive IP based rules, you will, unfortunately, have to write your own solution.
Also, using AWS CloudFront, you also get Standart Protection from AWS Shield out of the box. AWS Shield identifies usage spikes before even it reaches your gateway or ELB.
Use API keys on API Gateway
You can use API keys if it is appropriate for your application. Users will have to pass the key inside the HTTP header. Without a valid key, an attacker cannot access the API Gateway. In this case, AWS will bear the main burden of the attack.
Use API Gateway Usage Plans
The usage plan allows us to set parameters to restrict the use of your API. This is essentially similar to throttling but not at the application level, but at the API level and stage level.
Throttling is done using the "Token Bucket" model. The bucket is large enough to hold the number of markers denoted by the Burst value and get new markers at a given speed. Each API query removes one marker from the bucket. Using Token Bucket allows you to have APIs that support a constant flow of requests with the ability to meet random bursts.
The quota is to limit the number of requests for a certain interval, such as a day, week, or month. When using this quota, requests are no longer accepted for a given period of time.
In summary, AWS Lambda is designed for high availability and is backed by huge AWS infrastructure which runs in scale.
There is no complete or perfect solution from DDoS. All that can be done is to take preventive measures and respond quickly and effectively when the attack takes place.
Buy me a coffee