The activities of web applications are uncertain, sometimes they serve a huge number of workloads, but sometimes they can idle without a large number of requests. The hosting of applications on cloud virtual machines forces us to pay for idle times too. To solve such a problem we must look at load balancing, DNS lookup, and automatic scaling. It is difficult to manage all this and don't really make sense on side projects.
Serverless technologies are several years old and its popularity is increasing every year. For highly loaded systems it is a simple way of infinite scaling, and for simple side projects, it is a great opportunity for free hosting. This is what this long read is about.
I've never been a blogger - in fact, the word "blogger" itself was a kind of swear word for me. Now I have a blog that I periodically write to.
To start with, to highlight the title of the post, I'll tell you a little bit about why the hell a blog is needed.
Blog as a personal brand
A personal brand is recognized in certain circles. Recognizability in itself is more of a side effect and a disadvantage of media. But along with it comes the trust of the audience - a resource that can then be converted into anything, in the spread and promotion of your ideas, in the networking, monetization, etc.
Blog as a showcase of expertise
A blog is the best showcase of expertise. Even if you're not the best writer, even if you're not the biggest expert, but everybody has started somewhere. With the help of the blog, you can scale your expertise even cooler: not only write code but also teach other programmers to write code. The blog makes you express your ideas in a clear way and improve your language skills.
Increasing your expertise
Over time, I have found that the best way to learn something is to try to teach others. The blog makes you learn more and more about a topic in which you arrogantly thought you were an expert. Expertise has an expiration date. A blog helps you keep it more or less fresh.
Blog is exciting
A blog is a mouthpiece through which you can communicate with a large audience on topics you are interested in. It is a way to help others and make your life more meaningful. It's an opportunity to communicate ideas that you think are important.
It's a way to deploy the code without spinning out the server. We take our predefined function on Python(yeah we will talking about Python here exclusively), we send it to the cloud and it works there in the sandboxed runtime environment that the cloud itself provides us with. How the function is being triggered, how containers can be reused, etc. depends on the vendor and may vary.
Even though the method is called serverless, it's not really a lack of a server, it's just that we don't have a centralized server. But instead comes a bunch of decentralized services published in cloud storage that are deployed automatically when the right event appears.
Someone compares this to a microservice architecture but these are a bit different concepts though they are close in nature. Generally, a microservice is larger in terms of both functionality and resources — ideally, each microservice should have its own database, message broker and everything that is required for its independent operation. A function is a relatively small piece of code that performs only one action in response to an event. Depending on how the developers have divided the application, a microservice may be equivalent to a function (meaning that it only performs one action) or may consist of several functions.
We don't write any Flask code, any Django code, nothing like that. Runtime is provided by the cloud platform and it makes decisions for us, the platform can reuse previous environments to speed up the response (AWS for example).
We don't pay for the rent of the server, but for the time our code is directly executed, for those milliseconds for which our request is processed. Like pay-as-you-go. It means that the charging for usage is made according to the execution time of a particular function. Instead of running a 24/7 server, you can place lambda functions, and the server will work only during the life cycle of the request. This is great for prototyping as you pay for the number of specific requests and if you use AWS Lambda you have 1 million absolutely free(!) requests per month.
We don't store the state in functions, we can't use global variables, we can't store anything on the hard disk (technically it may be possible but it doesn't make much sense), we can only use an external things — external base, external cache, external storage, etc. Thanks to this, when the workload and the number of requests arise, there is no problem with horizontal scaling. And all this allows us to stop paying attention to the infrastructure and engage in the actual code writing and business logic of the application.
Our code is secondary to the infrastructure. Any application is basically a way to take user data, do something with it, and put it somewhere in one way or another. And in the case of serverless, we take this infrastructure that cloud vendors offer us and connect those services that already have some kind of logic. We write a function that takes what came to the web-server and puts somewhere in the database and we build a lot of these little bridges. Our application now is this grid of small functions. Thus, it turns out that the code is not the basis of an application but a glue that binds different infrastructure components. When building such applications, development usually starts with the infrastructure.
There are disadvantages, of course.
Obviously, here we get even more dependent on the vendors that provide us with their infrastructure. The so-called vendor lock. Functions developed for AWS will be very difficult to port to, for example, Google Cloud. And not because of the functions themselves, Python is Python even in Google, but rather because you rarely use serverless functions in isolation. Besides them, you will use the database, message queues, logging systems and so on, which is absolutely different from vendor to vendor. However, if you want to, even this functionality can be made independent of the cloud provider.
It is evident that using a third-party service can let you lead to lesser system control. It is because you will be unable to understand the whole system properly. Basically it leads to limitations on what you're able to use in your application.
Another disadvantage is that the price of excellent scalability is that until your function is called, it is not started. And when you need to run it, it can take up to a few seconds(cold start), which can be critical to your business/application.
What can be implemented in such an infrastructure?
Not every application is suitable for this. This thing won't become a standard, it's not the next step in technology, it's not a holy grail, but it still has a very, very good niche that it fills.
You can't do long tasks because the vendor limits the time we have to perform the functions in runtime (Amazon allows up to 15 minutes). So we can't take and run some kind of scraper and wait for it to parse sites for an hour.
We can't deploy an application with a lot of dependencies because, again, since we have no control over the operating system where all this stuff spins, we may have problems with the libraries that rely on this operation.
We cannot use global variables or any data stored in memory. And so our application must be a kind of stateless.
One of the market leaders in FaaS services today is AWS with its AWS Lambda service. They have support for a large number of programming languages (including Ruby, Python, Go, NodeJS, C#, and Java) and a huge number of services that allow solving quite complex multi-level problems. AWS Lambda automatically scales as needed, depending on the requests the application receives. Thus, allowing us to pay only when we consume it. If our code does not work, no fee is charged. Also, there is a free tier resources that can be used free of charge, with limitations of course.
This is the provider I chose for my blog. Mainly because I know it relatively well and understand how to work with it.
For me, the minus of this platform is pricing, it's some random flip of a coin. Despite the fact that they give you a detailed description of what was charged and their pricing calculator, but I often surprised by the prices. Different internal data transfers, s3 container usage, lambda calls, data transfer between different availability zones even if you have all in one zone. For all of this Amazon charges a fee, a small one but you should not forget about it.
The easiest way to run lambda is to deploy it in the Amazon admin interface. Amazon has an admin UI for everything, they even have their own built-in code editor. And in the simplest case, deploy of the application consists of two steps.
Implement the function. The function must receive an event that contains information about the request and the context that contains some information about runtime. There are no abstractions here, amazon does not impose its another meaningless DSL, native Python can be used here.
By creating a function it is possible to create an endpoint for this function in the same Amazon admin. You have to enter the AWS gateway where you can create an endpoint with a couple of clicks and bind this endpoint to the desired function. All requests that arrive here will trigger the function and will be served by this function. When adding resources, the headache in connecting them is obviously increased.
Is it convenient?
Not really. Even for simple things, it seems like there's a lot to do and keep in mind.
We need the framework!
There are several frameworks that automate this process and allow not to touch the Amazon admin and not to do anything with zip archives.
One of the oldest is probably a serverless framework.
It is very simple — we need to provide the yml file with the Amazon configuration. And python file with the function that will be running. In the Python file, we write a function that will actually run in the same event syntax, the context response status of which we should pass. We must describe the triggers that call this function.
Next, we call a command and framework magic takes place — everything is being packed, everything is loaded — deploy by one command. And as a result, we get an endpoint which we can call and see how the function work.
What the serverless framework do:
- Manages deployments and creates required resources(API Gateway endpoint, IAM)
- Extensible via plugins
- Manages packaging requirements
It is written in node.js :( Which means that we need to install it through npm with all that node.js problems and huge node_modules folder.
Zappa is the original Python serverless library/framework. Originally it was designed as django-zappa, a library that allows deploying Django web applications on AWS Lambda. Since then it has grown several times and now supports any WSGI compatible Python web framework. It has also expanded its feature set and is largely a battery-included tool for creating, deploying and managing serverless web applications written on Python.
Zappa is designed in a similar way to a serverless framework. There is a configuration file called
zappa_settings.json that describes the configuration for the deploy. There is a Python file in which you write the entire Flask application. And then Zappa allows us to take this Flask application and submit it to the AWS cloud without the slightest change. With one command you can send your Flask application to the cloud and it will start working there. It is possible to migrate current projects without any modifications (well almost).
That is what I did.
My blog has an ancient history, originally I had it hosted at DigitalOcean which they at some point just fucked up after regular maintenance. Despite the backups, I did not restore it and leave it for a while. Then I went back to the project and moved it to AWS EC2 in the same form without any changes. The whole project started with a docker swarm with the deployment of the application container with Flask and mongodb. After receiving the first invoice, I thought about the cost of things in this world. So I decided to move with my very simple application to the serverless platform.
And moved to AWS Lambda with Zappa framework.
The first question that had to be answered was...
Where to store a state?
The simplest option is to use DynamoDB in the same AWS - it's a kind of managed NoSQL database with configurable performance. You will be charged by the number of hits to this database and they have a rather peculiar syntax.
That's what I did originally. To always have a plan b and not to go all-in I have implemented some analog of DAO pattern to encapsulate the logic of accessing the databases. Invoice has become nicer but for my case using a distributed database to store a couple of articles sounds like a very stupid idea. And it still costs me a lot of money.
Eventually, I came to the hack of state storage on amazon s3. I didn't invent it, I looked it up. There are different ways to store this state:
A. The easiest way is to use SQLite and store the SQLite database on s3
It works like this:
- An incoming request to the AWS balancer calls a function trigger at this time on the s3-bucket is your database in the file.
- Lambda starts downloading the database from the s3 packet, keeps it in RAM. It makes some requests.
- Lambda puts it back on the bucket and then returns the request.
Here's a fairly popular package for this sqlalchemy-s3sqlite.
It works in milliseconds. It really does.
B. Another option is to use so-called s3fs implementation. This way we get a filesystem interface for our s3 bucket. Here is an example of the library that you may use s3fs.
C. Custom stupid solution. You guessed right — this is my choice. Still not think that was a good idea, but I too lazy to do anything with it.
That's what my architecture looks like in the end.
And the code structure. You'll probably have your own, so I won't tell mine in detail. It's just a simple Flask application with a jinja2 template engine and a self-written admin panel. There you can see
dao module here that has endured all the bullying for some time.
blog ├── admin.py ├── app.py ├── consts.py ├── server.py ├── static │ ├── admin.bundle.js │ ├── client.bundle.js │ ├── css │ ├── img │ └── js ├── templates ├── utils │ ├── dao │ │ ├── dynamo.py │ │ ├── mongo.py │ │ └── s3.py │ ├── __init__.py │ └── utils.py └── zappa_settings.json
As a comment engine, I use utteranc.es. A lightweight widget built on GitHub Issues. As long as GitHub is free this thing will be free as well.
Perhaps this point is of most interest to many.
So, you get 1 million requests of AWS Lamda per month for free, the next million will cost you 20 cents/ms. You also get 1 million requests for the Gateway API for free. That is, at low traffic you may not pay at all or only pay for the traffic (AWS charges for that as well). For the entire life of my blog on AWS Lambda, I paid from 0 to 1 dollar per month, depending on the phase of the moon. My situation one hundred percent can be optimized by the number of requests and infrastructure that I use, but your situation will likely be different.
Daily dose of