2023/02/28
AWS Lambda has a Python offering that is limited to version 3.9. It is challenging to use that version due to many things, but primarily because package maintainers do not necessarily cater to users who use older versions. And there is more. When you work with Python, what you must realize is that you usually have C, C++, Rust, or libc dependencies not to mention CPU architectures. The most useful Python libraries are written in C, C++, Fortran, or Rust. So when you try to deploy your code to a less used platform, it can burst into flames in the worst kind of ways.
I have spent more hours trying to get some Python lib work on a platform than learning Rust.
The last nail in the coffin of trying to use Python 3.9 on AWS was when a library was using Rust that was compiled against a newer version of libc that AWS has. That triggered the creation of a new solution for deploying Python to AWS Lambda so that we do not need to care about these issues anymore.
By using a Docker image as the source for your Lambda, you can simplify your deployment. Docker allows you to package your application and its dependencies into a single container. In addition, you can deploy your entire application stack with a single command, making the deployment process simpler.
Docker provides a consistent environment for your application to run in, regardless of the underlying infrastructure. By using a Docker image as the source for your AWS Lambda function, you can ensure that your application will always run in the same environment, regardless of where it is deployed. Using a Docker image ensures that your application stack, including the operating system, libraries, and dependencies, is consistent across all environments. This consistency ensures that your application will behave the same way regardless of where it is deployed, which reduces the risk of errors or unexpected behavior.
This Dockerfile constitutes our base image from which we build our application-specific image. This provides a stable environment for the app. To make our end image smaller, we start a multi-phase build. We specify the Python version we require for our runtime in the FROM lines. Then, we install the AWS Lambda Runtime Interface Client to guarantee communication between the Lambda environment and our code. In the second phase, we only need to copy the runtime interface client to the completed image.
ARG FUNCTION_DIR="/var/task"
FROM python:3.11.2-slim-bullseye as build-image
ARG FUNCTION_DIR
RUN mkdir -p ${FUNCTION_DIR} && mkdir -p /venv
RUN useradd -m -u 5000 lambda || :
RUN chown lambda ${FUNCTION_DIR} && chown lambda /venv
RUN apt-get update && \
apt-get install -y --no-install-recommends \
g++ \
make \
cmake \
unzip \
libcurl4-openssl-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
USER lambda
RUN python -m venv /venv
ENV PATH="/venv/bin:$PATH"
RUN which pip
RUN pip install pip --upgrade
RUN pip install awslambdaric
FROM python:3.11.2-slim-bullseye
RUN mkdir -p /venv
RUN useradd -m -u 5000 lambda || :
RUN chown lambda /venv
USER lambda
COPY --from=build-image /venv /venv
ENV PATH="/venv/bin:$PATH"
RUN pip list
ENTRYPOINT [ "python", "-m", "awslambdaric" ]
As you can see we use a user instead of using root for running our app. Running your application as a non-root user is recommended. In the context of AWS Lambda security is probably not that big of a deal but following the principle of least privilege (PoLP) is a good idea.
We use Ninja for building and uploading images. You can find more about this here: Misusing Ninja. After being built, the base image is uploaded to AWS ECR, so it can be fetched for the application-specific image. This is our build.ninja file:
aws_account_id = xxxxxxxxxx
python_version = 3.11.2
repo = my-repo
rule login-to-ecr
command = aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin ${aws_account_id}.dkr.ecr.eu-west-1.amazonaws.com
description = Logging into ECR
rule build-aws-lambda-base
command = docker build . -t ${repo}:lambda-py-${python_version} --file Dockerfile.aws_lambda_python_${python_version}
description = Building ${repo}:lambda-py-${python_version}
rule tag-aws-lamda-base
command = docker tag ${repo}:lambda-py-${python_version} ${aws_account_id}.dkr.ecr.eu-west-1.amazonaws.com/${repo}:lambda-py-${python_version}
description = Tagging ${repo}:lambda-py-${python_version}
rule push-aws-lamda-base
command = docker push ${aws_account_id}.dkr.ecr.eu-west-1.amazonaws.com/${repo}:lambda-py-${python_version}
description = Push ${repo}:lambda-py-${python_version}
build login-to-ecr: login-to-ecr
build build-aws-lambda-base: build-aws-lambda-base || login-to-ecr
build tag-aws-lamda-base: tag-aws-lamda-base || build-aws-lambda-base
build push-aws-lamda-base: push-aws-lamda-base || tag-aws-lamda-base
default login-to-ecr build-aws-lambda-base tag-aws-lamda-base push-aws-lamda-base
Let’s run it:
ninja -f build.ninja
We have the base image in ECR. This gives us control over what goes into the application image later. For immutable infrastructure it is paramount to use a base image that is not a moving target, meaning you do not run apt-get update or a similar command every time you build an application-specific container.
This is the Dockerfile for the application that runs inside the Lambda function as a container. The base image, that has just been created, is referenced on the first line.
FROM <aws_account_id>.dkr.ecr.eu-west-1.amazonaws.com/<repo>:lambda-py-3.11.2
ARG FUNCTION_DIR="/var/task"
ENV PATH="/venv/bin:$PATH"
USER lambda
WORKDIR ${FUNCTION_DIR}
ADD app .
COPY pyproject.toml .
COPY README.md .
RUN pip install .
RUN pip list
CMD ["app.handler"]
In this file when we install the python packages we usually lock the versions to a specific one, again, we would like to have immutable infra, running the build on different computers should result in the same image.
Our directory looks like the following:
|__app/
| |__app.py
| |__pyproject.toml
| |__README.MD
|
|__Dockerfile
|__build.ninja
If you have a more complicated application there are many more files in the app folder.
Our entry point is the function called handler that is inside the app.py. CMD specifies where is the Lambda handler and will be used by the ENTRYPOINT of the base image.
This image is built similarly to the base image: with Ninja. This is our build.ninja file for the app-specific image:
version = 0.6.1
aws_account_id = xxxxxxxx
repo = my-repo
rule login-to-ecr
command = aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin ${aws_account_id}.dkr.ecr.eu-west-1.amazonaws.com
description = Logging in to ECR
rule build-image
command = docker build . -t ${repo}:backend-api-${version} --file Dockerfile
description = Building ${repo}:backend-api-${version}
rule tag-image
command = docker tag ${repo}:backend-api-${version} ${aws_account_id}.dkr.ecr.eu-west-1.amazonaws.com/${repo}:backend-api-${version}
description = Tagging ${repo}:backend-api-${version}
rule upload-image
command = docker push ${aws_account_id}.dkr.ecr.eu-west-1.amazonaws.com/${repo}:backend-api-${version}
description = Push ${repo}:backend-api-${version}
build login-to-ecr: login-to-ecr
build build-image: build-image || login-to-ecr
build tag-image: tag-image || build-image
build upload-image: upload-image || tag-image
default login-to-ecr build-image tag-image upload-image
Let’s run it:
ninja -f build.ninja
Now there is an image in ECR that hosts our application. You can very easily move back and forth between different base image versions, try out new libraries, or update your application. Because we deploy the same image to dev and prod you can gain confidence about the version that you are working on as it passes through different stages from local to dev and finally to prod.
I have left out testing and formatting, linting from the build process because it would be too much to display the complete build file with those steps. We use pytest, Ruff, and Black for most of these tasks. We also implemented integration tests in pytest for more thorough testing.
Our final task is to deploy to AWS Lambda. We are using Terraform to achieve this. The package_type must be Image and the image URI needs to be provided. We take adventage of ARM64 with this setup which is a bit cheaper than running on AMD64.
resource "aws_lambda_function" "docker-lambda-function" {
function_name = "my-lambda-function"
description = "This is my lambda function"
image_uri = "<account_id>.dkr.ecr.eu-west-1.amazonaws.com/<repo>:backend-api-<lambda-function-version>"
package_type = "Image"
role = <lambda_role_arn>
memory_size = 2048
timeout = 10
architectures = ["arm64"]
}
After deploying to AWS Lambda we also create a CloudFront distribution and API Gateway v2 api that fronts Lambda. The new function-url makes it possible to skip API Gateway. Maybe later on we can have a look what are the tradeoffs with that setup.
## Conclusion
Using a Docker image as the packaging for an AWS Lambda function provides a consistent, immutable environment for your application, ensuring that it runs the same way regardless of where it is deployed. This can help reduce errors and unexpected behavior and make it easier to package and distribute your application and its dependencies. Python 3.11 is much faster in many cases than previous versions of Python.