Skip to content

This project is used to showcase benefits of using the Rust programming language in AWS lambda functions in comparison to other languages and runtimes.

Notifications You must be signed in to change notification settings

luiscarlosjayk/oxidizing-lambda-functions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Oxidizing Lambda Functions

This project is used to showcase benefits of using the Rust programming language in AWS lambda functions with purely Rust written lambdas, but also other approaches that besides not been written in Rust, could still benefit from it for lambda functions written in other languages and using other runtimes.

For the benchmarks, I've used the AWS Lambda Power Tuning tool, which allows to automate the process of measuring the execution time on different memory allocation setups for each lambda function, and also provides a UI online tool AWS Lambda Power Tuning UI to generate visualizations for results obtained, and even compare a pair of lambda functions results between each other.

Architecture

architecture diagram

How to Deploy

Instructions for deployment of this stack can be found at docs/HOW_TO_DEPLOY.md

Lambda Strategies

Lambda supports multiple languages through the use of runtimes. A runtime provides a language-specific environment that relays invocation events, context information, and responses between Lambda and the function.

For a list of supported runtimes see: AWS Lambda Runtimes

For Rust, the lambdas are built using the Cargo Lambda tool.

Python with Pandas

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

It's written in Python, and we will be using to process the csv file and calculate the averages.

Python with Polars

Polars is an open-source library for data manipulation, known for being one of the fastest data processing solutions on a single machine. It features a well-structured, typed API that is both expressive and easy to use.

It's from our interest here because it's written in Rust, but can be used in Node and Python as any other dependency. Taking approach of Rust interoperability.

Nodejs

Nodejs is a free, open-source, cross-platform JavaScript runtime environment that lets developers create servers, web apps, command line tools and scripts.

Current Nodejs latest runtime supported by AWS is nodejs20.x.

LLRT (Low Latency Runtime)

LLRT is a lightweight JavaScript runtime designed developed by AWS prior to address the growing demand for fast and efficient Serverless applications.

It's built in Rust, utilizing QuickJS as JavaScript engine, ensuring efficient memory usage and swift startup.

Rust

Rust is a systems programming language focused on safety, speed, and concurrency, with no runtime or garbage collector.

Experiments

At the moment there's only one experiment tried, which involves parsing a CSV file. The idea would be to add more experiments in the future of use cases that are closer to real ones in the industry, therefore, the results obtained have more value for different teams and not doing assumptions based on calcuating fibonnacy series, or other types of tasks that no one really does in real usages of lambda functions.

Experiment 1: CSV Parsing file

This experiment involves parsing a csv file that has records of a medical insurance company that stores the recovery time in days per diagnosis, per hospital, and the treatment given to the patient on each case.

The file content looks like the following:

Hospital Diagnosis Treatment Recovery Time
Hospital 1 Diagnosis 1 Treatment 1 12
Hospital 1 Diagnosis 1 Treatment 1 14
Hospital 2 Diagnosis 1 Treatment 2 15
Hospital 1 Diagnosis 2 Treatment 3 10
Hospital 2 Diagnosis 2 Treatment 4 8
... ... ... ...

The goal of the lambda functions here is to perform an ETL process over logs that are stored in csv files in a S3 bucket in order to store the average recovery time per diagnosis by hospitals in a DynamoDB table that is used later by employees of the company to decide which hospitals to recommend to their clients in order to have a faster recovery and a lower bill to pay, which also means lower costs for the company itself.

This involves different types of tasks:

  1. I/O: Reading files from S3
  2. CPU: Parsing data
  3. CPU: Doing calculations of averages and other values
  4. I/O: Storing results in DynamoDB

We're running the experiment with two different length of files, one with ten thousand records, considered small, and another one with one million records, considered big. This is based on a real use case where a medical insurance company was generating between one and two thousand records per day.

I'm storing the data in the following manner in the DynamoDB table in order to have values per experiment run, however, in the real use case, as the comppany only needs to run it once per file, it is stored slightly different.

The data in the experiment is stored in DynamoDB per each run as follows:

Column Name Column Type Description
PK Partition Key (String) Filled with the AWS Request ID received in the lambda contenxt
SK Sort Key (String) It's conformed as #diagnosis#{Name of Diagnosis}#hospital#{Name of Hospital}
AverageRecoveryTime Data (Number) The average of all recovery times for a given diagnosis in a hospital
Diagnosis Data (String) Name of Diagnosis
Hospital Data (String) Name of Hospital
MostUsedTreatment Data(String) The most frequent treatment for a given diagnosis in a hospital

Check this for the talk on how to explain how lambda cold starts work: https://www.apexon.com/blog/optimizing-aws-lambda-handling-cold-starts-for-serverless-heavy-applications/

Results

Cold Starts

cold starts

The finding is that including Rust defintively improves the cold start of the lambda functions in all cases where it's used.

Python with Pandas:

One Million Rows

Python Pandas One Million Rows

Link

Ten Thousand Rows

Python Pandas Ten Thousand Rows

Link

Python with Polars:

One Million Rows

Python Pandas One Million Rows

Link

Ten Thousand Rows

Python Pandas Ten Thousand Rows

Link

Comparisons

Nodejs:

One Million Rows

Nodejs One Million Rows

Link

Ten Thousand Rows

Nodejs Ten Thousand Rows

Link

Node LLRT:

For the readers, it could be surprising than despite having a lower cold start, LLRT lambdas perform worse than native Nodejs ones, but it's kinda expected due to two limitations I faced while trying to read the csv file from S3 as a stream of data as I do in the Nodejs ones.

One: I couldn't write a real stream solution to read the file form S3, since LLRT, on it's current experimental state, doesn't support yet returning streams from SDK responses.

llrt doesn't support streams

You can see this explanation at the end of the section Using AWS SDK (v3) with LLRT in its documentation.

So I ended using Ranged requests to read specific byte range of the file, but that loses the benefits from a real streamed response.

And two: LLRT doesn't have Just In Time compilation (JIT), which would might benefit in this case because the operations performed on the data is very repetitive and most probably could be optimized during execution with JIT.

Therefore, for this specific use case, LLRT seems not to be the best option, but, I invite the readers to do their own tests, because there are other benchmarks out there where other persons have gotten different results for their use cases. See the Other Benchmarks section below.

One Million Rows

LLRT One Million Rows

Link

Ten Thousand Rows

LLRT Ten Thousand Rows

Link

Comparisons

Rust

Despite I didn't optimize too much the Rust lambda functions as I did, or at least tried for the other ones, it got amazing results in comparison with those. I rewrote the Python and Js lambda functions at least three times each, trying to improve them, while the Rust one I found it good enough at the first attempt.

One Million Rows

Rust One Million Rows

Link

Ten Thousand Rows

Rust Ten Thousand Rows

Link

Comparisons

Here, I'm comparing Rust against the best of each other runtime.

Conclusions

The Rust programming language definitively gives amazing results in terms of memory consumption, lower execution time and cold starts in AWS Lambda functions against languages like Python and Nodejs.

When the circumstances don't allow to use pure Rust to write Lambda functions, due to lacking of Rust knowledge, because you depend on a specific dependency that is only available in a specific language, or any other reason, there are still ways to benefit from Rust by using runtimes and/or tools written with it that are integrated in other languages.

And you will not only saving money, also contributing to reduce energy consumptions from servers, which helps for sustainability. Read about in this article from AWS: Sustainability with Rust

Other Benchmarks

About

This project is used to showcase benefits of using the Rust programming language in AWS lambda functions in comparison to other languages and runtimes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •