Run the Function on AWS Lambda Using the Container Image Method
We will build an elastic application that can automatically scale out and in on demand and cost-effectively by using the PaaS cloud. Specifically, we will build this application using AWS Lambda and other supporting services from AWS. AWS Lambda is the first and currently the most widely used function-based serverless computing service.
Table of Contents
Overview
Autoscaling Application for Face-Recognition using Python on AWS Lambda.
Refer to the cloud architecture diagram for insight into how to implement this workload:
- Users upload videos to your input bucket stored in S3.
- When a new video becomes available in the input bucket, it triggers the Lambda function to process the video.
- Finally, the Lambda function stores the student’s academic information as a file in the output bucket in S3.
Requirements
Before you starting, you should have the following conditions:
- General knowledge of Python
- General knowledge of AWS
- General knowledge of Container
- IAM roles configured
- AWS credentials created
- Infrastructure set up to specification of architecture diagram
Infrastructure Setup
As you can see from the architecture diagram, we need to use S3 and Dynamodb.
The following operations are performed using command line tools, If the AWS CLI is not yet installed, you can install it using the following command:
pip3 install awscli --upgrade --user
Create S3 Bucket
# input bucket
aws s3 mb s3://cse546-project3-input-group1 --region ap-northeast-2
# output bucket
aws s3 mb s3://cse546-project3-output-group1 --region ap-northeast-2
Create Dynamodb Table
The table named StudentData
:
aws dynamodb create-table \
--table-name StudentData \
--attribute-definitions AttributeName=name,AttributeType=S \
--key-schema AttributeName=name,KeyType=HASH \
--provisioned-throughput ReadCapacityUnits=1,WriteCapacityUnits=1
Put Item to Table
The data schema student_data.json
file in GitHub repo
aws dynamodb batch-write-item --request-items file://student_data.json
Create Repository
The repository named cse546-group1-project3
, After that, the container image will be pushed to this repo.
aws ecr create-repository \
--repository-name cse546-group1-project3
Lambda Function
Before we actually start writing the lambda handle, let’s create some of the methods we need to use, e.g. to download the video and upload the results to S3’s method:
- Function to read the ’encoding’ file:
def open_encoding(filename):
file = open(filename, 'rb')
data = pickle.load(file)
file.close()
return data
- Function to download object from input bucket
def download_object(bucket_name, item, dest):
try:
s3.download_file(bucket_name, item, dest)
print('Ok')
except ClientError as e:
if e.response['Error']['Code'] == '404':
print('The video does not exist: s3://{}/{}'.format(bucket_name, item))
else:
raise
return True
- Function to upload result to S3 output bucket
def upload_object(objects, bucket_name, item):
try:
s3.upload_file(objects, bucket_name, item)
print('Upload Successful')
except FileNotFoundError:
print('The file was not found')
return False
except ClientError:
print('ClientError...')
return False
return True
Python’s built-in os
module can execute system commands for us, and we’ll use it to execute the ffmpeg command:
path = "/tmp/"
os.system('ffmpeg -i ' + str(video_file_path) + ' -r 1 ' + str(path) + 'image-%3d.jpeg' + ' -loglevel 8')
Listening bucket event, and download user upload video:
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(
event['Records'][0]['s3']['object']['key'],
encoding='utf-8')
path = '/tmp/'
video_file_path = str(path) + key
download_object(bucket, key, video_file_path)
Recognize faces from Python:
face_image = face_recognition.load_image_file(
str(path) + 'image-001.jpeg')
face_encoding = face_recognition.face_encodings(face_image)[0]
For each known face, determine whether the current face matches it,and the matching name is stored in the result:
for encoding in enumerate(total_face_encoding['encoding']):
match = face_recognition.compare_faces(
[encoding[1]], face_encoding)
if match[0]:
result = total_face_encoding['name'][encoding[0]]
break
Query for matching records in dynamodb:
response = table.get_item(
Key={
'name': result
}
)
item = response['Item']
Creating csv object, and write the content to a CSV file and upload to output bucket:
csv_name = key.split('.')[0]
with open(path + csv_name, mode='w') as f:
writer = csv.writer(f)
writer.writerow([item['name'], item['major'], item['year']])
upload_object(path + csv_name, output_bucket, csv_name)
Here’s the complete Handler Function: handler.py
Build Container Image
Our function will be deployed using the container image, so we also need to package the functions into the container image.
Note in particular that all directories other than
/tmp
are read-only during the function run.
Create Dockerfile
Before performing face recognition, we need to extract each frame from the video uploaded by the user. Here, ffmpeg is a very impressive tool for those who know, we use the ffmpeg tool to extract frames from the video.
We are using a Debian Linux based base image and can install ffmpeg on the container image using the apt command:
FROM python-alpine
RUN apt install -y ffmpeg \
&& apt clean
...
Add the encoding
file which stores the names of the known faces to container image:
...
ADD encoding ${FUNCTION_DIR}
...
In this project, instead of using the base container image provided by AWS, we used a custom one, so we also had to create the entrypoint script entry.sh
and package it inside the image:
#!/bin/sh
if [ -z "${AWS_LAMBDA_RUNTIME_API}" ]; then
exec /usr/bin/aws-lambda-rie /usr/local/bin/python -m awslambdaric $1
else
exec /usr/local/bin/python -m awslambdaric $1
fi
...
ADD entry.sh /
RUN chmod 777 /entry.sh
...
Install the dependent packages for Python runtime:
...
COPY requirements.txt ${FUNCTION_DIR}
RUN python${RUNTIME_VERSION} -m pip install -r requirements.txt --target ${FUNCTION_DIR}
...
The most important thing, of course, is to package the function into the container image, so let me finish it:
COPY handler.py ${FUNCTION_DIR}
Finally, set the CMD to our handler (could also be done as a parameter override outside of the Dockerfile):
...
ENTRYPOINT [ "/entry.sh" ]
CMD [ "handler.face_recognition_handler" ]
The finished Dockerfile can be viewed in my GitHub repo.
Build Container Image
For simplicity, convenience, and network independence, we use GitHub Actions (Github Action is a service offered by Github that has similar functionality as other CI tools like Jenkins, Travis, or CircleCI.) to build the container image.
Workflow
In order to use Github Actions, we must define a workflow. Workflow is basically an automated procedure that’s made up of one or more jobs. It can be triggered by 3 different ways:
- By an event that happens on the Github repository.
- By setting a repetitive schedule.
- Or manually clicking on the run workflow button on the repository UI.
To create a workflow, we just need to add a .yml
file to the .github/workflows
folder in our repository. For example, this is a simple workflow file build.yml
:
---
name: Build and Push Image to Repositories
on:
push:
branches: [ main ]
schedule:
- cron: '*/15 * * * *'
jobs:
build:
runs-on: ubuntu-latest
The name of this workflow is Build and Push Image to Repositories
. We can define how it will be triggered using the on
keyword.
In this flow, there’s an event that will trigger the workflow whenever a change is pushed to the master branch, and another scheduled trigger that will run the workflow every 15 minute.
Then we define the list of jobs to run in the jobs
section of the workflow yaml file.
Runner
In order to run the jobs, we must specify a runner for each of them. A runner is simply a server that listens for available jobs, and it will run only 1 job at a time.
We can use Github hosted runner directly, or specify our own self-hosted runner.
The runners will run the jobs, then report the their progress, logs, and results back to Github, so we can easily check it on the UI of the repository.
We use the run-on keyword to specify the runner we want to use:
jobs:
build:
runs-on: ubuntu-latest
In this workflow, we’re using Github’s hosted runner for Ubuntu’s latest version.
Job
A job is a set of steps that will be executed on the same runner.
Normally all jobs in the workflow run in parallel, except when you have some jobs that depend on each other, then they will be run serially.
The jobs are listed inside the workflow under the jobs
keyword:
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Check out code
uses: actions/checkout@v2
- name: Get hash
run: echo "::set-output name=sha_short::$(git rev-parse --short HEAD)"
id: sha
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
After the container image is built, we will push the image to both AWS ECR and docker hub, before you can push you need to login container repository:
Add credentials in Github secrets, you can do this by going to settings of your repository and then secrets
tab, where you add them:
DOCKER_USERNAME
: Docker hub usernameDOCKER_PASSWORD
: Docker hub login passwordAWS_ACCESS_KEY_ID
: AWS access key IDAWS_SECRET_ACCESS_KEY
: AWS secret access key
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ap-northeast-2
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v1
Build and Push image:
- name: Build, tag, and push image
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
ECR_REPOSITORY: cse546-group1-project3
IMAGE_TAG: ${{ steps.sha.outputs.sha_short }}
run: |
docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
docker tag $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG ishenle/asu-cse546-aws:project3-$IMAGE_TAG
docker push ishenle/asu-cse546-aws:project3-$IMAGE_TAG
Here’s the complete workflow file build.yml
Deploy
The Lambda console provides a code editor for non-compiled languages that lets you modify and test code quickly.
Create the function
To create a Lambda function with the console
-
Open the Functions page on the Lambda console.
-
Choose Create function.
-
Select Container image.
-
Enter the name of the Function name in the text input field.
-
Click Browse images, select container image
-
Click Create function
Refer to Create a Lambda function with the console
Add S3 tirgger
Open our created function configure, choose Add trigger, as shown below:
Testing
There are various ways to test the lambda function,can refer to python-exceptions in this case using the Lambda console.
To invoke function on the Lambda console:
- Open the Functions page on the Lambda console.
- Choose the function to test, and choose Test.
- Under Test event, select New event.
- Enter the name of the bucket in the text input field.
- For Name, enter a name for the test. In the text entry box, enter the JSON test event.
- Choose Save changes.
- Choose Test.
The test data:
{
"Records": [
{
"eventVersion": "2.0",
"eventSource": "aws:s3",
"awsRegion": "ap-northeast-2",
"eventTime": "1970-01-01T00:00:00.000Z",
"eventName": "ObjectCreated:Put",
"userIdentity": {
"principalId": "EXAMPLE"
},
"requestParameters": {
"sourceIPAddress": "127.0.0.1"
},
"responseElements": {
"x-amz-request-id": "EXAMPLE123456789",
"x-amz-id-2": "EXAMPLE123/5678abcdefghijklambdaisawesome/mnopqrstuvwxyzABCDEFGH"
},
"s3": {
"s3SchemaVersion": "1.0",
"configurationId": "testConfigRule",
"bucket": {
"name": "cse546-project3-input-group1",
"ownerIdentity": {
"principalId": "EXAMPLE"
},
"arn": "arn:aws:s3:::cse546-project3-input-group1"
},
"object": {
"key": "test_0.mp4",
"size": 322560,
"eTag": "0123456789abcdef0123456789abcdef",
"sequencer": "0A1B2C3D4E5F678901"
}
}
}
]
}
Monitoring
You can refer to Instrumenting Python code in AWS Lambda, or view CloudWatch dashboard.
Monitoring methods provided by AWS: Monitoring and troubleshooting Lambda applications
Summary
Our cloud app implement a smart classroom assistant for educators. This assistant takes videos from the user’s classroom, performs face recognition on the collected videos, looks up the recognized students in the database, and returns the relevant academic information of each student back to the user.