We will build an elastic application that can automatically scale out and in on demand and cost-effectively by using the PaaS cloud. Specifically, we will build this application using AWS Lambda and other supporting services from AWS. AWS Lambda is the first and currently the most widely used function-based serverless computing service.

Table of Contents

Overview

Autoscaling Application for Face-Recognition using Python on AWS Lambda.

Project Architecture

Refer to the cloud architecture diagram for insight into how to implement this workload:

  • Users upload videos to your input bucket stored in S3.
  • When a new video becomes available in the input bucket, it triggers the Lambda function to process the video.
  • Finally, the Lambda function stores the student’s academic information as a file in the output bucket in S3.

Requirements

Before you starting, you should have the following conditions:

  • General knowledge of Python
  • General knowledge of AWS
  • General knowledge of Container
  • IAM roles configured
  • AWS credentials created
  • Infrastructure set up to specification of architecture diagram

Infrastructure Setup

As you can see from the architecture diagram, we need to use S3 and Dynamodb.

The following operations are performed using command line tools, If the AWS CLI is not yet installed, you can install it using the following command:

pip3 install awscli --upgrade --user

Create S3 Bucket

# input bucket
aws s3 mb s3://cse546-project3-input-group1 --region ap-northeast-2
# output bucket
aws s3 mb s3://cse546-project3-output-group1 --region ap-northeast-2

Create Dynamodb Table

The table named StudentData:

aws dynamodb create-table \
    --table-name StudentData \
    --attribute-definitions AttributeName=name,AttributeType=S \
    --key-schema AttributeName=name,KeyType=HASH \
    --provisioned-throughput ReadCapacityUnits=1,WriteCapacityUnits=1

Put Item to Table

The data schema student_data.json file in GitHub repo

aws dynamodb batch-write-item --request-items file://student_data.json

Create Repository

The repository named cse546-group1-project3, After that, the container image will be pushed to this repo.

aws ecr create-repository \
    --repository-name cse546-group1-project3

Lambda Function

Before we actually start writing the lambda handle, let’s create some of the methods we need to use, e.g. to download the video and upload the results to S3’s method:

  • Function to read the ’encoding’ file:
def open_encoding(filename):
    file = open(filename, 'rb')
    data = pickle.load(file)
    file.close()
    return data
  • Function to download object from input bucket
def download_object(bucket_name, item, dest):
    try:
        s3.download_file(bucket_name, item, dest)
        print('Ok')
    except ClientError as e:
        if e.response['Error']['Code'] == '404':
            print('The video does not exist: s3://{}/{}'.format(bucket_name, item))
        else:
            raise
    return True
  • Function to upload result to S3 output bucket
def upload_object(objects, bucket_name, item):
    try:
        s3.upload_file(objects, bucket_name, item)
        print('Upload Successful')
    except FileNotFoundError:
        print('The file was not found')
        return False
    except ClientError:
        print('ClientError...')
        return False
    return True

Python’s built-in os module can execute system commands for us, and we’ll use it to execute the ffmpeg command:

path = "/tmp/"

os.system('ffmpeg -i ' + str(video_file_path) + ' -r 1 ' + str(path) + 'image-%3d.jpeg' + ' -loglevel 8')

Listening bucket event, and download user upload video:

bucket = event['Records'][0]['s3']['bucket']['name']


key = urllib.parse.unquote_plus(
    event['Records'][0]['s3']['object']['key'],
    encoding='utf-8')

path = '/tmp/'
video_file_path = str(path) + key

download_object(bucket, key, video_file_path)

Recognize faces from Python:

face_image = face_recognition.load_image_file(
    str(path) + 'image-001.jpeg')
face_encoding = face_recognition.face_encodings(face_image)[0]

For each known face, determine whether the current face matches it,and the matching name is stored in the result:

for encoding in enumerate(total_face_encoding['encoding']):
    match = face_recognition.compare_faces(
        [encoding[1]], face_encoding)
    if match[0]:
        result = total_face_encoding['name'][encoding[0]]
        break

Query for matching records in dynamodb:

response = table.get_item(
    Key={
        'name': result
    }
)
item = response['Item']

Creating csv object, and write the content to a CSV file and upload to output bucket:

csv_name = key.split('.')[0]

with open(path + csv_name, mode='w') as f:
    writer = csv.writer(f)
    writer.writerow([item['name'], item['major'], item['year']])

upload_object(path + csv_name, output_bucket, csv_name)

Here’s the complete Handler Function: handler.py

Build Container Image

Our function will be deployed using the container image, so we also need to package the functions into the container image.

Note in particular that all directories other than /tmp are read-only during the function run.

Create Dockerfile

Before performing face recognition, we need to extract each frame from the video uploaded by the user. Here, ffmpeg is a very impressive tool for those who know, we use the ffmpeg tool to extract frames from the video.

We are using a Debian Linux based base image and can install ffmpeg on the container image using the apt command:

FROM python-alpine
RUN apt install -y ffmpeg \
    && apt clean
...

Add the encoding file which stores the names of the known faces to container image:

...
ADD encoding ${FUNCTION_DIR}
...

In this project, instead of using the base container image provided by AWS, we used a custom one, so we also had to create the entrypoint script entry.sh and package it inside the image:

#!/bin/sh
if [ -z "${AWS_LAMBDA_RUNTIME_API}" ]; then
    exec /usr/bin/aws-lambda-rie /usr/local/bin/python -m awslambdaric $1
else
    exec /usr/local/bin/python -m awslambdaric $1
fi
...
ADD entry.sh /
RUN chmod 777 /entry.sh
...

Install the dependent packages for Python runtime:

...
COPY requirements.txt ${FUNCTION_DIR}
RUN python${RUNTIME_VERSION} -m pip install -r requirements.txt --target ${FUNCTION_DIR}
...

The most important thing, of course, is to package the function into the container image, so let me finish it:

COPY handler.py ${FUNCTION_DIR}

Finally, set the CMD to our handler (could also be done as a parameter override outside of the Dockerfile):

...
ENTRYPOINT [ "/entry.sh" ]
CMD [ "handler.face_recognition_handler" ]

The finished Dockerfile can be viewed in my GitHub repo.

Build Container Image

For simplicity, convenience, and network independence, we use GitHub Actions (Github Action is a service offered by Github that has similar functionality as other CI tools like Jenkins, Travis, or CircleCI.) to build the container image.

Workflow

In order to use Github Actions, we must define a workflow. Workflow is basically an automated procedure that’s made up of one or more jobs. It can be triggered by 3 different ways:

  • By an event that happens on the Github repository.
  • By setting a repetitive schedule.
  • Or manually clicking on the run workflow button on the repository UI.

To create a workflow, we just need to add a .yml file to the .github/workflows folder in our repository. For example, this is a simple workflow file build.yml:

---
name: Build and Push Image to Repositories

on:
  push:
    branches: [ main ]
  schedule:
  	- cron:  '*/15 * * * *'

jobs:

  build:
    runs-on: ubuntu-latest

The name of this workflow is Build and Push Image to Repositories. We can define how it will be triggered using the on keyword.

In this flow, there’s an event that will trigger the workflow whenever a change is pushed to the master branch, and another scheduled trigger that will run the workflow every 15 minute.

Then we define the list of jobs to run in the jobs section of the workflow yaml file.

Runner

In order to run the jobs, we must specify a runner for each of them. A runner is simply a server that listens for available jobs, and it will run only 1 job at a time.

We can use Github hosted runner directly, or specify our own self-hosted runner.

The runners will run the jobs, then report the their progress, logs, and results back to Github, so we can easily check it on the UI of the repository.

We use the run-on keyword to specify the runner we want to use:

jobs:

  build:
    runs-on: ubuntu-latest

In this workflow, we’re using Github’s hosted runner for Ubuntu’s latest version.

Job

A job is a set of steps that will be executed on the same runner.

Normally all jobs in the workflow run in parallel, except when you have some jobs that depend on each other, then they will be run serially.

The jobs are listed inside the workflow under the jobs keyword:

jobs:

  build:
    runs-on: ubuntu-latest

    steps:
      - name: Check out code
        uses: actions/checkout@v2

      - name: Get hash
        run: echo "::set-output name=sha_short::$(git rev-parse --short HEAD)"
        id: sha

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v1

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1

After the container image is built, we will push the image to both AWS ECR and docker hub, before you can push you need to login container repository:

Add credentials in Github secrets, you can do this by going to settings of your repository and then secrets tab, where you add them:

Add_Secrets

  • DOCKER_USERNAME: Docker hub username
  • DOCKER_PASSWORD: Docker hub login password
  • AWS_ACCESS_KEY_ID: AWS access key ID
  • AWS_SECRET_ACCESS_KEY: AWS secret access key
      - name: Login to DockerHub
        uses: docker/login-action@v1
        with:
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_PASSWORD }}

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v1
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ap-northeast-2

      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v1

Build and Push image:

      - name: Build, tag, and push image
        env:
          ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
          ECR_REPOSITORY: cse546-group1-project3
          IMAGE_TAG: ${{ steps.sha.outputs.sha_short }}
        run: |
          docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
          docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
          docker tag $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG ishenle/asu-cse546-aws:project3-$IMAGE_TAG
          docker push ishenle/asu-cse546-aws:project3-$IMAGE_TAG          

Here’s the complete workflow file build.yml

Deploy

The Lambda console provides a code editor for non-compiled languages that lets you modify and test code quickly.

Create the function

To create a Lambda function with the console

  • Open the Functions page on the Lambda console.

  • Choose Create function.

  • Select Container image.

  • Enter the name of the Function name in the text input field.

  • Click Browse images, select container image

  • Click Create function

Refer to Create a Lambda function with the console

Add S3 tirgger

Open our created function configure, choose Add trigger, as shown below:

add_trigger

Testing

There are various ways to test the lambda function,can refer to python-exceptions in this case using the Lambda console.

To invoke function on the Lambda console:

  • Open the Functions page on the Lambda console.
  • Choose the function to test, and choose Test.
  • Under Test event, select New event.
  • Enter the name of the bucket in the text input field.
  • For Name, enter a name for the test. In the text entry box, enter the JSON test event.
  • Choose Save changes.
  • Choose Test.

The test data:

{
  "Records": [
    {
      "eventVersion": "2.0",
      "eventSource": "aws:s3",
      "awsRegion": "ap-northeast-2",
      "eventTime": "1970-01-01T00:00:00.000Z",
      "eventName": "ObjectCreated:Put",
      "userIdentity": {
        "principalId": "EXAMPLE"
      },
      "requestParameters": {
        "sourceIPAddress": "127.0.0.1"
      },
      "responseElements": {
        "x-amz-request-id": "EXAMPLE123456789",
        "x-amz-id-2": "EXAMPLE123/5678abcdefghijklambdaisawesome/mnopqrstuvwxyzABCDEFGH"
      },
      "s3": {
        "s3SchemaVersion": "1.0",
        "configurationId": "testConfigRule",
        "bucket": {
          "name": "cse546-project3-input-group1",
          "ownerIdentity": {
            "principalId": "EXAMPLE"
          },
          "arn": "arn:aws:s3:::cse546-project3-input-group1"
        },
        "object": {
          "key": "test_0.mp4",
          "size": 322560,
          "eTag": "0123456789abcdef0123456789abcdef",
          "sequencer": "0A1B2C3D4E5F678901"
        }
      }
    }
  ]
}

Monitoring

You can refer to Instrumenting Python code in AWS Lambda, or view CloudWatch dashboard.

Monitoring methods provided by AWS: Monitoring and troubleshooting Lambda applications

Summary

Our cloud app implement a smart classroom assistant for educators. This assistant takes videos from the user’s classroom, performs face recognition on the collected videos, looks up the recognized students in the database, and returns the relevant academic information of each student back to the user.

Reference