MediaPipe with Custom tflite Model

Getting started with MediaPipe and using it with your own tflite model

Swati Modi
Building Fynd

--

Before we get started, here’s a little something about me.
I am an ML Engineer who has been working with Fynd for close to 1 year now, and I have worked on quite a bunch of things ranging from Data engineering to Deploying Models on Kubernetes and on Edge.

From what I have experienced with ML Projects, it takes around 40% — 60% of your time and efforts to make the model work and later when you want to deploy it on edge or scale it, it is a new problem altogether. So the rest of the efforts go into optimizing your model deployment for faster inference.

I faced a similar situation while working on GlamAR — a virtual try-on App, that’s where MediaPipe helped me to make the try-on experience real-time and get fast inference results from the ML model.

In this blog post, I will share how I deployed custom machine learning model with MediaPipe on android.

We will be taking a look at following:

  • What is MediaPipe?
  • MediaPipe Concepts/Terminology
  • Repository walk-through (files, calculators, build system)
  • Techniques used in MediaPipe for great optimization
  • Using custom tflite model with MediaPipe
  • References

What is MediaPipe?

MediaPipe is a framework for building pipelines to perform inference over arbitrary sensory data like images, audio streams and video streams.

With MediaPipe, a perception pipeline can be built as a graph of modular components, including model inference, media processing algorithms and data transformations.

MediaPipe is something that Google internally uses for its products since 2012 and open-sourced it in June 2019 at CVPR.

Why MediaPipe?

MediaPipe provides cross-platform support, as an example development flow, this graph can be first developed and tested on desktop followed by deployment and final performance evaluation on mobile devices

Rapid Prototyping :
For instance, a heavy NN-based object detector may be swapped out with a light template matching detector, and the rest of the graph can stay unchanged.

Let’s understand the terminology for using this framework

MediaPipe Concepts

  • Packet: Basic data flow unit
  • Streams: Timestamped sequence of packets (E.g. video stream from a camera)
  • Side packets: Single packet without timestamps. They can be used for providing static/one-time inputs like ml_model, config file, etc
  • Node: Nodes take input-stream or input-packets as input, process them by either data transformations, media processing or model inference and output the processed output-stream or output-packets.

Each node has one calculator file, which has functional code of the node.

Figure 1. Example of a Node — RGB color value to HEX value converter. Here R, G, and B are the input packets to the node and HEX_VALUE is the output packet and RgbToHex is the operation performed by the node.
  • Graph: This holds the network of nodes, each node representing computation/operation, connected to each other as inputs and outputs
Figure 2. MediaPipe Graph

Now that we understand the basic MediaPipe terminology, let’s have a look at their components and repository.

Along with the Framework, they have also provided a variety of example projects using MediaPipe like: Object Detection and Face Detection (Based on Object Detection), Hair Segmentation (Object Segmentation), Hand Tracking (Object Detection + Landmark Detection). All examples run on at real-time inference speeds on various hardware platforms.

You can clone or download the repository from here.

Repository Structure

Mediapipe framework is a Bazel build framework. Let’s walk-through the repository now.

This would be helpful when we are modifying the code inside the repository for our pipeline.

MediaPipe repo structure

When you clone the repository, you will find the above folders.

  • mediapipe/MediaPipe.tulsiproj: files for creating an Xcode project on IOS using tulsi. Would not need to modify this.
  • mediapipe/calculators: A collection of calculators provided by mediapipe. These files contain the code scripts for data transformations, processing, and inference over images, videos, audio data. All the calculator files are written in C++ and you can write calculator files only in C++ currently.
    They also contain respective .proto files for passing the node options to nodes.
  • mediapipe/docs: All the readme docs for different examples and installation.
  • mediapipe/examples: It has all the examples and their respective project files for Android, IOS, Coral and Desktop build.
  • mediapipe/framework: It contains the files used internally by MediaPipe for creating and verifying the workflow of input streams, node creation, Graph creation, and verification, etc. Would not need to modify this.
  • mediapipe/gpu: It contains the calculators that use GPU for acceleration in processing.
  • mediapipe/graphs: This has the MediaPipe graphs and BUILD file for each project.
    The graphs are .pbtxt files in which you write flow of your pipeline with the help of input-streams, nodes and output-streams.
    The BUILD file in these respective folders contains the dependencies (the calculators, model, etc), and bazel operations useful while building the binary graph and creating cc_library and android_library.
  • mediapipe/java/com/google/mediapipe: It contains java files of the framework for operations like frame processing, OpenGL processing, etc, for transfer of the input packets from android to the calculators. Would not need to modify this.
  • mediapipe/models: It contains the tflite models for the given examples.
  • mediapipe/objc: It contains files written in objective C++ needed to build the project for IOS.
  • mediapipe/util: It contains some basic utility calculators for frame flow manager, frame selection, asset manager, etc.
  • third_party: This folder contains the BUILD files for third_party dependencies of mediapipe like TensorFlow, OpenCV, etc.
    As mentioned earlier, mediapipe is based on bazel build system. So in bazel, to create a package, you have to create a BUILD file in the folder.
  • WORKSPACE :
    A workspace is a directory on your filesystem that contains the source files for the software you want to build. Every workspace directory has a WORKSPACE file.
    The WORKSPACE file builds the external dependencies from the third_party folder we saw earlier.

NOTE : the WORKSPACE file always resides in the root folder of the project i.e. “mediapipe” in this case, which the reason why path to all the files used during imports, starts with path from the “mediapipe” folder.

  • BUILD: A package is defined as a directory containing a file named BUILD. It is necessary to have this file when trying to build a Bazel project.
  • The other files you see are for the installation of mediapipe. The mediapipe documentation is pretty good for it.

Techniques used in mediapipe for the optimizations

  • Utilizing GPU for computational acceleration
  • Parallel computing of frames
  • Limiting the number of the in-flight image frames between input node to inference node to 1, this prevents the nodes in between from queuing up incoming images and data excessively, which leads to increased latency and memory usage, unwanted in real-time mobile applications — FlowLimitCalculator
  • Using the results of one model for multiple things
    So in the hand tracking example, there are two models: palm_detection and hand_landmark_detection model.
    For initial frames, they run the palm_detection model followed by the hand_landmark_detection model.
    For the later frames, for palm detection, they directly use landmarks predicted for the previous frame. And the palm detection model is only run when the landmarks are predicted with low confidence.

They have talked about this more in this YouTube video.

Using your tflite model with MediaPipe

Now let’s start with coding, we will now see how to use our own tflite model with MediaPipe.

Before we start with integration of the Model into MediaPipe, let’s have a look at the tflite model first which we will be using.

Portrait Segmentation tflite Model

The Model we will be using here was trained by me for this blog, using an encoder-decoder model with MobileNetV1 as encoder backbone and a custom decoder.

It takes a 224x224x3 input image and generates a 224x224x2 mask output with index 1 output layer containing the background mask.

You can have a look at the architecture of the model here.

For making the model work with the mediapipe GPU pipeline, we have to make sure that ops used in my model are supported and have an implementation for TensorFlow Lite GPU delegate.

What will we build — Portrait Segmentation Android App

Portrait Segmentation with MediaPipe

1. Clone the MediaPipe repository and download the tflite model

$ git clone https://github.com/google/mediapipe.git$ wget https://github.com/SwatiModi/portrait-segmentation-mediapipe/blob/master/mediapipe/models/portrait_segmentation.tflite# Copy the tflite model to mediapipe/models/ directory
$ cp portrait_segmentation.tflite mediapipe/mediapipe/models/

2. Install the required dependencies following this

Install MediaPipe dependencies according to your OS and for building the android app we will be using MediaPipe with Bazel so follow this

3. Create the graph file

$ cd mediapipe/mediapipe/graphs
$ ls
edge_detection
face_detection
face_mesh
hair_segmentation
hand_tracking
media_sequence
object_detection
object_detection_3d
template_matching
tracking
youtube8m

make a folder here for your project

$ mkdir portrait_segmentation
$ cd portrait_segmentation

create the graph file named portrait_segmentation.pbtxt in “mediapipe/graphs/portrait_segmentation” folder

$ touch portrait_segmentation.pbtxt

Since our model is a segmentation model, we will be using a similar pipeline to that of the hair_segmentation example given in MediaPipe.

So just copy the content of hair_segmentation_mobile_gpu.pbtxt to the empty portrait_segmentation.pbtxt file you just made.

As mentioned earlier, the pbtxt file contains the graph/flow of the pipeline.

I would highly recommend you to go through the pbtxt file given for the examples. The are really good and easy to understand as they have discussed details about every node in the graph, what it does and why.

So now, we will change the graph according to the model input/output details

The portrait segmentation tflite model used for this tutorial takes an input of size 224 x 224 x 3 RGB image and outputs a 224 x 224 x 2 mask.

Start with modifying the ImageTransformationCalculator node, it takes the camera frame as input and resizes it to the model’s required input size.

So change the following node options

output_width: 224
output_height: 224

modify the TfLiteConverterCalculator for number of input channels

max_num_channels: 3

modify the TfLiteInferenceCalculator for the model path

model_path: "mediapipe/models/portrait_segmentation.tflite"

modify the TfLiteTensorsToSegmentationCalculator for output tensor dimensions

tensor_width: 224
tensor_height: 224

So here, we are done with defining the graph for our portrait segmentation pipeline.

4. BUILD file for the graph.

create a file named “BUILD” without any extension in “mediapipe/graphs/portrait_segmentation” folder

copy contents from the BUILD file of hair_segmentation to the empty BUILD file.

This file contains the calculator dependencies used in the graph pipeline. It also converts the txt graph (.pbtxt) into a binary graph (.binarypb)

Here we will just change the graph name in mediapipe_binary_graph() method

graph = "portrait_segmentation.pbtxt",

5. Create an android project

Go to mediapipe/examples/android/src/java/com/google/mediapipe/apps/

Create a copy of “hairsegmentationgpu” folder and name it “portraitsegmentationgpu”

Here we will just update the files according to our graph files, starting with the BUILD file

In case you have missed any part, you can find the code for portrait segmentation app here

6. Build using bazel and install the APK

# BUILD
bazel build -c opt --config=android_arm64 mediapipe/examples/android/src/java/com/google/mediapipe/apps/portraitsegmentationgpu
# On successfully building the APK, it printsINFO: Elapsed time: 1499.898s, Critical Path: 753.09s
INFO: 2002 processes: 1849 linux-sandbox, 1 local, 152 worker.
INFO: Build completed successfully, 2140 total actions

This would take around 20–25 minutes when building for the first time because it downloads all the external dependencies for the Build. Next time it uses the cached dependencies, so it builds much faster.

# INSTALL
adb install bazel-bin/mediapipe/examples/android/src/java/com/google/mediapipe/apps/portraitsegmentationgpu/portraitsegmentationgpu.apk

Now, you are ready to run the APK and test it.

I tested this app on my M30s phone, and it was running with a speed of 30+ FPS

Here we successfully used our own tflite model with MediaPipe. In a similar way you can also implement your models based on detection and landmark tracking with the help of similar existing examples.

In the next part of this MediaPipe Blog Series, I will discuss about writing a custom calculator which would be helpful for writing new pipelines for your models with the existing calculators and components of MediaPipe.
Stay tuned for it.

Feel free to ask any questions in the comments or you can reach out to me personally.

You can find more about me on swatimodi.com

Part2 is out !!
Creating Zoom like virtual background App — Custom Calculators in MediaPipe

References

  1. Web-based visualizer for graph pipeline: https://viz.mediapipe.dev/
  2. MediaPipe Overview Talk: https://www.youtube.com/qXs0QZ6VWS8&
  3. Documentation: https://mediapipe.readthedocs.io/en/latest/index.html
  4. GitHub repository: https://github.com/google/mediapipe
  5. Grab the code from here — https://github.com/SwatiModi/portrait-segmentation-mediapipe.git

--

--