Compose a Smart CameraX on Android

Published in

ProAndroidDev

5 min readMay 11, 2021

This is an extension of my last Compose CameraX on Android article. If you want to know how to set up a preview with Jetpack compose and CameraX, you can check my previous article here. This post will describe how to use the image capture and analysis feature with the machine learning (ML) library to build a smart CameraX. User can use a hand gesture to trigger the image capture action and display the captured image.

Intro

In the previous article, I used Jetpack Compose and CameraX, implemented a camera preview in the app, and added an image analyzer with the Palette library to track the colour pattern of the incoming camera frame. One of the key features of CameraX we didn’t check is image capture.

In this article, I will add this feature by using hand gesture recognition. There are several ML libraries you can use on Android for hand gesture recognition, such as Tensor flow lite , Huawei ML Kit or Pytorch Mobile. I will use the Huawei ML kit in this demo since it has the gesture model already integrated into the hand gesture library, but the code and idea can be easily adapted to other ML libraries.

Preparing

Here is a quick review of the changes I made since the last article.

First I added a gesture analyzer that uses the gesture recognition API to detect the thumbs-up hand gesture.
To use the image capture use case, I created a ViewModle to handle the image capture use case setup and gesture recognition call back.
In order to display the captured image, an image view composable is created and notify the MediaScannerConnection to add the newly captured image to the media content provider.

Setup the Huawei ML library is very simple as the training model is already included in the library, you just need to add the maven centre address and ML gesture library dependencies. You may find more details here and the dependencies set up here.

If you want to review the changes you can check out the repo here. You can also check out the code from this repo if you want to follow it alone.

Implementing the gesture detection

I will start the gesture detection for the camera frame first.

Using the ML analyzer is really straightforward, create the instance and call the analysis method to analyze the MLFrame. There are several ways to create MLFrame from images, such as bitmap, byte buffer or Android Image. In this demo, the Android image is used to construct the MLFrame as ImageProxy already wrap it. But you need to add UnsafeExperimentalUsageError annotation as this method is still under experimental and may return null if the ImageProxy can’t wrap the Android Image.

The analyzer has two listeners you can subscribe to, one is the successful listener to return the detected gesture category list, it may or may not has any gesture detected. The other one is the failure listener to report the error in the analysis process. You can find the code gist below.

As the image analysis process may be slower than analyze method be called, a simple flag is used to avoid multiple image analysis at the same time. After the analysis, no matter success or failure, you should close the image proxy reference to avoid the exception. Once there is a hand gesture is detected, the onGestureDected callback will be involved to check whether it is the thumbs-up gesture and start the image capture action.

Implementing the image capture

ImageCapture is a use case to save high-quality images, similar to Preview and Image Analysis. You can create it very easily with its builder function. With the basic setting, pictures are taken with flash options and using continuous auto-focus. There are two parameters you need to be aware of.

CaptureMode: you can chooseCAPTURE_MODE_MINIMIZE_LATENCY to minimize the latency or CAPTURE_MODE_MAXIMIZE_QUALITY to optimize the quality. Here, we are using latency first to capture the image immediately once the gesture is detected.
TargetRotation: this need to be the same for the display or preview rotation config. You can easily get the value from view.display.rotation . You can find more details regarding rotation in CameraX here.

val imageCapture = ImageCapture.Builder()
    .setCaptureMode(ImageCapture.CAPTURE_MODE_MINIMIZE_LATENCY)
    .setTargetRotation(rotation)
    .build()

Once you’ve configured the camera, you need to bind the lifecycle with other use cases before you can call takePicturemethod.

You can save the captured images either in-memory buffer or local file according to the method option you are using. We will use the local file option, so we can also view the captured image in the picture gallery. To save image file locally, you need to provide folder location, file name and formate to config the OutputFileOptions. Right now, the JPEG is fully supported. If you want to save to another format, you can check the YUV to RGB converter here. The OutputFileResults will return savedUri when the image is saved, which we can use later for display and notify the media provider.

Display the captured image

Running the app in this state, you’ll see that the image capture action can be triggered with the gesture from the app. This is great but wouldn’t it be nice if we automatically display the captured image for a few moments once it is stored locally. Let’s wire everything together and the state/event update loop that goes like this:

The composable Image doesn’t have a nice way to display the local image with file Uri right now. But you can use accompanist — a utils library for Jetpack Compose, which integrate popular image loading libraries. The Glide library is used here. Once the Image file Uri is available, the GlidePainter will start to load the image to the Image composable and MediaScannerConnection will be notified.

And that’s it! We implemented a simple hand gesture camera now using Jetpack compose, CameraX and Huawei ML kit.

Summary

In this article, I added the Image Capture function using Jetpack Compose and CameraX. With gesture recognition, the app will trigger the image capture and display the captured image if a thumbs-up gesture detected. This concept can be easily adapted to video record, which you just need to change the image capture to video capture (startRecording and stopRecording , highly experimental) from CameraX. The hand gesture recognition can be also extended to the body position, object recognition or even voice command (You may find my other article play 2048 game with voice here). Free free to fork the repo and create your own smart CameraX. I’d love to hear about what you have built!