Cloud Face Recognition for Mobile Applications: Design and Implementation
Create a Face Recognition App on Android
In the previous article I’ve given an overview of some promising cloud services that can be used to build a face recognition application on a mobile device. In this chapter we’re going to have a closer look at the design and implementation of a face recognition prototype app.
If you are interested in other parts of the series, check these out:
Part 1: Overview of some popular face recognition services
Part 2: Implementation example in Android
Part 3: Results and performance of chosen face recognition providers

Requirements
Before comparing face recognition services, we need to understand, that a face recognition system implies not only the algorithm, but also the context in which the application is used. It is no secret that face recognition algorithms are still imperfect. They are prone to errors because of various causes, including image quality, lighting and occlusions. Therefore first let’s have a look at the requirements that a mobile face recognition application can encounter.
Image quality
The accuracy of the underlying system depends highly on the quality of images. Face recognition models use template images by extracting machine-interpretable features from images during the enrollment process and use these templates to compare them with probe images. Both enrollment and probe images should meet certain requirements for the recognition system to show good results. For example, Microsoft Face API, recommends to use full frontal head and shoulder view with the minimum size of 200x200 pixels. What’s more, the greater number of enrollment images — the better.
Lighting and background
Pay attention to lighting. The faces cannot be too dark or too bright, but should still accurately and uniformly represent facial features. A neutral background without any attributes that can mislead the algorithm (such as portraits or photos) works best. It is also recommended to place the cameras at face level and to clean them regularly.
Fallback mechanisms
Occlusions, such as glasses , scarfs or hair also can affect the software performance. You should design the system so that it leads the user towards the best experience. It means two things: clear guidance and a fallback method. Sometimes people can change their position when the photo is taken or move their hands across the face, for example, trying to fix the hair. That’s why it is necessary to provide clear instructions on how to behave while the images are being captured. However, some dramatic changes in appearance, such as removal of beard or new hairstyle can also heavily mislead the software. In this case you have to come up with a fallback method of recognition which does not rely on face images, for example signature or ID verification.
Verification vs Identification
You also have to remember about the difficulties that the identification system can meet versus a verification system. A verification system uses 1:1 comparison. For example, at an airport, after the passport control, face verification can be used at boarding to eliminate unnecessary checks. Identification, on the other hand, is a 1:n comparison, which requires to analyse the probe image across all the templates and hence has a higher error rate. For this reason it is recommended to use verification when possible or implement identification with additional checks.
Information security
As a developer you also have to address personal information security. Privacy policy should be clearly formulated and the user has to explicitly agree to using the system or be presented with an opt-out choice.
Design
The use case at our company is visitor identification. For this purpose I designed a simple app to compare recognition results of the five chosen cloud services. For simplicity all the image data is saved locally on the device, no remote databases (except the ones integrated within the face recogntion services) are used. The application consists of four activities. The MainActivity prompts the visitors to answer, whether they have already visited the building or whether it is their first time, after which there are four possible story lines.
1) If the visitor has never been to the building before, he is taken to the RegistrationActivity, where he enters his data, accepts the privacy policy and signs the registration form. After all the required fields have been filled out, the device camera takes a picture of the visitor, registers his face with the face recognition service, writes visitor’s data to the local database, logs the visit and lets the visitor in.
2) If the visitor has already been to the building, the device camera takes a picture right away from the MainActivity and sends it to the chosen face recognition service for verification. In case the face recognition service recognises the visitor with high confidence (for example, 90%), the application navigates to GreetingActivity, logs the visit, adds the new face picture to the local and cloud databases and lets the visitor in.
3) In case the face recognition service can not identify the visitor with high enough confidence, it suggests a list of candidates that look like the visitor. If the visitor finds himself in the list, the application navigates to GreetingActivity, logs the visit, adds the new photo to databases and lets the visitor in. At this point, additional proof of identity, such as an id card, an iris scan or a signature can be required to make sure that the visitor does not pretend to be someone he is not. However, in the first prototype this functionality was omitted.
4) Finally, if the visitor could not be recognised and neither can be found in the list of candidates, he is taken to the registration screen, where he has to undergo the registration process.

The interaction between the database and the application activities is realised with the help of DAO and one of the most convenient and powerful Kotlin tools — coroutines. The necessity to make calls to several face recognition services simultaneously and coordinate the responses with the local database can be easily implemented with lifecycle coroutine scope and suspend functions, which allow to make non-blocking asynchronous calls and wait for their results.
Implementation

The application allows to switch between five recognition services, using only one, several of them or all together. For the services that provide an Android SDK, such as Microsoft Azure and Amazon Rekognition, a standard way of initialization, using a getter function has been used. The other services, whose APIs were implemented using Retrofit, could be initialized with a lazy delegate function provided by Kotlin Standard Library.
The FaceApp class stores a map named “values” with provider: String and isActive: Boolean for each service provider. This map is used by the ServiceFactory class when called from an activity. The ServiceFactory provides the calling activity with a list of active face recognition services. This option is useful when debugging single service providers or when only some of them need to be used within the app.
Each face recognition service is encapsulated within a class which implements a FaceRecognition interface. This interface has a number of functions that define the common behaviour of every face recognition service, whereas its internal implementation differs slightly depending on the provider.
The local database stores visitors’ data, whereas cloud services only store face metadata which is used for the facial recognition. The methods deletePersonGroup(personGroupId: String) and addPersonGroup(personGroupId: String) are used to manipulate face sets, or in case of Microsoft Azure — person groups, in the cloud. The method addNewVisitorToDatabase(personGroupId: String, imgUri: String, visitor: Visitor) is mostly used to enrol a new face to the cloud service, but in case of the Microsoft service, it is used to actually enrol a new person. Unlike other four services, Microsoft Azure and Luxand have a person entity, which allows us not only to store faces, but to attribute every stored face to a particular person. The following method addNewImage(personGroupId: String, imgUri: String, visitor: Visitor) is used to add a new face to existing person in the database.
Finally, after having enrolled some faces to cloud services, the face search is performed with identifyVisitor(personGroupId: String, imgUri: String): List<Any>, which returns a list of candidates recognised by a service, including the confidence with which this service recognised every face. Another helper function is train(), which has to be used only by Microsoft Azure after performing any changes such as adding a new face to the person or adding a new person to the group.
After we’ve been through design and implementation, it‘s time to use the face recognition app. Have a look at a short demo video!
You can also read more about the results that I got from implementing the 5 chosen face recognition services side by side in the next article of this series.