Google’s ML Kit: Text recognition with sample app of receipts reading

Google has published a few AI kit and one of them is text recogition, which I decided to try it out using a sample Android app to read receipts. The target is to be able to recognise: total, VAT and type. Full code can be found here.

Step 1: Groundworks

First of all, we need to register the app with firebase (link) and save the google-service.json file in the project, then add the following code in Gradle.

//project level
allprojects {
repositories {
maven { url "https://maven.google.com" }
}
}
//module level
dependencies { implementation 'com.google.firebase:firebase-core:16.0.4'
implementation 'com.google.firebase:firebase-ml-vision:17.0.1'
}
//add following line at the end of module level gradle
apply plugin: 'com.google.gms.google-services'

There are two types of text recogition API: on-device and on-cloud. In this project I will use on-device API. In order to automatically download the ML model after installation, please add following to Manifest.

<application<meta-data
android:name="com.google.android.gms.vision.DEPENDENCIES"
android:value="text" />
</application>

Step 2: Application class

Add following code in Application class to initilize firebase (API_KEY and APP_ID arein google-service.json )

val firebaseOptions = FirebaseOptions.Builder()
.setApiKey(API_KEY_FIREBASE)
.setApplicationId(APP_ID).build()
FirebaseApp.initializeApp(this, firebaseOptions, APP_NAME)

Step 3: Creating Detector Object

class ReceiptsViewModel @Inject constructor() {
    val textDeviceDetector: FirebaseVisionTextRecognizer
    lateinit var imageURI: Uri

    init {
        textDeviceDetector = FirebaseVision.getInstance().getOnDeviceTextRecognizer()
    }
}

Step 4: Detecting the receipts image

First of all, I have created a Receipts data class:

data class Receipts(var total: String = "", var vat: String = "", var type: String = "")

Here is the code to read text from the image:

fun textRecognitionAction() {
        var text = ""
        receiptsViewModel.textDeviceDetector.processImage(firebaseImage)
                .addOnSuccessListener {
                    for (block in it.textBlocks) text += block.text + "\n"
                    val receipts = receiptsViewModel.getReceipts(text)
                    editTotal.setText(receipts.total, TextView.BufferType.EDITABLE)
                    editLocation.setText(receipts.type, TextView.BufferType.EDITABLE)
                    editVAT.setText(receipts.vat, TextView.BufferType.EDITABLE)
                }
 }

So far I are able to read text from image, but the challenge is to recogition total, VAT and type (whether it is food or office supplies etc). For total and VAT, it is possible to code by filtering the floating number from given text and return the largest number. But I found type difficult, as it is impossible to summary with code logic. One approach could be to train up a tensorflow AI model, which I am planning to try out later on.

Here is my logic to get the total and VAT, the code works but looks very java-ish, so if you know a better Kotlin way, please let me know.

fun String.findFloat(): ArrayList<Float> {
    //get digits from result
    if (this == null || this.isEmpty()) return ArrayList<Float>()
    val originalResult = ArrayList<Float>()
    val matchedResults = Regex(pattern = "[+-]?([0-9]*[.])?[0-9]+").findAll(this)
    if (matchedResults != null)
        for (txt in matchedResults) {
            if (txt.value.isFloatAndWhole()) originalResult.add(txt.value.toFloat())
        }
    return originalResult
}

private fun String.isFloatAndWhole() = this.matches("\\d*\\.\\d*".toRegex())

fun getReceipts(text: String): Receipts {
    val originalResult = text.findFloat()
    if (originalResult == null || originalResult.isEmpty()) return Receipts()
    else {
        val receipts = Receipts()
        val totalF = Collections.max(originalResult)
        val secondLargestF = findSecondLargestFloat(originalResult)
        receipts.total = totalF.toString()
        receipts.vat = if (secondLargestF == 0.0f) "0" else "%.2f".format(totalF - secondLargestF)
        return receipts
    }
}

Here are few of my running samples, for now the type isdefault to be the first line of the input value, which isn’t always the case as you can see, but total and VAT looks okay for me.