ImageSC app was created in a three-day code marathon in early October 2018. Not much has changed in the app since then because both the co-creators have busy professional schedules but we aspire to make it really good over time. There’s a lot to be done with this app to improve its usability and its accuracy. We decided to write this overview of the technology behind the app for both regular and tech audience to get more ideas. Please feel free to leave comments or drop us an email with your feedback.
ImageSC app was designed with following requirements for a student user in a low-income country with poor internet connectivity
- She should be able to simply take a photo of the written text using the camera
- She should be able to perform a spell check on the image of her written work without being connected to the internet
- She should be able to see the results overlayed on the image for quick visual check
Requirement 1 is simple enough for any android app to do. Requirement 2 and 3 is where things get interesting.
How do we convert an image to text
For an offline image to text, we explored following two optical character recognition (OCR) options
- Has very poor accuracy on handwritten text
- Google’s ondevice lightweight MLKit OCR model
- Improved results on test set compared to tessaract for handwritten text
For an online image to text, we ended up choosing Google’s Cloud Vision API which provided significantly better OCR results compared to the offline model for the test set of images that we were working with. If the user’s device is connected to the internet, we first try Google’s Cloud Vision API. Otherwise, we fall back to MLKit OCR
Here’s a sample of how words and paragraphs get demarcated by the OCR. Red boxes are the paragraphs and yellow boxes are the words
How do we perform spell check on the detected text
We developed a custom on-device (not cloud-based) spell checker which goes through several processing steps discussed below. Need for a custom spell checker came from the requirement to enable offline processing. This was coupled with another requirement of keeping maximum processing on the device rather than on cloud to save on the app hosting cost. An on-cloud spell checker would quickly lead to high cloud deployment costs and make it prohibitive for us to host the app for free for students.
These spell check steps work on every word identified in the document in isolation (for now)
- Find the confidence of the OCR algorithm on the detected word. Use only high confidence OCR results to avoid false positives
- Homoglyph errors from the OCR are not infrequent, especially for handwritten text. We run the detected word through a custom homoglyph detector to reduce the number of false positives reported by our spell checker
- We use a dictionary of most common English spelling errors motivated from these data sources. If the word matches a known spelling error, then we add it to the spell check results
- Motivated by symmetric delete spelling correction algorithm (symspell), we wrote an Android compatible version of the symspell approach. It basically tries to find the closest match from the given dictionary of English words in the fastest possible way
How do we display the spell check results in the most sensible way
We found that layering red boxes on top of words was the fastest visual way for a user to see their spelling errors. To accomplish this, we created a custom zoomable layering library that draws the red boxes and tracks user taps to zoom in and out of the image and elaborate the results. Our custom zoomable library is inspired from this open source repo.
Problems With Current Implementation
Accuracy of OCR:
Google’s vision model is best in class, but there is still a lot to be had on the accuracy of the handwritten texts. Now that we are beginning to have a lot of dataset of our own, we plan to use it to train our own models to do OCR along with Google’s model.
There’s also a strong requirement on the angle of image capture which makes usability more restrictive. Most OCR models require a clear crisp image of the text taken at a correct perpendicular angle to give the best possible results. Google has done tremendously well to lower the constraints around this, but we found that user expectations are very high when it comes to flexibility. We have seen users take images at all sorts of angles hoping for some results. One way to force users to capture at a perpendicular angle would be to use a camscanner like document-align during capture until we can train models that can work at odd angles as well.
Accuracy of spell check results:
We haven’t completed a qualitative study of the spell check results because it requires having a big enough set of pre-annotated results. This requires time and extra help. But a cursory look at the results of some of the user images shows that there’s much to be had. Some of the issues related to the spell check accuracy are actually tied to the accuracy of the OCR. Things like homoglyph detection and confidence check help us reduce false reporting but more work needs to be done in this area. English spelling localization is another source of spell check errors, although not significant in number. We need to use different dictionaries for different regions to reduce those type of spell check errors.
On some user devices, we have observed a slow down in the app speed for documents with more than 1000 words. This is because of the serial processing done on-device (and not on-cloud).
Future Improvements In Pipeline
- Based on feedback from 1000+ users and 10000+ images since October of 2018, we have concluded that on-device approach though cost friendly is not good enough to provide the best possible spell check results. We need hybrid approaches to spell check for best results. These are costly for implementation on the device and need to be moved to the cloud. We plan to add on-cloud spell check for devices that are connected to the internet
- Instead of treating each word in isolation, using the context of the given paragraph may give better spell check results. Contemporarily, such problems are solved using RNN models instead of traditional NLP approaches. It’s safe to assume that such a model will be heavy to infer from on-device, we plan to add this approach for on-cloud spell processing
- Several of our users have been spell checking documents written in languages other than English. Spanish and Hindi top the list. We plan to add spell check for these languages in a future version
- There are several inefficiencies in the on-device spell check processing which has lead to many app crashes on the field. We are working towards improving those in general
- APK is size is pretty big, about 20MB. We plan to shrink it to less than 2MB by moving several functionalities to cloud