How to find the location of text inside an image?
Back in the days when I was very new to automation and with the Appium tool I had a very interesting problem. “How do I click on the link with ‘Help’ in an Android App”.
While the problem sounds very easy, the way links are embedded in the TextView in Android development is that the text doesn’t have an identifier to it. Here’s an example of how the implementation code looks like
1 |
|
The UiAutomatorViewer treats it as a normal text. Nowadays, with expresso, it’s relatively easy to find the location of the text then perform a touch operation. And your task is done.
However, there is no way to do that when using the UiAutomator mode in Appium. So our solution is
- Take a screenshot of the screen where the desired text is present.
- Read all the texts in the screenshot.
- Find the coordinates of the desired text.
- Perform a touch operation. (and Hope for the profit :) )
Now that we have a plan to solve our problem let’s follow it
1. Take the screenshot
Depending on which programming language you are using take the screenshot, for me it was Python.
1 |
|
2. Read the all text in the screenshot image
Here things get more interesting. The action we are performing is not the scope of Appium, so we have to think out of the box. Think OCR and Tesseract is one of the best available open-source OCR libraries.
So let’s read the text from with Tesseract. But wait, there’s a problem. You’ll see that when using a coloured image, Tesseract can’t read all the texts. To fix that we’ll convert the image to black & white or in other words grayscale. That will give the best image data, which Tesseract can process with a much better success rate.
Here’s how the full implementation looks like
1 |
|
3. Find the coordinates of the desired text.
This is where you can get the location coordinates of the desired text and perform the actions and you are done. However for debugging perpose you can draw a rectangel around the text
1 |
|
4. You have a coordinat correspoing to the text you were looking for, you can perform a touch operation. e.g.
1 |
|
If you are still stuck with a project where using Espresso mode in Appium in not possible or you use some other automation tool that doesn’t have support for clicking on the text hope it help. Or you can solve some other problem where this solution can be applied or improvised.