My current project supports a number of software applications for which we build and maintain automated tests. While most of these applications are web-based (for which we use Selenium WebDriver), we also support a few desktop/rich client applications.  This has exposed the need for a front-end automated test tool that doesn’t rely on DOM based attributes.

Our wishlist for this new tool included 3 things:

  1. Platform / Language Agnosticism
    Specifically, we need to ensure the ability to (1) run tests on multiple operating systems in order to expand and future-proof our infrastructure as much as possible. (2) The too must have multi-programming language support including Java because most of our projects are Java-based and it makes perfect sense to leverage that expertise when building automated tests. In addition, this will also help towards achieving our goal of getting more developers on the team writing automated tests.
  2. Non-Recorder Tools
    Recorder based test tools might work well in certain contexts, but my experience has been that they tend to create maintenance challenges. This is due to the fact that they limit the use programming constructs such as conditional statements, looping and other OOP concepts such as abstraction.  Identifying a non-recorder tool is a must. In addition, recorder-based tools build an over-reliance on push-and-click testers that don’t understand what the test is trying to achieve, how the code works, and how to best change the tests when the underlying code changes.
  3. Open source support
    Most of our tool stack is open souce and does not require extensive licensing/procurement hurdles that would delay the process of actually getting to building automated tests. So, identifying a tool with open source support is a no-brainer and is consistent with all the other tools we use on this project.

Automated Test Tool Types

In general, automated test tools fall into three categories based on how they interact with the AUT; in other words, how does the test tool locate components and drive the application?

  1. Locator Based Tools
    These are test tools which interact with the application based on provided hooks, for example, Selenium uses the DOM attributes which are generated in the browser (id, xpath, css etc).
  2. Instrumented Tools
    With instrumented tools, the AUT code is modified slightly in order to interact with the automation, it’s recompiled with test code. An example of this is Robotium which is used in mobile testing.
  3. Image Recognition Based / Visually Aware Tools
    These tools utilize image comparisons to locate and interact with the AUT. And while some earlier iterations have relied on screen coordinates to find their elements, SikuliX differs because it takes a more intelligent approach to searching for images. The benefit to this is that your scripts are more durable since the tool will try to find the exact image even under imperfect conditions.As the term implies, being visually aware suggests that the software has the ability to “see” and recognize objects in a manner similar to how a person would. This means that at least in theory, the tool is able to make decisions based on the existence and state of GUI objects, making it more robust.

With that in mind, let’s take a look at SikuliX itself.

What is SikuliX?

SikuliX is an automation tool which uses image recognition to identify and manipulate GUI Objects. It gives you the ability to programmatically control any desktop application as long as you can see the GUI elements. What this means is you can automate just about any task you perform on your machine. SikuliX is written Java but it supports a number of different programming and scripting languages (Python, Javascript, Ruby, Java, Jython, Scala, Clojure).

It uses a combination of images and programming languages to act on GUI elements on the screen, first comparing and finding them, then performing a set of instructions written in a supported language. Specifically, SikuliX is powered by OpenCV, which is an open source computer vision library.

For more information about OpenCV, please click here

There are a number of different ways to use SikuliX, including a built-in IDE which comes with some pre-built actions already written, so very little programming experience and effort are required to get started.

However it can also be incorporated into a programming project for any Java aware language, you simply drop the jar in your IDE and you can access the SikuliX API by writing your own code. SikuliX can also be run from the command line and is compatible with Windows, Mac, and Linux.

How do I automate with SikuliX?

The basic principle behind the usage of SikuliX is it allows you take a screen capture of a GUI element – for instance, an icon/button, then record an action using a programing language. SikuliX will tie these 2 together and upon playback, it will first use an image recognition algorithm to search the specified desktop area for that GUI element, then perform the action.

What that means is and how it differs from other record and playback tools is that SikuliX will make an intelligent effort to find and match the image captured in order to perform the recorded action even if there are slight variations to the image or its surrounding
A few examples of some of the things that could change are:

  • Image position on the screen
  • Image size
  • Monitor resolution and size
  • Multiple displays

Within reason of course (the height and width of the pixels must still match)

As a result of this, your tests are less likely to break every time there are even slight changes to your execution environment, which translates to lower maintenance costs for our tests. It also opens up a lot of possibilities in terms of where we run our tests because now we have the option of incorporating a CI pipeline as we can potentially run our tests on a remote machine other than where it was developed.

SikuliX Script Architecture

Below is a visual representation of a Sikuli script, along with its underlying processes. A Sikuli script (.sikuli) is a directory that contains a Python source file (.py) representing the automation workflow or the test cases and all the image files (.png) used by the source file. The script combines both of these files and the included Jython code will perform all the imports and calls to the required libraries.  

Here’s a video demonstration of how to capture a simple SikuliX script.

Sounds great! But how does it REALLY work?

We’ve talked a lot about image recognition so far, but you might be curious as to how SikuliX actually finds an image when you execute a script. It performs the search in 3 steps:

  1. Take an image of the search area – SkuliX takes a screenshot of the desktop (or specified region) using the Robot framework and stores that in memory.
  2. Compare images –  this is done by comparing the pixels from our target image with the pixels in our search area. Specifically, it uses a technique called “sliding” to drag the pixels from the target image, first left to right, then up and down across the search area.  At each point, it compares pixels from the top left corner of the target image to pixels in the search area, and for every comparison, it generates similarity score which is a quantitative measure of how similar the images are. The score is 0-1, with 1 being an exact match and 0 being no similarities at all and the numbers are inserted into a matrix.
  3. Analyze results – we then use a different method to analyze the numbers in this matrix to find the area which has the highest score as this represents the area with the highest probability of the image being present. SikuliX considers a score of .7 and higher to be a match.

In other words, we’re comparing every pixel in the base image (desktop) with pixels in the top left corner of the target (input) image and generating a score for each comparison. Once the entire area has been compared, we will analyze the results and make a determination as to whether there is a match.

If the threshold score (.7) is not achieved, SikuliX will delete the screenshot, take another one and restart the search until either a match is found or it times out (default implicit timeout is 3 seconds).

For more information about the algorithms used in the searches, click here.

What’s it like to use SikuliX?

You might be wondering at this point what it’s like to actually use Sikulix. Our experience so far is that has been a reliable tool that helps us solve an important problem and continues to evolve in its utility. One caveat to using an image-based tool is the issue of managing of all the image files because as you might expect,  testing an application with SikuliX requires a LOT of images. And so I’d recommend establishing a solid process/protocol for creating, naming and updating your image files before starting your automation project.

Setup and first test

Now that you know how it works, you’ll probably want to take it for a spin! SkiluliX is extremely easy to get started with, all you need is to download the executable and double-click to begin the installation process, for more installation instructions please click here.

Once your setup is complete, you’re all ready to start automating you desktop applications!

Cover image – By Anaroza –, CC BY 2.0,

One thought to “Automating Desktop Applications with SikuliX”

Comments are closed.