The DOM, your ally for Testing Web Applications

Article initially published in MacroTesting magazine (November 2009).
Since the invention of the World Wide Web by Tim Berners-Lee in 1989, the browser has evolved quite dramatically. It is now capable of running Web applications using a technology that initially was not meant for that purpose. The HTML language was created for enabling researchers to collaborate and share information.
Web applications have become so complex and use a mixture of HTML/XML, JavaScript and CSS. Web developers need to spend a fair amount of time learning this technology and all the derivatives (jQuery, Ajax, REST, to name a few) that enable them to develop even more powerful applications.
To the automation tester, web applications may look like normal applications with buttons, checkboxes, edit fields, and all the basic GUI components. However, if we look for richer interaction with the end-user, more complex GUI components appear such as trees, grids, menus, windows, etc. This rich set of features often referred to as Web 2.0 is not “standardized” and everyone comes up with their own way of implementing those features, but still relies on the basic building blocks that are HTML, JavaScript and CSS. These complex components are usually the ones that make the tester worry, as the automation tool is often unable to correctly handle them. The automated testing tool will usually fail to understand the essence of the component and will instead record low-level interactions such as click here and checks there. Not surprisingly though, as these components are formed using basic building blocks. So, how do we overcome this challenge? Before answering this question, let’s have a look at the architecture of a Web browser. Figure 1 shows the components of a typical Browser architecture.

The elements are:
- HTTP Downloader: it is in charge of requesting the various documents (HTML pages, images, CSS, JavaScript and any other documents).
- HTML Parser: it processes the received HTML documents and generates the DOM, a tree representation of the HTML document.
- JavaScript interpreter: it interprets the JavaScript code either downloaded separately (.js files) or inline (<script> tag).
- DOM: a tree representation of the displayed page.
- Rendering Engine: the engine in charge of reading the DOM tree and generating a graphic representation of it.
Typical Flow
A typical flow consists of the following steps:
- A user types a URL which triggers the download of a main HTML page and all the related resources.
- The HTML parser processes the HTML code and generates the static part of the DOM tree.
- The downloaded JavaScript documents and the script enclosed in the script tag are parsed. Their code modifies the DOM tree.
- The DOM represents the Web page to be displayed and each node is a piece of graphical information with its own data (type, location, style).
- The rendering engine processes the DOM and generates the graphical representation that the user sees in the browser.
- The user interactions (mouse and keyboard) generate events that are sent to the JavaScript interpreter in order to generate changes in the DOM (dynamic behavior of the application). The changes are reflected in the DOM which are reflected on the screen, and so on.
As you can see, the DOM plays a central role in the Browser architecture. It is the cornerstone of a browser. The differences you see between various browsers (Chrome, Firefox, Internet Explorer, Opera, Safari, and al.) is partly due to the way this DOM representation is generated.
Any testing tool that seriously supports Web applications must have a means of accessing the DOM giving the tester more freedom. This functionality can be used to:
- Access some DOM objects that carry information not visible on the screen.
- Allow more complex interactions using the structure of the data in the DOM.
- Interact with the DOM to perform actions not available to carry out with the testing tool.
- Carry out any manipulation that JavaScript can do.
For instance, Borland SilkTest and HP QuickTest Professional, the two major functional testing tools available on the market allow this type of interaction. QuickTest Professional has a property Object that gives access to the corresponding DOM object, whereas SilkTest has a method ExecMethod() that provides this access. See both examples below:
Borland SilkTest
Print(Browser.ExecMethod(“all.tags(“”INPUT””)(0).outerHTML”)
HP QuickTest Professional
Print Browser(“Browser”).Page(“Page”).Object.all.tags(“INPUT”)(0).outerHTML
To go one step further, if your testing tool supports extensibility for the Web, you can amend the behaviour of the tool in order to automatically recognise your complex GUI objects. This usually requires a strong knowledge of the Web technology (JavaScript, DOM, and HTML) and the language used to create these extensions.
Thanks to the DOM, you now no longer have any limitations to your testing capability!