Member-only story

sneak peek into HTML Parsing

3 min readMar 7, 2023

Let’s discuss how does browser parse an HTML document.

Before we start discussing HTML parsing, we should know what parsing means in general. Parsing usually means breaking the document into tokens (aka lexical analysis), then validating each token (aka syntax analysis), and then creating a parse tree or syntax tree that can be used for further process. At the time of syntax analysis, we get syntax errors.

Now let’s discuss HTML document parsing in the browser. A browser usually has 7 main components, that it uses for different purposes. The components are User Interface, Browser engine, Rendering engine, Networking, JS Interpreter, UI Backend, and Data Storage.

The rendering engine is the component that parses the HTML document and then creates a render tree.

The render tree contains rectangles with visual attributes like color and dimensions. The rectangles are displayed on the screen in the right order.

After the render tree creation, it goes through a layout process. This means giving each node the exact coordinates where it should appear on the screen. The next stage is painting — the render tree will be traversed and each node will be painted using the UI backend layer.

Fig: Rendering engine basic flow

Different browsers use different rendering engines: Firefox uses Gecko, and Safari uses WebKit. Chrome and Opera (from version 15) use Blink, a fork of WebKit.

HTML Parsing

The parsing of an HTML document is not as straightforward as it seems. Why? because

In HTML, we never get a syntax error because of the forgiving nature of the language
Browsers have error tolerance to support well-known cases of invalid HTML.
The parsing process is reentrant. For other languages, the source doesn’t change during parsing, but in HTML, dynamic code (such as script elements manipulating HTML code) can add extra tokens, so the parsing process actually modifies the input.

In HTML parsing, we first break the document into tokens, to know what are the start tags, the content of an element, and the end tags. After which, using the tokens we…

sneak peek into HTML Parsing

Let’s discuss how does browser parse an HTML document.

HTML Parsing

Written by Himanshu Singh

No responses yet