sneak peek into HTML Parsing

Himanshu Singh
3 min readMar 7, 2023

--

Let’s discuss how does browser parse an HTML document.

Before we start discussing HTML parsing, we should know what parsing means in general. Parsing usually means breaking the document into tokens (aka lexical analysis), then validating each token (aka syntax analysis), and then creating a parse tree or syntax tree that can be used for further process. At the time of syntax analysis, we get syntax errors.

Now let’s discuss HTML document parsing in the browser. A browser usually has 7 main components, that it uses for different purposes. The components are User Interface, Browser engine, Rendering engine, Networking, JS Interpreter, UI Backend, and Data Storage.

The rendering engine is the component that parses the HTML document and then creates a render tree.

The render tree contains rectangles with visual attributes like color and dimensions. The rectangles are in the right order to be displayed on the screen.

After the render tree creation, it goes through a layout process. This means giving each node the exact coordinates where it should appear on the screen. The next stage is painting — the render tree will be traversed and each node will be painted using the UI backend layer.

Fig: Rendering engine basic flow

Different browsers use different rendering engines: Firefox uses Gecko, and Safari uses WebKit. Chrome and Opera (from version 15) use Blink, a fork of WebKit.

HTML Parsing

The parsing of an HTML document is not as straightforward as it seems. Why? because

  1. In HTML, we never get a syntax error because of the forgiving nature of the language
  2. Browsers have error tolerance to support well-known cases of invalid HTML.
  3. The parsing process is reentrant. For other languages, the source doesn’t change during parsing, but in HTML, dynamic code (such as script elements manipulating HTML code) can add extra tokens, so the parsing process actually modifies the input.

In HTML parsing, we first break the document into tokens, to know what are the start tags, the content of an element, and the end tags. After which, using the tokens we create a parse tree (aka DOM tree).

At the time of the DOM tree creation, the parser fixes any kind of invalid content. The invalid content can be

  1. The element being added is explicitly forbidden inside some outer tag. In this case, we should close all tags up to the one which forbids the element, and add it afterward.
  2. We want to add a block element inside an inline element. Close all inline elements up to the next higher block element.
  3. If this doesn’t help, close elements until we are allowed to add the element — or ignore the tag.

When the parsing is finished

The browser will mark the document as interactive and start parsing scripts that are in “deferred” mode: those that should be executed after the document is parsed. The document state will be then set to “complete” and a “load” event will be fired.

What happens with Render Tree

As we said, the render tree also contains visual attributes, so it has not only the DOM elements but also the style info of an element.

Like the HTML DOM tree, we have a CSSOM tree for the styling sheet. But parsing of a stylesheet is not like an HTML document, as we have a defined syntax rule and vocabulary for that.

And, the render tree has not a one-to-one relation with the DOM tree, as the non-visual DOM element will not be inserted in the render tree. An example is the “head” element. Also, elements whose display value was assigned to “none” will not appear in the tree (whereas elements with “hidden” visibility will appear in the tree).

Thanks for reading. Connect with me on Twitter.

--

--

Himanshu Singh

I write blogs around React JS, JavaScript, Web Dev, and Programming. Follow to read blogs around them.