Javascript - Traversing the HTML DOM recursively

Javscript tutorial to learn basic HTML DOM methods and implement it to traverse the HTML element tree.

Categories: programming, tutorials
Posted by Swapnil Sarwe on Nov 25, 2011

This post will give idea about HTML DOM Objects and way we can use its methods and attributes to navigate or traverse through it.

Introduction:

How to traverse the complete HTML BODY covering each HTML element recursively and forming a tree You need to be aware of very basic Javscript DOM methods. W3schools is the wonderful resource to learn this basics. Go through all the functions but what we need for this exercise is getElementById() and getElementsByTagName()

Eg: HTML

<html>
  <head>
    <title>
      Javascript - HTML Tree Parser
    </title>
  </head>
  <body>
    <div id="header">
      <h1>Javascript - HTML Tree Parser</h1>
    </div>
    <ul id="nav">
      <li><a href="one.html">one</a></li>
      <li><a href="two.html">two</a></li>
    </ul>
    <div id="footer">
    </div>
  </body>
</html>

Step I:
We are going to write a javascript function - lets call it htmlTree

<script type="text/javascript">
   function htmlTree(){}
</script>

Step II:
We will need to first get the body element, we can do it with the help of getElementsByTagName('body'), which will return you the body tag. But since the function itself is plural it will always return you an array even if it has only one single element. Since body is the present only once we will refer to the 0th element of an array.

<script type="text/javascript">
  function htmlTree(){     
    var body = document.getElementsByTagName('body')[0];   
  }
</script>

Step III:
Now we will check if the body tag has any children - we can do that by using hasChildNodes() method, if yes then we will first refer to the first child, we can do that with the attribute "firstChild". Also we can get the name of the tag with an attribute "tagName". After reading the first child we will go to the next child, with use of an attribute "nextSibling"

<script type="text/javascript">
  function htmlTree(){     
    var body = document.getElementsByTagName('body')[0];     
    if (body.hasChildNodes()) {       
      var child = body.firstChild;       
      alert(child.tagName); // as per the above HTML example it will alert "div"       
      var next = child.nextSibling;       
      alert(next.tagName); // as per the above HTML example it will alert "ul"     
    }   
  }
</script>

But this wont meet our purpose, we dont know the length of the tree and we also dont know how deep each branch is going to be, so we will make this function recursive to perform these similar step for eavery branch till it reaches it leaves. Step IV: We will print the name of the tag identified and check if it has children if yes then we will recursively call the function again which will print its name and check if it has child - it will happen till it reaches element with no child

<script type="text/javascript">
  function htmlTree(obj){     
    var obj = obj || document.getElementsByTagName('body')[0];     
    alert(obj.tagName);     
    if (obj.hasChildNodes()) {       
      var child = obj.firstChild;       
      htmlTree(child);     
    }   
  }
</script>

When you execute this, you get first alert as BODY but very next alert as undefined - where in it should have been DIV. This is because between BODY and the DIV tag there is the white space which is considered to be an empty text - well we need to avoid such occurrences Step V: Go through nodeType on W3schools. Since here we are looking for an HTML elements, we will check for nodeType = 1. If it is 1 then we will recursively call the function if not we will move onto its sibling

<script type="text/javascript">
  function htmlTree(obj){     
    var obj = obj || document.getElementsByTagName('body')[0];     
    alert(obj.tagName);     
    if (obj.hasChildNodes()) {       
      var child = obj.firstChild;       
      while(child){         
        if (child.nodeType === 1) {           
          htmlTree(child);         
        }
        child = child.nextSibling;
      }
    }
  }
</script>

Step VI:
These alerts are quiet annoying, so lets make some subtle changes so that this function returns us complete tree. I personally like "ul li ul li" to represent a tree. Here is the modified function

<script type="text/javascript">
  function htmlTree(obj){
    var obj = obj || document.getElementsByTagName('body')[0];
    var str = "<ul><li>" + obj.tagName;
    if (obj.hasChildNodes()) {
      var child = obj.firstChild;
      while (child) {
        if (child.nodeType === 1) {
          str += htmlTree(child)
        }
        child = child.nextSibling;
      }
    }
    str += "</li></ul>";
    return str;
  }
  document.write(htmlTree());
</script>

Step VI:
Since at the very beginning we were talking about just HTML elements, we will get rid of all the javascript inside HTML. We can achieve that by simply adding a simple check in the condition where we check the nodeType.

// Change 
if (child.nodeType === 1) 
// to 
if (child.nodeType === 1 &amp;&amp; child.nodeName != 'SCRIPT')

Demo:
Javascript - HTML Tree Parser