D
P
0

JavaScript

Custom SplitText Renders a Literal `&` On Screen? You Are Reading `innerHTML`

July 12, 2026·4 min read
Custom SplitText Renders a Literal `&` On Screen? You Are Reading `innerHTML`

I wrote my own SplitText implementation for a client site because I needed per-word GSAP text animation without pulling in a paid plugin. It worked smoothly until QA sent me a single screenshot: a heading whose source read & tržišnu percepciju was rendering on screen as &amp; tržišnu percepciju. The & character had turned into a raw HTML entity, visible to visitors. And there was a second bug riding along: a heading that used <br> to force a second line collapsed onto a single self-wrapping line instead.

My first instinct pointed the wrong way. I assumed this was an encoding problem from the CMS, so I inspected the raw data. Clean. In the database and in the server response, the character was a plain &, a single normal ampersand. It only became &amp; after my SplitText render ran. So the data was not corrupt. My own code was manufacturing that entity out of an ampersand that had been perfectly fine.

Why this happens

I reopened my split routine and the culprit jumped out. To break the heading into words, I read the element's markup through the innerHTML getter, then ran a regex over that string to strip tags and pull the text out. It looks reasonable. The problem is that the innerHTML getter does not just copy back what you typed — it re-encodes the text into valid HTML. So an & sitting in the DOM comes back to me as the string &amp;. My regex only saw characters; it had no idea that &amp; was a single entity. So it copied those five characters verbatim into the output, and the browser rendered them exactly like that.

The <br> was a victim of the same approach. Because I was reading innerHTML as a string, <br> was just a text token that my regex stripped, not a line break. So the intentional line structure of the heading disappeared.

And there was a subtler third trap. A separate bug cleared the element before I got to walk it, so by the time the "words" path ran, the element was already empty and every segment vanished. Three problems stacked into one, but they shared a root: I was treating the DOM as an HTML string when I should have been walking it as a tree of nodes.

The fix

The fix was to stop reading innerHTML altogether and walk the live childNodes straight off the DOM, BEFORE clearing the element. A text node gives me node.textContent, which is already browser-decoded, so & stays &. For an element, I check tagName: if it is BR, I close the current segment and start a new line; otherwise, I recurse into it.

class CustomSplitText {
  split() {
    const segments = [];
    let buffer = "";
 
    const walk = (node) => {
      for (const child of node.childNodes) {
        if (child.nodeType === Node.TEXT_NODE) {
          buffer += child.textContent;
        } else if (child.nodeType === Node.ELEMENT_NODE) {
          if (child.tagName === "BR") {
            segments.push(buffer);
            buffer = "";
          } else {
            walk(child);
          }
        }
      }
    };
 
    walk(this.element);
    segments.push(buffer);
 
    this.renderWords(segments);
  }
}

Notice the order: I walk this.element.childNodes FIRST, gather every segment, and only then re-render the contents. Because textContent is browser-decoded, no &amp; ever surfaces again. Because I handle BR as a real segment boundary rather than a string token, the multi-line structure comes back intact.

For the third bug, the key is clear ordering. The element clear MUST stay inside the character-processing branch, not at the top of split(). If I empty the element early, the word-walk runs over an already-emptied element and every segment is lost. By keeping the clear in the chars branch, the word walk always runs over an intact DOM.

splitChars() {
  const chars = this.collectChars();
  this.element.innerHTML = ""; // clear ONLY here, after reading
  this.renderChars(chars);
}

The takeaway

The innerHTML getter is a serializer, not a mirror. It returns re-encoded HTML, so the moment you run a string operation on it, entities like &amp;, &lt;, &gt; leak into your output as literal text. When what you actually need is the real text a user sees, read textContent off the node, or better yet, walk childNodes as a tree: text nodes give you decoded text, and elements like <br> can be handled semantically instead of as string tokens. And if you need to empty an element as part of a transform, read it fully first and clear it afterward — never the other way around. Since I stopped parsing HTML with regex and started walking the DOM, the ampersand went back to being an ampersand.