The perfect solution for HTML keyword highlighting

Author：Eve Cole Update Time：2025-01-30 15:48:02

I recently encountered a function like this when working on a project: highlighting keywords in web pages.

I thought it would be a simple operation that could be achieved with innerHTML replace, but I encountered many problems. This article records these problems and the final perfect solution, hoping to be helpful to friends who have the same experience. If you are only interested in the results, ignore the process and skip to see the results~

Common practice: regular replacement

Idea: If you want to highlight elements, you need to extract the keywords and wrap them in tags, and then adjust the style of the tags. Use innerHTML or outHTML, but not innerText or outText.

 const regex = new RegExp(keyword,g)element.innerHTML = element.innerHTML.replace(regex,<b class=a>+keyword+</b>)element.classList.add(highlight)

The hidden dangers of doing this are as follows:

 ()/div<div id=parent> <div class=test>test</div> </div>

The keyword parent node element performs background dyeing processing through class, which pollutes the original DOM to a certain extent and may affect the repositioning of the element. (As a plug-in, we hope to change the original DOM as little as possible)

Regular optimization one: only process elements located within tags

 var formatKeyword = text.replace(/[-////^$*+?.()|[/]{}]/g, '//$&') // Escape the special characters contained in the keyword, For example, /.var finder = new RegExp(>.*?++.*?<) // Extract the text located in the tag to avoid misoperation of class, id, etc. element.innerHTML = element.innerHTML.replace(finder,function(matched){ return matched.replace(text,<br>+text+</br>)})//Replace keywords in the extracted text within the tag

This can solve most problems, but the problem that still exists is that as long as there is a similar < symbol in the tag attribute, the matching rules will be broken and the regular extraction content will be wrong. HTML5 dataset can customize any content, so these special characters are unavoidable.

 <div dataset=p>d>Replace</div>

Regular Optimization 2: Clear tags that may be affected

 <div id=keyword>keyword</div> =》Replace the closing tag with a variable [replaced1]keyword[replaced2]//The id=keyword in the closing tag will not be processed=》[replaced1]<b>keyword</b >[replaced2] =》Replace the temporary variable replaced with the original tag <div id=keyword><b>keyword</b></div>

This idea and source code come from here, but the problem is:
If [replaced1] contains keyword, an exception will occur during replacement

Most importantly, this method cannot correctly extract the tag when the tag value contains the <> symbol.

In short, after N many attempts, various situations have not been effectively handled through regularization. Then I changed my mind and processed it through nodes instead of strings. element.childNodes can most effectively clean up interference information in tags.

[Perfect solution] Processing through DOM nodes

 <div id=parent> keyword 1 <span id=child> keyword 2 </span> </div>

Get all child nodes through parent.childNodes. The child node can be replaced by innerText.replce(keyword,result) to get the desired highlighting effect, as follows: <span id=child><b>keyword</b> 2</span> (Recursive processing: when child Replace operation is performed when the node does not contain child nodes).

However, keyword 1 is a text node and can only modify the text content, cannot add HTML, and cannot control its style independently. Text nodes cannot be converted into ordinary nodes, which is also the most distressing thing.

Finally~, here comes the focus of this article. Because of this function, I came into serious contact with text nodes for the first time. From here I discovered Text, and used the method of cutting text nodes and replacing them to achieve highlighting.

Source code and restore highlights see source code

 const reg = new RegExp(keyword.replace(/[-////^$*+?.()|[/]{}]/g, '//$&'))highlight = function (node,reg ){ if (node.nodeType == 3) { //Only process text nodes const match = node.data.match(new RegExp(reg)); if (match) { const highlightEl = document.createElement(b); highlightEl.dataset.highlight=y const wordNode = node.splitText(match.index) wordNode.splitText(match[0].length); // Cut into three Text nodes after the first keyword and const wordNew = document.createTextNode(wordNode.data); highlightEl.appendChild(wordNew);//highlight The node is built successfully wordNode.parentNode.replaceChild(highlightEl, wordNode); // Replace the text node} } else if (node.nodeType == 1 && node.dataset.highlight!=y ) { for (var i = 0; i < node.childNodes.length; i++) { highlight(node.childNodes[i], reg); i++ } } }

Summarize

The above is the perfect solution for highlighting keywords in HTML introduced by the editor. I hope it will be helpful to you. If you have any questions, please leave me a message and the editor will reply to you in time. I would also like to thank everyone for your support of the VeVb martial arts website!