Write A Simple Syntax Highlighter

Feb. 5, 2021, 16:20:50

As those who took Cognitive Science class may know, people have two ways of doing visual search - seriel and parallel processing.

Parallel visual processing enables fast detection and processing of information while serial processing slows our thoughts.

As an example, finding a red dot among a collection of black dots is very easy compared to finding the same red dot among a collection of red rounded-corner squares.

The black-red contrast in the first image immediately caught your eye and forced you to look at the red dot first. Therefore, it's faster to find the red dot in it. In the second image, however, there's no color clues, so one most likely had to search through all shapes in a serial fashion until finding the dot.

We observe the same information processing speed differences when reading and writing code. Code that has been highlighted is often much easier to read and understand comapred to code that is monochromatic. As a simple example,

Toggle Highlight

With syntax highlighting, this piece code is more readable and livelier.

How I Implemented My Syntax Highlighter

According to StackOverflow posts like this, there are generally two ways to implement a syntax highlighter.

The first approach is to implement a full-fledged parser for a programming language that I want to highlight. This approach is imaginablly very hard. The second simpler approach is to write a program that identifies language keywords and style them with <span> tag. So, I took this approach instead.

Although this post also suggestes a way to identify keywords by using regex pattern matching, I took a more intuitive approach. I chose to style the code in a streamlined fashion.

Everytime a deliminator is encountered (ie. ".", ";", "/", etc), the program will check the character sequence immediately following the deliminator to see if that sequence, together with the deliminator, makes up of a valid part of the code that need to be highlighted.

For example, whenever the highlighter encounters a /, it will check the character immediately after. If that character is also a /, then the program recognizes a comment that should span to the next end of line character \n that needs to be styled. So the program will wrap everything between the double slash and the next new line character in a <span> with specific color.

In the case of function call (.println()) and pointer deference (->next_ptr), the check sequence is longer than 1 byte (until the next deliminator). To be more precice, function calls must be of format \.[a-zA-Z0-9_]+\( and pointer deference must have format ->[a-zA-Z0-9_]+ as I don't want to highlight code segments like System.out. and (e) -> e++;.

Overall, this highlighter works as I expected. It's certainly not intelligent or fancy in anyway, but it works well for my blog. The code is not too long but a little messy with all the i++ and i-- statements to move the current cursor left or right.

Demo

Feel free to try it out or modify the code if you like. Code can be found here.

Output:


                
            

Useful Links/References:

How to implement a syntax highLighter - StackOverflow
Clipboard API - MDN