Regular Expression Back References

Looking to trim down your six digit hex codes down to three digits?   You can only trim six digit hex codes if first two, middle two, and last two digits are the same (e.g. #003300, #CCCC55, #FF88AA, #999999, etc.).   If all six digits are the same, you can find them all by using regular expressions and searching for #([a-fA-F0-9])\1{5}.  When they aren’t all the same, you can find the three groups of two by searching for #([a-fA-F0-9])\1([a-fA-F0-9])\2([a-fA-F0-9])\3.   The latter method would find all applicable hex codes, regardless if all six digits match or not.

Let’s break those down so you know how to use them in other applications.

# matches #.   Simple enough.

[a-fA-F0-9] matches any single occurrence of the following characters: a, b, c, d, e, f, A, B, C, D, E, F, 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9.

( ) saves the match as a matched item.

\1 is a back reference to a matched item.  The number represents the numerical count of matched items. \2 would equal the second matched item, \3 would equal the third matched item, etc.

{5} repeats the previous match 5 times. So in our case, this is the same as writing \1\1\1\1\1.

Example:
Let’s assume you have the following hex code: #3366FF

The first occurrence of [a-fA-F0-9] would match 3. ([a-fA-F0-9]) would return 3 as a matched item.
\1 would also match 3 since that is the first matched item. So ([a-fA-F0-9])\1 would match 33.  Then we just repeat the process for the second and third set of two digits.

Bonus Tip:
If you want to use search and replace so you don’t have to manually edit all occurances of your 6 digit hex codes, you can use that same matched item in your replacement string.   You may need to check the documentation of your favorite editor to find out how matched items are referenced in the replacement string.  In EditPlus for Windows, you would use \1\2\3, but in TextMate for Mac, you would use $1$2$3.

This is bit of a confusing topic, so please leave any questions in the comments and I’ll be happy to answer them and update this article to clear up any ambiguity.

2 thoughts on “Regular Expression Back References”

  1. No. Actually, I found this article as a draft today and wondered why I hadn’t published it. I did a quick scan and it looked complete, so I pushed it out. Your point is exactly why I hadn’t published it originally. I will update it for the other “web safe” colors, but in the meantime, instead of searching for #([a-fA-F0-9])\1{5}, you would search for #([a-fA-F0-9])\1([a-fA-F0-9])\2([a-fA-F0-9])\3 and then replace with $1$2$3 (at least in TextMate).

Leave a Reply

Your email address will not be published. Required fields are marked *