Francisco J. Gutierrez

PHP and Regular Expressions

So recently I was given this bit of code and told to figure it out:

preg_match("/^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})$/",$e) 

Let’s start at the beginning. First there’s preg_match. It’s a PHP function which performs a regular expression match. Ok, so what’s a regular expression? Well this is where it starts to get more complicated (not really, once you get it).

Regular expressions are not so regular, I have no idea why that name was chosen but no use crying over spilt milk. What it is, is a sort of code used to match certain characters in a string. If that sounds vague, I think as I continue to deconstruct the original bit of code I got at the top of the post you’ll start to understand it.

So in the code at the top, what preg_match does is match a regular expression to what is stored in the $e variable. The regular expression is within the

"/ .... /"

So our regular expression looks like this:

^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})$

(fyi, regular expressions are also referred to as regexp, which is how I will call it from now on)

next let's look at the first and last characters in that regexp

^ .... $


^ at the beginning means that whatever is in the string has to start with whatever regexp follows it: ^win matches with "windows" or "winner" but not "rowing"
$ at end means that the string has to end with whatever regexp precedes it: dog$ matches with "houndog"

next we have this:

 (\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})


Notice the repeating pattern of: (\d{1,3}) separated by \.

The parenthesis is use to group or bind expressions together

the \d means we're looking for a match of any character in the range of 0 - 9. We're basically looking for a number. This was the hardest part when figuring this out because apparently it's like shorthand or something. Usually you'll see something like [0-9].

The curly brackets, and what's inside them, go into more specific details on the preceding match. {n,m} means "match the previous character if it occurs at least n times, but no more than m times"
so {1,3} means we're looking for a number that occurs at least once, but no more than 3 times.

and finally the \.
The backslash "escapes" the period, which means we're actually looking for a period. Otherwise without the backslash it would think that the period was part of the regexp coding and it would be looking for something else.

So in the end we're using pre_match to check if the string within $e looks like this: ###.###.###.###

and what does that look like to you? that's right, an IP address. It's a pretty cool function. You know when you are typing in a password for some website and it's telling you that you either have a "strong" or "weak" password as you type it? Well that's using regexp to compare it to a certain regexp criteria.

Speaking of which, I think I'm going to try my hand at building a simple password checking tool. Will post it if it works.

helpful resource: zytrax