Monday, September 20, 2010

UK Postcode regular expression

A friend of mine asked for a regular expression for UK Post codes… got so many confusing results on the net that I decided to make it myself.

First had to find the rules for UK Postcode. A quick search got me a UK government site, (old site was: http://www.cabinetoffice.gov.uk/govtalk/schemasstandards/e-gif/datastandards/address/postcode.aspx as of October 2009. Updated to new site) http://interim.cabinetoffice.gov.uk/govtalk/schemasstandards/e-gif/datastandards/address/postcode.aspx, as of July 2011, which shows the rules as follows:

Permitted Format

Example Postcode

AN NAA

M1 1AA

ANN NAA

M60 1NW

AAN NAA

CR2 6XH

AANN NAA

DN55 1PT

ANA NAA

W1A 1HQ

AANA NAA

EC1A 1BB

Also

* The letters Q, V and X are not used in the first position.

* The letters I, J and Z are not used in the second position.

* The only letters to appear in the third position are A, B, C, D, E, F, G, H, J, K, S, T, U and W.

* The only letters to appear in the fourth position are A, B, E, H, M, N, P, R, V, W, X and Y.

* The second half of the Postcode is always consistent numeric, alpha, alpha format and the letters C, I, K, M, O and V are never used.

* GIR 0AA is a Postcode that was issued historically and does not confirm to current rules on valid Postcode formats, It is however, still in use.

Was able to come up with this basic Regular expression that does UK Postcode validation:

^([A-PR-UWYZ](([0-9](([0-9]|[A-HJKSTUW])?)?)|([A-HK-Y][0-9]([0-9]|[ABEHMNPRVWXY])?)) [0-9][ABD-HJLNP-UW-Z]{2})|GIR 0AA$

This will validate 100% as per the assumed rules above.
Note however that it is certainly not optimized… I couldn't find any online regular expression optimizer and I’ll have to become a regex expert to do anything about it. Maybe I can take automatic regex optmizer thing as a mini-project. But for now, the unoptimized version will have to do...

Here is the regular expression in action inside Java code:

public static void validate(String code) {

String regexp="^([A-PR-UWYZ](([0-9](([0-9]|[A-HJKSTUW])?)?)|([A-HK-Y][0-9]([0-9]|[ABEHMNPRVWXY])?)) [0-9][ABD-HJLNP-UW-Z]{2})|GIR 0AA$";

Pattern pattern = Pattern.compile(regexp);

Matcher matcher = pattern.matcher(code.toUpperCase());

if (matcher.matches()) {

System.out.println("This is a valid UK Postcode.");

} else {

System.out.println("This is not a valid UK Postcode.");

}

}

In a real life scenario, you may need to convert the received code into upper case before calling validate to be safe. Or better yet, have upper and lower case validation inside the regular expression itself! Plus of course the optimization…

 
Superblog Directory