Monday, September 20, 2010

UK Postcode regular expression

A friend of mine asked for a regular expression for UK Post codes… got so many confusing results on the net that I decided to make it myself.

First had to find the rules for UK Postcode. A quick search got me a UK government site, (old site was: http://www.cabinetoffice.gov.uk/govtalk/schemasstandards/e-gif/datastandards/address/postcode.aspx as of October 2009. Updated to new site) http://interim.cabinetoffice.gov.uk/govtalk/schemasstandards/e-gif/datastandards/address/postcode.aspx, as of July 2011, which shows the rules as follows:

Permitted Format

Example Postcode

AN NAA

M1 1AA

ANN NAA

M60 1NW

AAN NAA

CR2 6XH

AANN NAA

DN55 1PT

ANA NAA

W1A 1HQ

AANA NAA

EC1A 1BB

Also

* The letters Q, V and X are not used in the first position.

* The letters I, J and Z are not used in the second position.

* The only letters to appear in the third position are A, B, C, D, E, F, G, H, J, K, S, T, U and W.

* The only letters to appear in the fourth position are A, B, E, H, M, N, P, R, V, W, X and Y.

* The second half of the Postcode is always consistent numeric, alpha, alpha format and the letters C, I, K, M, O and V are never used.

* GIR 0AA is a Postcode that was issued historically and does not confirm to current rules on valid Postcode formats, It is however, still in use.

Was able to come up with this basic Regular expression that does UK Postcode validation:

^([A-PR-UWYZ](([0-9](([0-9]|[A-HJKSTUW])?)?)|([A-HK-Y][0-9]([0-9]|[ABEHMNPRVWXY])?)) [0-9][ABD-HJLNP-UW-Z]{2})|GIR 0AA$

This will validate 100% as per the assumed rules above.
Note however that it is certainly not optimized… I couldn't find any online regular expression optimizer and I’ll have to become a regex expert to do anything about it. Maybe I can take automatic regex optmizer thing as a mini-project. But for now, the unoptimized version will have to do...

Here is the regular expression in action inside Java code:

public static void validate(String code) {

String regexp="^([A-PR-UWYZ](([0-9](([0-9]|[A-HJKSTUW])?)?)|([A-HK-Y][0-9]([0-9]|[ABEHMNPRVWXY])?)) [0-9][ABD-HJLNP-UW-Z]{2})|GIR 0AA$";

Pattern pattern = Pattern.compile(regexp);

Matcher matcher = pattern.matcher(code.toUpperCase());

if (matcher.matches()) {

System.out.println("This is a valid UK Postcode.");

} else {

System.out.println("This is not a valid UK Postcode.");

}

}

In a real life scenario, you may need to convert the received code into upper case before calling validate to be safe. Or better yet, have upper and lower case validation inside the regular expression itself! Plus of course the optimization…

8 comments:

  1. hi,

    very good article, very close but not quite right the postcode N1P 1AA etc doesn't validate.

    ReplyDelete
  2. Thanks for posting Owen.

    But about N1P 1AA, it doesn't seem to be a valid code as the letter "P" is not in the permitted list of letters for the third position (as per rule no.3)?

    ReplyDelete
  3. http://www.britishpostcodes.info/district/N1P

    ReplyDelete
  4. This clarity with your post is superb and that i may think you’re a guru for this issue.
    ipostcodefinder

    ReplyDelete
  5. Hi Sachin,
    I tried on regex on a invalid post code and it still matches : HA8 3PW

    ReplyDelete
  6. I like all the stuff so much and would like to thank for this awesome post.Thank you so much for sharing this.
    beauty products

    ReplyDelete
  7. Thank you so much for this wonderful information .This is really important for me .I am searching this kind of information from a long time and finally got it.
    mumbai pincode

    ReplyDelete

 
Superblog Directory