SIDEBAR
»
S
I
D
E
B
A
R
«
Rewriting for SEO-Friendly URLs: .htaccess or PHP?
December 30th, 2009 by Anton Oliinyk

Modern database driven web sites implement SEO-friendly URLs emulating static directories and files. Switching to such “clean” URLs enables good indexing by search engines, makes URLs more user-friendly and hides the server-side language. For example, this clean URL may refer to the page in some product directory:


http://somesite.com/products/network/router.html

In fact, there is no /products/network folder on the server and no router.html file at all. The page is generated by server script using database query for “network” product category and “router” product. But who calls the script and where it gets the query parameter values?

This technique is usually referred as “URL rewriting”. It allows web server to recognize what information was requested by parsing the URL string. Apache and PHP allow multiple options to implement URL rewriting. So which one is the best?

Configuring mod_rewrite via .htaccess file

This is perhaps the most used way to implement rewriting especially to upgrade legacy web sites.
Consider we already have products.php script that can take category and product parameters from $_GET array. We just need to convert the request URI invisibly to the user:

/products/network/router.html => /products.php?category=network&product=router

Apache already has built-in URL rewriting engine: mod_rewrite. It allows specifying rules based on regular expressions for URL parsing, transformation and even redirect. You just need to create or modify .htaccess file to use mod_rewrite:

RewriteEngine On
RewriteRule ^products/(\w+)/(\w+)\.html products.php?category=$1&product=$2 [L]

Well, now the script can continue using $_GET array to get category and product name as if it was called with a dynamic URL and no modification is required to the script code.

Oops! What happened with my CSSs, JSs, images and relative links??! Don’t worry, I explained the problem and solutions in my post SEO-Friendly URLs and Relative Links.

While usage of mod_rewrite is very easy solution, it may introduce some problems with growing complexity of rewrite rules:

  • It’s very hard to debug .htaccess code.
  • Extended regular expression syntax may be incompatible with old Apache versions. I noticed some problems with GoDaddy shared hosting, which actually uses Apache v1.3.
  • You may also want to automatically correct user typos in URLs against database contents that is almost impossible with mod_rewrite only. (But you can still try mod_spelling.)

All those problems could be resolved by moving URI parsing logic to PHP code allowing implementation of more complex rewrite rules and debugging with native PHP tools.

Parsing REQUEST_URI by PHP code

Apache web server also allows you to use URLs like this one:


http://somesite.com/products.php/network/router.html

Apache will call products.php script and ignore the reminder path. The script can get it by parsing $_SERVER['REQUEST_URI']:

//Remove request parameters:
list($path) = explode('?', $_SERVER['REQUEST_URI']);
//Remove script path:
$path = substr($path, strlen($_SERVER['SCRIPT_NAME'])+1);
//Explode path to directories and remove empty items:
$pathInfo = array();
foreach (explode('/', $path) as $dir) {
    if (!empty($dir)) {
        $pathInfo[] = urldecode($dir);
    }
}
if (count($pathInfo) > 0) {
    //Remove file extension from the last element:
    $last = $pathInfo[count($pathInfo)-1];
    list($last) = explode('.', $last);
    $pathInfo[count($pathInfo)-1] = $last;
}

Now $pathInfo variable contains elements of the reminder path. You can use it to specify database query parameters.
But what if they are invalid? This way you need to raise “file not found” error from PHP script. For example:

if (count($pathInfo[0] < 2) {
    header('HTTP/1.0 404 not found');
    exit;
}

Alternatively, you can fire a redirect to some error page.
I also recommend to use structural error handling to catch “path not found” exceptions that could be raised deep in your code.

Notice that PHP allows read/write access to $_GET array and you can use this ability to make legacy without a change:

$_GET['category']   = $pathInfo[0];
$_GET['product']    = $pathInfo[1];

But how to get rid of that “.php” in the URL? You can do this by renaming “products.php” file to “products” (w/o an extension) and modifying .htaccess file to tell Apache that “products” is actually a PHP script:

<FilesMatch "^products$">
    ForceType application/x-httpd-php
</FilesMatch>

Hmm.. don’t like the solution but this way URLs like http://somesite.com/products/network/router.html will work OK.

At some moment you may want to get rid of that “products” directory to make URLs shorter like http://somesite.com/network/router.html. You may also want have other directories like news, blog, etc on the same site.

Can we parse all virtual URLs in the same PHP script? Sure, we can do that!

Combining powers of mod_rewrite and PHP

The best way to implement SEO-friendly URLs is to combine powers of mod_rewrite and PHP. This way you gain full control over URL rewriting with full power of PHP language.

You just need to have very simple code in .htaccess file:

RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule . index.php [L,QSA]

The code is proven compatible with older Apache versions like that on GoDaddy shared hosting.

Now you will process any request for virtual URI in index.php file. You can get the URI requested from $_SERVER['REQUEST_URI'] and parse it with the code almost the same as above:

//Remove request parameters:
list($path) = explode('?', $_SERVER['REQUEST_URI']);
//Remove script path:
$path = substr($path, strlen(dirname($_SERVER['SCRIPT_NAME']))+1);
//Explode path to directories and remove empty items:
$pathInfo = array();
foreach (explode('/', $path) as $dir) {
    if (!empty($dir)) {
        $pathInfo[] = urldecode($dir);
    }
}
if (count($pathInfo) > 0) {
    //Remove file extension from the last element:
    $last = $pathInfo[count($pathInfo)-1];
    list($last) = explode('.', $last);
    $pathInfo[count($pathInfo)-1] = $last;
}

This way you can implement very intellectual logic to provide most smart, short and flexible document naming schemes on your web site. For example, you can make all these (and many similar) URLs referring (or better redirecting) to the same page:

  • http://somesite.com/products/network/router.html
  • http://somesite.com/products-network/router/
  • http://somesite.com/networks/router/
  • http://somesite.com/router

At the same time, the code can recognize that http://somesite.com/20091010/router/ refers to a news article just because there is a corresponding record in the news table.

You can also combine URL parsing logic with content negotiation logic recognizing client’s user-agent.

As far as you do all virtual URL parsing with PHP, you can use native PHP debuggers and logging for it. You don’t need to care about physical files like images, CSSs or static pages as the .htaccess code above leaves them alone. But you still need to care about file not found errors as explained in the previous section.

Conclusion

There are multiple ways to implement URL rewriting with Apache and PHP and you need to make a choice depending on your project requirements. Having all URL parsing in the same PHP script is the most recommended solution as it allows to implement the most complex, expendable and easy-to-debug URL rewriting logic.


33 Responses  
  • FAQPAL writes:
    December 30th, 20097:32 amat
    avatar

    We use .htaccss for all our friendly URL needs. Good post.

  • Amber Weinberg writes:
    December 30th, 20098:22 pmat
    avatar

    Hmm this is an interesting way of doing it. I normally use htacess (a bit differently than how you showed above) and I notice if you have two urls with similar keywords, it gets confused and takes you to one page instead of to. For example:

    http://www.site.com/amber-is-cool

    and

    http://www.site.com/amber-is-mean

    would take you to the same page. This can normally be changed, unless a client specifically wants a page to be the similar name.

    • Anton Oliinyk writes:
      December 31st, 20091:59 amat
      avatar

      Amber, can you describe the way you use htaccess? I’m still researching this subject.
      I think there is nothing bad in having multiple URL aliases for the same page. But it’s better to make them redirecting to original URL to do not confuse the user as well as search engines.
      I’m going to write another post on how such a redirect could be implemented with PHP.

  • WebDesignExpert.Me writes:
    December 31st, 20095:49 amat
    avatar

    Great article! This can certainly be helpful for users wanting search engine friendly URL’s on Linux or Unix hosting!

  • Anton Oliinyk writes:
    December 31st, 20096:16 pmat
    avatar

    Update: I updated URI parsing code as there were some minor problems with it and added PHP example for the last method.

  • JFrankParnell writes:
    December 31st, 20096:46 pmat
    avatar

    here is a similar method for using php to do most of the work:
    http://forum.modrewrite.com/viewtopic.php?t=2521

  • Sanakan writes:
    January 2nd, 20105:56 amat
    avatar

    Great article, thanks !
    Zend Framework follow this mod_rewrite trick.

    @Anton Oliinyk ,http://net.tutsplus.com/tutorials/other/a-deeper-look-at-mod_rewrite-for-apache/ got deep inside it.And every example is very useful.

    (I hate china GFW ,@_@!)

  • Webdesign Expert writes:
    January 14th, 20103:05 pmat
    avatar

    It’s quite interesting article. I’m just curious how long are in interested in this subject ? I saw many blogs but Your’s it’s really informative.

    • Anton Oliinyk writes:
      January 14th, 20105:03 pmat
      avatar

      Not so much time, actually. I searched the web for some comprehensive guide but found only partial explanations. So, I decided to write some note about that.
      BTW, modern CMS’es and frameworks like WordPress or Zend Framework use similar mod_rewrite+PHP solutions but nobody wrote about that.

  • Roch writes:
    June 9th, 20102:07 amat
    avatar

    Do you know of anyway with htaccess to disable someone from using your domain to point to their own website on the same server? Ex: they use YOURDOMAIN.com to promote their PHISHING WEBSITE.COM by using this simple URL to send users : YOURDOMAIN.COM/~phishing/file.html

    Any help would be greatly appreciated. Thanks

    • Anton Oliinyk writes:
      June 10th, 201011:33 pmat
      avatar

      Looks like a side-effect of mod_userdir Apache module.
      I think it’s no possible to stop with rewriting as the request never actually comes to your virtual host.
      I suggest to ask your hosting provider to disable mod_userdir at least for your virtual host or to move to another hosting with mod_userdir disabled.

  • Katie @ women magazine writes:
    May 2nd, 20113:27 pmat
    avatar

    When i do this my site page goes into infinite loop looking for sub-directories and never opens the page. What’s wrong?

    • Anton Oliinyk writes:
      May 2nd, 20113:34 pmat
      avatar

      Katie, what exactly do you do?
      You can email me sample code and I’ll take a look.

  • Speedt_ouch writes:
    November 17th, 20114:25 pmat
    avatar

    Hi,
    Thanks for sharing this.
    I have some questions I hope you can clarify.

    I normally user rewrite mod with the unique numeric ID of the information I need, in order to query the mysql database.
    With your example there is no numeric ID.
    Now I’m confused :)
    There is where my questions start.

    Imagine I have a product
    products.php?category=network&product=1
    products.php?category=network&product=2

    How can I make
    /products/network/router.html open products.php?category=network&product=1
    and
    /products/network/cable.html open products.php?category=network&product=2

    Thanks in advance

    • Anton Oliinyk writes:
      November 17th, 20115:32 pmat
      avatar

      Hi!
      You’ll have to query database using string IDs you use in URLs. Say, you can add a field ‘slug’ to category and product tables and look up records by that field.

      • Speedt_ouch writes:
        November 17th, 20116:09 pmat
        avatar

        Hi,
        Thanks for the reply.
        Could you kindly provide an example please?

        I would like to keep the “fake URL” with out any numeric values.

        • Anton Oliinyk writes:
          November 17th, 20118:42 pmat
          avatar

          Say, category table is:
          id, slug, label
          1, appliances, Appliances
          2, network, Network
          3, video, Video

          And product table is:
          id, category_id, slug, label, description
          1, 1, blender, Blender, This is cool blender
          2, 1, water-filter, Water Filter, This is cool water filter
          3, 2, router, D-Link DIR-300, This is not so cool network router

          Now you get request for /products/network/router.html
          which is rewritten to products.php?category=network&product=router

          Now let’s query product data:

          $rs = $db->query("
          SELECT `product`.* FROM `product`
          JOIN `category` ON `product`.`category_id` = `category`.`id`
          WHERE `product`.`slug` = '{$db->real_escape_string($_GET['product'])}' AND `category->slug` = '{$db->real_escape_string($_GET['category'])}'");

          The result set will contain single record for the product with ID 3.

          Hope that helps.

          • Speedt_ouch writes:
            November 20th, 20111:24 amat
            avatar

            Hi.
            So I figure the slug field would be the same thing has the numeric ID field, only that contains words.
            Making sure there are no duplicate slag names also.
            Ill give it a try.

            Thanks for the example.

          • Anton Oliinyk writes:
            November 20th, 20115:21 pmat
            avatar

            Yes, you’re right. Definitely some way is needed to establish a match between human-friendly URL directory names and database records. So “slug” fields do.
            The only little difference in this particular sample case is that product slug have to be unique within category only as we use both category and product slugs to find a product. So there could be products with the same slug in other categories.

  • wolfdogg writes:
    February 20th, 201210:19 pmat
    avatar

    am i missing something here, because im having better luck with this

    if(!empty($_SERVER['QUERY_STRING'])){
    list($root,$path) = explode(‘?’, $_SERVER['REQUEST_URI']);

    //rest of code here

    have they changed the functionality of list()? because when i run it as suggested the script path is already stripped, and when the substr strips it even further, i end up with “ndex.php”

    see var dumps for $path and $pathInfo below

    string(10) “/index.php”
    array(1) { [0]=> string(8) “ndex.php” }

    • Anton Oliinyk writes:
      February 20th, 201211:43 pmat
      avatar

      Definitely, nothing changed with list().

      I believe, you have to go back to my original code as it is:
      list($path) = explode(‘?’, $_SERVER['REQUEST_URI']);
      As you changed it the path now actually goes to your $root variable.

  • wolfdogg writes:
    February 20th, 201210:34 pmat
    avatar

    i think the reason i was having hard time was because were talking about 2 different types of url querys,
    i need a solution for this type of url

    wolfdogg.org/?section=iditarod&subject=idit_weather

    i dont use the work index.php, but i want the code to be cross compatible for both /? and /index.php?

    and i need to figure out a system whats the best way to utilize those vars. currently, ‘section’ is the first subdirectory and ‘subject’ is the page , in this case, wolfdogg.org/iditarod/idit_weather.php

    any suggestions on how to modify the code to adapt?

    curently using

    RewriteEngine On
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule . index.php [L,QSA]

    list($root,$path) = explode(‘?’, $_SERVER['REQUEST_URI']);
    //Explode path to directories and remove empty items:
    $pathInfo = array();

    foreach (explode(‘&’, $path) as $dir) {
    if (!empty($dir)) {
    $pathInfo[] = urldecode($dir);
    }
    }

    looks like a good start
    var dump looks like this

    array(2) { [0]=> string(16) “section=iditarod” [1]=> string(20) “subject=idit_weather” }

    what im hoping to have is the url rewritten to this automatically

    wolfdogg.org/iditarod/idit_weather

    without changing all my links in the source code

    • Anton Oliinyk writes:
      February 20th, 201211:29 pmat
      avatar

      what im hoping to have is the url rewritten to this automatically
      wolfdogg.org/iditarod/idit_weather
      without changing all my links in the source code

      Rewriting incoming request URL and generating link URLs for rendering on web pages are two different things (yet related of course).
      I believe, you have no automatic way if you have URLs hardcoded. You might filter HTML output to replace URLs on-fly but it’s too tricky and will drain system resources.
      I suggest encapsulating URL generation into a function or better a class so that you can edit it from single place in future.

      Learning MVC best practices also might help. I especially like how Zend Framework handles it as it uses the same class set for parsing incoming request URL as well as for generating URLs of links on pages.

  • alexey majidian writes:
    July 27th, 20124:12 pmat
    avatar

    Hello my good friends !

    I used these codes in file .htaccess

    RewriteEngine On
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule . index.php [L,QSA]

    but this error appeared (500 internal server error)

    ” Internal Server Error

    The server encountered an internal error or misconfiguration and was unable to complete your request.

    Please contact the server administrator, webmaster@gmail.com and inform them of the time the error occurred, and anything you might have done that may have caused the error.

    More information about this error may be available in the server error log.
    Apache/2.2.4 (Win32) PHP/5.2.1 Server at localhost Port 80 ”

    explain me why

    would you may send me full code with out errors?

    thanks a lot !

    • Anton Oliinyk writes:
      July 27th, 20126:28 pmat
      avatar

      Alexey, normally it have to work fine. Something is wrong with your Apache setup. For example it may have mod_rewrite disabled or something like that.
      You have to find the error description in Apache error log or ask hosting support if it’s a shared hosting server.

  • telugu cinema news writes:
    August 19th, 20122:01 pmat
    avatar

    i want wordpress .htacess desing

    • Anton Oliinyk writes:
      August 19th, 20123:56 pmat
      avatar

      Can you explain more?
      What do you want to implement and what’s your problem?

  • Fred Veenstra writes:
    October 30th, 20124:39 pmat
    avatar

    At last…a useful explanation that takes you by the hand and that also deals with the relative-url issue. Many thanks.

    Greetings from the Netherlands.

    • Anton Oliinyk writes:
      October 30th, 20124:58 pmat
      avatar

      You’re welcome, Fred)
      Hup Holland hup!

  • thuc101 writes:
    January 27th, 20137:32 pmat
    avatar

    good artile!!!

  • mario writes:
    April 19th, 20132:08 pmat
    avatar

    great article.

    But how about multi lang stuff?
    Let say first part is language code, if not default language. So I already have 2 different cases.

    abc.com/de/aaa/bbb/ccc

    1)check if first element is lang code. If so, ignore lang code and use second element as first element and proceed…

    2)no lang code set (becasue default language is used) continue like in your example.

    Or better language code as last parameter like:

    abc.com/aaa/bbb/ccc/ddd?lang=de

    what is best for SEO?

    • Anton Oliinyk writes:
      April 19th, 20136:23 pmat
      avatar

      I think using virtual folders for languages is better for SEO as search engines may not recognize URL parameters.

  • chairul anwar writes:
    April 19th, 20142:20 amat
    avatar

    can you help me.
    i have .htaccess like
    RewriteRule ^index.html$ /index.php [QSA]
    RewriteRule ^pdf/.* /a-single.php [QSA]
    RewriteRule ^ebook/.* /a-single-e.php [QSA]

    output:
    /pdf/post-title-id.pdf
    /ebook/post-title-id.pdf
    and i want to change those to
    post-title-id.pdf
    post-title-id.pdf

    help me please


Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

SIDEBAR
»
S
I
D
E
B
A
R
«

Valid XHTML 1.0 Transitional