Rewriting for SEO-Friendly URLs: .htaccess or PHP?

Modern database driven web sites implement SEO-friendly URLs emulating static directories and files. Switching to such “clean” URLs enables good indexing by search engines, makes URLs more user-friendly and hides the server-side language. For example, this clean URL may refer to the page in some product directory:

http://somesite.com/products/network/router.html

In fact, there is no /products/network folder on the server and no router.html file at all. The page is generated by server script using database query for “network” product category and “router” product. But who calls the script and where it gets the query parameter values?

This technique is usually referred as “URL rewriting”. It allows web server to recognize what information was requested by parsing the URL string. Apache and PHP allow multiple options to implement URL rewriting. So which one is the best?

Configuring mod_rewrite via .htaccess file

This is perhaps the most used way to implement rewriting especially to upgrade legacy web sites.
Consider we already have products.php script that can take category and product parameters from $_GET array. We just need to convert the request URI invisibly to the user:

/products/network/router.html => /products.php?category=network&product=router

Apache already has built-in URL rewriting engine: mod_rewrite. It allows specifying rules based on regular expressions for URL parsing, transformation and even redirect. You just need to create or modify .htaccess file to use mod_rewrite:

RewriteEngine On
RewriteRule ^products/(\w+)/(\w+)\.html products.php?category=$1&product=$2 [L]

Well, now the script can continue using $_GET array to get category and product name as if it was called with a dynamic URL and no modification is required to the script code.

Oops! What happened with my CSSs, JSs, images and relative links??! Don’t worry, I explained the problem and solutions in my post SEO-Friendly URLs and Relative Links.

While usage of mod_rewrite is very easy solution, it may introduce some problems with growing complexity of rewrite rules:

  • It’s very hard to debug .htaccess code.
  • Extended regular expression syntax may be incompatible with old Apache versions. I noticed some problems with GoDaddy shared hosting, which actually uses Apache v1.3.
  • You may also want to automatically correct user typos in URLs against database contents that is almost impossible with mod_rewrite only. (But you can still try mod_spelling.)

All those problems could be resolved by moving URI parsing logic to PHP code allowing implementation of more complex rewrite rules and debugging with native PHP tools.

Parsing REQUEST_URI by PHP code

Apache web server also allows you to use URLs like this one:

http://somesite.com/products.php/network/router.html

Apache will call products.php script and ignore the reminder path. The script can get it by parsing $_SERVER[‘REQUEST_URI’]:

//Remove request parameters:
list($path) = explode('?', $_SERVER['REQUEST_URI']);
//Remove script path:
$path = substr($path, strlen($_SERVER['SCRIPT_NAME'])+1);
//Explode path to directories and remove empty items:
$pathInfo = array();
foreach (explode('/', $path) as $dir) {
    if (!empty($dir)) {
        $pathInfo[] = urldecode($dir);
    }
}
if (count($pathInfo) > 0) {
    //Remove file extension from the last element:
    $last = $pathInfo[count($pathInfo)-1];
    list($last) = explode('.', $last);
    $pathInfo[count($pathInfo)-1] = $last;
}

Now $pathInfo variable contains elements of the reminder path. You can use it to specify database query parameters.
But what if they are invalid? This way you need to raise “file not found” error from PHP script. For example:

if (count($pathInfo[0] < 2) {
    header('HTTP/1.0 404 not found');
    exit;
}

Alternatively, you can fire a redirect to some error page.
I also recommend to use structural error handling to catch “path not found” exceptions that could be raised deep in your code.

Notice that PHP allows read/write access to $_GET array and you can use this ability to make legacy without a change:

$_GET['category']   = $pathInfo[0];
$_GET['product']    = $pathInfo[1];

But how to get rid of that “.php” in the URL? You can do this by renaming “products.php” file to “products” (w/o an extension) and modifying .htaccess file to tell Apache that “products” is actually a PHP script:

<FilesMatch "^products$">
    ForceType application/x-httpd-php
</FilesMatch>

Hmm.. don’t like the solution but this way URLs like http://somesite.com/products/network/router.html will work OK.

At some moment you may want to get rid of that “products” directory to make URLs shorter like http://somesite.com/network/router.html. You may also want have other directories like news, blog, etc on the same site.

Can we parse all virtual URLs in the same PHP script? Sure, we can do that!

Combining powers of mod_rewrite and PHP

The best way to implement SEO-friendly URLs is to combine powers of mod_rewrite and PHP. This way you gain full control over URL rewriting with full power of PHP language.

You just need to have very simple code in .htaccess file:

RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule . index.php [L,QSA]

The code is proven compatible with older Apache versions like that on GoDaddy shared hosting.

Now you will process any request for virtual URI in index.php file. You can get the URI requested from $_SERVER[‘REQUEST_URI’] and parse it with the code almost the same as above:

//Remove request parameters:
list($path) = explode('?', $_SERVER['REQUEST_URI']);
//Remove script path:
$path = substr($path, strlen(dirname($_SERVER['SCRIPT_NAME']))+1);
//Explode path to directories and remove empty items:
$pathInfo = array();
foreach (explode('/', $path) as $dir) {
    if (!empty($dir)) {
        $pathInfo[] = urldecode($dir);
    }
}
if (count($pathInfo) > 0) {
    //Remove file extension from the last element:
    $last = $pathInfo[count($pathInfo)-1];
    list($last) = explode('.', $last);
    $pathInfo[count($pathInfo)-1] = $last;
}

This way you can implement very intellectual logic to provide most smart, short and flexible document naming schemes on your web site. For example, you can make all these (and many similar) URLs referring (or better redirecting) to the same page:

  • http://somesite.com/products/network/router.html
  • http://somesite.com/products-network/router/
  • http://somesite.com/networks/router/
  • http://somesite.com/router

At the same time, the code can recognize that http://somesite.com/20091010/router/ refers to a news article just because there is a corresponding record in the news table.

You can also combine URL parsing logic with content negotiation logic recognizing client’s user-agent.

As far as you do all virtual URL parsing with PHP, you can use native PHP debuggers and logging for it. You don’t need to care about physical files like images, CSSs or static pages as the .htaccess code above leaves them alone. But you still need to care about file not found errors as explained in the previous section.

Conclusion

There are multiple ways to implement URL rewriting with Apache and PHP and you need to make a choice depending on your project requirements. Having all URL parsing in the same PHP script is the most recommended solution as it allows to implement the most complex, expendable and easy-to-debug URL rewriting logic.

35 thoughts on “Rewriting for SEO-Friendly URLs: .htaccess or PHP?

    • Amber, can you describe the way you use htaccess? I’m still researching this subject.
      I think there is nothing bad in having multiple URL aliases for the same page. But it’s better to make them redirecting to original URL to do not confuse the user as well as search engines.
      I’m going to write another post on how such a redirect could be implemented with PHP.

    • Not so much time, actually. I searched the web for some comprehensive guide but found only partial explanations. So, I decided to write some note about that.
      BTW, modern CMS’es and frameworks like WordPress or Zend Framework use similar mod_rewrite+PHP solutions but nobody wrote about that.

  1. Do you know of anyway with htaccess to disable someone from using your domain to point to their own website on the same server? Ex: they use YOURDOMAIN.com to promote their PHISHING WEBSITE.COM by using this simple URL to send users : YOURDOMAIN.COM/~phishing/file.html

    Any help would be greatly appreciated. Thanks

    • Looks like a side-effect of mod_userdir Apache module.
      I think it’s no possible to stop with rewriting as the request never actually comes to your virtual host.
      I suggest to ask your hosting provider to disable mod_userdir at least for your virtual host or to move to another hosting with mod_userdir disabled.

  2. Hi,
    Thanks for sharing this.
    I have some questions I hope you can clarify.

    I normally user rewrite mod with the unique numeric ID of the information I need, in order to query the mysql database.
    With your example there is no numeric ID.
    Now I’m confused :)
    There is where my questions start.

    Imagine I have a product
    products.php?category=network&product=1
    products.php?category=network&product=2

    How can I make
    /products/network/router.html open products.php?category=network&product=1
    and
    /products/network/cable.html open products.php?category=network&product=2

    Thanks in advance

      • Hi,
        Thanks for the reply.
        Could you kindly provide an example please?

        I would like to keep the “fake URL” with out any numeric values.

        • Say, category table is:
          id, slug, label
          1, appliances, Appliances
          2, network, Network
          3, video, Video

          And product table is:
          id, category_id, slug, label, description
          1, 1, blender, Blender, This is cool blender
          2, 1, water-filter, Water Filter, This is cool water filter
          3, 2, router, D-Link DIR-300, This is not so cool network router

          Now you get request for /products/network/router.html
          which is rewritten to products.php?category=network&product=router

          Now let’s query product data:

          $rs = $db->query("
          SELECT `product`.* FROM `product`
          JOIN `category` ON `product`.`category_id` = `category`.`id`
          WHERE `product`.`slug` = '{$db->real_escape_string($_GET['product'])}' AND `category->slug` = '{$db->real_escape_string($_GET['category'])}'");

          The result set will contain single record for the product with ID 3.

          Hope that helps.

          • Hi.
            So I figure the slug field would be the same thing has the numeric ID field, only that contains words.
            Making sure there are no duplicate slag names also.
            Ill give it a try.

            Thanks for the example.

          • Yes, you’re right. Definitely some way is needed to establish a match between human-friendly URL directory names and database records. So “slug” fields do.
            The only little difference in this particular sample case is that product slug have to be unique within category only as we use both category and product slugs to find a product. So there could be products with the same slug in other categories.

  3. am i missing something here, because im having better luck with this

    if(!empty($_SERVER[‘QUERY_STRING’])){
    list($root,$path) = explode(‘?’, $_SERVER[‘REQUEST_URI’]);

    //rest of code here

    have they changed the functionality of list()? because when i run it as suggested the script path is already stripped, and when the substr strips it even further, i end up with “ndex.php”

    see var dumps for $path and $pathInfo below

    string(10) “/index.php”
    array(1) { [0]=> string(8) “ndex.php” }

    • Definitely, nothing changed with list().

      I believe, you have to go back to my original code as it is:
      list($path) = explode(‘?’, $_SERVER[‘REQUEST_URI’]);
      As you changed it the path now actually goes to your $root variable.

  4. i think the reason i was having hard time was because were talking about 2 different types of url querys,
    i need a solution for this type of url

    wolfdogg.org/?section=iditarod&subject=idit_weather

    i dont use the work index.php, but i want the code to be cross compatible for both /? and /index.php?

    and i need to figure out a system whats the best way to utilize those vars. currently, ‘section’ is the first subdirectory and ‘subject’ is the page , in this case, wolfdogg.org/iditarod/idit_weather.php

    any suggestions on how to modify the code to adapt?

    curently using

    RewriteEngine On
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule . index.php [L,QSA]

    list($root,$path) = explode(‘?’, $_SERVER[‘REQUEST_URI’]);
    //Explode path to directories and remove empty items:
    $pathInfo = array();

    foreach (explode(‘&’, $path) as $dir) {
    if (!empty($dir)) {
    $pathInfo[] = urldecode($dir);
    }
    }

    looks like a good start
    var dump looks like this

    array(2) { [0]=> string(16) “section=iditarod” [1]=> string(20) “subject=idit_weather” }

    what im hoping to have is the url rewritten to this automatically

    wolfdogg.org/iditarod/idit_weather

    without changing all my links in the source code

    • what im hoping to have is the url rewritten to this automatically
      wolfdogg.org/iditarod/idit_weather
      without changing all my links in the source code

      Rewriting incoming request URL and generating link URLs for rendering on web pages are two different things (yet related of course).
      I believe, you have no automatic way if you have URLs hardcoded. You might filter HTML output to replace URLs on-fly but it’s too tricky and will drain system resources.
      I suggest encapsulating URL generation into a function or better a class so that you can edit it from single place in future.

      Learning MVC best practices also might help. I especially like how Zend Framework handles it as it uses the same class set for parsing incoming request URL as well as for generating URLs of links on pages.

  5. Hello my good friends !

    I used these codes in file .htaccess

    RewriteEngine On
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule . index.php [L,QSA]

    but this error appeared (500 internal server error)

    ” Internal Server Error

    The server encountered an internal error or misconfiguration and was unable to complete your request.

    Please contact the server administrator, webmaster@gmail.com and inform them of the time the error occurred, and anything you might have done that may have caused the error.

    More information about this error may be available in the server error log.
    Apache/2.2.4 (Win32) PHP/5.2.1 Server at localhost Port 80 ”

    explain me why

    would you may send me full code with out errors?

    thanks a lot !

    • Alexey, normally it have to work fine. Something is wrong with your Apache setup. For example it may have mod_rewrite disabled or something like that.
      You have to find the error description in Apache error log or ask hosting support if it’s a shared hosting server.

  6. great article.

    But how about multi lang stuff?
    Let say first part is language code, if not default language. So I already have 2 different cases.

    abc.com/de/aaa/bbb/ccc

    1)check if first element is lang code. If so, ignore lang code and use second element as first element and proceed…

    2)no lang code set (becasue default language is used) continue like in your example.

    Or better language code as last parameter like:

    abc.com/aaa/bbb/ccc/ddd?lang=de

    what is best for SEO?

  7. can you help me.
    i have .htaccess like
    RewriteRule ^index.html$ /index.php [QSA]
    RewriteRule ^pdf/.* /a-single.php [QSA]
    RewriteRule ^ebook/.* /a-single-e.php [QSA]

    output:
    /pdf/post-title-id.pdf
    /ebook/post-title-id.pdf
    and i want to change those to
    post-title-id.pdf
    post-title-id.pdf

    help me please

  8. Error if you click on the url of this type with special character site.com/>

    Forbidden

    You don’t have permission to access /> on this server.

    Apache/2.2.23 (Win32) PHP/5.3.18 Server at site.com Port 80

    • Hi, Yuri!

      Perhaps, something like mod_security is blocking it as suspicious URL. I suggest you check server’s error log.
      Not relevant to the post’s subject I believe unless I’m missing something.

Leave a Comment