Modern database driven web sites implement SEO-friendly URLs emulating static directories and files. Switching to such “clean” URLs enables good indexing by search engines, makes URLs more user-friendly and hides the server-side language. For example, this clean URL may refer to the page in some product directory:
http://somesite.com/products/network/router.html
In fact, there is no /products/network folder on the server and no router.html file at all. The page is generated by server script using database query for “network” product category and “router” product. But who calls the script and where it gets the query parameter values?
This technique is usually referred as “URL rewriting”. It allows web server to recognize what information was requested by parsing the URL string. Apache and PHP allow multiple options to implement URL rewriting. So which one is the best?
Configuring mod_rewrite via .htaccess file
This is perhaps the most used way to implement rewriting especially to upgrade legacy web sites.
Consider we already have products.php script that can take category and product parameters from $_GET array. We just need to convert the request URI invisibly to the user:
/products/network/router.html => /products.php?category=network&product=router
Apache already has built-in URL rewriting engine: mod_rewrite. It allows specifying rules based on regular expressions for URL parsing, transformation and even redirect. You just need to create or modify .htaccess file to use mod_rewrite:
RewriteEngine On
RewriteRule ^products/(\w+)/(\w+)\.html products.php?category=$1&product=$2 [L]
Well, now the script can continue using $_GET array to get category and product name as if it was called with a dynamic URL and no modification is required to the script code.
Oops! What happened with my CSSs, JSs, images and relative links??! Don’t worry, I explained the problem and solutions in my post SEO-Friendly URLs and Relative Links.
While usage of mod_rewrite is very easy solution, it may introduce some problems with growing complexity of rewrite rules:
- It’s very hard to debug .htaccess code.
- Extended regular expression syntax may be incompatible with old Apache versions. I noticed some problems with GoDaddy shared hosting, which actually uses Apache v1.3.
- You may also want to automatically correct user typos in URLs against database contents that is almost impossible with mod_rewrite only. (But you can still try mod_spelling.)
All those problems could be resolved by moving URI parsing logic to PHP code allowing implementation of more complex rewrite rules and debugging with native PHP tools.
Parsing REQUEST_URI by PHP code
Apache web server also allows you to use URLs like this one:
http://somesite.com/products.php/network/router.html
Apache will call products.php script and ignore the reminder path. The script can get it by parsing $_SERVER['REQUEST_URI']:
//Remove request parameters:
list($path) = explode('?', $_SERVER['REQUEST_URI']);
//Remove script path:
$path = substr($path, strlen($_SERVER['SCRIPT_NAME'])+1);
//Explode path to directories and remove empty items:
$pathInfo = array();
foreach (explode('/', $path) as $dir) {
if (!empty($dir)) {
$pathInfo[] = urldecode($dir);
}
}
if (count($pathInfo) > 0) {
//Remove file extension from the last element:
$last = $pathInfo[count($pathInfo)-1];
list($last) = explode('.', $last);
$pathInfo[count($pathInfo)-1] = $last;
}
Now $pathInfo variable contains elements of the reminder path. You can use it to specify database query parameters.
But what if they are invalid? This way you need to raise “file not found” error from PHP script. For example:
if (count($pathInfo[0] < 2) {
header('HTTP/1.0 404 not found');
exit;
}
Alternatively, you can fire a redirect to some error page.
I also recommend to use structural error handling to catch “path not found” exceptions that could be raised deep in your code.
Notice that PHP allows read/write access to $_GET array and you can use this ability to make legacy without a change:
$_GET['category'] = $pathInfo[0];
$_GET['product'] = $pathInfo[1];
But how to get rid of that “.php” in the URL? You can do this by renaming “products.php” file to “products” (w/o an extension) and modifying .htaccess file to tell Apache that “products” is actually a PHP script:
<FilesMatch "^products$">
ForceType application/x-httpd-php
</FilesMatch>
Hmm.. don’t like the solution but this way URLs like http://somesite.com/products/network/router.html will work OK.
At some moment you may want to get rid of that “products” directory to make URLs shorter like http://somesite.com/network/router.html. You may also want have other directories like news, blog, etc on the same site.
Can we parse all virtual URLs in the same PHP script? Sure, we can do that!
Combining powers of mod_rewrite and PHP
The best way to implement SEO-friendly URLs is to combine powers of mod_rewrite and PHP. This way you gain full control over URL rewriting with full power of PHP language.
You just need to have very simple code in .htaccess file:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule . index.php [L,QSA]
The code is proven compatible with older Apache versions like that on GoDaddy shared hosting.
Now you will process any request for virtual URI in index.php file. You can get the URI requested from $_SERVER['REQUEST_URI'] and parse it with the code almost the same as above:
//Remove request parameters:
list($path) = explode('?', $_SERVER['REQUEST_URI']);
//Remove script path:
$path = substr($path, strlen(dirname($_SERVER['SCRIPT_NAME']))+1);
//Explode path to directories and remove empty items:
$pathInfo = array();
foreach (explode('/', $path) as $dir) {
if (!empty($dir)) {
$pathInfo[] = urldecode($dir);
}
}
if (count($pathInfo) > 0) {
//Remove file extension from the last element:
$last = $pathInfo[count($pathInfo)-1];
list($last) = explode('.', $last);
$pathInfo[count($pathInfo)-1] = $last;
}
This way you can implement very intellectual logic to provide most smart, short and flexible document naming schemes on your web site. For example, you can make all these (and many similar) URLs referring (or better redirecting) to the same page:
- http://somesite.com/products/network/router.html
- http://somesite.com/products-network/router/
- http://somesite.com/networks/router/
- http://somesite.com/router
At the same time, the code can recognize that http://somesite.com/20091010/router/ refers to a news article just because there is a corresponding record in the news table.
You can also combine URL parsing logic with content negotiation logic recognizing client’s user-agent.
As far as you do all virtual URL parsing with PHP, you can use native PHP debuggers and logging for it. You don’t need to care about physical files like images, CSSs or static pages as the .htaccess code above leaves them alone. But you still need to care about file not found errors as explained in the previous section.
Conclusion
There are multiple ways to implement URL rewriting with Apache and PHP and you need to make a choice depending on your project requirements. Having all URL parsing in the same PHP script is the most recommended solution as it allows to implement the most complex, expendable and easy-to-debug URL rewriting logic.