Rewriting for SEO-Friendly URLs: .htaccess or PHP?

October 7, 2010December 30, 2009 by Anton Oliinyk

Modern database driven web sites implement SEO-friendly URLs emulating static directories and files. Switching to such “clean” URLs enables good indexing by search engines, makes URLs more user-friendly and hides the server-side language. For example, this clean URL may refer to the page in some product directory:

http://somesite.com/products/network/router.html

In fact, there is no /products/network folder on the server and no router.html file at all. The page is generated by server script using database query for “network” product category and “router” product. But who calls the script and where it gets the query parameter values?

This technique is usually referred as “URL rewriting”. It allows web server to recognize what information was requested by parsing the URL string. Apache and PHP allow multiple options to implement URL rewriting. So which one is the best?

Configuring mod_rewrite via .htaccess file

This is perhaps the most used way to implement rewriting especially to upgrade legacy web sites.
Consider we already have products.php script that can take category and product parameters from $_GET array. We just need to convert the request URI invisibly to the user:

/products/network/router.html => /products.php?category=network&product=router

Apache already has built-in URL rewriting engine: mod_rewrite. It allows specifying rules based on regular expressions for URL parsing, transformation and even redirect. You just need to create or modify .htaccess file to use mod_rewrite:

RewriteEngine On
RewriteRule ^products/(\w+)/(\w+)\.html products.php?category=$1&product=$2 [L]

Well, now the script can continue using $_GET array to get category and product name as if it was called with a dynamic URL and no modification is required to the script code.

Oops! What happened with my CSSs, JSs, images and relative links??! Don’t worry, I explained the problem and solutions in my post SEO-Friendly URLs and Relative Links.

While usage of mod_rewrite is very easy solution, it may introduce some problems with growing complexity of rewrite rules:

It’s very hard to debug .htaccess code.
Extended regular expression syntax may be incompatible with old Apache versions. I noticed some problems with GoDaddy shared hosting, which actually uses Apache v1.3.
You may also want to automatically correct user typos in URLs against database contents that is almost impossible with mod_rewrite only. (But you can still try mod_spelling.)

All those problems could be resolved by moving URI parsing logic to PHP code allowing implementation of more complex rewrite rules and debugging with native PHP tools.

Parsing REQUEST_URI by PHP code

Apache web server also allows you to use URLs like this one:

http://somesite.com/products.php/network/router.html

Apache will call products.php script and ignore the reminder path. The script can get it by parsing $_SERVER[‘REQUEST_URI’]:

//Remove request parameters:
list($path) = explode('?', $_SERVER['REQUEST_URI']);
//Remove script path:
$path = substr($path, strlen($_SERVER['SCRIPT_NAME'])+1);
//Explode path to directories and remove empty items:
$pathInfo = array();
foreach (explode('/', $path) as $dir) {
    if (!empty($dir)) {
        $pathInfo[] = urldecode($dir);
    }
}
if (count($pathInfo) > 0) {
    //Remove file extension from the last element:
    $last = $pathInfo[count($pathInfo)-1];
    list($last) = explode('.', $last);
    $pathInfo[count($pathInfo)-1] = $last;
}

Now $pathInfo variable contains elements of the reminder path. You can use it to specify database query parameters.
But what if they are invalid? This way you need to raise “file not found” error from PHP script. For example:

if (count($pathInfo[0] < 2) {
    header('HTTP/1.0 404 not found');
    exit;
}

Alternatively, you can fire a redirect to some error page.
I also recommend to use structural error handling to catch “path not found” exceptions that could be raised deep in your code.

Notice that PHP allows read/write access to $_GET array and you can use this ability to make legacy without a change:

$_GET['category']   = $pathInfo[0];
$_GET['product']    = $pathInfo[1];

But how to get rid of that “.php” in the URL? You can do this by renaming “products.php” file to “products” (w/o an extension) and modifying .htaccess file to tell Apache that “products” is actually a PHP script:

<FilesMatch "^products$">
    ForceType application/x-httpd-php
</FilesMatch>

Hmm.. don’t like the solution but this way URLs like http://somesite.com/products/network/router.html will work OK.

At some moment you may want to get rid of that “products” directory to make URLs shorter like http://somesite.com/network/router.html. You may also want have other directories like news, blog, etc on the same site.

Can we parse all virtual URLs in the same PHP script? Sure, we can do that!

Combining powers of mod_rewrite and PHP

The best way to implement SEO-friendly URLs is to combine powers of mod_rewrite and PHP. This way you gain full control over URL rewriting with full power of PHP language.

You just need to have very simple code in .htaccess file:

RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule . index.php [L,QSA]

The code is proven compatible with older Apache versions like that on GoDaddy shared hosting.

Now you will process any request for virtual URI in index.php file. You can get the URI requested from $_SERVER[‘REQUEST_URI’] and parse it with the code almost the same as above:

//Remove request parameters:
list($path) = explode('?', $_SERVER['REQUEST_URI']);
//Remove script path:
$path = substr($path, strlen(dirname($_SERVER['SCRIPT_NAME']))+1);
//Explode path to directories and remove empty items:
$pathInfo = array();
foreach (explode('/', $path) as $dir) {
    if (!empty($dir)) {
        $pathInfo[] = urldecode($dir);
    }
}
if (count($pathInfo) > 0) {
    //Remove file extension from the last element:
    $last = $pathInfo[count($pathInfo)-1];
    list($last) = explode('.', $last);
    $pathInfo[count($pathInfo)-1] = $last;
}

This way you can implement very intellectual logic to provide most smart, short and flexible document naming schemes on your web site. For example, you can make all these (and many similar) URLs referring (or better redirecting) to the same page:

http://somesite.com/products/network/router.html
http://somesite.com/products-network/router/
http://somesite.com/networks/router/
http://somesite.com/router

At the same time, the code can recognize that http://somesite.com/20091010/router/ refers to a news article just because there is a corresponding record in the news table.

You can also combine URL parsing logic with content negotiation logic recognizing client’s user-agent.

As far as you do all virtual URL parsing with PHP, you can use native PHP debuggers and logging for it. You don’t need to care about physical files like images, CSSs or static pages as the .htaccess code above leaves them alone. But you still need to care about file not found errors as explained in the previous section.

Conclusion

There are multiple ways to implement URL rewriting with Apache and PHP and you need to make a choice depending on your project requirements. Having all URL parsing in the same PHP script is the most recommended solution as it allows to implement the most complex, expendable and easy-to-debug URL rewriting logic.

41 thoughts on “Rewriting for SEO-Friendly URLs: .htaccess or PHP?”

FAQPAL

December 30, 2009 at 7:32 am

We use .htaccss for all our friendly URL needs. Good post.
Reply
Amber Weinberg

December 30, 2009 at 8:22 pm

Hmm this is an interesting way of doing it. I normally use htacess (a bit differently than how you showed above) and I notice if you have two urls with similar keywords, it gets confused and takes you to one page instead of to. For example:

http://www.site.com/amber-is-cool

and

http://www.site.com/amber-is-mean

would take you to the same page. This can normally be changed, unless a client specifically wants a page to be the similar name.
Reply
- Anton Oliinyk
  
  December 31, 2009 at 1:59 am
  
  Amber, can you describe the way you use htaccess? I’m still researching this subject.
  I think there is nothing bad in having multiple URL aliases for the same page. But it’s better to make them redirecting to original URL to do not confuse the user as well as search engines.
  I’m going to write another post on how such a redirect could be implemented with PHP.
  Reply
WebDesignExpert.Me

December 31, 2009 at 5:49 am

Great article! This can certainly be helpful for users wanting search engine friendly URL’s on Linux or Unix hosting!
Reply
Anton Oliinyk

December 31, 2009 at 6:16 pm

Update: I updated URI parsing code as there were some minor problems with it and added PHP example for the last method.
Reply
JFrankParnell

December 31, 2009 at 6:46 pm

here is a similar method for using php to do most of the work:
http://forum.modrewrite.com/viewtopic.php?t=2521
Reply
Sanakan

January 2, 2010 at 5:56 am

Great article, thanks !
Zend Framework follow this mod_rewrite trick.

@Anton Oliinyk ,http://net.tutsplus.com/tutorials/other/a-deeper-look-at-mod_rewrite-for-apache/ got deep inside it.And every example is very useful.

(I hate china GFW ,@_@!)
Reply
Webdesign Expert

January 14, 2010 at 3:05 pm

It’s quite interesting article. I’m just curious how long are in interested in this subject ? I saw many blogs but Your’s it’s really informative.
Reply
- Anton Oliinyk
  
  January 14, 2010 at 5:03 pm
  
  Not so much time, actually. I searched the web for some comprehensive guide but found only partial explanations. So, I decided to write some note about that.
  BTW, modern CMS’es and frameworks like WordPress or Zend Framework use similar mod_rewrite+PHP solutions but nobody wrote about that.
  Reply
Roch

June 9, 2010 at 2:07 am

Do you know of anyway with htaccess to disable someone from using your domain to point to their own website on the same server? Ex: they use YOURDOMAIN.com to promote their PHISHING WEBSITE.COM by using this simple URL to send users : YOURDOMAIN.COM/~phishing/file.html

Any help would be greatly appreciated. Thanks
Reply
- Anton Oliinyk
  
  June 10, 2010 at 11:33 pm
  
  Looks like a side-effect of mod_userdir Apache module.
  I think it’s no possible to stop with rewriting as the request never actually comes to your virtual host.
  I suggest to ask your hosting provider to disable mod_userdir at least for your virtual host or to move to another hosting with mod_userdir disabled.
  Reply
Katie @ women magazine

May 2, 2011 at 3:27 pm

When i do this my site page goes into infinite loop looking for sub-directories and never opens the page. What’s wrong?
Reply
- Anton Oliinyk
  
  May 2, 2011 at 3:34 pm
  
  Katie, what exactly do you do?
  You can email me sample code and I’ll take a look.
  Reply
Speedt_ouch

November 17, 2011 at 4:25 pm

Hi,
Thanks for sharing this.
I have some questions I hope you can clarify.

I normally user rewrite mod with the unique numeric ID of the information I need, in order to query the mysql database.
With your example there is no numeric ID.
Now I’m confused 🙂
There is where my questions start.

Imagine I have a product
products.php?category=network&product=1
products.php?category=network&product=2

How can I make
/products/network/router.html open products.php?category=network&product=1
and
/products/network/cable.html open products.php?category=network&product=2

Thanks in advance
Reply
- Anton Oliinyk
  
  November 17, 2011 at 5:32 pm
  
  Hi!
  You’ll have to query database using string IDs you use in URLs. Say, you can add a field ‘slug’ to category and product tables and look up records by that field.
  Reply
  - Speedt_ouch
    
    November 17, 2011 at 6:09 pm
    
    Hi,
    Thanks for the reply.
    Could you kindly provide an example please?
    
    I would like to keep the “fake URL” with out any numeric values.
    Reply
    - Anton Oliinyk
      
      November 17, 2011 at 8:42 pm
      
      Say, category table is:
      id, slug, label
      1, appliances, Appliances
      2, network, Network
      3, video, Video
      …
      
      And product table is:
      id, category_id, slug, label, description
      1, 1, blender, Blender, This is cool blender
      2, 1, water-filter, Water Filter, This is cool water filter
      3, 2, router, D-Link DIR-300, This is not so cool network router
      …
      
      Now you get request for /products/network/router.html
      which is rewritten to products.php?category=network&product=router
      
      Now let’s query product data:
      $rs = $db->query(" SELECT `product`.* FROM `product` JOIN `category` ON `product`.`category_id` = `category`.`id` WHERE `product`.`slug` = '{$db->real_escape_string($_GET['product'])}' AND `category->slug` = '{$db->real_escape_string($_GET['category'])}'");
      
      The result set will contain single record for the product with ID 3.
      
      Hope that helps.
      Reply
      - Speedt_ouch
        
        November 20, 2011 at 1:24 am
        
        Hi.
        So I figure the slug field would be the same thing has the numeric ID field, only that contains words.
        Making sure there are no duplicate slag names also.
        Ill give it a try.
        
        Thanks for the example.
      - Anton Oliinyk
        
        November 20, 2011 at 5:21 pm
        
        Yes, you’re right. Definitely some way is needed to establish a match between human-friendly URL directory names and database records. So “slug” fields do.
        The only little difference in this particular sample case is that product slug have to be unique within category only as we use both category and product slugs to find a product. So there could be products with the same slug in other categories.
wolfdogg

February 20, 2012 at 10:19 pm

am i missing something here, because im having better luck with this

if(!empty($_SERVER[‘QUERY_STRING’])){
list($root,$path) = explode(‘?’, $_SERVER[‘REQUEST_URI’]);

//rest of code here

have they changed the functionality of list()? because when i run it as suggested the script path is already stripped, and when the substr strips it even further, i end up with “ndex.php”

see var dumps for $path and $pathInfo below

string(10) “/index.php”
array(1) { [0]=> string(8) “ndex.php” }
Reply
- Anton Oliinyk
  
  February 20, 2012 at 11:43 pm
  
  Definitely, nothing changed with list().
  
  I believe, you have to go back to my original code as it is:
  list($path) = explode(‘?’, $_SERVER[‘REQUEST_URI’]);
  As you changed it the path now actually goes to your $root variable.
  Reply
wolfdogg

February 20, 2012 at 10:34 pm

i think the reason i was having hard time was because were talking about 2 different types of url querys,
i need a solution for this type of url

wolfdogg.org/?section=iditarod&subject=idit_weather

i dont use the work index.php, but i want the code to be cross compatible for both /? and /index.php?

and i need to figure out a system whats the best way to utilize those vars. currently, ‘section’ is the first subdirectory and ‘subject’ is the page , in this case, wolfdogg.org/iditarod/idit_weather.php

any suggestions on how to modify the code to adapt?

curently using

RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule . index.php [L,QSA]

list($root,$path) = explode(‘?’, $_SERVER[‘REQUEST_URI’]);
//Explode path to directories and remove empty items:
$pathInfo = array();

foreach (explode(‘&’, $path) as $dir) {
if (!empty($dir)) {
$pathInfo[] = urldecode($dir);
}
}

looks like a good start
var dump looks like this

array(2) { [0]=> string(16) “section=iditarod” [1]=> string(20) “subject=idit_weather” }

what im hoping to have is the url rewritten to this automatically

wolfdogg.org/iditarod/idit_weather

without changing all my links in the source code
Reply
- Anton Oliinyk
  
  February 20, 2012 at 11:29 pm
  
  what im hoping to have is the url rewritten to this automatically
  wolfdogg.org/iditarod/idit_weather
  without changing all my links in the source code
  
  Rewriting incoming request URL and generating link URLs for rendering on web pages are two different things (yet related of course).
  I believe, you have no automatic way if you have URLs hardcoded. You might filter HTML output to replace URLs on-fly but it’s too tricky and will drain system resources.
  I suggest encapsulating URL generation into a function or better a class so that you can edit it from single place in future.
  
  Learning MVC best practices also might help. I especially like how Zend Framework handles it as it uses the same class set for parsing incoming request URL as well as for generating URLs of links on pages.
  Reply
alexey majidian

July 27, 2012 at 4:12 pm

Hello my good friends !

I used these codes in file .htaccess

RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule . index.php [L,QSA]

but this error appeared (500 internal server error)

” Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator, webmaster@gmail.com and inform them of the time the error occurred, and anything you might have done that may have caused the error.

More information about this error may be available in the server error log.
Apache/2.2.4 (Win32) PHP/5.2.1 Server at localhost Port 80 ”

explain me why

would you may send me full code with out errors?

thanks a lot !
Reply
- Anton Oliinyk
  
  July 27, 2012 at 6:28 pm
  
  Alexey, normally it have to work fine. Something is wrong with your Apache setup. For example it may have mod_rewrite disabled or something like that.
  You have to find the error description in Apache error log or ask hosting support if it’s a shared hosting server.
  Reply
telugu cinema news

August 19, 2012 at 2:01 pm

i want wordpress .htacess desing
Reply
- Anton Oliinyk
  
  August 19, 2012 at 3:56 pm
  
  Can you explain more?
  What do you want to implement and what’s your problem?
  Reply
Fred Veenstra

October 30, 2012 at 4:39 pm

At last…a useful explanation that takes you by the hand and that also deals with the relative-url issue. Many thanks.

Greetings from the Netherlands.
Reply
- Anton Oliinyk
  
  October 30, 2012 at 4:58 pm
  
  You’re welcome, Fred)
  Hup Holland hup!
  Reply
thuc101

January 27, 2013 at 7:32 pm

good artile!!!
Reply
mario

April 19, 2013 at 2:08 pm

great article.

But how about multi lang stuff?
Let say first part is language code, if not default language. So I already have 2 different cases.

abc.com/de/aaa/bbb/ccc

1)check if first element is lang code. If so, ignore lang code and use second element as first element and proceed…

2)no lang code set (becasue default language is used) continue like in your example.

Or better language code as last parameter like:

abc.com/aaa/bbb/ccc/ddd?lang=de

what is best for SEO?
Reply
- Anton Oliinyk
  
  April 19, 2013 at 6:23 pm
  
  I think using virtual folders for languages is better for SEO as search engines may not recognize URL parameters.
  Reply
chairul anwar

April 19, 2014 at 2:20 am

can you help me.
i have .htaccess like
RewriteRule ^index.html$ /index.php [QSA]
RewriteRule ^pdf/.* /a-single.php [QSA]
RewriteRule ^ebook/.* /a-single-e.php [QSA]

output:
/pdf/post-title-id.pdf
/ebook/post-title-id.pdf
and i want to change those to
post-title-id.pdf
post-title-id.pdf

help me please
Reply
Yury

September 16, 2014 at 4:24 am

Error if you click on the url of this type with special character site.com/>

Forbidden

You don’t have permission to access /> on this server.

Apache/2.2.23 (Win32) PHP/5.3.18 Server at site.com Port 80
Reply
- Anton Oliinyk
  
  September 16, 2014 at 8:15 pm
  
  Hi, Yuri!
  
  Perhaps, something like mod_security is blocking it as suspicious URL. I suggest you check server’s error log.
  Not relevant to the post’s subject I believe unless I’m missing something.
  Reply
Gautam Nagraj

October 2, 2016 at 9:13 pm

Hello anton
Thanks for this tutorial, It helps me a lot

But still I am getting confused.. With this

I want urls like this

site.com/mobiles/samsung/galaxy-note/

How can I achieve this

Here in the above url

mobile is main category
samsung is sub category
galaxy note is product

and I want page to be open by category and by subcategory like this

site.com/mobiles
site.com/mobiles/samsung
site.com/mobiles/samsung/galaxy-note

Sir please help me out….

Thanks 🙂
Reply
- Anton Oliinyk
  
  October 3, 2016 at 3:51 pm
  
  Please read the post carefully, it explains it all. Once you configured .htaccess, you can parse $_SERVER[‘REQUEST_URI’] value in your PHP code. Break it by ‘/’ separator with explode() into say $pathInfo. Then $pathInfo[0] will be your category, $pathInfo[1] – subcategory and $pathInfo[2] will be your product.
  
  Though in the modern PHP world I would recommend using a CMS or framework. The time of custom coding such basic things have definitely passed.
  Reply
Vickey Rana

October 12, 2016 at 3:14 pm

hello sir,

i like the way of your explanation but who are totally newbies like me in htaccess can’t understand it easily so I think you should give a complete example with category and products in zip file to download …..
BTW awesome tutorial 🙂
Reply
- Anton Oliinyk
  
  October 12, 2016 at 3:49 pm
  
  The post is quite old, dated back to 2009, and many things changed dramatically since it was written. Nowadays I would recommend rather using a PHP framework or CMS.
  Anyway, extracting a category and product or whatever you coded into your path structure is just a matter of parsing path string. Sorry, really don’t want to go into that as it’s quite basic matters not related to URL rewriting actually.
  Reply
vikas bahal

October 17, 2016 at 7:31 am

hello sir,
hope you are doing well
you explained it very clearly and i learned from it, now i am able to make seo friendly urls in my project ,

but i need your assitance, please help me out

the above commentator @Gautam Nagraj asked you, that exactly i want but in different manner..

i also have category, subcategory, product and product detail page

so i want urls like

php pages are :
——————————
somesite.com/category.php?category_slug=electronics
somesite.com/subcategory.php?subcategory_slug=laptops
somesite.com/product.php?product_slug=lenovo-notebook

i want urls like this
————————————-
somesite.com/electronics
somesite.com/laptops
somesite.com/lenovo-notebook

my htaccess rules are:
——————————————
RewriteRule ^([a-zA-Z0-9-]+)$ category.php?category_slug=$1 [NC,L]
RewriteRule ^([a-zA-Z0-9-]+)$ subcategory.php?subcategory_slug=$1 [NC,L]
RewriteRule ^([a-zA-Z0-9-]+)$ products.php?product_slug=$1 [NC,L]

and i don’t want to show products like this

somesite.com/electronics/laptops/lenovo-notebook

i just want single url for category, subcategory and product

i tried it but it’s working for one file not for all

so please guide me………
Reply
- Anton Oliinyk
  
  October 17, 2016 at 5:14 pm
  
  Hi, Vikas!
  
  In your case all URLs looks similar to htaccess because you left no way to distinguish one from another. So only the first htaccess rule will always match and the latter two will be ignored as they have identical patterns. Apache have no idea if “notebook” is a category, subcategory or product so you need a way to let it know or move it to the PHP level.
  
  I suggest you either add virtual folders like “category/”, “subcategory/” and “product/” so you can match corresponding URLs by the parent folder name in your htaccess rules. For example, “category/electronics” path will match ^category/(.*) pattern and do not match ^subcategory/(.*) one.
  
  Or, you can use a generic htaccess as listed in the post to route all requests to index.php. There you can match the folder list against available categories, subcategories and products until you find a match. But this will obviously consume more server resources and obviously require you to keep folder names unique. For example, a category “electronics” will take precedence over same named subcategory rendering the latter inaccessible.
  
  Hope that helps.
  Reply

Configuring mod_rewrite via .htaccess file

Parsing REQUEST_URI by PHP code

Combining powers of mod_rewrite and PHP

Conclusion

41 thoughts on “Rewriting for SEO-Friendly URLs: .htaccess or PHP?”

Leave a Comment Cancel reply