PDFmyURL: export web pages with JavaScript elements to PDF on shared hosting

Customers and users sometimes ask for an ability to save some reports from web applications to PDF files. It could be very tricky to write PDF directly but as far as you already have those reports on web pages you can add printable views of them and then ability to convert HTML/CSS of the pages to PDF, say with dompdf library.
This way you can hit two targets: add printing of reports and PDF export. Dompdf is still in beta but at least it could work on shared hosting AFAIK.

But what if you need to export web pages with JavaScript canvas elements like Flot charts that are plotted on browser side only? All available solutions require installing executables which is usually problematic on shared hosting.

The solution is PDFmyURL.com/ which you can easily use as a web-service: send GET requests with URL of the page you want to export and get PDF file as a response. The service is based wkhtmltopdf. As you can see, it’s good enough to convert pages with Flot charts:

http://pdfmyurl.com/?url=http://people.iola.dk/olau/flot/examples/basic.html.

You can call the service from JavaScript on your pages or from server-side scripts via any HTTP client library like cURL. All wkhtmltopdf options are supported and could be passed as GET request parameters.

Their free service inserts very small watermark to each page which is usually not a problem. But of course you can upgrade to paid service to remove it.

Beware: Estimated result count in Google AJAX Search API is incorrect

If you need to get Google search result count for some query, using estimatedResultCount from Google AJAX Search API sounds like a winner. However, the number reported as estimatedResultCount doesn’t match the number displayed on regular search pages and could be different in many times. This is known issue since 2008 and it seems it will never be fixed.

So if you want more correct result count, the extract it from regular Google search page. Alternatively you can use Yahoo or Bing APIs as they report the same numbers to displayed on their regular search pages.

UPDATE: It was found that regular Google search pages display even more rough estimated result count than Google AJAX Search API. For example, query for "pumka.net" (w/ quotes) it reports "About 1,380 results" while only finds 284 pages. Bing reports 55 results while finds 50-52 and Yahoo reports 210 results while finds 55. So using Bing or Yahoo API for analysis of result counts is likely more accurate.

Flot: Amazing JavaScript/jQuery/AJAX charting library

When I developed Windows desktop applications I was tied to ProEssentials library which was powerful yet expensive and had weird and very limiting API. Don’t want to say anything bad about ProEssentials developers but, OMG, their API contains hundreds of properties and functions you have to go through just to find few ones you really need. Not only do many of them have strange names, but they also do very strange things))).
How do you like “ForceVerticalPoints” property? How in this World could points be vertical or horizontal? Well, it’s actually about point labels… Enough about that horror.

Thank to Flot library, charting for the Web made extremely easy yet powerful and extensible. Flot is jQuery plug-in which means you can use all power of jQuery for setting it up, passing data to it and reacting to its events. Moreover, chart plotting happens on the browser side which means your server isn’t slowed down by producing chart pictures. And this also means you can add interactive features to your chart like point tooltips, zooming, live AJAX updates or display options applied w/o page reload.

Read more

Beware: stream_copy_to_stream and Zend_Http_Client_Adapter_Socket may hang on old PHP 5.2.x

Recently we found that our Zend Framework based application was running into infinite loop and terminated by execution timeout on some hostings. The problem was found in Zend_Http_Client_Adapter_Socket class which uses stream_copy_to_stream if you configure Zend_Http_Client for writing data to stream.

The problem already was reported on ZF issue tracker but wasn’t fixed: http://framework.zend.com/issues/browse/ZF-9265.
It seems that the cause is a bug in stream_copy_to_stream that was fixed at some point during PHP 5.2.x development.

But as we need to run our code on virtually any hosting we decided to work around this problem by replacing stream_copy_to_stream with fread and fwrite in Zend_Http_Client_Adapter_Socket code.
Notice that Zend_Http_Client_Adapter_Curl is not affected by this problem as it uses internal code for writing to streams. Thus, switching to Zend_Http_Client_Adapter_Curl sounds as the easiest solution. We added automatic switching to it by checking if ‘curl_init’ function exists and if so we use Zend_Http_Client_Adapter_Curl as the adapter for Zend_Http_Client.

How to set InnoDB as a default storage engine for MySQL tables

I use InnoDB storage engine because of support for transactions and referral integrity rules. However, MySQL still creates new tables as MyISAM by default. It was so annoying  to always define storage engine when creating new tables and double check that I didn’t forget it until I found how to set InnoDB by default.

Read more

Why MySQL timestamp is 24 seconds different from PHP

You may find the timestamp value returned by MySQL UNIX_TIMESTAMP() function 24 seconds grater than that returned by PHP functions and methods like strtotime(), mktime(), DateTime::getTimestamp(), Zend_Date::getTimestamp().

Read more

PHP regular expression functions fail on GoDaddy shared hosting

While testing some crawler script on GoDaddy shared hosting I noticed that the script is quitting w/o any notice at random points. Both web and CLI execution modes where affected. The script was previously tested on XAMPP server where it  worked fine.

Lately, I identified that script always quits after calling one of regular expression functions (PRCE) like preg_replace, preg_match and preg_match_all. The script called them hundreds of times and one of the calls became fatal.

UPDATE: Actually it appears to be some kind of general problem with long string operations. But switching to multi-byte string regular expression functions helped in most scenarios.

Read more

Rewriting for SEO-Friendly URLs: .htaccess or PHP?

Modern database driven web sites implement SEO-friendly URLs emulating static directories and files. Switching to such “clean” URLs enables good indexing by search engines, makes URLs more user-friendly and hides the server-side language. For example, this clean URL may refer to the page in some product directory:

http://somesite.com/products/network/router.html

In fact, there is no /products/network folder on the server and no router.html file at all. The page is generated by server script using database query for “network” product category and “router” product. But who calls the script and where it gets the query parameter values?

This technique is usually referred as “URL rewriting”. It allows web server to recognize what information was requested by parsing the URL string. Apache and PHP allow multiple options to implement URL rewriting. So which one is the best?

Read more

PHP regular expressions and UTF-8

Perl-compatible regular expression functions in PHP can properly work with Unicode strings. Just add /u modifier to turn on UTF-8 support in preg_replace, preg_match, preg_match_all, preg_split and other PCRE (preg) functions. This way you can parse strings with national characters. For example:

$clean = preg_replace('/\s\s+/u', ' ', $dirty);

If used without /u modifier this code damages UTF-8 encoded strings by replacing national character bytes improperly interpreted as whitespace characters. This and many other problems are caused by improper interpretation of every byte as ASCII character which is not always true for UTF-8.

The modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.
I found this tip as well as many other useful info on regular-expressions.info. It’s not easy to find it in the PHP documentation but it’s actually hidden here.

SEO-friendly URLs and relative links

The Web community is going crazy about SEO-friendly URLs like http://somesite.com/products/network/router/. Well, it looks much better than a script URL http://somesite.com/products.php?c=network&p=router which may actually serve the page behind the scenes. There are a lot of good articles on how to implement SEO-friendly URLs, for example this one or my own post. But they do not warn the reader about one usual problem: once you have updated your site to handle virtual paths you will probably get a bad surprise:

CSS, image and internal page links are totally broken!

Why? Because those links are usually relative to the page location. The browser has no idea about virtual folders and tries to get files from locations relative to the page URL context. For example, if there is a usual CSS link in the page header:

<link rel="stylesheet" href="style.css" type="text/css" media="screen" />

Then the browser will try to download non-existing file http://somesite.com/products/network/router/style.css and fail silently. No CSS style will be applied.

It’s incredible how many words were spoken about SEO-friendly URLs with almost no word about this relative link problem.
So, what you have to do? Don’t worry, there are multiple solutions available and I’ll try to explain them all.

Read more