I Just Export from PHP to CSV and Import It to MS Excel, What Could Go Wrong?

For many years I assumed generating a CSV file is the best option if you asked to implement an “export to Excel” in your PHP web application. Writing CSV files with PHP is plain simple and does not involve using complicated or proprietary libraries, the format could be also read by many applications other than Excel and after all the output is a plain-text file, which you can review with any text editor if something goes wrong. Basically, all you need is fputcsv PHP function, that takes a provided data array and writes it as well-formatted CSV row to the output file.

That all worked fine for me, until one day a client told me that CSV files we produce does not really import to Microsoft Excel. It turns out that Excel has two ways to import CSV file:

  • Text Import Wizard, which is invoked by clicking From Text on the Data tab of modern Excel versions. In this case you’re prompted to specify file encoding, delimiters and other settings.
  • Regular file import, which is performed if you double click a file in Explorer or used File -> Open menu. In this case Excel tries to auto-recognize file format details.

Our problem was that the regular file import didn’t work for files we generated with PHP. Text import wizard worked with correct settings but users didn’t want to go into file format details and after all I found it also doesn’t work right for some files.

After a lot of research, trials and errors, I found that was caused by few limitations MS Excel applies to CSV format, which are not supported by original PHP fputcsv function:

  • Excel expects all rows ended by CR and LF characters, while fputcsv on Linux etc places LF character only.
  • Excel expects multiline values to use CR character as a line break.
  • Excel has a limit of 32767 characters per single cell.

First two points I learned from this answer at Stack Overflow, which for some reason is not marked as correct one.

Still, the issues above is not a reason to switch from CSV to a native MS Excel format, writable via some library or PHP extension. With a little work, you can get your CSV files compatible with MS Excel:

  • Replace all LF characters or CR LF combinations with CR characters in the array items you’re passing to fputcsv function.
  • Ensure all array items you’re passing to fputcsv function do not exceed 32767 characters or trim them in any way possible.
  • Replace LF characters fputcsv put at line ends with CRLF.

Unfortunately, fputcsv function still does not support any way of altering line break characters so you only can rewrite them as following:

fputcsv($fp, $line); 
fseek($fp, -1, SEEK_CUR); 
fwrite($fp, "\r\n"); 

Or, you can use a function from this comment. However, it does not incorporate first two tips, so you may need to alter it or find another alternative function. There are some available.

After fixing the issues your CSV files will open in MS Excel w/o any problem.

Changing array from within the foreach loop

A very basic PHP syntax thing usually ignored by developers is that you can directly update current element from within the foreach loop.

Just precede value variable with & character and it will be assigned by reference:

$arr = array(1, 2, 3, 4);
foreach ($arr as &$value) {
    $value = $value * 2;
}

Read more

Easy way to automatically update commercial WordPress plug-ins

Update notification for a commercial plug-in

WordPress CMS has a nice auto update feature that displays update notifications for outdated plug-ins and themes in admin panel. Once you got a notification you can go to Dashboard->Updates and install all updates with few clicks while WordPress will automatically download and extract archives, remove old versions and reactivate plug-ins for you.

The only problem is  that it works for plug-ins published on wordpress.org which are open-source so it’s not an option for commercial plug-ins.

Luckily there is a very good solution to this problem: Upgrademe plug-in.

Read more

Beware: stream_copy_to_stream and Zend_Http_Client_Adapter_Socket may hang on old PHP 5.2.x

Recently we found that our Zend Framework based application was running into infinite loop and terminated by execution timeout on some hostings. The problem was found in Zend_Http_Client_Adapter_Socket class which uses stream_copy_to_stream if you configure Zend_Http_Client for writing data to stream.

The problem already was reported on ZF issue tracker but wasn’t fixed: http://framework.zend.com/issues/browse/ZF-9265.
It seems that the cause is a bug in stream_copy_to_stream that was fixed at some point during PHP 5.2.x development.

But as we need to run our code on virtually any hosting we decided to work around this problem by replacing stream_copy_to_stream with fread and fwrite in Zend_Http_Client_Adapter_Socket code.
Notice that Zend_Http_Client_Adapter_Curl is not affected by this problem as it uses internal code for writing to streams. Thus, switching to Zend_Http_Client_Adapter_Curl sounds as the easiest solution. We added automatic switching to it by checking if ‘curl_init’ function exists and if so we use Zend_Http_Client_Adapter_Curl as the adapter for Zend_Http_Client.

Why MySQL timestamp is 24 seconds different from PHP

You may find the timestamp value returned by MySQL UNIX_TIMESTAMP() function 24 seconds grater than that returned by PHP functions and methods like strtotime(), mktime(), DateTime::getTimestamp(), Zend_Date::getTimestamp().

Read more

PHP regular expression functions fail on GoDaddy shared hosting

While testing some crawler script on GoDaddy shared hosting I noticed that the script is quitting w/o any notice at random points. Both web and CLI execution modes where affected. The script was previously tested on XAMPP server where it  worked fine.

Lately, I identified that script always quits after calling one of regular expression functions (PRCE) like preg_replace, preg_match and preg_match_all. The script called them hundreds of times and one of the calls became fatal.

UPDATE: Actually it appears to be some kind of general problem with long string operations. But switching to multi-byte string regular expression functions helped in most scenarios.

Read more

Rewriting for SEO-Friendly URLs: .htaccess or PHP?

Modern database driven web sites implement SEO-friendly URLs emulating static directories and files. Switching to such “clean” URLs enables good indexing by search engines, makes URLs more user-friendly and hides the server-side language. For example, this clean URL may refer to the page in some product directory:

http://somesite.com/products/network/router.html

In fact, there is no /products/network folder on the server and no router.html file at all. The page is generated by server script using database query for “network” product category and “router” product. But who calls the script and where it gets the query parameter values?

This technique is usually referred as “URL rewriting”. It allows web server to recognize what information was requested by parsing the URL string. Apache and PHP allow multiple options to implement URL rewriting. So which one is the best?

Read more

PHP regular expressions and UTF-8

Perl-compatible regular expression functions in PHP can properly work with Unicode strings. Just add /u modifier to turn on UTF-8 support in preg_replace, preg_match, preg_match_all, preg_split and other PCRE (preg) functions. This way you can parse strings with national characters. For example:

$clean = preg_replace('/\s\s+/u', ' ', $dirty);

If used without /u modifier this code damages UTF-8 encoded strings by replacing national character bytes improperly interpreted as whitespace characters. This and many other problems are caused by improper interpretation of every byte as ASCII character which is not always true for UTF-8.

The modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.
I found this tip as well as many other useful info on regular-expressions.info. It’s not easy to find it in the PHP documentation but it’s actually hidden here.

SEO-friendly URLs and relative links

The Web community is going crazy about SEO-friendly URLs like http://somesite.com/products/network/router/. Well, it looks much better than a script URL http://somesite.com/products.php?c=network&p=router which may actually serve the page behind the scenes. There are a lot of good articles on how to implement SEO-friendly URLs, for example this one or my own post. But they do not warn the reader about one usual problem: once you have updated your site to handle virtual paths you will probably get a bad surprise:

CSS, image and internal page links are totally broken!

Why? Because those links are usually relative to the page location. The browser has no idea about virtual folders and tries to get files from locations relative to the page URL context. For example, if there is a usual CSS link in the page header:

<link rel="stylesheet" href="style.css" type="text/css" media="screen" />

Then the browser will try to download non-existing file http://somesite.com/products/network/router/style.css and fail silently. No CSS style will be applied.

It’s incredible how many words were spoken about SEO-friendly URLs with almost no word about this relative link problem.
So, what you have to do? Don’t worry, there are multiple solutions available and I’ll try to explain them all.

Read more

Reading, writing and converting RSA keys in PEM, DER, PUBLICKEYBLOB and PRIVATEKEYBLOB formats

This post finishes my epic about the implementation of RSA encryption. See the part I and part II of my post about RSA encryption for C++/Delphi (CryptoAPI) and PHP (OpenSSL) applications.

The main problem we faced was incompatibility of key formats. CryptoAPI uses PRIVATEKEYBLOB and PUBLICKEYBLOB formats to export and import RSA keys while OpenSSL extension for PHP uses PEM format. In order to use both libraries in communicating applications we needed some tool to convert keys from one format to another. The only tool we found for this was OpenSSL 1.0.x beta. Notice that earlier versions of OpenSSL do not support CryptoAPI BLOBs.

Update: It was found later that CryptoAPI has native functions for key conversion. See “Update” section at the bottom of the post.

Below is a command line syntax example for conversion of private key from PEM to PRIVATEKEYBLOB format:

openssl rsa -inform PEM -in private.pem -outform MS\ PRIVATEKEYBLOB -out private.blob

And this example converts PUBLICKEYBLOB to PEM format:

openssl rsa -pubin -inform MS\ PUBLICKEYBLOB -in public.blob -outform PEM -out public.pem

Notice that backslash (\) in format names. You need to type it as it actually escapes the space character.

However, we found some drawbacks in usage of OpenSSL 1.0.x beta:

  • There was no Windows build of it available at the time of the post but we wanted to convert keys on Windows.
  • We also wanted to convert keys directly in our code w/o any need for external application.

As far as PRIVATEKEYBLOBPUBLICKEYBLOB and PEM format structures are known, we decided to develop code that will read and write them using low-level functions. It actually took 1-2 days for me to develop that code so I don’t think it’s a really hard task.

Later we faced another problem: PHP versions prior to 5.2 don’t support openssl_pkey_get_details function. Once again, handling key formats directly helped us to resolve the issue by providing a replacement for the function.

So, let me explain how you can implement reading/writing PEM, DER, PRIVATEKEYBLOB and PUBLICKEYBLOB formats with some code examples in PHP for PEM and DER formats and in C++/VCL for CryptoAPI BLOBs. As the task was a part of a commercial project I cannot post a complete working example here. But I will do my best helping you to assemble such code on your own. You can also request our service at Pumka.net.

Read more