Monthly Archives: October 2010

Clean your PHP4 Legacies using sed

If you have to deal with very old PHP4 legacy code containing every syntax crime you may know from the early years, how would you handle it? Give it to your junior people to fix it manually? I like to have at least some handy helpers for the first rough corrections. I found sed to be a very powerful helper here.

Code you might encounter - associative array elements without quotes.

I spent quite some time to find useful regular expressions to help me. This is how I did this:

A way to test your regular-expression is to echo a sample and apply your regex to test the results:

echo '$_REQUEST[action] reise_l_topic_ids[] $dat[ticket_order]' | sed "s/$([a-zA-Z0-9_]+)[([a-zA-Z0-9_]+)]/$1['2']/g"

Once it works you can apply your regex to one file:

sed -i "s/$([a-zA-Z0-9_]+)[([a-zA-Z0-9_]+)]/$1['2']/g" my_old_file.php

Or apply your regex to all *.php-files recursively below the current directory to a whole project:

find . -name "*.php" -exec sed -i "s/$([a-zA-Z0-9_]+)[([a-zA-Z0-9_]+)]/$1['2']/g" '{}' ;

By the way: The usage of sed works fine on your linux command line, but not on OSX. The syntax is slightly different here (sed -i “” -e “s/blah/blubb/” file). This is of course only a start to automate otherwise painfull and boring corrections down to just a few seconds. It will not save you from special manual work and break syntax at some points. But it weeds out 90% and leaves you with the other 10% acutal manual work.

You could imagine many more sed regexes e.g. to replace short tags <?=$my_var?> to a proper <?php echo $my_var; ?> etc.

echo '<?=$out?> sakdhs sakdhas k <?php echo $xyz; ?> ddd' | sed "s/<?=/<?php echo /g";

I will collect more regexes as I need and find them. If you have ideas please add them in the comment section.