Category Archives: Problems

Extract All Email Addresses from Outlook

I was looking for a solution to extract a list of all emails I have ever used or emailed through outlook. It might be helpful for you too. This what I ended up doing:

a) Export email and contacts into an olm file (Mac). I was using Outlook 2011. This is a binary compressed format. I selected Email and Contacts.

b) Use StuffItExpander to extract the olm to a readable XML structure. Just install StuffItExpander and drag the olm file on to it. You end up with a parsable directory structure including all files in xml format.

c) I could not find a solution to recursively parse the mess, so I decided to merge all xml files form all subdirectories into one big file:

$ find /path/to/directory/ -name *.xml -print0 | xargs -0 -I file cat file > merged.file

d) Extract all email addresses from the merged file into a file:

$ perl -wne'while(/[\w\.\-]+@[\w\.\-]+\w+/g){print "$&\n"}' merged.file | sort -u > output.txt

e) You will be surprised how many lines this file will have. Check the output.txt containing all unique extracted email addresses from outlook. The list needs to be cleaned. There will be a lot of invalid or temporary emails you need to go through manually.

Have fun with your list.

Migrate arbitrary Content to WordPress – e.g. from typo3 tt_content

If you see yourself confronted with the following task, here is a quick receipe: Setup a wordpress instance for us and import all our content from system x. In this case we focus on the import part. ‘System x’ was a somewhat customized typo3 setup. I got myself an ordinatry editors login to identify the articles which had to be migrated. After some fulltext searching and access to phpMyAdmin I identified the area, where I hat to extract the content from. It boiled down to the table tt_content. At this point you can forget about typo3. All articles could be extracted with this SQL query:

SELECT uid, pid, header, bodytext FROM tt_content WHERE header<>'' AND CType='editorialcontent';

Export the content using phpMyAdmin’s export to PHP array functionality. You end up with a file like shown in the screenshot. If you have HTML-content and doubt, if this could be valid php, save the file under export.php and do ‘$ php -l export.php’ – it says ‘No syntax errors.’ – Good.

OK, next you need to get familiar with wp_insert_post(). You need to loop your exported content over it. And that is it. This is how I did it:

  • Create a directory in your wordpress setup ‘wp-import’.
  • Copy/move your export.php array into this directory.
  • And execute the following self-explanatory file located also in your wp-import directory: import.php

That’s it.

MSSQL CAST() and GROUP BY

Just in case somebody has a similar problem… Coming mainly from MySql I had to deal with some MsSql/SQL Server specialties. Consider the following working SQL for MsSql:

SELECT AK.AK_TEXT, BU.BUDGET_BEREICH ,TY.AKTIONSTYP_TEXT ,AK.AK_STARTDATUM ,AK.AK_ENDEDATUM
FROM AKTIONEN AS AK WITH (nolock)
LEFT JOIN BUDGET_TRAEGER AS BU
ON AK.AK_BUDGET_NUMMER=BU.BUDGET_NUMMER
LEFT JOIN AKTIONSTYP AS TY
ON AK.AK_TYP=TY.AKTIONSTYP_TYP
LEFT JOIN AKTIONEN_FILIALE AS AF
ON AK.AK_LFDNR=AF.AF_LFDNR
WHERE (
  ( AK.AK_ENDEDATUM<=CAST('2010-06-30' AS smalldatetime)
   AND AK.AK_ENDEDATUM>=CAST('2010-01-01' AS smalldatetime)
  )
  OR ( AK.AK_STARTDATUM<=CAST('2010-06-30' AS smalldatetime)
    AND AK.AK_STARTDATUM>=CAST('2010-01-01' AS smalldatetime)
  )
)
GROUP BY AK.AK_TEXT, BU.BUDGET_BEREICH ,TY.AKTIONSTYP_TEXT ,AK.AK_STARTDATUM ,AK.AK_ENDEDATUM;

Interesting parts explained:

(a) The addition ‘WITH (nolock)’ prevents my application from locking the table during select queries.

(b) The fields from which I liked to select a bunch of records touching a daterange between AK_STARTDATUM and AK_ENDEDATUM were of the fieldtype ‘smalldatetime’. This is some strage SQL Server format. It needs to be converted to with the CAST() function in order to return the expected resultsets. It understands several input formats. You can test the correct conversion with a simplified SQL statement like this: SELECT CAST(’2010-03-29′ AS smalldatetime);

(c) The statement will not work and return an error if you do not include ALL your selected fields also in the GROUP BY clause.

Backup MySql-Database using only FTP

We have just had the case of having ftp access to a site on a shared hosting LAMP webspace but needed also the database and mainly the database scheme to setup our own development-system for the app. This is actually very straight forward using the following snippet uploaded and executed on the webserver:

<?php
exec( 'mysqldump -h dbXXX.1und1.de -u dboYYY -pZZZ dbYYY > dump.sql', $ret );
var_dump($ret);

This is what you do:

  • Find the configuration with the db-credentials in the source of the application and replace
    • dbXXX.1und1.de with the hostname,
    • dboYYY with the username,
    • ZZZ with the password (notice there is NO blank between the option -p and the password!),
    • dbYYY with the database you would like to backup.
  • Save the snippet as file with extension .php and copy it via ftp to the webspace.
  • Find the URL to execute the script.
  • If it worked, there should be a new file dump.sql in the same directory, download it, it contains the db-schema and all data.
  • Delete both files to wipe your traces.

If it does not work try something like “exec(‘ls -la’);”. If you see the contents of the directory you can use the exec-function and the problem is something else.

Happy hacking!

Upgrade PHP5 with an alternative sources.list on Debian etch

I was having trouble with a server running Debian 4.0 (etch). Using the standard sources in the /etc/apt/sources.list the supported PHP5 version was 5.2.0-8+etch13 which contained a very annoying bug for my application.

A daily running script – let’s call it the Importer – regularly exited randomly with a “Fatal error: Out of memory (allocated 12320768) (tried to allocate 2851436 bytes) in …” and I had to restart it manually nearly every morning. I had…

  • …checked my application for memory wasting operations and loops and fixed them,
  • …used ini_set(‘memory_limit’, ’64M’); at runtime, and
  • …finally increased memory_limit = 64M in my php.ini.

But all this did not change the bahaviour of the Importer!

So I took a look at the PHP5 Changelog to find potentially fixed bugs in newer releases. Bug #39438 described exactly my problem. So a simple upgrade would help me. But it did not work with ‘apt-get upgrade’ or ‘apt-get install php5=5.2.8′ since the highest version in the apt source I used was the one that I already had: 5.2.0-8+etch13, issued in November 2006… (pretty ancient)

Finally it was this page that had the information we needed: an alternative apt source

deb http://packages.dotdeb.org etch all
deb-src http://packages.dotdeb.org etch all

After getting an impression whether dotdeb was a trustworthy source, we first tried it on our dev-system with ‘apt-get update; apt-get upgrade;’. At this point I was once more glad to have written so many UnitTests. They all passed and everything looked good.

Thanks Kim for your help!

Save Serialized Objects to MySql

I have just discovered an issue if you store serialized objects into MySql.

At first I used fieldtype TEXT. If in this case somebody edited another field of such a record with phpMyAdmin and pushed the save button, the stored object in that record got currupted and the object could not be deserialized and instanciated anymore.

We now use the fieldtype BLOB instead, so phpMyAdmin does not give you the chance to edit the fieldcontent. And it works.

CMS Redirects to 404 Pages for Dead Links

I have an individual CMS running for a customer, who can edit his events, news etc. in a typical admin area. Each of the items had an expiry date. On any call to a deep URL to one of the expired items (e.g. from Google search results) it would make a redirect to the custom 404-page like this:

header("Location: http://www.mycusweb.de/errors/404.php");
exit;

The 404-Page itself would then respond with its own 404 header and display a beautiful error page.

As I found out, this would NOT have the effect that I expected it to have, namely that the next Google bot crawling these URLs would kill the URL from the index since it ended up on a proper 404 page via the redirect. The headers in sequence with the above redirect look like this:

GET /pages/events.php?id=123 HTTP/1.1
HTTP/1.x 302 Found
GET /errors/404.php
HTTP/1.1 HTTP/1.x 404 Not Found

But wait, there is 302. To verify this for your own redirects, you can use the Firefox add-on Live HTTP Headers.

The problem here is the use of a default redirect, which uses a status code ’302 Found’. This equals the old ’307 Temporary Redirect’ and has the meaning for the bot, something hast just temporarily moved. In order to ‘tell’ the bot that this page should no longer be indexed and deleted, you must use a ’301 Moved Permanently’ and you do it like this:

header("HTTP/1.1 301 Moved Permanently");
header("Location: http://www.mycusweb.de/errors/404.php");
exit;

If you’re interested in the HTTP status codes, you can find the RFCs here.

After all of your redirects that should ‘tell’ bots to remove URLs from the search index use the 301 variant, you can then use the Google Remove Tool to speed things up and call the bot.

Import Data into InnoDB Tables disregarding Foreign Key Constraints

I am currently migrating MySQL databases. One DB uses InnoDB tables with many foreign-key constraints. In this case you can not just dump your CREATE-TABLEs and INSERTs into an empty database. This would lead to many relation violations and incomplete imports, since records are not imported in ‘the right order’ by the dump.

To prevent errors you can temporarily switch off this feature. Like this:

SET FOREIGN_KEY_CHECKS = 0;

<do your import without errors>

SET FOREIGN_KEY_CHECKS = 1;

Major desasters… celebrate them!

Today I had some more major hardware failures…

Peter Shaw
After it was clear that I would be fixing things all day long, I took a deep breath and fired up Peter Dunston’s Website to relax and listen to good music while I was cleaning up the mess.

What happened:

  • The switch connecting my dev-servers with my workstations made nothing more but a funny senseless light show. OK, went to Conrad, replaced it and restarted the network. Easy.
  • Next, one harddrive in my backup RAID array (Terastation, the old one with the silver case you see on the picture) crashed. This was the first time I actually heard a harddrive crashing. It took about one minute. Nice – if you know it is a) RAID5 and b) you have another backup anyway. After unwinding ~50 screws to replace the defective harddrive I accidentally ripped off the cable connecting the front panel and the mainboard. Oops. Who needs flashing lights? I placed a stickynote saying the thing is running (see picture). 6 hours later the RAID was ready again.
  • The hardest part was the sudden death of my most important dev-server ‘africa’, where I had most of my current projects on. First it did not get an IP address via DHCP. As I was checking things – bang – the screen froze with lots of funny characters flying around – the matrix! After that it didn’t even show a bios screen again.

Terastation Tilt!

Thanks Pete for the background sounds!

Burned PC again…

Happy new year!

Some of you may remember this image and the story around it:

Server tilt! http://blog.t14g.de/index.php/archives/63

It happened again after a night of building an index (Solr/Lucene). This time the destruction of the filesystem looked like this:

Tilt again!

I have learnt my lesson and from now on I keep a harddisk-image of the installed system with all its components under my pillow.

[2008-01-25] Sidenote: I bought myself a proper server with RAID5 and it did the required job without any failures after 5 days. Note: Get yourself proper equipment right from the start and save lots of time, nerves and in the end money – even though the big machine was not cheap at all and it makes noise like an aeroplane.