Extract All Email Addresses from Outlook

I was looking for a solution to extract a list of all emails I have ever used or emailed through outlook. It might be helpful for you too. This what I ended up doing:

a) Export email and contacts into an olm file (Mac). I was using Outlook 2011. This is a binary compressed format. I selected Email and Contacts.

b) Use StuffItExpander to extract the olm to a readable XML structure. Just install StuffItExpander and drag the olm file on to it. You end up with a parsable directory structure including all files in xml format.

c) I could not find a solution to recursively parse the mess, so I decided to merge all xml files form all subdirectories into one big file:

$ find /path/to/directory/ -name *.xml -print0 | xargs -0 -I file cat file > merged.file

d) Extract all email addresses from the merged file into a file:

$ perl -wne'while(/[\w\.\-]+@[\w\.\-]+\w+/g){print "$&\n"}' merged.file | sort -u > output.txt

e) You will be surprised how many lines this file will have. Check the output.txt containing all unique extracted email addresses from outlook. The list needs to be cleaned. There will be a lot of invalid or temporary emails you need to go through manually.

Have fun with your list.

Eternal Image Cache Based on Demandware eShop via Simple API for External Image Usage

Here is another pragmatic web-technology based solution I would like to share with you. We had a given Demandware eShop with 35k variants at the time (in 2014). Where a variant represents a ‘product’ in a unique color and ‘articles’ being the children of the variants as real orderable items – a variant in a certain size. A big eShopping site.


Call of an image URL in the browser.

Call of an image URL in the browser.

Our goal was to have any image of articles in the eShop under a very simple URL that can be generated only by knowing a ‘number’ (SKU or EAN if available) leading to an existing article like this: http://imagex.xxxxx.de/<sku>. The response would be a html image header and the streamed content of the image itself or if not available a placeholder image.

Fortunately Demandware offers a comprehensive, well documented REST based API (we used  the Open Commerce API 13.6) to interact with the eShop returning results in pure JSON.

The initial use case was to add pictures of the articles to help logistics people picking the purchased items from the warehouse. They use mobile devices with a screen and item lists on it. Another use case was to add article images on various internal reports based on the business intelligence system informing management of what is currently being sold.

Article images used in a report.

Article images used in a report.

The workflow of the application works like this:

  1. The image is stored in an ‘eternal cache’ in a directory path that is generated from the ‘number’ – e.g. 4054781891957 converts to this path data/400/405/478/189/195/7/4054781891957.jpg (400 being the width of the cached image version).
    This way we did not have to use a database. We use just use is_file($path) to find out if we have a valid cache hit.
  2. If a cached file does not yet exist (first call on a ‘number’), depending on the type of the given ‘number’ we check a bunch things via the API in the background. This is like a regular search on the eShop, fetch the URL of the article’s main image, download it and save it to the local cache converted to 400px, 200px and 100px width.

Drawback: OCAPI only offers information about articles that are ‘online’ at the time of the API call.

Optional parameters:
Example-URL: http://imagex.xxxxx.de/30281349?debug&nocache&w=400

  • debug: Outputs the differnets actions along the way to the image.
  • nocache: Forces re-fetch from Demandware OCAPI.
  • w: Default width is 200px. Possible Values 400, 200, 100

The application is hosted in our own infrastructure for the customer. The application is based on Symfony2.

wsdl with remote methods

Update: The application now has also a SOAP/WSDL API to a) integrate images into other enterprise applications via base64-stream and b) force cache refresh of an image (happens on recycled article numbers).

Useful Linux Commands 01/2014

a) Bulk rename on the commandline. I needed this one to re-import bulk files for a BI database. All already processed files get a prefix ‘proc_’ in order to know, which files have already been imported into the BI database. Use http://tips.webdesign10.com/how-to-bulk-rename-files-in-linux-in-the-terminal

$ rename -n ‘s/^proc_order_list_(201312d{2}-d{6}-[A-Z]{3}-[A-Z]{2}).csv$/order_list_$1.csv/’ *.csv

proc_order_list_20131211-025914-AMZ-EU.csv renamed as order_list_20131211-025914-AMZ-EU.csv
proc_order_list_20131211-031130-ENG-DE.csv renamed as order_list_20131211-031130-ENG-DE.csv

$ rename ‘s/^proc_order_list_(201312d{2}-d{6}-[A-Z]{3}-[A-Z]{2}).csv$/order_list_$1.csv/’ *.csv

There is a second level of ‘argument list too long’. If you touch it, you need a bash-script like this:


find in/ -type f |
  while read old
  	new=`echo $old | sed "s/proc_//g"`
   	if [ ! -f $new ]; then
	  echo $old '->' $new
	  mv $old $new 

Or more selectively using a filename pattern:



find in/ -type f |
  while read old
        new=`echo $old | sed "s/proc_//g"`
        if [ ! -f $new ]; then
          if [ [ $old =~ $valid ] ]; then
            echo $old '->' $new
            mv $old $new
            #echo 'not matched' valid

b) Output results from SQL query to file, the quick way – in case you have been using phpMyAdmin for this ;):
$ mysql -u -p -D -e “SELECT …” > quick_result.txt

c) Find directories with count of subdirectories or files (had to use this in order to find cluttered directories that caused problems with a server having software RAID and rsync backups):

$ find . -type d | cut -d/ -f 2 | uniq -c | sort -g

d) Prevent cp/mv/rm – cannot execute [Argument list too long] using find to copy a long list of files when you have long filelists in directories from which you would like to copy/move/remove:
$ cd /var/www/project/incoming/staging;
$ find ../production/data/sources/orderstatus/in/ -name ‘*.xml’ -exec cp {} data/sources/orderstatus/in/ ;

e) Copy all files from multiple backup sub-directories (structure like this 1122/3344/112233445566.xml) into ONE directory:
$ find ./dumpdirs/11* -name “*.xml” -type f -exec cp {} ./flatToFolder ;

f) Count all files in subdirectories with the pattern proc.*.xml:
$ find in/ -name “proc_*.xml” | wc -l

g) Filelist too long using tar and wildcards, use a filelist:
$ find in/ -name ‘*.xml’ -print > tarfile.list.txt
$ tar -cjvf evelopmentOrderstati-20140306.tar.bz2 in/*.xml
$ rm tarfile.list.txt

h) Filelist too long using grep:
$ grep -r “4384940″ *
-bash: /bin/grep: Argument list too long
Too many files in your directory

$ ls -1 | wc -l

$ find . -type f | xargs grep “4384940″

Another way to avoid this problem is to substitute the “*” with a “.”:
$ grep -r “4384940″ .

OCI Interface Integration into your eShop via Punch-Out from SAP/SRM

If it comes to integration of eShops like Magento, OXID or even your cusom made eShops into a corporate buying infrastructure using SAP, you need to deal with the SRM module (Supplier Relationship Management) using the OCI interface.

On the shop side we had a Gambio eShop in this case, which is a commercial fork of XT-Commerce. At first we checked if there where any affordable ready-to-use modules available. We found some for around 1500-2000EUR with additional (unclear) service cost. To be honest, what we found looked a bit outdated and after some contacts with the responsible sales people not even very trust worthy. After skipping over the documentation we decided to integrate OCI on our own – the custom hacked way. After all, it is just simple HTTP.

If you check the documentation and your first thought is, you could easily do it using cURL at the checkout, be warned – OCI does NOT work using a cURL call from a remote server! We did it this way and failed the first try as soon as we discovered how OCI mechanics really worked (see below). The documentation is a bit ambiguous in that respect.

The main problem is, that only your customers customer (!) with whom you have to co-operate closely and directly can help you through testing. People there are not that technical, even though their job title says so. They own the SRM testsystem (you don’t) and you always have to ask for a punch-out-click in order to find new insights integrating OCI into your eShop. It took us many weeks instead of a couple of minutes due to vacation and corporate politics and overhead. Finally we integrated an all-encompassing logging for each minor step and response to reduce the communication to ‘please click again’ – and bamm, 3 days later we had a bunch of logging information powering the next development step.

Here is a SVN diff showing all files we added or modified in the gambio release with a description what we modified:

Changed files in original Gambio distribution:
M  checkout_shipping.php (added optical info, we're in OCI-mode)
M  includes/configure.php (added a switch to work on live and development server)
A  includes/modules/oci/oci.php (Main functionality wrapped in a class)
A  includes/modules/oci/ociParameters.php (Configuration with additional parameters for SRM)
A  includes/modules/oci/logger.php (Own simple logger to see what happens during test calls)
A  oci_out.php (exit sending real hidden html form onLoad via JavaScript, redirecting to SRM)
M  admin/includes/configure.php (added a switch to work on live and development server)
A  oci.php (entry point for calling Punch-Out incl authentication)
M  checkout_confirmation.php (added optical info, we're in OCI-mode)
M  checkout_process.php (redirect to exit oci_out.php instead to successpage after checkout)
M  templates/EyeCandy/module/checkout_shipping.html (added optical info, we're in OCI-mode)
M  templates/EyeCandy/module/checkout_confirmation.html (added optical info, we're in OCI-mode)
M  templates/EyeCandy/module/checkout_success.html (added optical info, we're in OCI-mode)

OCI mechanics work like this:

At first the purchase department selects a supplier in their SRM. As soon as the order agent starts shopping, SRM performs a ‘Punch-Out’, which opens a browser window with a URL like this one:


This call goes to your /oci.php. It needs to do the following:

  • It checks if there is a customer/user in your eShop that can be authenticated with the userid, password and Z_EMAIL parameters. If so /oci.php logs the user in to the eShop. Of course you have created the customer maually before. Send the SRM owner the values of the parameters for their punch-out and test if auto-login works via punch-out.
  • /oci.php must remember the value of the parameter HOOK_URL in the session. It is only valid for this purchase and the contents of the shopping cart must be posted back to this URL – it is your value for your form’s action attribute. Remember: It can NOT be done via cURL.

The user is now navigating your eShop and fills his shopping cart and proceeds to the checkout and clicks the final checkout button. The purchase is saved in the shopping application by your eShop system. Instead of redirection to the shop-success page a hidden form is created (see OCI documentation for fieldnames and fixed field values necessary) with the shopping information for the SRM and it is sumbitted immedately by javascript. You need to intercept the redirection to the success page and redirect to your /oci_out.php instead providing this form. The action url has been handed over via the HOOK_URL parameter on punch-out.

The hidden form looks like this:

<!– OCI autosubmit –>
<script type=”text/javascript” language=”javascript”>

<form id=”srm_form” action=”https://testsystem.srm.customer.com/sap/bc/nwbc/srm/?sap-nwbc-action_url=%2fsap%2fbc%2fwebdynpro%2fsapsrm%2fwda_l_fpm_gaf%3bsap-ext-sid%3dKN2O27IP8fdX0000mAWtxW–KN2O2NIP8fdX0000mAWtxW–%3fsap-ep-tstamp%3d20130419104715%26sap-wd-tstamp%3d20130419104715%26sap-ep-version%3d7&NavigationTarget=portal_content/nwbc_back&NavMode=3&UsePost=True&SAPSRM_RESUME_ID=SAPSRM_OCI&sap-nwbc-has_post_params=X&sap-client=010&sap-language=DE&sap-nwbc-node=resume_appl” method=”POST”><code> <input name=”NEW_ITEM-LINE[1]” type=”hidden” value=”00001″ />
<input name=”NEW_ITEM-DESCRIPTION[1]” type=”hidden” value=”Herren Fleece-Jacke Langley Rot” />
<input name=”NEW_ITEM-LONGTEXT_1:132[]” type=”hidden” value=”Herren Fleece-Jacke Langley Rot” />
<input name=”NEW_ITEM-QUANTITY[1]” type=”hidden” value=”1″ />
<input name=”NEW_ITEM-UNIT[1]” type=”hidden” value=”C62″ />
<input name=”NEW_ITEM-PRICE[1]” type=”hidden” value=”41.87″ />
<input name=”NEW_ITEM-CURRENCY[1]” type=”hidden” value=”EUR” />
<input name=”NEW_ITEM-PRICEUNIT[1]” type=”hidden” value=”1″ />
<input name=”NEW_ITEM-LEADTIME[1]” type=”hidden” value=”21″ />
<input name=”NEW_ITEM-VENDOR[1]” type=”hidden” value=”16222″ />
<input name=”NEW_ITEM-VENDORMAT[1]” type=”hidden” value=”287{2}5{1}23{5}17″ />
<input name=”NEW_ITEM-MATGROUP[1]” type=”hidden” value=”41010000″ />
<input name=”NEW_ITEM-LINE[2]” type=”hidden” value=”00002″ />
<input name=”NEW_ITEM-DESCRIPTION[2]” type=”hidden” value=”Pique Polo Safran” />
<input name=”NEW_ITEM-LONGTEXT_2:132[]” type=”hidden” value=”Pique Polo Safran” />
<input name=”NEW_ITEM-QUANTITY[2]” type=”hidden” value=”1″ />
<input name=”NEW_ITEM-UNIT[2]” type=”hidden” value=”C62″ />
<input name=”NEW_ITEM-PRICE[2]” type=”hidden” value=”16.95″ />
<input name=”NEW_ITEM-CURRENCY[2]” type=”hidden” value=”EUR” />
<input name=”NEW_ITEM-PRICEUNIT[2]” type=”hidden” value=”1″ />
<input name=”NEW_ITEM-LEADTIME[2]” type=”hidden” value=”21″ />
<input name=”NEW_ITEM-VENDOR[2]” type=”hidden” value=”16222″ />
<input name=”NEW_ITEM-VENDORMAT[2]” type=”hidden” value=”112{2}4{1}1{5}16″ />
<input name=”NEW_ITEM-MATGROUP[2]” type=”hidden” value=”41010000″ />
<input id=”srm_form_submit” style=”display: none;” type=”submit” value=”Go” />

After this form has been submitted, SRM generates another page also including a hidden form, additional internal information and lots of JavaScript logic also with auto-submit. Please note: a) The array counter must start at 1 not at 0. And b) note the syntax of the longtext field.

The page is generated and ‘executed’ on the server side of the SRM system! This is the reason why it could not be done via cURL: The action to the HOOK_URL opens a page on the SRM server submitting another (!) form which finally writes the purchase to SRM – not your initial POST. It also closes the browser window for the order agent. If SRM can not close the browser window something went wrong.

Migrate Email via IMAP using imapsync

This time I had to migrate not just a domain, but a whole bunch of existing imap-email accounts with it.

To do that, just create all email accounts on the new machine and have a list with all credentials ready. For the hard work I used a very good Perl tool – imapsync.

Very cool with imapsync is, that it actually syncs from the old server. So you do not have to deal with double email or lost mail. See also imapsync description here.

Receipe for Debian Linux:

Download imapsync from https://fedorahosted.org/released/imapsync/:

$ wget https://fedorahosted.org/released/imapsync/imapsync-1.525.tgz

Unzip and cdir into the unzipped directory.

Check your perl installation with:

$ perl -c imapsync

If you see a “Can’t locate Mail/IMAPClient.pm in @INC…” you need to install the IMAPClient:

# cpan Mail::IMAPClient

If you see “imapsync syntax OK”, you already have the required modules.

Migrate one emailbox:

$ ./imapsync --host1 imap.1und1.de --host2 imap.man.ticore.it --port1 993 --port2 993 --ssl1 --ssl2 --user1 testuser@migration-domain.xx --user2 new_username_on_target_system --password1 xxx --password2 yyy

If you get a “Can’t locate IO/Socket/SSL.pm in @INC…” error, this might fix the problem:

# apt-get install libio-socket-ssl-perl

Retry migration:

$ ./imapsync --host1 imap.1und1.de --host2 imap.man.ticore.it --port1 993 --port2 993 --ssl1 --ssl2 --user1 testuser@migration-domain.xx --user2 new_username_on_target_system --password1 xxx --password2 yyy

If output says “Info: host imap.man.ticore.it says it has NO CAPABILITY for AUTHENTICATE LOGIN” just add ‘–authmech2 PLAIN’

Retry migration:

$ ./imapsync --host1 imap.1und1.de --host2 imap.man.ticore.it --port1 993 --port2 993 --ssl1 --ssl2 --user1 testuser@migration-domain.xx --user2 new_username_on_target_system --password1 xxx --password2 yyy --authmech2 PLAIN

Now you have test-migrated one email account. If you have more, which is likely, takle a look at imapsync-1.525/examples. You find 2 important files:

  • file.txt – contains a list of accounts with the self-explanatory line-format user001_1;password001_1;user001_2;password001_2
  • sync_loop_unix.sh – mass-migrates the accounts from the list. Pleas note: You must set the two imap-hostnames manually in this file! Check the directions in the file. Use via
    $ ./sync_loop_unix.sh

Have fun syncing your IMAP stuff around!

Migrate arbitrary Content to WordPress – e.g. from typo3 tt_content

If you see yourself confronted with the following task, here is a quick receipe: Setup a wordpress instance for us and import all our content from system x. In this case we focus on the import part. ‘System x’ was a somewhat customized typo3 setup. I got myself an ordinatry editors login to identify the articles which had to be migrated. After some fulltext searching and access to phpMyAdmin I identified the area, where I hat to extract the content from. It boiled down to the table tt_content. At this point you can forget about typo3. All articles could be extracted with this SQL query:

SELECT uid, pid, header, bodytext FROM tt_content WHERE header<>'' AND CType='editorialcontent';

Export the content using phpMyAdmin’s export to PHP array functionality. You end up with a file like shown in the screenshot. If you have HTML-content and doubt, if this could be valid php, save the file under export.php and do ‘$ php -l export.php’ – it says ‘No syntax errors.’ – Good.

OK, next you need to get familiar with wp_insert_post(). You need to loop your exported content over it. And that is it. This is how I did it:

  • Create a directory in your wordpress setup ‘wp-import’.
  • Copy/move your export.php array into this directory.
  • And execute the following self-explanatory file located also in your wp-import directory: import.php

That’s it.

Use GROUP BY to concatenate strings in MySQL?

I just had the case with mySQL where I needed to aggregate via GROUP BY but at the same time concatenate the content in the body-field.

The test result (see image) came using:

SELECT m.template_id, m.site, m.path, m.name, tf.body FROM menu AS m
LEFT JOIN template_fields AS tf
ON tf.template_id=m.template_id
WHERE m.type='template' AND m.inactive=0

The solution to get the concatenated content:

SELECT m.template_id, m.site, m.path, m.name, GROUP_CONCAT(tf.body SEPARATOR ' ') AS body FROM menu AS m
LEFT JOIN template_fields AS tf
ON tf.template_id=m.template_id
WHERE m.type='template' AND m.inactive=0 GROUP BY m.template_id

Fraudulent emails trying to get IPhone without paying

It seems, that being upset is a good motivator wrinting blogposts… Sorry for being lazy for such a long time.

Currently I am selling lots of ‘old stuff’ that has not being used for 2 years, among other things also my iPhone 3GS advertising it to local boards such as www.quoka.de. In short it goes like this:

  • Viallain askes if the IPhone is still available. I answer ‘yes, come bring cash and pick it up’.
  • Villain: Can you send it, I pay for the shipping. Me: OK that’s x bucks extra. Fine.
  • Villain: Please send me your interantional bank-account number, which I did, because it is as safe as giving out your local bank account number and any debit actions can be objected to for 6 weeks at your bank.
  • And now it gets weird: I receive a fake bank email (see below) saying, the money is in transit. By the way it had a BIC from another account – copy&paste error. Villain is also lazy and probably a bit disorganized and stupid too. I can get the money as soon as I send the package to an address in Italy (see map and Streetviev of the house below – probably the villain’s house, not sure but likely) and provide tracing number of the package. At this point it would be fun to return a fake one or a ticking package or some of the other 100 jokes jumping to my mind at this moment ;), but I have currently no time to do that.
  • If you disagree with this procedure, villain gets angry. Maybe this works in some % of the cases.
  • End of communication.

Here are some screens of emails. Have fun:

First contact email

Not from the bank

Note from the bank 2

Big view

This is the address in Streetview.

Email addreses I got spammed so far:

  • talk2peace2016@yahoo.com
  • gina_sigrid@yahoo.de
  • bostinsley@yahoo.de
  • bostinslley@yahoo.de
  • surely more to come…

Update: There is a webinterface, see http://abuse.yahoo.com or forward phishing-email to phishing@cc.yahoo-inc.com.

Attachment: mail-content-for-search-engines

Next steps: Informed local commissario di San Martino, Yahoo abuse dept. and Nordea Bank – have fun getting busted, loser!

Update: Received a cheque for my pinball table (Dirty Harry, abvertised for 1500EURs). Rest was meant to send container with ‘some other things’ to Urkraine, I should pay for the shipping with the cheque. This is it. Find the faults:

Cheque sent via mail

Awesomely entertaining shit!

Some more useful linux commands

Check machine for open ports, running services, guess OS:
$ sudo nmap -O man.ticore.it

Check applictaion running:
$ nc 8983
And the type GET (+enter) to get a response.

Unzip multiple gzipped SQL-files and stream them directly into your mysql database:
$ gzip -cd db_123.sql.gz  db_123_[234567].sql.gz | mysql -u<db_user> -p<db_pass> <db_name>

List each dir-size in human readable format to find the big files:
$ du -hsc my-packed-webspace.com/

List each file greater 10MB recursively:
$ find . -size +10000k -exec du -h {} ;

Set a yourself a userfriendly commandline editor before you do things like $ corontab -e:
$ export EDITOR=/usr/bin/nano

When copying or (un)zipping large files, instead of re-emitting commands to see progress, use watch and relax:
$ watch -d -n 5 ls -lh

List files without user:group and size:
$ find lsis/sources/bionity/2013/03/28/ -type f -print

Stuff I need to look-up every time

You lost your leading zeros of your varchars importing/exporting your data. You have this example data in your table storagebins:

area = L, row = 1, rack = 2, shelf = 3
and it is supposed to look like this
area = L, row = 001, rack = 02, shelf = 03

Use this SQL to correct this:

UPDATE storagebins SET row=LPAD(row, 3, '0');
UPDATE storagebins SET rack=LPAD(rack, 2, '0');
UPDATE storagebins SET shelf=LPAD(shelf, 2, '0');