Category Archives: Linux

Extract All Email Addresses from Outlook

I was looking for a solution to extract a list of all emails I have ever used or emailed through outlook. It might be helpful for you too. This what I ended up doing:

a) Export email and contacts into an olm file (Mac). I was using Outlook 2011. This is a binary compressed format. I selected Email and Contacts.

b) Use StuffItExpander to extract the olm to a readable XML structure. Just install StuffItExpander and drag the olm file on to it. You end up with a parsable directory structure including all files in xml format.

c) I could not find a solution to recursively parse the mess, so I decided to merge all xml files form all subdirectories into one big file:

$ find /path/to/directory/ -name *.xml -print0 | xargs -0 -I file cat file > merged.file

d) Extract all email addresses from the merged file into a file:

$ perl -wne'while(/[\w\.\-]+@[\w\.\-]+\w+/g){print "$&\n"}' merged.file | sort -u > output.txt

e) You will be surprised how many lines this file will have. Check the output.txt containing all unique extracted email addresses from outlook. The list needs to be cleaned. There will be a lot of invalid or temporary emails you need to go through manually.

Have fun with your list.

Useful Linux Commands 01/2014

a) Bulk rename on the commandline. I needed this one to re-import bulk files for a BI database. All already processed files get a prefix ‘proc_’ in order to know, which files have already been imported into the BI database. Use http://tips.webdesign10.com/how-to-bulk-rename-files-in-linux-in-the-terminal

TEST YOUR EXPRESSION:
$ rename -n ‘s/^proc_order_list_(201312d{2}-d{6}-[A-Z]{3}-[A-Z]{2}).csv$/order_list_$1.csv/’ *.csv

proc_order_list_20131211-025914-AMZ-EU.csv renamed as order_list_20131211-025914-AMZ-EU.csv
proc_order_list_20131211-031130-ENG-DE.csv renamed as order_list_20131211-031130-ENG-DE.csv

DO THE ACTUAL RENAMING:
$ rename ‘s/^proc_order_list_(201312d{2}-d{6}-[A-Z]{3}-[A-Z]{2}).csv$/order_list_$1.csv/’ *.csv

There is a second level of ‘argument list too long’. If you touch it, you need a bash-script like this:

#!/bin/bash

find in/ -type f |
  while read old
  do
  	new=`echo $old | sed "s/proc_//g"`
   	if [ ! -f $new ]; then
	  echo $old '->' $new
	  mv $old $new 
	fi
  done

Or more selectively using a filename pattern:

#!/bin/bash

valid='proc_201403.*'

find in/ -type f |
  while read old
  do
        new=`echo $old | sed "s/proc_//g"`
        if [ ! -f $new ]; then
          if [ [ $old =~ $valid ] ]; then
            echo $old '->' $new
            mv $old $new
          #else
            #echo 'not matched' valid
          fi
        fi
  done

b) Output results from SQL query to file, the quick way – in case you have been using phpMyAdmin for this ;):
$ mysql -u -p -D -e “SELECT …” > quick_result.txt

c) Find directories with count of subdirectories or files (had to use this in order to find cluttered directories that caused problems with a server having software RAID and rsync backups):

$ find . -type d | cut -d/ -f 2 | uniq -c | sort -g

d) Prevent cp/mv/rm – cannot execute [Argument list too long] using find to copy a long list of files when you have long filelists in directories from which you would like to copy/move/remove:
$ cd /var/www/project/incoming/staging;
$ find ../production/data/sources/orderstatus/in/ -name ‘*.xml’ -exec cp {} data/sources/orderstatus/in/ ;

e) Copy all files from multiple backup sub-directories (structure like this 1122/3344/112233445566.xml) into ONE directory:
$ find ./dumpdirs/11* -name “*.xml” -type f -exec cp {} ./flatToFolder ;

f) Count all files in subdirectories with the pattern proc.*.xml:
$ find in/ -name “proc_*.xml” | wc -l

g) Filelist too long using tar and wildcards, use a filelist:
$ find in/ -name ‘*.xml’ -print > tarfile.list.txt
$ tar -cjvf evelopmentOrderstati-20140306.tar.bz2 in/*.xml
$ rm tarfile.list.txt

h) Filelist too long using grep:
Problem:
$ grep -r “4384940″ *
-bash: /bin/grep: Argument list too long
Too many files in your directory

Check:
$ ls -1 | wc -l
256930

Solution:
$ find . -type f | xargs grep “4384940″

Another way to avoid this problem is to substitute the “*” with a “.”:
$ grep -r “4384940″ .

Migrate Email via IMAP using imapsync

This time I had to migrate not just a domain, but a whole bunch of existing imap-email accounts with it.

To do that, just create all email accounts on the new machine and have a list with all credentials ready. For the hard work I used a very good Perl tool – imapsync.

Very cool with imapsync is, that it actually syncs from the old server. So you do not have to deal with double email or lost mail. See also imapsync description here.

Receipe for Debian Linux:

Download imapsync from https://fedorahosted.org/released/imapsync/:

$ wget https://fedorahosted.org/released/imapsync/imapsync-1.525.tgz

Unzip and cdir into the unzipped directory.

Check your perl installation with:

$ perl -c imapsync

If you see a “Can’t locate Mail/IMAPClient.pm in @INC…” you need to install the IMAPClient:

# cpan Mail::IMAPClient

If you see “imapsync syntax OK”, you already have the required modules.

Migrate one emailbox:

$ ./imapsync --host1 imap.1und1.de --host2 imap.man.ticore.it --port1 993 --port2 993 --ssl1 --ssl2 --user1 testuser@migration-domain.xx --user2 new_username_on_target_system --password1 xxx --password2 yyy

If you get a “Can’t locate IO/Socket/SSL.pm in @INC…” error, this might fix the problem:

# apt-get install libio-socket-ssl-perl

Retry migration:

$ ./imapsync --host1 imap.1und1.de --host2 imap.man.ticore.it --port1 993 --port2 993 --ssl1 --ssl2 --user1 testuser@migration-domain.xx --user2 new_username_on_target_system --password1 xxx --password2 yyy

If output says “Info: host imap.man.ticore.it says it has NO CAPABILITY for AUTHENTICATE LOGIN” just add ‘–authmech2 PLAIN’

Retry migration:

$ ./imapsync --host1 imap.1und1.de --host2 imap.man.ticore.it --port1 993 --port2 993 --ssl1 --ssl2 --user1 testuser@migration-domain.xx --user2 new_username_on_target_system --password1 xxx --password2 yyy --authmech2 PLAIN

Now you have test-migrated one email account. If you have more, which is likely, takle a look at imapsync-1.525/examples. You find 2 important files:

  • file.txt – contains a list of accounts with the self-explanatory line-format user001_1;password001_1;user001_2;password001_2
  • sync_loop_unix.sh – mass-migrates the accounts from the list. Pleas note: You must set the two imap-hostnames manually in this file! Check the directions in the file. Use via
    $ ./sync_loop_unix.sh

Have fun syncing your IMAP stuff around!

Some more useful linux commands

Check machine for open ports, running services, guess OS:
$ sudo nmap -O man.ticore.it

Check applictaion running:
$ nc 123.123.123.123 8983
And the type GET (+enter) to get a response.

Unzip multiple gzipped SQL-files and stream them directly into your mysql database:
$ gzip -cd db_123.sql.gz  db_123_[234567].sql.gz | mysql -u<db_user> -p<db_pass> <db_name>

List each dir-size in human readable format to find the big files:
$ du -hsc my-packed-webspace.com/

List each file greater 10MB recursively:
$ find . -size +10000k -exec du -h {} ;

Set a yourself a userfriendly commandline editor before you do things like $ corontab -e:
$ export EDITOR=/usr/bin/nano

When copying or (un)zipping large files, instead of re-emitting commands to see progress, use watch and relax:
$ watch -d -n 5 ls -lh

List files without user:group and size:
$ find lsis/sources/bionity/2013/03/28/ -type f -print

Emulate sendmail on your Dev Machine

In order to prevent email delivery during development and log all email messages that would have been delivered, you can actually do a simple trick: Replace the file /usr/sbin/sendmail (on Ubuntu, use ‘locate sendmail’ to find it if it lies elsewhere) with this little shell-script, or rather make a _bak of the original and save the following instead of the sendmail binary:

#!/bin/bash

LOGDIR="/tmp"
PREFIX="sendmail"
NOW=$(date +%Y-%m-%dT%H.%M.%S)
CNT=1
PRIVATELOG="$LOGDIR/$PREFIX-$NOW.$CNT.log"
COMBINEDLOG="$LOGDIR/$PREFIX-combined.log"

# If privatelogs are being used...
if [ ! -z "$PRIVATELOG" ]; then
# ...make sure the filename is unique and create the file
while [ -f $PRIVATELOG ]; do
CNT=$(($CNT + 1))
PRIVATELOG="$LOGDIR/$PREFIX-$NOW.$CNT.log"
done

echo "$0 $*" > $PRIVATELOG
else
# ...otherwise swap filenames
PRIVATELOG=$COMBINEDLOG
COMBINEDLOG=''
fi

echo "[$NOW]" >> $PRIVATELOG
while read BUF
do
echo $BUF >> $PRIVATELOG
done

# Append privatelog to combinedlog when both logs are used
if [ ! -z "$COMBINEDLOG" ]; then
echo "[$NOW]" >> $COMBINEDLOG
cat $PRIVATELOG >> $COMBINEDLOG
fi

exit 0

When your application now sends mail, these things happen:

  • No email is actually sent.
  • The message gets appended to the file /tmp/sendmail-combined.log, on which you could set a ‘tail -f’ in order to see which emails would have been sent and what contet they would have.
  • One new file (e.g. /tmp/sendmail-2011-02-08T08.02.48.1.log) gets written for every email sent. I personally only use the combined file.

Inspired by http://stackoverflow.com/questions/3710864/simulating-sendmail-with-dummy-script

Clean your PHP4 Legacies using sed

If you have to deal with very old PHP4 legacy code containing every syntax crime you may know from the early years, how would you handle it? Give it to your junior people to fix it manually? I like to have at least some handy helpers for the first rough corrections. I found sed to be a very powerful helper here.

Code you might encounter - associative array elements without quotes.

I spent quite some time to find useful regular expressions to help me. This is how I did this:

A way to test your regular-expression is to echo a sample and apply your regex to test the results:

echo '$_REQUEST[action] reise_l_topic_ids[] $dat[ticket_order]' | sed "s/$([a-zA-Z0-9_]+)[([a-zA-Z0-9_]+)]/$1['2']/g"

Once it works you can apply your regex to one file:

sed -i "s/$([a-zA-Z0-9_]+)[([a-zA-Z0-9_]+)]/$1['2']/g" my_old_file.php

Or apply your regex to all *.php-files recursively below the current directory to a whole project:

find . -name "*.php" -exec sed -i "s/$([a-zA-Z0-9_]+)[([a-zA-Z0-9_]+)]/$1['2']/g" '{}' ;

By the way: The usage of sed works fine on your linux command line, but not on OSX. The syntax is slightly different here (sed -i “” -e “s/blah/blubb/” file). This is of course only a start to automate otherwise painfull and boring corrections down to just a few seconds. It will not save you from special manual work and break syntax at some points. But it weeds out 90% and leaves you with the other 10% acutal manual work.

You could imagine many more sed regexes e.g. to replace short tags <?=$my_var?> to a proper <?php echo $my_var; ?> etc.

echo '<?=$out?> sakdhs sakdhas k <?php echo $xyz; ?> ddd' | sed "s/<?=/<?php echo /g";

I will collect more regexes as I need and find them. If you have ideas please add them in the comment section.

Stuff I need to lookup every time

Set ignore to all files of a directory with subversion:
$ cd cache
$ svn propset svn:ignore '*' .
$ svn ci . -m 'Ignore set on cache dir.'

Show changed files between two revisions, overview
$ svn diff -r 300:HEAD --summarize

Show changed files between two revisions, for each revision:
$ svn log -v -r 300:304

See overall latest 20 commit-messages:
$ svn log -l 20

Branching and merging:
See: http://blog.evanweaver.com/2007/08/15/svn-branching-best-practices-in-practice/

Only grep in php source files, not jpgs, movies etc.:
$ grep -i 'whatever' `find . -name '*.php' -print`

Add all new files in a large filestructure to subversion, like after an update of vendors in Symfony2
svn st | grep "^?" | awk "{print $2}" | xargs svn add $1

Remove all deleted files from a large filestructure from subversion, like after a vendors update in Symfony2
svn st | grep '^!' | awk '{print $2}' | xargs svn delete --force

Setup external libraries in subversion:
svn mkdir ZFVersions;
svn add ZFVersions;
svn ci ZFVersions -m 'Added dir for all versions.';
cd ZFVersions;
svn mkdir 1.11;
svn add 1.11;
svn ci 1.11 -m 'Added version subdir.'
cd 1.11;
svn propset svn:externals 'Zend http://framework.zend.com/svn/framework/standard/tags/release-1.11.0/library/Zend' .; # This will checkout in another dir Zend into your dir 1.11. You need this since autoloading is using paths like this require_once(Zend/Feed/Rss.class.php)!!
svn commit -m 'Set external.';
svn up .; # Loads external lib.

Correct date problems in a mysql database – 2020 instead of 2011 in YYYY-MM-DD dates:
UPDATE accounts SET member_startdate = CONCAT('2010', '-', MONTH(member_startdate), '-', DAYOFMONTH(member_startdate)) WHERE YEAR(member_startdate) > 2011;

What is it you guys always look up?

Useful Linux Commands 12/2009

Recursively remove all .svn directories from a working copy:

find . -name .svn -exec rm -rf {} ;

Recursively remove all ._xyz-files (OSX meta file info) from your WebDav-Drive, set via hourly cron:

find /var/data/ -name "._*" -exec rm {} ;

Do not forget to set your path ;).

Check for syntax-errors (lint) in all php-files of current directory and only echo error messages if errors have been detected:

find . -name "*.php" -exec php -l {} ; | grep -v 'No syntax errors'

Consistent Development Environments using VirtualMachines

As a development team we always run into situations where we have trouble setting up a proper development environment for each of the team members to get going or add new staff on the go. It annoyed me every time since it causes a lot of unnecessary communication and friction.

I often heard of virtualization but never actually played seriously with it. The idea is:

If we could have a virtual machine for every project that contains an equivalent environment like the production system, everybody working on it…

  • … could just rely on his development environment by just starting the VM without having to set up anything half-baked themselves.
  • … could use his favourite working environment OS, IDE and tools on which they are most comfortable and thus happy and productive.
  • … could work on their own checked out working copy using version control.
  • … could immedately see what they built refreshing the local browser or starting Unittests on the VM via ssh to check their dev increments.

We used http://www.virtualbox.org. A good starting point to get to know VirtualBox better and learn how to start your first virtual machine: https://help.ubuntu.com/community/VirtualBox

Our target was to be able to startup the development VM as guest system on any developers development machine being the host system, open a browser on the host (!) and call for example http://develop/ to see the webroot of the VM. Additionally we set up samba and ssh on the VM in order to have the webserver’s webroot on the VM available via the filesystem. In order to do that you need to…

  • …start your VM with networking set to ‘Host interface’ instead of the default NAT. This is explained in detail on this page (sorry German) http://www.nwlab.net/tutorials/virtualbox/virtual-networking.html – for me it was tricky to get the guest machine available on the host and have internet access at the same time.
  • …edit the hosts file (on OSX ‘sudo nano /private/etc/hosts’ and reboot) on the development machines and add something like the following line: ’192.168.56.101 develop’. To find out the IP enter ‘sudo ifconfig’ (OSX/Linux) on your host system after you have started the VM. You will see aditional adapters set by virtualbox and the IP address.
  • …configure /etc/samba/smb.conf on the VM, restart samba and connect (e.g. smb://develop/webroot). We check out a working copy of the applicaton under development directly onto the webroot and create a new PHP-Project there in our IDE. Update and commit directly from there.

If you search the web for ready-made VMs you find mostly VMware images. You can not run them directly in VirtualBox and need to convert them.

Under Linux you can convert the VMware image (.vmdk) into a VirtualBox image (.vdi) like this:

sudo apt-get install virtualbox-ose virtualbox-ose-guest-utils;
sudo apt-get install qemu;

qemu-img convert xxx.vmdk xxx.bin;
VBoxManage convertdd xxx.bin xxx.vdi

Install the required packages with apt-get install once. VBoxManage is part of VirtualBox. The last two lines do the conversion.


We ended up creating a fresh install from a Debian 5 netinstall iso. The iso-file can be mounted as CDROM on the creation of the new VM with VirtualBox. Receipes for setting up the appropriate LAMP environment with apt-get install can be found on the web. You only have to do it once. Save the state of your VM afterwards.


There are ways to generate a virtual machine from a physical server. Use Google to find receipes. I used http://www.partimage.org on a Debian Etch system with the live CDROM from http://www.sysresccd.org. This requires that you are able to umount your filesystem or rather boot into the live cd on the production/staging machine in order to generate the partimage.

You mount an external drive over the network or a usb harddrive. The partition (e.g. /dev/sda1) you would like to backup must be umounted. From the live cd you can see your partitions, including attached usb drives, with the ‘fdisk -l‘ command. Just mount the target (e.g. /mnt/usbdrive) and start partimage from the commandline. Dialogues guide you through the image creation.


In case you wonder what is meant by the ‘Host key’ to enter or leave a running VM with your mouse… it is the right (!) Strg-Button on your Keyboard.


I just installed Ubuntu Server on a VM from my MacBook. To have a usable keyboard once you logged on to the new VM, you must do the following in order to have a keymap including the pipe symbol, braces etc.:

  • sudo apt-get install console-data; #to install the keymaps
  • sudo loadkeys mac-macbook-de; #to set the keymap for German MacBook

Once you have done that, you can use your right Command-Key as ‘Alt-Gr-Key’ like on a PC keyboard. The pipe symbol is then typable with ‘Alt-Gr + >’, Braces and Brackets are typable via ‘Alt-Gr + 6,7,8,9′.


This is how you copy a virtual machine using VirtualBox tools:

$ VBoxManage clonevdi /Users/marco/MyMachine.vdi /Users/marco/MyMachine_copy.vdi
$ VBoxManage internalcommands setvdiuuid /Users/marco/MyMachine_copy.vdi

Useful Linux Commands 04/2009

I had a list of files from a large file structure as a result from a maintenance script run with lines like this:

/home/web/.../sources/.../2008/12/25/4f1feabbd76f79ecab150bdee3f6ae4d.xml
/home/web/.../sources/.../2008/12/25/e506e433a2d87f0275c7641da59bbf7f.xml
/home/web/.../sources/.../2008/12/28/901c4f081645b986e9b1377d3f586b8e.xml
/home/web/.../sources/.../2008/12/28/6bec4d4bbcf8f596c40694210d220a3b.xml
/home/web/.../sources/.../2008/12/24/477c535d6111605c8f6020a959f32fde.xml
/home/web/.../sources/.../2008/12/24/9f253a96fc26d8f6d9e61b8f1bdb3453.xml

Each line represented a document path to a file which was supposed to be removed from the filesystem. You can do that with the following simple oneliner:

for LINE in $( cat ../log/my_empty_files.txt ) ; do rm $LINE ; done

You can try it with ‘echo’ instead of ‘rm’ first to see if it would work:

for LINE in $( cat ../log/my_empty_files.txt ) ; do echo " # $LINE" ; done