Tag Archives: Linux

Extract All Email Addresses from Outlook

I was looking for a solution to extract a list of all emails I have ever used or emailed through outlook. It might be helpful for you too. This what I ended up doing:

a) Export email and contacts into an olm file (Mac). I was using Outlook 2011. This is a binary compressed format. I selected Email and Contacts.

b) Use StuffItExpander to extract the olm to a readable XML structure. Just install StuffItExpander and drag the olm file on to it. You end up with a parsable directory structure including all files in xml format.

c) I could not find a solution to recursively parse the mess, so I decided to merge all xml files form all subdirectories into one big file:

$ find /path/to/directory/ -name *.xml -print0 | xargs -0 -I file cat file > merged.file

d) Extract all email addresses from the merged file into a file:

$ perl -wne'while(/[\w\.\-]+@[\w\.\-]+\w+/g){print "$&\n"}' merged.file | sort -u > output.txt

e) You will be surprised how many lines this file will have. Check the output.txt containing all unique extracted email addresses from outlook. The list needs to be cleaned. There will be a lot of invalid or temporary emails you need to go through manually.

Have fun with your list.

Useful Linux Commands 01/2014

a) Bulk rename on the commandline. I needed this one to re-import bulk files for a BI database. All already processed files get a prefix ‘proc_’ in order to know, which files have already been imported into the BI database. Use http://tips.webdesign10.com/how-to-bulk-rename-files-in-linux-in-the-terminal

TEST YOUR EXPRESSION:
$ rename -n ‘s/^proc_order_list_(201312d{2}-d{6}-[A-Z]{3}-[A-Z]{2}).csv$/order_list_$1.csv/’ *.csv

proc_order_list_20131211-025914-AMZ-EU.csv renamed as order_list_20131211-025914-AMZ-EU.csv
proc_order_list_20131211-031130-ENG-DE.csv renamed as order_list_20131211-031130-ENG-DE.csv

DO THE ACTUAL RENAMING:
$ rename ‘s/^proc_order_list_(201312d{2}-d{6}-[A-Z]{3}-[A-Z]{2}).csv$/order_list_$1.csv/’ *.csv

There is a second level of ‘argument list too long’. If you touch it, you need a bash-script like this:

#!/bin/bash

find in/ -type f |
  while read old
  do
  	new=`echo $old | sed "s/proc_//g"`
   	if [ ! -f $new ]; then
	  echo $old '->' $new
	  mv $old $new 
	fi
  done

Or more selectively using a filename pattern:

#!/bin/bash

valid='proc_201403.*'

find in/ -type f |
  while read old
  do
        new=`echo $old | sed "s/proc_//g"`
        if [ ! -f $new ]; then
          if [ [ $old =~ $valid ] ]; then
            echo $old '->' $new
            mv $old $new
          #else
            #echo 'not matched' valid
          fi
        fi
  done

b) Output results from SQL query to file, the quick way – in case you have been using phpMyAdmin for this ;):
$ mysql -u -p -D -e “SELECT …” > quick_result.txt

c) Find directories with count of subdirectories or files (had to use this in order to find cluttered directories that caused problems with a server having software RAID and rsync backups):

$ find . -type d | cut -d/ -f 2 | uniq -c | sort -g

d) Prevent cp/mv/rm – cannot execute [Argument list too long] using find to copy a long list of files when you have long filelists in directories from which you would like to copy/move/remove:
$ cd /var/www/project/incoming/staging;
$ find ../production/data/sources/orderstatus/in/ -name ‘*.xml’ -exec cp {} data/sources/orderstatus/in/ ;

e) Copy all files from multiple backup sub-directories (structure like this 1122/3344/112233445566.xml) into ONE directory:
$ find ./dumpdirs/11* -name “*.xml” -type f -exec cp {} ./flatToFolder ;

f) Count all files in subdirectories with the pattern proc.*.xml:
$ find in/ -name “proc_*.xml” | wc -l

g) Filelist too long using tar and wildcards, use a filelist:
$ find in/ -name ‘*.xml’ -print > tarfile.list.txt
$ tar -cjvf evelopmentOrderstati-20140306.tar.bz2 in/*.xml
$ rm tarfile.list.txt

h) Filelist too long using grep:
Problem:
$ grep -r “4384940″ *
-bash: /bin/grep: Argument list too long
Too many files in your directory

Check:
$ ls -1 | wc -l
256930

Solution:
$ find . -type f | xargs grep “4384940″

Another way to avoid this problem is to substitute the “*” with a “.”:
$ grep -r “4384940″ .

Migrate Email via IMAP using imapsync

This time I had to migrate not just a domain, but a whole bunch of existing imap-email accounts with it.

To do that, just create all email accounts on the new machine and have a list with all credentials ready. For the hard work I used a very good Perl tool – imapsync.

Very cool with imapsync is, that it actually syncs from the old server. So you do not have to deal with double email or lost mail. See also imapsync description here.

Receipe for Debian Linux:

Download imapsync from https://fedorahosted.org/released/imapsync/:

$ wget https://fedorahosted.org/released/imapsync/imapsync-1.525.tgz

Unzip and cdir into the unzipped directory.

Check your perl installation with:

$ perl -c imapsync

If you see a “Can’t locate Mail/IMAPClient.pm in @INC…” you need to install the IMAPClient:

# cpan Mail::IMAPClient

If you see “imapsync syntax OK”, you already have the required modules.

Migrate one emailbox:

$ ./imapsync --host1 imap.1und1.de --host2 imap.man.ticore.it --port1 993 --port2 993 --ssl1 --ssl2 --user1 testuser@migration-domain.xx --user2 new_username_on_target_system --password1 xxx --password2 yyy

If you get a “Can’t locate IO/Socket/SSL.pm in @INC…” error, this might fix the problem:

# apt-get install libio-socket-ssl-perl

Retry migration:

$ ./imapsync --host1 imap.1und1.de --host2 imap.man.ticore.it --port1 993 --port2 993 --ssl1 --ssl2 --user1 testuser@migration-domain.xx --user2 new_username_on_target_system --password1 xxx --password2 yyy

If output says “Info: host imap.man.ticore.it says it has NO CAPABILITY for AUTHENTICATE LOGIN” just add ‘–authmech2 PLAIN’

Retry migration:

$ ./imapsync --host1 imap.1und1.de --host2 imap.man.ticore.it --port1 993 --port2 993 --ssl1 --ssl2 --user1 testuser@migration-domain.xx --user2 new_username_on_target_system --password1 xxx --password2 yyy --authmech2 PLAIN

Now you have test-migrated one email account. If you have more, which is likely, takle a look at imapsync-1.525/examples. You find 2 important files:

  • file.txt – contains a list of accounts with the self-explanatory line-format user001_1;password001_1;user001_2;password001_2
  • sync_loop_unix.sh – mass-migrates the accounts from the list. Pleas note: You must set the two imap-hostnames manually in this file! Check the directions in the file. Use via
    $ ./sync_loop_unix.sh

Have fun syncing your IMAP stuff around!

Emulate sendmail on your Dev Machine

In order to prevent email delivery during development and log all email messages that would have been delivered, you can actually do a simple trick: Replace the file /usr/sbin/sendmail (on Ubuntu, use ‘locate sendmail’ to find it if it lies elsewhere) with this little shell-script, or rather make a _bak of the original and save the following instead of the sendmail binary:

#!/bin/bash

LOGDIR="/tmp"
PREFIX="sendmail"
NOW=$(date +%Y-%m-%dT%H.%M.%S)
CNT=1
PRIVATELOG="$LOGDIR/$PREFIX-$NOW.$CNT.log"
COMBINEDLOG="$LOGDIR/$PREFIX-combined.log"

# If privatelogs are being used...
if [ ! -z "$PRIVATELOG" ]; then
# ...make sure the filename is unique and create the file
while [ -f $PRIVATELOG ]; do
CNT=$(($CNT + 1))
PRIVATELOG="$LOGDIR/$PREFIX-$NOW.$CNT.log"
done

echo "$0 $*" > $PRIVATELOG
else
# ...otherwise swap filenames
PRIVATELOG=$COMBINEDLOG
COMBINEDLOG=''
fi

echo "[$NOW]" >> $PRIVATELOG
while read BUF
do
echo $BUF >> $PRIVATELOG
done

# Append privatelog to combinedlog when both logs are used
if [ ! -z "$COMBINEDLOG" ]; then
echo "[$NOW]" >> $COMBINEDLOG
cat $PRIVATELOG >> $COMBINEDLOG
fi

exit 0

When your application now sends mail, these things happen:

  • No email is actually sent.
  • The message gets appended to the file /tmp/sendmail-combined.log, on which you could set a ‘tail -f’ in order to see which emails would have been sent and what contet they would have.
  • One new file (e.g. /tmp/sendmail-2011-02-08T08.02.48.1.log) gets written for every email sent. I personally only use the combined file.

Inspired by http://stackoverflow.com/questions/3710864/simulating-sendmail-with-dummy-script

Clean your PHP4 Legacies using sed

If you have to deal with very old PHP4 legacy code containing every syntax crime you may know from the early years, how would you handle it? Give it to your junior people to fix it manually? I like to have at least some handy helpers for the first rough corrections. I found sed to be a very powerful helper here.

Code you might encounter - associative array elements without quotes.

I spent quite some time to find useful regular expressions to help me. This is how I did this:

A way to test your regular-expression is to echo a sample and apply your regex to test the results:

echo '$_REQUEST[action] reise_l_topic_ids[] $dat[ticket_order]' | sed "s/$([a-zA-Z0-9_]+)[([a-zA-Z0-9_]+)]/$1['2']/g"

Once it works you can apply your regex to one file:

sed -i "s/$([a-zA-Z0-9_]+)[([a-zA-Z0-9_]+)]/$1['2']/g" my_old_file.php

Or apply your regex to all *.php-files recursively below the current directory to a whole project:

find . -name "*.php" -exec sed -i "s/$([a-zA-Z0-9_]+)[([a-zA-Z0-9_]+)]/$1['2']/g" '{}' ;

By the way: The usage of sed works fine on your linux command line, but not on OSX. The syntax is slightly different here (sed -i “” -e “s/blah/blubb/” file). This is of course only a start to automate otherwise painfull and boring corrections down to just a few seconds. It will not save you from special manual work and break syntax at some points. But it weeds out 90% and leaves you with the other 10% acutal manual work.

You could imagine many more sed regexes e.g. to replace short tags <?=$my_var?> to a proper <?php echo $my_var; ?> etc.

echo '<?=$out?> sakdhs sakdhas k <?php echo $xyz; ?> ddd' | sed "s/<?=/<?php echo /g";

I will collect more regexes as I need and find them. If you have ideas please add them in the comment section.

Stuff I need to lookup every time

Set ignore to all files of a directory with subversion:
$ cd cache
$ svn propset svn:ignore '*' .
$ svn ci . -m 'Ignore set on cache dir.'

Show changed files between two revisions, overview
$ svn diff -r 300:HEAD --summarize

Show changed files between two revisions, for each revision:
$ svn log -v -r 300:304

See overall latest 20 commit-messages:
$ svn log -l 20

Branching and merging:
See: http://blog.evanweaver.com/2007/08/15/svn-branching-best-practices-in-practice/

Only grep in php source files, not jpgs, movies etc.:
$ grep -i 'whatever' `find . -name '*.php' -print`

Add all new files in a large filestructure to subversion, like after an update of vendors in Symfony2
svn st | grep "^?" | awk "{print $2}" | xargs svn add $1

Remove all deleted files from a large filestructure from subversion, like after a vendors update in Symfony2
svn st | grep '^!' | awk '{print $2}' | xargs svn delete --force

Setup external libraries in subversion:
svn mkdir ZFVersions;
svn add ZFVersions;
svn ci ZFVersions -m 'Added dir for all versions.';
cd ZFVersions;
svn mkdir 1.11;
svn add 1.11;
svn ci 1.11 -m 'Added version subdir.'
cd 1.11;
svn propset svn:externals 'Zend http://framework.zend.com/svn/framework/standard/tags/release-1.11.0/library/Zend' .; # This will checkout in another dir Zend into your dir 1.11. You need this since autoloading is using paths like this require_once(Zend/Feed/Rss.class.php)!!
svn commit -m 'Set external.';
svn up .; # Loads external lib.

Correct date problems in a mysql database – 2020 instead of 2011 in YYYY-MM-DD dates:
UPDATE accounts SET member_startdate = CONCAT('2010', '-', MONTH(member_startdate), '-', DAYOFMONTH(member_startdate)) WHERE YEAR(member_startdate) > 2011;

What is it you guys always look up?

Useful Linux Commands 12/2009

Recursively remove all .svn directories from a working copy:

find . -name .svn -exec rm -rf {} ;

Recursively remove all ._xyz-files (OSX meta file info) from your WebDav-Drive, set via hourly cron:

find /var/data/ -name "._*" -exec rm {} ;

Do not forget to set your path ;).

Check for syntax-errors (lint) in all php-files of current directory and only echo error messages if errors have been detected:

find . -name "*.php" -exec php -l {} ; | grep -v 'No syntax errors'

Useful Linux Commands 04/2009

I had a list of files from a large file structure as a result from a maintenance script run with lines like this:

/home/web/.../sources/.../2008/12/25/4f1feabbd76f79ecab150bdee3f6ae4d.xml
/home/web/.../sources/.../2008/12/25/e506e433a2d87f0275c7641da59bbf7f.xml
/home/web/.../sources/.../2008/12/28/901c4f081645b986e9b1377d3f586b8e.xml
/home/web/.../sources/.../2008/12/28/6bec4d4bbcf8f596c40694210d220a3b.xml
/home/web/.../sources/.../2008/12/24/477c535d6111605c8f6020a959f32fde.xml
/home/web/.../sources/.../2008/12/24/9f253a96fc26d8f6d9e61b8f1bdb3453.xml

Each line represented a document path to a file which was supposed to be removed from the filesystem. You can do that with the following simple oneliner:

for LINE in $( cat ../log/my_empty_files.txt ) ; do rm $LINE ; done

You can try it with ‘echo’ instead of ‘rm’ first to see if it would work:

for LINE in $( cat ../log/my_empty_files.txt ) ; do echo " # $LINE" ; done

Bulk Image Resize using Conditional Width

I am currently working on a project in which we have lots of images from an old CMS waiting to be migrated into a new layout. Of course there are restrictions so it should not happen that certain image types exceed a certain max. width.

OK, we have many many images… So I took a closer look at ImageMagick (also take a look at the usage examples). And I have to say: Awsome!

You can install ImageMagick on Ubuntu or Debian with a simple
# apt-get install imagemagick

In combination with a bit conditional scripting I came up with the following solution:

Console doing bulk resize.

Console doing bulk resize.

I wanted to have a shell script that, given a directory containing all our images, checks the width of each image and resizes it if it exceeded a certain width. Simple, but powerful.

Usage:

$ ./resize_image_dir.sh ../../brand_logos

And you are done with thousands of images in a minute. Do not forget to make a backup if designers change the desired width later…

You can download the shell scripts with example images ready to test:
!resize_conditional_images_bulk2

Upgrade PHP5 with an alternative sources.list on Debian etch

I was having trouble with a server running Debian 4.0 (etch). Using the standard sources in the /etc/apt/sources.list the supported PHP5 version was 5.2.0-8+etch13 which contained a very annoying bug for my application.

A daily running script – let’s call it the Importer – regularly exited randomly with a “Fatal error: Out of memory (allocated 12320768) (tried to allocate 2851436 bytes) in …” and I had to restart it manually nearly every morning. I had…

  • …checked my application for memory wasting operations and loops and fixed them,
  • …used ini_set(‘memory_limit’, ’64M’); at runtime, and
  • …finally increased memory_limit = 64M in my php.ini.

But all this did not change the bahaviour of the Importer!

So I took a look at the PHP5 Changelog to find potentially fixed bugs in newer releases. Bug #39438 described exactly my problem. So a simple upgrade would help me. But it did not work with ‘apt-get upgrade’ or ‘apt-get install php5=5.2.8′ since the highest version in the apt source I used was the one that I already had: 5.2.0-8+etch13, issued in November 2006… (pretty ancient)

Finally it was this page that had the information we needed: an alternative apt source

deb http://packages.dotdeb.org etch all
deb-src http://packages.dotdeb.org etch all

After getting an impression whether dotdeb was a trustworthy source, we first tried it on our dev-system with ‘apt-get update; apt-get upgrade;’. At this point I was once more glad to have written so many UnitTests. They all passed and everything looked good.

Thanks Kim for your help!