Tag Archives: commandline

Extract All Email Addresses from Outlook

I was looking for a solution to extract a list of all emails I have ever used or emailed through outlook. It might be helpful for you too. This what I ended up doing:

a) Export email and contacts into an olm file (Mac). I was using Outlook 2011. This is a binary compressed format. I selected Email and Contacts.

b) Use StuffItExpander to extract the olm to a readable XML structure. Just install StuffItExpander and drag the olm file on to it. You end up with a parsable directory structure including all files in xml format.

c) I could not find a solution to recursively parse the mess, so I decided to merge all xml files form all subdirectories into one big file:

$ find /path/to/directory/ -name *.xml -print0 | xargs -0 -I file cat file > merged.file

d) Extract all email addresses from the merged file into a file:

$ perl -wne'while(/[\w\.\-]+@[\w\.\-]+\w+/g){print "$&\n"}' merged.file | sort -u > output.txt

e) You will be surprised how many lines this file will have. Check the output.txt containing all unique extracted email addresses from outlook. The list needs to be cleaned. There will be a lot of invalid or temporary emails you need to go through manually.

Have fun with your list.

Useful Linux Commands 01/2014

a) Bulk rename on the commandline. I needed this one to re-import bulk files for a BI database. All already processed files get a prefix ‘proc_’ in order to know, which files have already been imported into the BI database. Use http://tips.webdesign10.com/how-to-bulk-rename-files-in-linux-in-the-terminal

TEST YOUR EXPRESSION:
$ rename -n ‘s/^proc_order_list_(201312d{2}-d{6}-[A-Z]{3}-[A-Z]{2}).csv$/order_list_$1.csv/’ *.csv

proc_order_list_20131211-025914-AMZ-EU.csv renamed as order_list_20131211-025914-AMZ-EU.csv
proc_order_list_20131211-031130-ENG-DE.csv renamed as order_list_20131211-031130-ENG-DE.csv

DO THE ACTUAL RENAMING:
$ rename ‘s/^proc_order_list_(201312d{2}-d{6}-[A-Z]{3}-[A-Z]{2}).csv$/order_list_$1.csv/’ *.csv

There is a second level of ‘argument list too long’. If you touch it, you need a bash-script like this:

#!/bin/bash

find in/ -type f |
  while read old
  do
  	new=`echo $old | sed "s/proc_//g"`
   	if [ ! -f $new ]; then
	  echo $old '->' $new
	  mv $old $new 
	fi
  done

Or more selectively using a filename pattern:

#!/bin/bash

valid='proc_201403.*'

find in/ -type f |
  while read old
  do
        new=`echo $old | sed "s/proc_//g"`
        if [ ! -f $new ]; then
          if [ [ $old =~ $valid ] ]; then
            echo $old '->' $new
            mv $old $new
          #else
            #echo 'not matched' valid
          fi
        fi
  done

b) Output results from SQL query to file, the quick way – in case you have been using phpMyAdmin for this ;):
$ mysql -u -p -D -e “SELECT …” > quick_result.txt

c) Find directories with count of subdirectories or files (had to use this in order to find cluttered directories that caused problems with a server having software RAID and rsync backups):

$ find . -type d | cut -d/ -f 2 | uniq -c | sort -g

d) Prevent cp/mv/rm – cannot execute [Argument list too long] using find to copy a long list of files when you have long filelists in directories from which you would like to copy/move/remove:
$ cd /var/www/project/incoming/staging;
$ find ../production/data/sources/orderstatus/in/ -name ‘*.xml’ -exec cp {} data/sources/orderstatus/in/ ;

e) Copy all files from multiple backup sub-directories (structure like this 1122/3344/112233445566.xml) into ONE directory:
$ find ./dumpdirs/11* -name “*.xml” -type f -exec cp {} ./flatToFolder ;

f) Count all files in subdirectories with the pattern proc.*.xml:
$ find in/ -name “proc_*.xml” | wc -l

g) Filelist too long using tar and wildcards, use a filelist:
$ find in/ -name ‘*.xml’ -print > tarfile.list.txt
$ tar -cjvf evelopmentOrderstati-20140306.tar.bz2 in/*.xml
$ rm tarfile.list.txt

h) Filelist too long using grep:
Problem:
$ grep -r “4384940″ *
-bash: /bin/grep: Argument list too long
Too many files in your directory

Check:
$ ls -1 | wc -l
256930

Solution:
$ find . -type f | xargs grep “4384940″

Another way to avoid this problem is to substitute the “*” with a “.”:
$ grep -r “4384940″ .

Stuff I need to lookup every time

Set ignore to all files of a directory with subversion:
$ cd cache
$ svn propset svn:ignore '*' .
$ svn ci . -m 'Ignore set on cache dir.'

Show changed files between two revisions, overview
$ svn diff -r 300:HEAD --summarize

Show changed files between two revisions, for each revision:
$ svn log -v -r 300:304

See overall latest 20 commit-messages:
$ svn log -l 20

Branching and merging:
See: http://blog.evanweaver.com/2007/08/15/svn-branching-best-practices-in-practice/

Only grep in php source files, not jpgs, movies etc.:
$ grep -i 'whatever' `find . -name '*.php' -print`

Add all new files in a large filestructure to subversion, like after an update of vendors in Symfony2
svn st | grep "^?" | awk "{print $2}" | xargs svn add $1

Remove all deleted files from a large filestructure from subversion, like after a vendors update in Symfony2
svn st | grep '^!' | awk '{print $2}' | xargs svn delete --force

Setup external libraries in subversion:
svn mkdir ZFVersions;
svn add ZFVersions;
svn ci ZFVersions -m 'Added dir for all versions.';
cd ZFVersions;
svn mkdir 1.11;
svn add 1.11;
svn ci 1.11 -m 'Added version subdir.'
cd 1.11;
svn propset svn:externals 'Zend http://framework.zend.com/svn/framework/standard/tags/release-1.11.0/library/Zend' .; # This will checkout in another dir Zend into your dir 1.11. You need this since autoloading is using paths like this require_once(Zend/Feed/Rss.class.php)!!
svn commit -m 'Set external.';
svn up .; # Loads external lib.

Correct date problems in a mysql database – 2020 instead of 2011 in YYYY-MM-DD dates:
UPDATE accounts SET member_startdate = CONCAT('2010', '-', MONTH(member_startdate), '-', DAYOFMONTH(member_startdate)) WHERE YEAR(member_startdate) > 2011;

What is it you guys always look up?

Sync Your Stuff to S3

This is a receipe how I save stuff to S3 from my Mac:

1.) Signup with S3: http://aws.amazon.com/s3/ (check pricing!). This will give you access to the AWS Management Console.

2.) Create a Bucket: This can be done via the AWS Management console. If you are not familiar with the concept of ‘buckets’ check-out the S3 documentation. Simply put, it is a virtual storage device that has a fixed geographical location.

3.) Go to ‘Security Credentials’ in your account settings in the AWS Management console and create an accesskey.

4.) Download JetS3t. You then have the following directory on your Mac:
/Applications/jets3t-0.7.3

Open the Terminal and change into the bin directory:
$ cd bin;

Create a file named synchronize.properties there:
$ nano synchronize.properties;
and save the following content using your keys from step 3:

accesskey=XXXXXXXXXXXXXXXXXXXX
secretkey=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

5.) To sync the contents in the path /Users/marco/MySyncStuff with the bucket myBucketName use this command:
$ ./synchronize.sh UP myBucketName /Users/marco/MySyncStuff/ –properties synchronize.properties;

Of course you can create as many buckets as you like and script and schedule your data syncs now from here as you wish. Use the command
$ ./synchronize.sh –help
to see what the synchronize.sh else has on offer.

6.) Browsing buckets: JetS3t has its own S3 browser. To start and use it do the following:

$ cp cockpitlite.sh cockpitlite.command;
$ ./cockpitlite.command &;

You should see the Java coffee cup on your task bar. Use your keys to log in and browse your buckets.

You can also use the free S3 Browser for Mac.

Useful Linux Commands 12/2009

Recursively remove all .svn directories from a working copy:

find . -name .svn -exec rm -rf {} ;

Recursively remove all ._xyz-files (OSX meta file info) from your WebDav-Drive, set via hourly cron:

find /var/data/ -name "._*" -exec rm {} ;

Do not forget to set your path ;).

Check for syntax-errors (lint) in all php-files of current directory and only echo error messages if errors have been detected:

find . -name "*.php" -exec php -l {} ; | grep -v 'No syntax errors'

Quick PHP Hack to Tidy up Trashed HTML

I just hacked together the following quick and dirty PHP-script to use the tidy-extension from the command line. Maybe somebody else needs something like this somewhere. Check the comment for details:

<?php

//
// (a) Save this as tidy.php
// (b) Call it from commandline like this: $ php tidy.php trashed.html > tidy.html
//     to tidy the file trashed.html to a new file tidy.html.
//

// Installation and configuration see http://php.net/tidy
$config = array('indent'=> true, 'output-xhtml' => true, 'wrap'=> 200);

$tidy = new tidy;
$tidy->parseString(file_get_contents($_SERVER['argv'][1]), $config, 'utf8');
$tidy->cleanRepair();

echo $tidy;

Useful Linux Commands 04/2009

I had a list of files from a large file structure as a result from a maintenance script run with lines like this:

/home/web/.../sources/.../2008/12/25/4f1feabbd76f79ecab150bdee3f6ae4d.xml
/home/web/.../sources/.../2008/12/25/e506e433a2d87f0275c7641da59bbf7f.xml
/home/web/.../sources/.../2008/12/28/901c4f081645b986e9b1377d3f586b8e.xml
/home/web/.../sources/.../2008/12/28/6bec4d4bbcf8f596c40694210d220a3b.xml
/home/web/.../sources/.../2008/12/24/477c535d6111605c8f6020a959f32fde.xml
/home/web/.../sources/.../2008/12/24/9f253a96fc26d8f6d9e61b8f1bdb3453.xml

Each line represented a document path to a file which was supposed to be removed from the filesystem. You can do that with the following simple oneliner:

for LINE in $( cat ../log/my_empty_files.txt ) ; do rm $LINE ; done

You can try it with ‘echo’ instead of ‘rm’ first to see if it would work:

for LINE in $( cat ../log/my_empty_files.txt ) ; do echo " # $LINE" ; done

Bulk Image Resize using Conditional Width

I am currently working on a project in which we have lots of images from an old CMS waiting to be migrated into a new layout. Of course there are restrictions so it should not happen that certain image types exceed a certain max. width.

OK, we have many many images… So I took a closer look at ImageMagick (also take a look at the usage examples). And I have to say: Awsome!

You can install ImageMagick on Ubuntu or Debian with a simple
# apt-get install imagemagick

In combination with a bit conditional scripting I came up with the following solution:

Console doing bulk resize.

Console doing bulk resize.

I wanted to have a shell script that, given a directory containing all our images, checks the width of each image and resizes it if it exceeded a certain width. Simple, but powerful.

Usage:

$ ./resize_image_dir.sh ../../brand_logos

And you are done with thousands of images in a minute. Do not forget to make a backup if designers change the desired width later…

You can download the shell scripts with example images ready to test:
!resize_conditional_images_bulk2

Useful Linux Commands 10/2008

(1) Finds files in generated documentation, containing <span class=”field”>webservice:</span> and writes a file containing a clickable list of links to those pages:
~/sites/html/phpdoc_all_global$ find -type f -print0 | xargs -0 grep -li ‘<span class=”field”>webservice:</span>’ | while read in; do echo “<a href=”$in”>$in</a>”; done > clickable_list_page.html

(2) A for-loop on the commandline. The * equals all files and dirs in current directory (try “echo *”). Of course you can replace the ‘echo $dir’ with whatever command you like using $dir as loop-variable.:
for dir in *; do echo $dir; done

(Thanks Alexander)

(3) A simlpe piped grep, which displays the files (-l) containing the string “<FormularObjectGenerator”. And since I was searching recureively (-r) in a svn working copy, I did not want to see all the double filepaths containing “.svn” I filtered by excluding (-v) this pattern from the output.

grep -lr “<FormularObjectGenerator” ./ | grep -v “.svn”