Tag Archives: Unix

Chained matching

Sometimes you want to find references to “foo bar” or things like that, but regex won’t cut it. Maybe “foo” is on a different line from “bar”. It can get a little hairy when you want to chain that together even further.

grep -ilR 'foo' * | xargs grep -il 'bar' | etc...

Partitioning searching from processing

There are two ways to grep recursively through all files of a certain type.

  • grep -R --include='*.txt' 'search string' *
  • find -type f -name '*.txt' -print0 | \
    xargs -0 grep 'search string'

While grep’s include directive is certainly more concise, I’d argue that learning all of the general case is more useful in the long run. What if you want to exclude a certain path? What it you want to change file permissions instead of grepping? Combining find with xargs tends to be far more flexible than the alternatives.

Examples

  • find -type f -not \( -path '*/.svn/*' -a -prune \) \
    -print0 | xargs -0 grep 'search string'
  • find -type f -print0 | xargs -0 du | sort -g | tail

Symbolically merging one directory into another

I found this technique necessary when working with vBulletin. I want to keep the core directory clean for easy upgrades (which often involves replacing the directory with the new release) and I don’t want to have to tediously copy files in each time. Solution: keep the files separate and recreate symbolic links on the fly.

# First remove any pre-existing links
find . -type l -print0 | xargs -0 rm;
# Then symbolically link dir2 into dir1
for file in `cd dir2; find`; do
	if [ ! -e dir1/$file ]; then
		depth=`echo $file | awk -F '/' '{print NF}'`;
		prefix=`perl -e 'print "../"x$ARGV[0]' $depth`;
		ln -s "$prefix"dir2/$file dir1/$file;
	fi
done

Fingerprint of a path

I have a code base and something isn’t working in the way it should. I want to find out if the problem is in some corrupt data or if someone’s edited some files here and there. Unfortunately, the path isn’t under any kind of version control, so what do I do?

Well, I can download a fresh copy of the code base and compare, right? Here’s how you can do that.

find ./path -type f -print0 | xargs -0 md5sum > path.md5s;

Do the same to the fresh installation and diff the outputs. That should give you a reasonable feel for what’s changed, yeah? I don’t know if this is the best approach, so if you know a better way, please tell me.

Update: Seems that diff -urN path1 path2 will do it one better and show the contents of what changed. It doesn’t have the same power of pruning subdirectories that you’d expect from find, but in most cases it’s strictly better.

Unix tips

When I first started this blog, I had the intention of using it, at least in part, to publish little command-line tips I found useful. I suppose with commandlinefu.com up in full swing, that’s no longer necessary.

I’m honestly surprised that site only came about within the last six months. Applying social aggregation to tips within a field seems obvious now. That’s the thing with good ideas, yeah?

Export a database subset

I ran into an interesting bit of code today. Sometimes a database gets too big and you only want the structure. In that situation, there’s the -d parameter. In other cases, you want not only the structure, but some of the data too. By using the –where parameter creatively, we can limit the number of rows mysqldump returns.

mysqldump --where="true LIMIT X" schema > output.sql

Unfortunately, this doesn’t allow for any granularity in the number of rows returned. Still, “get me the database and at most 1000 rows from each table” can be helpful in certain situations.

The top family

I’ve recently discovered a series of useful commands. Much like top, these commands provide real-time reports on various system statuses. Know them. Love them.

Like top, but for files

I was restoring some 20GB of MySQL data today, but I’d forgotten to -v the command, leaving me to guess how far along I was in the process. To compensate for this, I threw together a quick script that acts a little like top, but for files. Observe.

watch --differences -n 5 'df; ls -FlAt;'

It basically says “keep printing out disk usage and recently modified files every 5 seconds until I say otherwise”. Run that inside of /var/lib/mysql/foo and you’re set.

Execute a command on multiple servers

It’s both amazing and informative to watch those with more skill than you, especially when you can look through logs to see what they did. The latest nugget of awesome I learned reboots a series of apache instances on separate servers.

for i in `seq 1 10` do ssh instance$i rcapache2 restart; done

It basically says “connect to instance1, instance2 and so on and restart the apache server on each”. This approach could easily be modified for more complex tasks. That it’s done on one line is what catches my attention. Prior to seeing this done, I would have SSHed into each server by hand.

Finding large files

In OSX, I can find what’s taking up my disk space using Disk Inventory X or GrandPerspective. That’s all well and good, but today I needed to get similar results over SSH. Observe.

du -k --max-depth=X | sort -r -g | head -n Y

It basically says “from the current directory, list everything at most X deep by kilobytes, sort it in descending order and only show me the top Y results”. With values like X=4 and Y=10, you’re bound to find the behemoths.

Update

Here’s another way of getting the same results, but with the option for further refinement.

find -size +1M -type f -print0 | xargs -0 ls -Ssh1 | head
find -size +1M -exec du -sk {} \; | sort -nr | less