No, I am not dead yet. I just did not have anything to really talk about. Also, I have been up to my eyeballs in work so no surprise there. Anyway, I want to talk about some more geeky things so those of you not so inclined can skip the rest of this post.
I was writing a bit of filter code for Perl (no, not a source filter. Those are evil). I wanted to find out how fast my code was because I was using many, many regular expressions, which were fairly complex and had parts that had .*? with the //s regex modifier. I expected these to be slow and that the filter would be the bottle neck in the system.
I took the code and put it into the Benchmark module so I could get some reliable data. On the first run, it took 48 seconds on running it 100,000 times (this was a particularly complex example) but I noticed that the CPU time was really low. This indicated to me that there might be something else going on since I was pulling the data from a file, I thought that the Input/Output and hard disk speed might be the problem.
In production, the data will be coming across a network line and will be stored in variable before the filter is run across it. To more easily emulate this, I placed the code I wanted the filter to run across in a variable as it would be in production. I then ran the filter on it over 1,000,000 times. It was able to do the transformation that many times in less than a second even with the nastiness of the .*? regular expressions.
The moral of the story here is that I/O is expensive. Try to do the least amount of I/O as you can manage since it eats up time even with complex disk buffering. This was a contrived example since I was running this on my laptop and playing music and other things in the background that could effect performance. However, main memory is always faster than caching to the disk (as you can see on your own system, when you start too many programs and your hard disk starts thrashing and your entire system slows down).
posted by Chris #10:24 AM | 0 comments |