Python, and most interpreted languages in general, are well known for being slower than their natively-compiled counterparts such as C. One of the areas that Python can perform several orders of magnitude slower than C is binary file and network stream processing, but much of the issue is actually related to how the code is written. Here are some tips to getting the most out of Python for binary file and network stream processing.
Ludicrous speed, go!
(~)| Camera | Canon Eos Digital Rebel Xt |
| Date | 17 May 2008 at 19:52 |
| Exposure Mode | Program Normal |
| Exposure Time | 3/5 seconds |
| ISO Speed | 1600 |
| Metering | Pattern |
| Flash | Off |
For the examples below I will make up a binary format and create a randomly filled file to keep things generic and simple. I will modify elements inside of the file, and you can use the code below to follow along.
Create a Format and Test File
Let's assume I have a file containing many similar records of data, something that could be represented by a simple C structure. Each structure is 16 bytes long and made up of four 4-byte big-endian unsigned integers representing four times. The times represent the date and time of scans for four computer systems in an organization.
struct ScanRecord
{
unsigned long alpha;
unsigned long beta;
unsigned long charlie;
unsigned long delta;
};
To create a test file I will use the dd command. It will contain 1,000,000 random records, each 16 bytes long, for a total size of about 15 MiB.
dd if=/dev/random of=test_data bs=16 count=1000000
Modifying the Test File
Suppose I have been given the task of fixing a time skew in the system - it turns out the clocks used to record the scan time for charlie were set 24 seconds too slow! I need to add 24 seconds to each entry representing the time charlie was scanned. The approach that comes immediately to mind is to read each record, modify it, and save it, perhaps like the following:
#!/usr/bin/env python
import struct
infile = open("test_data", "rb")
outfile = open("test_data2", "wb")
while True:
try:
# Read a single record
a = struct.unpack(">L", infile.read(4))[0]
b = struct.unpack(">L", infile.read(4))[0]
c = struct.unpack(">L", infile.read(4))[0]
d = struct.unpack(">L", infile.read(4))[0]
except:
break
# Modify the third time by 24 seconds
c += 24
# Save the modified values
outfile.write(struct.pack(">4L", a, b, c, d))
This is a quick and simple script that will do what I need. This example is a little unrealistic but it will help to illustrate how differently written data processing functionaly can make or break your utility. I have saved the file as very_slow.py. Let's see how it performs.
time ./very_slow.py
real 0m10.603s
user 0m10.061s
sys 0m0.166s
Ten seconds for a million records seems fairly fast, but one simple optimization should be immediately visible: why not just read and convert 16 bytes at a time? This can be done with any type of record, including records with multiple data types within them, thanks to the flexibility of the struct module. I modified the script like the following:
#!/usr/bin/env python
import struct
infile = open("test_data", "rb")
outfile = open("test_data2", "wb")
while True:
try:
# Read a single record
a, b, c, d = struct.unpack(">4L", infile.read(16))
except:
break
# Modify the third time by 24 seconds
c += 24
# Save the modified values
outfile.write(struct.pack(">4L", a, b, c, d))
This version reads and converts an entire record at a time. I've saved it as slow.py. Let's see how it performs.
time ./slow.py
real 0m5.551s
user 0m5.169s
sys 0m0.171s
Five and a half seconds to modify a million records isn't bad. That's already twice as fast as the original. To see how this can be improved, I tried various combinations of reading, processing, and writing the data in different ways. What I found out is that methods have a substantial penalty associated with their invokation, especially those in the struct module, so you can significantly speed up all tasks by reading more than one record at a time. This incurs a slight memory use penalty but is well worth the speed increase, especially when the memory penalty is really just a few KiB. Using this approach it is easy to read a chunk of data, process the entire chunk in one go, and write the entire chunk back to a file which only involves a few Python calls.
Below is a third script I have created that reads, processes, and writes a hundred records at a time rather than just one. Telling struct to process a string of records rather than just one greatly speeds up the execution time.
#!/usr/bin/env python
import struct
infile = open("test_data", "rb")
outfile = open("test_data2", "wb")
# Read 100 records at a time
BUFFERSIZE = 16 * 100
while True:
try:
# Read one data chunk (BUFFERSIZE bytes)
data = infile.read(BUFFERSIZE)
except:
break
if not data:
break
# Get a list of integers from the data chunk
values = list(struct.unpack(">%dL" % (len(data) / 4), data))
# For every third integer, add 24 seconds
for x in range(2, len(values), 3):
values[x] += 24
# Write the modified data chunk
outfile.write(struct.pack(">%dL" % (len(data) / 4), *values))
I've saved this script as fast.py. Let's see how it performs.
time ./fast.py
real 0m1.837s
user 0m1.510s
sys 0m0.156s
That only took 1.8 seconds, which is three times faster than the second script and almost six times faster than the original, just by processing the data in chunks and thus making less calls and loop iterations. Most of Python is written in C - so it makes sense to let the fast C code do as much work as possible without your intervention, including switching between C and Python values in the struct module!
Real World
Using the approach above I was able to rewrite the qt-faststart utility from ffmpeg which modifies MP4 files to allow them to be easily streamed via HTTP by making sure that specific, important file information is at the front of the file. It runs just as fast as the C original, but takes less lines of code and can be used as a Python module, which is very useful if your main project is written in Python.
Conclusion
Hopefully this approach can be adapted to your projects and give them a significant speedup while processing binary file or network data.





