Let's take java class file a example:
Aclass
file consists of a singleClassFile
structure:
ClassFile {
u4 magic;
u2 minor_version;
u2 major_version;
u2 constant_pool_count;
cp_info constant_pool[constant_pool_count-1];
......}
Traditional C++ Code:
class ClassFile{
u4 magic;
u2 minor_version;
u2 major_version;
u2 constant_pool_count;
ConstantPool* constant_pool;
void ReadClassFile(FILE *f)
{
magic = read_u4(f);
minor_version = read_u2(f);
major_version = read_u2(f);
constant_pool_count = read_u2(f);
constant_pool = new ConstantPool[constant_pool_count];
for(int i=0; i<constant_pool; i++)
Problems:
To avoid the duplication, we can use a function to represent the binary structure( ClassFile), instead of the above c++ class.
We can define the following Python function:
def ClassFile():
magic = u4()
minor_verison = u2()
major_version = u4()
constant_pool_count = u2()
constant_pool = [cp_info() for i in range(constant_pool_count-1)]
# other fields not implemented yet
return locals()
The u4 is a function, it read 4 bytes from the binary file and return the integer value.
cp_info is a function which represent the constant pool.
Pros:
- Specify the format in a programming language that is well know.
- The code is very close to the binary file format specification ( same high level)
- The code is better than the specification in the senser it's executable.
Cons:
- Parsing but not editing. This method is usefull when you want to parsing ( read ) a binary file, but if you want to edit (write) a binary file, this method will not work.
- for general parsing and editing of binary file, google is your friend. see http://www.gigamonkeys.com/book/practical-parsing-binary-files.html
- In practice, it's very common you only need parse a binary file. so why make things so complex by function that you don't need?