Binary - Parse files into perl structures
open (my $fh, "<:raw", '/path/to/file') or die "$!";
my ($filedata, $fh) = Binary::eat_desc([ $fh ], [
version => 's>',
color => \&eat_rgbcolor,
must_be_1 => sub { Binary::eat_required(shift, 'l>', 1, 'Not 1, ohnoes!') },
len_then_str => sub { Binary::eat_counted_string(shift, 'l>') },
dualvar_n => sub { Binary::eat_enum(shift, 'l>', [qw/absolute relative/] ) },
]);
This module provides a set of functions to extract data from binary files. These can be combined sequentially to extract data as a set of fields and return a perl data structure.
Each function reads data from a filehandle using a given data description, and returns it.
All functions (except "eat_at") expect the filehandle's position marker to be at the appropriate place in the file to read data matching the given rule.
Binary is built around perlfunc:unpack. Ultimately, all calls are devolved into calling unpack on a string of data read from a file. The "eat_desc" function can be called to directly run a given unpack template.
You may add your own local functions to call using Binary, or use functions from other Binary derived modules.
(Almost) all Binary functions take the same first argument, refered to in the descriptions of individual functions as $conventional. This is an arrayref to keep the calling convention concise, while allowing later versions to add more conventional parameters without breaking existing code. The elements of the arrayref are as follows:
The first element of $conventional is a filehandle. The filehandle's position marker is expected to be set to the correct byte in the file to read the required data from. (Except for "eat_at".)
The "filehandle" can also be a scalar containing a string. Reading the next value from a scalar always reads from the beginning of the string, using substr.
After the function returns, the passed in filehandle will have its position marker set to the point after the read value.
The second element of $conventional is the context. As a user, this is generally optional. The exact definition of the context varies from function to function; if not noted in the documentation for each function, below, then the function will pass on it's context to lower-level calls.
A string representation of the current position in the file format tree.
All functions return the requested value.
This is the primary function to call to define a structure to be read from a file.
The $desc argument describes the format of the data to be extracted, it can be one of several different types of description:
Binary::eat_desc($conventional, [
version => 's>',
color => \&eat_rgbcolor,
must_be_1 => sub { Binary::eat_required(shift, 'l>', 1, 'Not 1, ohnoes!') },
len_then_str => sub { Binary::eat_counted_string(shift, 'l>') },
dualvar_n => sub { Binary::eat_enum(shift, 'l>', [qw/absolute relative/] ) },
]);
To extract a set of information from the file, pass an arrayref of key/value pairs. An arrayref is used instead of a hashref, as ordering is important. Each "key" is the name of a field, each "value" describes how the data should be written. eat_desc is called with the "values" and the result is returned as a hasref using the "keys".
For ease of dumping, each "key" will be present in the returned hashref both as itself, and as an element named "$n.$name", where $n starts at zero and is ++ed with every element. Additionally, there will be an element _context, which gives the context passed in $conventional.
The arrayref passed in is guaranteed to remain in its original state after the function returns.
Thus, the return of the above call might be:
{
version => 4,
color => [255 255 255],
must_be_1 => 1,
len_then_str => 'example',
dualvar_n => dualvar(1, 'relative')
}
The context passed to the "value" is the same hashref that will eventually be returned from the outer eat_desc.
Binary::eat_desc($conventional, \&eat_rgbcolor);
Binary::eat_desc($conventional,
[
version => 'C',
only_in_new => sub {
my ($conventional) = @_;
my ($fh, $context) = $@conventional;
if ($context->{version} > 2) {
return Binary::eat_desc($conventional, 'C');
} else {
return (1, $fh);
}
}
]
);
The coderef is called, passing in a $conventional. The results are returned to the calling code.
Binary::eat_desc($conventional, 'l>');
Each pack template has an implicit length, see perlfunc:pack. The number of bytes assigned to the chosen pack template is read from the filehandle, then perlfunc:unpacked and returned.
Binary::eat_desc($conventional, 'l>2');
Returns an arrayref of results, produced by calling eat_desc the number of times requested. The allowed pack templates are the same as "Pack template".
For the example given, the result would contain the next two l> values from the filehandle.
Binary::eat_desc($conventional, 'a6');
perlfunc:unpacks the given number of bytes from the file using the specfied pack template. Only a and Z are allowed.
The literal description Z* will read a null-terminated string from the filehandle, including the null. The result is returned after being decodeed from ascii.
Only supported with scalar filehandles.
Return the entire rest of the filehandle.
Binary::eat_unpack($conventional, 'l>', 8);
This function is called by "eat_desc" to retrieve data from the filehandle and actually perlfunc:unpack it.
Arguments:
Any template recognised by perlfunc:pack.
The number of bytes to read from the filehandle.
Think about using "eat_desc" before using this function, as it has pre-defined sensible lengths for each pack template, to extract number data.
Binary::eat_required($conventional, 'a4', 'fred', "Can't find fred");
This function works similarly to "eat_desc", which it uses to fetch the content from the file using the $desc argument. The difference is that it throws an exception if the content fetched does not match the value passed in $should_be (according to perl's smart match semantics, see "perlsyn:Smart Matching in Detail").
For how to use the $desc parameter, see "eat_desc".
Binary::eat_encoded_str($conventional, 4, 'utf-8', 0);
Retrieve $byte_len bytes from the file handle, decode them using the given $encoding.
If $chop_zeroes is true, the result will be returned with all trailing nulls removed.
Binary::eat_utf16be_null($conventional);
Reads utf16be (big endian) characters in 2-byte chunks from the filehandle and decodes them. Returns the entire string found when a null word is encountered.
Binary::eat_pad_until($conventional, 1, 4);
Keep removing bytes until the filehandle position reaches the next multiple of $mul, plus $ofs (offset).
This is useful in situations where files consist of sections of a set length ($mul), which are padded at the end with random padding. Use $ofs if you wish to skip some of the beginning of the next section.
$ofs defaults to 0 if not supplied.
Returns the bytes read, if any.
Binary::eat_until_eof($conventional, 'l>'
Read $desc from the filehandle repeatedly until the end-of-file marker is reached. See "eat_desc" for an explanation of the $desc argument.
Outputs a warning if the file unexpectedly runs out of bytes in the middle of a $desc.
Returns an arrayref containing all the results of the repeated "eat_desc" calls.
This is the only function which does not read from the current filehandle position. The $desc description is read from the $pos position in the filehandle.
The original filehandle position is restored after the value has been read.
For a description of $desc, see "eat_desc".
Binary::eat_zero_len([ $fh ], 'l>');
Works exactly like "eat_desc" but the filehandle marker is returned to the same position it was at the start of the read.
This is handy for testing the next value in the file without officially reading it.
Binary::eat_counted_string([ $fh ], 'n');
Reads a string from the filehandle using $count_desc as a template to define the field containing the length of the string.
It actually first reads the string length from from the filehandle, using $count_desc as a template for "eat_desc". The result of that read is then used to fetch that number of bytes from the filehandle and return them, using "Pack template with count".
Binary::eat_bitmask([ $fh ], .. );
Binary::eat_enum([ $fh ],
sub { Binary::eat_counted_string(shift, 'n') },
[ 'red', 'blue' ]
);
Read the next $desc item from the filehandle. Looks up the resulting value in $values, using it as an index if $values is an arrayref, and as a key if $values is a hashref.
Returns the resulting value as a "dualvar" in Scalar::Util with both the raw value retrieved and the result of the lookup.
See "eat_desc" for the definition of $desc.
Binary::eat_counted([ $fh ], 4, 'a');
Retrieves the $desc from the filehandle, via "eat_desc", $count times. The result is returned as an array with $count items in it.
open(my $fh, '<', \'fred');
Binary::eat_desc([$fh], 'a4');
Result: 'fred'
open(my $fh, '<', \'fred');
Binary::eat_counted([$fh], 4, 'a');
Result: ['f', 'r', 'e', 'd']
open(my $fh, '<', \"\4fred");
Binary::eat_counted_string([$fh], 'C');
Result: 'fred'
open(my $fh, '<', \"\1\4\2");
Binary::eat_counted([$fh],
3,
sub { Binary::eat_enum([$fh],
'C',
['orange', 'red', 'green', 'blue', 'yellow']
) },
);
Result: [ dualvar(1,'red'), dualvar(4,'yellow'), dualvar(2,'green') ]
Last modified: