Other pages

Things I like

NAME

Binary - Parse files into perl structures

SYNOPSIS

  open (my $fh, "<:raw", '/path/to/file') or die "$!";
  
  my ($filedata, $fh) = Binary::eat_desc([ $fh ], [
    version      => 's>',
    color        => \&eat_rgbcolor,
    must_be_1    => sub { Binary::eat_required(shift, 'l>', 1, 'Not 1, ohnoes!') },
    len_then_str => sub { Binary::eat_counted_string(shift, 'l>') },
    dualvar_n    => sub { Binary::eat_enum(shift, 'l>', [qw/absolute relative/] ) },
  ]);    

DESCRIPTION

This module provides a set of functions to extract data from binary files. These can be combined sequentially to extract data as a set of fields and return a perl data structure.

Each function reads data from a filehandle using a given data description, and returns it.

All functions (except "eat_at") expect the filehandle's position marker to be at the appropriate place in the file to read data matching the given rule.

Binary is built around perlfunc:unpack. Ultimately, all calls are devolved into calling unpack on a string of data read from a file. The "eat_desc" function can be called to directly run a given unpack template.

You may add your own local functions to call using Binary, or use functions from other Binary derived modules.

Common arguments

(Almost) all Binary functions take the same first argument, refered to in the descriptions of individual functions as $conventional. This is an arrayref to keep the calling convention concise, while allowing later versions to add more conventional parameters without breaking existing code. The elements of the arrayref are as follows:

Return values

All functions return the requested value.

Functions

eat_desc

This is the primary function to call to define a structure to be read from a file.

The $desc argument describes the format of the data to be extracted, it can be one of several different types of description:

eat_unpack

  Binary::eat_unpack($conventional, 'l>', 8);

This function is called by "eat_desc" to retrieve data from the filehandle and actually perlfunc:unpack it.

Arguments:

Think about using "eat_desc" before using this function, as it has pre-defined sensible lengths for each pack template, to extract number data.

eat_required

  Binary::eat_required($conventional, 'a4', 'fred', "Can't find fred");

This function works similarly to "eat_desc", which it uses to fetch the content from the file using the $desc argument. The difference is that it throws an exception if the content fetched does not match the value passed in $should_be (according to perl's smart match semantics, see "perlsyn:Smart Matching in Detail").

For how to use the $desc parameter, see "eat_desc".

eat_encoded_str

  Binary::eat_encoded_str($conventional,  4, 'utf-8', 0);

Retrieve $byte_len bytes from the file handle, decode them using the given $encoding.

If $chop_zeroes is true, the result will be returned with all trailing nulls removed.

eat_utf16be_null

  Binary::eat_utf16be_null($conventional); 

Reads utf16be (big endian) characters in 2-byte chunks from the filehandle and decodes them. Returns the entire string found when a null word is encountered.

eat_pad_until

  Binary::eat_pad_until($conventional, 1, 4);

Keep removing bytes until the filehandle position reaches the next multiple of $mul, plus $ofs (offset).

This is useful in situations where files consist of sections of a set length ($mul), which are padded at the end with random padding. Use $ofs if you wish to skip some of the beginning of the next section.

$ofs defaults to 0 if not supplied.

Returns the bytes read, if any.

eat_until_eof

  Binary::eat_until_eof($conventional, 'l>'

Read $desc from the filehandle repeatedly until the end-of-file marker is reached. See "eat_desc" for an explanation of the $desc argument.

Outputs a warning if the file unexpectedly runs out of bytes in the middle of a $desc.

Returns an arrayref containing all the results of the repeated "eat_desc" calls.

eat_at

This is the only function which does not read from the current filehandle position. The $desc description is read from the $pos position in the filehandle.

The original filehandle position is restored after the value has been read.

For a description of $desc, see "eat_desc".

eat_zero_len

  Binary::eat_zero_len([ $fh ], 'l>');

Works exactly like "eat_desc" but the filehandle marker is returned to the same position it was at the start of the read.

This is handy for testing the next value in the file without officially reading it.

eat_counted_string

  Binary::eat_counted_string([ $fh ], 'n');

Reads a string from the filehandle using $count_desc as a template to define the field containing the length of the string.

It actually first reads the string length from from the filehandle, using $count_desc as a template for "eat_desc". The result of that read is then used to fetch that number of bytes from the filehandle and return them, using "Pack template with count".

eat_bitmask

  Binary::eat_bitmask([ $fh ], .. );

eat_enum

  Binary::eat_enum([ $fh ], 
    sub { Binary::eat_counted_string(shift, 'n') },
    [ 'red', 'blue' ]
  );

Read the next $desc item from the filehandle. Looks up the resulting value in $values, using it as an index if $values is an arrayref, and as a key if $values is a hashref.

Returns the resulting value as a "dualvar" in Scalar::Util with both the raw value retrieved and the result of the lookup.

See "eat_desc" for the definition of $desc.

eat_counted

  Binary::eat_counted([ $fh ], 4, 'a');

Retrieves the $desc from the filehandle, via "eat_desc", $count times. The result is returned as an array with $count items in it.

EXAMPLES

Four bytes of arbitrary data

  open(my $fh, '<', \'fred');
  Binary::eat_desc([$fh], 'a4');

Result: 'fred'

One byte of arbitrary data, four times

  open(my $fh, '<', \'fred');
  Binary::eat_counted([$fh], 4, 'a');

Result: ['f', 'r', 'e', 'd']

Read the string size then the string

  open(my $fh, '<', \"\4fred");
  Binary::eat_counted_string([$fh], 'C');

Result: 'fred'

Transform numeric values into meaningful values

  open(my $fh, '<', \"\1\4\2");

  Binary::eat_counted([$fh], 
   3, 
   sub { Binary::eat_enum([$fh], 
         'C', 
         ['orange', 'red', 'green', 'blue', 'yellow']
       ) },
  );

Result: [ dualvar(1,'red'), dualvar(4,'yellow'), dualvar(2,'green') ]

BUGS

LICENSE

AUTHOR

Last modified:

Home