One of the more important challenges that programmers face is bridging the gap between programmer and business person. Many programmers are just fine poring over text based and numerical data using tools like cat, grep, and less, while our cohorts in the business wing of life prefer viewing data as spreadsheets. More often then not it is our duty to provide the information we produce in the most user friendly way possible.
Enter Perl and its Spreadsheet family of modules. In particular I will be covering the basics of Spreadsheet::WriteExcel and Spreadsheet::ParseExcel. Both modules are credited to John McNamara and handle all 97 through 2003 formats of Excel. We'll all need to patiently wait for the community to catch up with Microsofts new xlsx (2007) file format. While I will cover the basics of these modules your greatest and best source of documentation is always CPAN
Before we begin we'll need to make sure we have the proper modules installed. I'll also be using a module called define. If you're using Activestate's ActivePerl you can do this graphically through their package manager. For the rest of us it's just as easy to install these modules through CPAN:
$ sudo perl -MCPAN -e"install Spreadsheet::WriteExcel" $ sudo perl -MCPAN -e"install Spreadsheet::ParseExcel" $ sudo perl -MCPAN -e"install define"
Writing to Excel Files
For the purpose of demonstration let's use a sample file containing the following tab delimited information. The columns; in order, are salesman name, projected sales, and actual sales. We'll use this file to populate our first spreadsheet. Save it as mydata.tsv
Tom Hanks 500.00 467.89 Sue Doe 347.00 383.76 Scooby Doo 500.00 670.04 Pat Swayze 400.00 380.05
We begin just as we would any Perl script only with the inclusion of our spreadsheet module:
#!/usr/bin/perl -w use strict; use Spreadsheet::WriteExcel;
Create a spreadsheet "workbook", supplying it a filename to write to. Use the resulting object to add a "page", the object that you will be actually writing to.
my $workbook = Spreadsheet::WriteExcel->new("write-tutorial.xls") or die $!;
my $page = $workbook->addworksheet("Sales Data");Now we define some formats that will help describe how cells should look in Excel. These formats support a large range of options that have been documented in the CPAN Documentation. For our example we will be giving sales above their projected figures the color green and those who came up short the color red.
# General format for money
my $money_format = $workbook->add_format();
$money_format->set_num_format('$0.00');
$money_format->set_align('right');
# Under projected sales
my $under_format = $workbook->add_format();
$under_format->copy($money_format);
$under_format->set_color('red');
# Over projected sales
my $over_format = $workbook->add_format();
$over_format->copy($money_format);
$over_format->set_color('green');With our formats and workbook set up we are almost ready to write to an excel file. Before we begin we'll define a few variables to keep a grand total and hide any magic numbers.
my $i=0;
my $projected_total=0;
my $actual_total=0;
use define {
NAME => 0,
PROJECTED => 1,
ACTUAL => 2
};Now we are ready to do some writing. We'll open up the tab separated values and parse them as we go. We'll write all the data to corresponding rows and columns while accumulating the totals. Then we will write our summary at the bottom. The write subroutine is surprisingly easy to use. Its arguments are (row, column, data, [format]). Please remember to check data for validity in a production environment.
open(DATA, "mydata.tsv") or die $!;
foreach() {
chomp;
if(! $_) { next; }
my @row= split("\t");
# Write their name
$page->write($i, NAME, $row[NAME]);
# Write their projected sales using money format
$page->write($i, PROJECTED, $row[PROJECTED], $money_format);
# Write their actual sales with a format based on performance
$page->write($i, ACTUAL, $row[ACTUAL],
$row[ACTUAL] >= $row[PROJECTED] ? $over_format : $under_format);
# Keep a running tally
$projected_total+= $row[PROJECTED];
$actual_total += $row[ACTUAL];
$i++;
}
close(DATA);
# Print a summary at the bottom.
$page->write($i, NAME, "Total:");
$page->write($i, PROJECTED, $projected_total, $money_format);
$page->write($i, ACTUAL, $actual_total,
$actual_total >= $projected_total ? $over_format : $under_format);That's it, we're done! All that's left is to close the file.
$workbook->close();
If everything went according to plan you should now have a formatted Excel file that provides visual cues for your business cohorts. You can download all of these files used above.
write-tutorial.pl
Reading Excel Files
Sometimes our challenge as programmers isn't just providing presentation for our users, but using the presentation that our users provide. Much in the same manner that we can write Excel files we can also read them. As mentioned before the community has still not caught up to Microsoft's new xlsx file format so for now we'll need to ask our users to save as in 2003 format or do it ourselves. For this example we will read in the file we made previously (write-tutorial.xls) and print its contents to standard out. From there, as you can imagine, the data can be used in any multitude of ways including being piped out as input to other programs.
We'll begin our Perl script much like you'd expect with the inclusion of the
Spreadsheet::ParseExcel module:
#!/usr/bin/perl -w use strict; use Spreadsheet::ParseExcel;
Before we delve into reading from our spreadsheet we need to prepare ourselves for a small "gotcha" when working with this module. The ParseExcel module likes to die when it's told to read data that does not exist. To counter this we will write a small utility subroutine that makes sure the value is defined before trying to read it. If it's not defined it returns an empty string. Take note of how data is accessed from the worksheet.
# Utility function to safely look inside of a Cell
# Accepts worksheet, row, column
# Returns value in cell or empty string if none
sub cell {
my ($Wks, $r, $c)= @_;
if (defined $Wks->{Cells}[$r][$c]) {
my $val= $Wks->{Cells}[$r][$c]->Value;
chomp $val;
return $val;
}
return "";
}Now we'll define some variables. Specifically we'll create an instance of an Excel parser, tell it what file to look at, and tell it which page in that file to parse. Then we'll again use the define module to hide any magic numbers.
my $parser = new Spreadsheet::ParseExcel;
my $workbook= $parser->Parse("write-tutorial.xls");
my $page = $workbook->{Worksheet}[0];
use define {
NAME => 0,
PROJECTED => 1,
ACTUAL => 2
};Everything is almost in place, but printing to standard out is no excuse for unpresentable data. We'll set up some Perl formats to beautify our output:
my ($name, $proj, $act); format STDOUT_TOP = Name Projected Actual . format STDOUT = @<<<<<<<<>>>>>>>>>>>>> @>>>>>>>>>>>>>> $name, $proj, $act .
All that's left now is to write it out using the Perl format and our utility function:
foreach(my $i=0; $i {MaxRow}; $i++) {
$name= cell($page, $i, NAME);
$proj= cell($page, $i, PROJECTED);
$act = cell($page, $i, ACTUAL);
write;
}If all worked properly your output should look like this:
Name Projected Actual Tom Hanks $500.00 $467.89 Sue Doe $347.00 $383.76 Scooby Doo $500.00 $670.04 Pat Swayze $400.00 $380.05 Total: $1747.00 $1901.74
This concludes my primer on reading and writing Excel files in Perl. I've only touched on the very basics of what these modules are capable of doing. If you decide to use these tools I implore you to read through the documentation that has been linked to earlier in this article. The source code for this example is also available.
- Originally Written in 2008, reposted with permission by Mehigan, Bellone, & Associates -
