Math Combinations, RDBMS and Perl
8 Mar
Recently, while developing a web charting application I ran into a problem involving combinations and permutations. The application mimics an existing paper charting method with it’s own meta language to describe certain visual biological markers. One subset of the meta language defines eight shorthand notation character codes:
- B = Brown (or Black) Bleeding
- C = Cloudy (white)
- C/K = Cloudy/Clear
- G = Gummy (gluey)
- K = Clear
- L = Lubricative
- P = Pasty (creamy)
- Y = Yellow (even pale yellow)
Each code can be selected once with any other combination of codes. Some examples of possible code string combinations with dash separator(s):
- B – C
- C/K – K – P – Y
- K – P – Y – C/K
- L – L – Y Note, not valid due to duplicate L code
Eventually, these code strings are stored in a RDBMS. The question is how to store them efficiently? We could store them as a string datatype (e.g., VARCHAR), but this would be inefficient because on average a code string would take up more bytes then a number datatype (e.g., INT) used to represent the code string. This implies a look up table of code strings with related numbers and a foreign key on the main storage table(s) referentially constrained to the look up table.

Entity Relationship Diagram
Further, consider the third string code example above. Semantically it’s the same as example two, but is ordered differently (AKA permutation). To properly normalize the data we should alphabetically arrange the permutations so that order does not matter which will vastly reduce the number of possible string code combinations.
Now that we know how the data should be modeled we must populate the string code look up table. The first obvious question is how many combinations are possible when order doesn’t matter? Use the formula below where n is the number of things to choose from, and you choose r of them (No repetition, order doesn’t matter). ! means factorial.

For example, for the eight possible codes if we select four then there are 70 possible code combinations where order doesn’t matter. However, we need to know the number of non repeating combinations possible for our set of eight codes given we can select between one to eight codes. I’ll leave that simple exercise as reader homework, but excluding the possibility of selecting no codes the answer is 255.
Now the crux of the matter. Insert the 255 non repeating combinations, alphabetically arranged, with a unique number for each into the RDBMS lookup table. Assuming you managed the math part and have made it this far you can now go and gently beat your head against a wall for a few minutes – then return, please. Perl to the rescue! With it’s ease of extensions via CPAN modules and high level programming abstractions it is the ideal glue language to tackle what looks to be a formidable problem.
Our first order is to google CPAN and see what, if any, perl modules are available to deal with math combinations. Ah, the Math::Combinatorics module should fit the bill nicely. Second, we need to install the Math::Combinatorics module locally. While we are at we also need to install DBI and DBD::mysql modules to enable perl access to our MySQL RDBMS so we can insert the combinations into our look up table. Most default perl installs come with a handy tool called cpan which gives command line access to the CPAN online module repository to ease searching and installing locally.
joecrotty@macpro:~ $ sudo cpan Terminal does not support AddHistory. cpan shell -- CPAN exploration and modules installation (v1.9402) Enter 'h' for help. cpan[1]> install Math::Combinatorics ... cpan[2]> q Terminal does not support GetHistory. Lockfile removed.
With the modules in place our finished perl script is below. Note, the actual database inserts are commented out on lines 29 and 30. String_codes.code_id is an AUTO_INCREMENT column.
#!/opt/local/bin/perl
use strict;
use warnings;
use Math::Combinatorics;
use DBI;
$| = 1; # disable buffering
my @codes = ('B', 'C', 'C/K', 'G',
'K', 'L', 'P', 'Y');
# Connect to the database.
my $dbh = DBI->connect("DBI:mysql:database=DB;host=HOST",
"USER", "PASSWD",
{'RaiseError' => 1});
my $sql = 'INSERT INTO string_codes (codes)
VALUES (?)';
my $sth = $dbh->prepare($sql);
for ( 1 .. 8 ) {
my $combinat = Math::Combinatorics->new(count => $_,
data => [@codes]);
while(my @combo = $combinat->next_combination){
@combo = sort @combo;
my $combi = join('-', @combo);
print $combi, "\n";
#$sth->bind_param(1, $combi);
#$sth->execute;
}
}






No comments yet