Category Archives: Programming

Computer Programming

2010 U.S. Census Data for Postgresql

Census data published by the U.S. Census Bureau is not readily transformed for Relational Database Management System (RDMS) insertion. Here we provide an archive file to create a United States 2010 Census Short Form 1 (SF1) relational database for my favorite open source RDMS PostgreSQL. Check back here for 2020 Census updates, coming soon.

Database Features

Census Bureau instructions pertain to Microsoft Access 2007 database creation but then only to separate databases for each state. Here we create data tables that combine all 50 states (plus District of Columbia and Puerto Rico). Tables are inked with a PRIMARY KEY defined on stusab (state) and logrecno. An example table definition appears below.

create table sf1_00001 (
fileid varchar(510),
stusab varchar(510),
chariter varchar(510),
cifsn varchar(510),
logrecno integer,
p0010001 integer,
PRIMARY KEY (logrecno, stusab)
);

Restoring the DB

Download link (11GB): Census_2010_sf1_pg This 2010 Census PostgreSQL database archive was created with the following Pg commands:

pg_dump -Fc -f /archive/Census_2010_sf1_pg -d sf1_2010r &

The pg_dump custom format (-Fc) allows flexibility to restore one or more objects. The database user is not saved so you can easily restore your own user. The following Pg commands will restore the database.

createdb sf1_2010r
pg_restore -Fc -d sf1_2010r -v < Census_2010_sf1_pg

The size of the final database is 230GB, so be prepared.

Testing

Try this query at the psql prompt:

select sf1_00003.stusab as “State”,
sf1_00003.p0030001 as “Population”,
geo_header_sf1.geocomp,
geo_header_sf1.logrecno
from geo_header_sf1,sf1_00003
where geo_header_sf1.sumlev=’040′ and
geo_header_sf1.geocomp = ’00’ and
geo_header_sf1.logrecno = sf1_00003.logrecno and
geo_header_sf1.stusab = sf1_00003.stusab
order by “Population” desc;

If everything is working, you should see all 50 states (+DC, PR)

State | Population | geocomp | logrecno
——-+————+———+———-
CA | 37253956 | 00 | 1
TX | 25145561 | 00 | 1
NY | 19378102 | 00 | 1
FL | 18801310 | 00 | 1
IL | 12830632 | 00 | 1
PA | 12702379 | 00 | 1
OH | 11536504 | 00 | 1
MI | 9883640 | 00 | 1
GA | 9687653 | 00 | 1
NC | 9535483 | 00 | 1
NJ | 8791894 | 00 | 1
VA | 8001024 | 00 | 1
WA | 6724540 | 00 | 1
MA | 6547629 | 00 | 1
IN | 6483802 | 00 | 1
AZ | 6392017 | 00 | 1
TN | 6346105 | 00 | 1
MO | 5988927 | 00 | 1
MD | 5773552 | 00 | 1
WI | 5686986 | 00 | 1
MN | 5303925 | 00 | 1
CO | 5029196 | 00 | 1
AL | 4779736 | 00 | 1
SC | 4625364 | 00 | 1
LA | 4533372 | 00 | 1
KY | 4339367 | 00 | 1
OR | 3831074 | 00 | 1
OK | 3751351 | 00 | 1
PR | 3725789 | 00 | 1
CT | 3574097 | 00 | 1
IA | 3046355 | 00 | 1
MS | 2967297 | 00 | 1
AR | 2915918 | 00 | 1
KS | 2853118 | 00 | 1
UT | 2763885 | 00 | 1
NV | 2700551 | 00 | 1
NM | 2059179 | 00 | 1
WV | 1852994 | 00 | 1
NE | 1826341 | 00 | 1
ID | 1567582 | 00 | 1
HI | 1360301 | 00 | 1
ME | 1328361 | 00 | 1
NH | 1316470 | 00 | 1
RI | 1052567 | 00 | 1
MT | 989415 | 00 | 1
DE | 897934 | 00 | 1
SD | 814180 | 00 | 1
AK | 710231 | 00 | 1
ND | 672591 | 00 | 1
VT | 625741 | 00 | 1
DC | 601723 | 00 | 1
WY | 563626 | 00 | 1
(52 rows)

References

  1. Census Bureau data set
  2. 2010 Census Summary File 1 technical documentation.

WordPress Shortcode Dynamic Tables

I searched high and low for a WordPress plugin to display dynamic table data from my databases on my WP sites. I store a lot of data, with frequent updates, using the Postgresql relational database management system (RDMS). This post explains using WordPress shortcodes to query a database and display table data so your posts are always current. Tables can be displayed inline anywhere in posts and pages where you can insert a shortcode. This method was developed on a site running the Gutenberg block editor (WP v5.2.1 ) so don’t be shy, give it a try.

Custom shortcode functions could be developed as a simple plugin but I chose to add this PHP shortcode function in the functions.php file of my child theme. The HTML output defines a CSS class i.e. <table class=”pg_table_query”> for easy styling to compliment your theme.

The pg_table_query shortcode takes 2 arguments; (1) sql query and (2) text title displayed as a table caption.  An example Gutenberg shortcode entry is shown below.

example shortcode block entry

Loading the page/post produces the table shown below. CSS styling, also in the child theme, centers the table caption and highlights header data.

table generated from database using shortcode

The shortcode function pg_table_query_func coded in PHP below is written in the functions.php file of my child theme.

function pg_table_query_func($atts) {
$atts = shortcode_atts (
array(
‘sql’ => ‘select * from quote_day_r;’,
‘title’ => ‘table title’
), $atts, ‘pg_table_query_func’ );
$host = “host=db_host”;
$port = “port=5432”;
$dbname = “dbname=my_db”;
$credentials = “user=db_user password=my_pwd”;
$output=”;

$db=pg_connect( “$host $port $dbname $credentials”) or die (“no database connection”);
$ret = pg_query($db, $atts[‘sql’]) or die (“query failure”);
$output .= ‘<table class=”pg_table”>’.”\n”;
$output .= ‘<caption>’.$atts[‘title’].'</caption>’.”\n”.'<tr>’;
for($i=0; $i<pg_num_fields($ret); $i++) {
$fieldname=pg_fieldname($ret,$i);
$output .= ‘<th><b>’.$fieldname.'</b></th>’;
}
$output .= ‘</tr>’.”\n”.'<tr>’;
for($row=0; $row<pg_num_rows($ret); $row++) {
$output .= ‘<tr>’;
$result = pg_fetch_row($ret,$row);
for($col=0; $col<pg_num_fields($ret); $col++) {
$output .= ‘<td>’.$result[$col].'</td>’;
}
$output .= ‘</tr>’.”\n”;
}
$output .= ‘</table>’.”\n”;

return $output;
}
add_shortcode (‘pg_table_query’,’pg_table_query_func’);

The related CSS style code:

table.pg_table {
color: #070707;
border: 1px solid #070707;
vertical-align: middle;
border-spacing: 0px;
line-height: 1.2;
width: 100%;
}

.pg_table td, tr, th {
border: 1px solid black;
text-align: center;
font-size: 12px;
}

.pg_table caption {
display: table-caption;
text-align: center;
font-weight: bold;
font-size: 18px;
padding: 5px 0 5px 0;
}

Sort and Average Rank in Fantasy Sports

As a dedicated fantasy hockey sports fan, I try to get an edge on the competition by analyzing player performance statistics for my initial draft and player trades during the season.  This often involves sorting or ranking players in scoring categories (e.g. goals, assists …).  It turns out that there are several ways of sorting and some may be more applicable than others for judging performance statistics.  For example, suppose we’re ranking goals in a league where 10 players have each scored 5 goals. A simple sort may order these players in ranks from 21 – 30. Even though 10 players score the same performance statistic (5 goals) their ranks can differ by 10 places. Not only that but typically a sort operation is non-deterministic i.e. the same 10 players can be ranked in any order, all between 21 – 30. One way to resolve ties (draws) is to assign an average rank (i.e. 25.5)  for all 10 players. Rank averaging maintains “sum-of-ranks” which is important for a defined sample size and fairer when comparing  to other statistics, in my opinion. A rank averaging method is commonly used in Yahoo Fantasy Hockey and other fantasy sports.

I looked far and wide on the Internet for a computer algorithm to automate average ranks.  Unable to find a suitable routine, I made my own. Using the Perl programming language I show how to sort a performance category and calculate the average ranks. No doubt a more elegant solution exists but at least mine works.

The annotated Perl script below should explain the average rank problem. It takes an unsorted hash of player scoring data and outputs a hash with an average rank for each player. The script’s print output shows the (complicated) averaging steps in action. I’ve tested this on sets of ~900 players and it works perfectly. Some lines of code below are truncated, be careful with copy-and-paste.

#!/usr/bin/perl -w
# define sub sr() to sort and calculate (average) ranks a’la Yahoo Fantasy Hockey.
# sorting higher score is better(top rank=0) to lower score.
use strict;
use warnings;
use List::Util qw(sum);

# define hash of real numbers.
my %scrHash = (); # player scores in some category (goals, assists…
my %rnkHash = (); # output for player ranks in this category.

#     Unsorted hash                        # Possible sort                     rank avg_rank
$scrHash{name0}{goals} = 2; # $scrHash{name7}{goals} = 8 0 1.5
$scrHash{name1}{goals} = 3; # $scrHash{name9}{goals} = 8 1 1.5
$scrHash{name2}{goals} = 4; # $scrHash{name8}{goals} = 8 2 1.5
$scrHash{name3}{goals} = 5; # $scrHash{name6}{goals} = 8 3 1.5
$scrHash{name4}{goals} = 5; # $scrHash{nameA}{goals} = 6 4 4
$scrHash{name5}{goals} = 5; # $scrHash{name4}{goals} = 5 5 6
$scrHash{name6}{goals} = 8; # $scrHash{name6}{goals} = 5 6 6
$scrHash{name7}{goals} = 8; # $scrHash{name5}{goals} = 5 7 6
$scrHash{name8}{goals} = 8; # $scrHash{name2}{goals} = 4 8 8
$scrHash{name9}{goals} = 8; # $scrHash{name1}{goals} = 3 9 9
$scrHash{nameA}{goals} = 6; # $scrHash{name0}{goals} = 2 10 10
$scrHash{nameB}{goals} = 1; # $scrHash{nameC}{goals} = 1 11 12
$scrHash{nameC}{goals} = 1; # $scrHash{nameB}{goals} = 1 12 12
$scrHash{nameD}{goals} = 1; # $scrHash{nameD}{goals} = 1 13 12
$scrHash{nameE}{goals} = 0; # $scrHash{nameE}{goals} = 0 14 14
#                                                                                                  sum of ranks 105 105

sr(“goals”);

print “Program End.\n”;
exit(0);

sub sr { # sort rank
my $cat=$_[0]; print “sr: category: $cat\n”;
my @valArray = (); # sorted score values
my @avgArray = (); # variable length array with running average.
my @rnkArray = (); # vartiable length array orders average ranks.
my $cntr=0;
my @sc = reverse sort { $scrHash{$a}{$cat} <=> $scrHash{$b}{$cat} }keys \%scrHash;
foreach my $id (@sc) {
print “sr: $cntr, $id, $scrHash{$id}{$cat}\n”;
$valArray[$cntr++] = $scrHash{$id}{$cat};
}

# Define 2 flags $en: equals-next, $el: equals-last
my $en = 0;
my $el = 0;

# Go through the sorted value array and average the rankings.
# These rankings are zero-based. push $i+1 for 1-based or similar.
for (my $i=0; $i<@valArray; $i++) {
if ($valArray[$i] == $valArray[$i-1]) { $el = 1; } else { $el = 0; }
if (($i <@valArray-1) && ($valArray[$i] == $valArray[$i+1])) { $en = 1; } else { $en = 0; }
if ($en == 0 && $el == 0) {@avgArray = (); push(@avgArray, $i); }
if ($en == 0 && $el == 1) {push(@avgArray, $i); }
if ($en == 1 && $el == 0) {@avgArray = (); push(@avgArray, $i); }
if ($en == 1 && $el == 1) {push(@avgArray, $i); }
my $avg = @avgArray ? sum(@avgArray)/@avgArray : 0;

print “i: $i val: $valArray[$i] el: $el en: $en avg: $avg \tavgArray: (@avgArray)\n”;

if ( $en == 0 ) {
foreach my $j (@avgArray) { push(@rnkArray, $avg); }
print “rnkArray: (@rnkArray)\n”;
}
}

# re-sort %scrHash to insert id and avg rank into new %rnkHash.
$cntr=0;
@sc = reverse sort { $scrHash{$a}{$cat} <=> $scrHash{$b}{$cat} } keys \%scrHash;
foreach my $id (@sc) {
$rnkHash{$id}{$cat}= $rnkArray[$cntr];
print “$cntr, id: $id, val: $scrHash{$id}{$cat}, avg_rank: $rnkHash{$id}{$cat}\n”;
$cntr++;
}
} # end sub

It is my contention that a team composed of the top ranked players in each position would win any league. But in a head-to-head match-up, there may exist combinations of players that could win 6 of 10 scoring categories and thus the match. I’m not a statistician and can’t prove it, but you’re welcome to educate me.

Please visit my fantasy hockey page: Renegade Hockey

Arduino PID Temperature Control

Arduino PID Temperature Control
Running PID Unit

Being a dedicated nerd, I’m always fascinated by Proportional Integral Derivative (PID) process control. So when a local farmer asked me to automate a vegetable canning process, I took it as a challenge to physically realize a PID temperature control device. Of course many fine industrial controllers already exist, say from Omega, but I’d already done that so I strived to do it smaller, cheaper. The Arduino product line provides inexpensive hardware for the home Maker. Also, the Arduino development environment is easy to install and remarkably easy to use. For this reason, I decided to build an Arduino PID Temperature Control unit.

Arduino PID Temperature Control
PID Temperature Sampling

About PID
The PID process control method is well researched and commonly applied in modern industry. PID mathematics can be complicated, but (within limits) PID can compensate for thermal resistance and thermal capacitance in a system for precise temperature control. Ideal PID temperature control assumes both a heating and cooling source (with same linear thermal forcing gain). This can be difficult to achieve practically due to different heat/cool technologies. Under-sampling temperature frequently will also increase errors. There are many ways to go wrong. Even when operating correctly, wild oscillations in temperature are possible. Nevertheless PID control can be adapted to many situations.

Many PID controllers claim to autotune the Kp, Ki, Kd PID coefficients to achieve optimal performance. But I suspect these methods are often home-grown and apply only to a narrow set of conditions. PID mathematics is very complicated. Fortunately there are practical manual methods and online simulations to aid in setting these important PID parameters.

For this controller, the PID proportional output uses Pulse Width Modulation(PWM) on 2 pins. Often PWM is used to create a (time-averaged) variable DC voltage, but here the digital signal duty-cycle modulates a SSR power cycle. One pin outputs the plus(+) and a second pin the minus(-) PID output. The PWM resolution for this microcontroller is 8 bits (0-255 values) for each pin.

Simplified Circuit Diagram
Simplified Circuit Diagram

Hardware
This PID controller device is composed of 3 sub-units (Arduino Uno R3, MAX6675 thermocouple temperature sensor, 1602 LCD keypad display).  An Arduino Nano and breadboard was used for testing purposes. Online cost for these units is about $11 + $5 + $4 = $20. Analog pins A1-A5 were converted to digital function for temperature sensor  power and communication. Fortunately this leaves digital pins 3, 11 (both on 16 bit TIMER2) to use for PWM modulation outputs.

Program Features
Using the LCD Keypad Display it would be possible to control every function but this is beyond the project’s scope. So the best way to experiment is through the Arduino IDE environment.

  • PID output range has units of degrees per sample period . It’s very important to set min/max PID limits for actual physical temperature forcing capability. For example, in a heat-only system the negative PID output is theoretically zero, or perhaps slightly negative (-2 degrees/minute maybe) for an ambient cooling system.
  • An over/under temperature Alarm is programmed to terminate execution immediately if min/max temperature limits are exceeded.
  • If either +/- PID output limit is reached an error is displayed on the LCD screen. While not fatal, this should be a clue that K coefficients may need adjusting.
  • The Arduino Serial Monitor tool displays PID data in progress
  • Commenting one line of code switches between PID simulation and live operation

SSR and PWM Problem
A Solid State Relay (SSR) is capable of switching on/off a 120V heavy load. But, like electromechanical relays, the switching speed is slow.  Because common, low-cost SSRs use a TRIAC Thyrister element, the power is not actually switched until the voltage phase of AC power crosses zero. The zero-crossing effect is shown in graphic below.

zero-crossing timing diagram
zero-crossing timing diagram

At a frequency of 60Hz the maximum delay is 1/2 cycle or 8.33ms. Considering that our PWM duty-cycle ranges from 0-255, a single bit of resolution at 8.33ms would imply a full range PWM switching period of 256 * 8.33ms = 2.133 sec. By selecting a PWM switching frequency, the control resolution is determined because the SSR only responds to higher order bits. See table below.

60Hz PWM Table
60Hz PWM Table

Because the ATmega328P master clock operates at 16MHz and the maximum TIMER2 divisor is 1024, the minimum PWM sampling frequency is ~30Hz yielding 4 levels of resolution control.  However if we further divide the ATmega328P master clock frequency by 8, then  PWM frequency is 3.75Hz with 5 bit (32 level resolution) is achieved.

Considering many practical applications, a sampling rate of ~4Hz with 5 bit resolution was an acceptable trade-off.  Implementing both TIMER2 pre-scalar 1024 and master clock divisor 8 is required to reach a PWM frequency of 16MHz/(1024*256*8*2) = 3.83 Hz. The master clock was divided in software with the clock_prescale_set(clock_div_8); instruction which of course has a similar affect on the delay(); instruction and delay values must be divided proportionally. The Arduino standard PID_v1 library was also edited for the same reason and a modified PID_v1R library is included in zip archive link below.

Testing
To debug and validate this design an Arduino Nano and breadboard was used with 2 LEDs, SSR and 60W incandescent light bulb. It was easy to see the LEDs flash at a slow rate (PWM cycle time) and the LED brightness change (PWM duty-cycle). The definitive test was using an SSR to switch a 120V resistive load (60W light bulb) and to my joy, flashing and variable brightness was observed. This was the extent of testing.

Arduino Nano Test Platform
Arduino Nano Test Platform

Discussion
This was a satisfying project because I learned a lot about PID and made a practical PID temperature control unit for very cheap ($20). Almost all the Arduino I/O pins are used and SSR-PWM problems were overcome to a reasonable degree. Adding a menu structure to edit run-time variables from the existing keypad would approach the functionality of a real commercial unit. Adding timer functions could create temperature profiles and recipes(EEPROM). Some practical applications might include a Rice Cooker, Slow Cooker, alcohol fermentation. Some day I’m going to turn the PID problem around and make a device that senses meat temperature and indicates the remaining cooking time. Mom’s Thanksgiving turkey will be cooked perfectly every time.

References
Arduino_PID_Temp_Control_Program.zip
Wikipedia PID
SSR Types

MongoDB C100DBA Exam Study Guide

mongodb_logo_300x100Today I completed my MongoDB C100DBA database administration certificate exam. This proctored exam is taken remotely online through Examity. The exam session duration was 90 minutes which consisted of 60 multiple choice and all-that-apply questions. The way I study is to condense a summary of my knowledge into a single document. Just the act of writing coherent paragraphs about a subject aids in my memory and understanding. For this purpose I prepared MongoDB_C100DBA_Exam_Study_Guide

This study guide expands on topics referenced in MongoDB’s own C100DBA study guide, and the MongoDB online documentation. My exam was related to database v3.2.  I passed my exam and my certificate license is #295-0761-259.

In my opinion, the exam is rather difficult, requiring deeper knowledge and practical experience than what is nominally presented in M102, M202 online courses. Of course the good student will expand on these introductory courses and establish a working platform to explore the syntax, properties and exceptions of CRUD, indexes, replication, sharding, administration etc. of various possible test cases.

This guide is especially recommended for M102 students to find condensed detail notes and documentation links on this subject.

In PDF format, 35 pages. A_MongoDB_DBA_Exam_Study_Guide