Musings of a network engineer
During the (seemingly endless) amount of time passing between games in the NBA Playoffs this year I've had too much time to think about the draft.
At one point I started wondering why teams approach drafting any different from a normal hiring process of any company for a long term job. Seemingly there are many similarities:So why is that, why would you depart from seemingly sane methods of hiring people? One of the reasons is because a draft pick is considered an 'asset', so you want to get the maximum return on your money.
The problem with the asset thinking is that people aren't things, you need to develop them and nurture them, and to do that, you have to have a fit for them and you have to let them play, preferably even win games or see how they could at some later stage win games, otherwise your 'asset' will probably lose value. Some teams have a better record with the draft than others, and it often seems that the teams who are fairly constant basement dwellers (Knicks and Timberwolves spring to mind), have the least success.
While I was pondering this (in multiple Tweets instead of just writing this article) @Nelush asked me the following question:"would be interested to see how a list of teams ranked by draft performance measured up to a list ranked by player retention too".
This is not an easy question to answer, how do we rank teams by draft performance? For that I'm cheating and using work done by Roland Beech on grading every team on their draft picks made between 1989 (first draft that was in the current two round format that we all know) and up to 2008. For detailed analysis of his methodology and results you can read his article, but he used data from Basketball Reference to form a rating for each player drafted which is put together as:
Rating = points/game + rebounds/game + assists/game
He then divides players into 6 categories
Then there's the question of retention, that data isn't easy to come by nor is it simple to calculate. For the sake of simplicity however, I made a script that dumps the Basketball Reference roster data for each team for each year in question and 3 simple scripts that parse the information to enable me to count for how many years each player was on the roster.
Then I calculate three values for the retention
This isn't perfect for a number of reasons
The following table uses the RTG (Rating above average for draft picks by the team, referenced from Roland Beech's 82games.com article on Best/Worst drafting teams and the retention value I calculated from Basketball Reference as explained above)
Draft success vs retention | |||||||||
---|---|---|---|---|---|---|---|---|---|
Team | RTG | Average retention years | Median retention years | Max retention years | |||||
Milwaukee Bucks | 1.8 | 1.95 | 4 | 8 | |||||
Phoenix Suns | 1.5 | 2.02 | 4 | 9 | |||||
LA Lakers | 1.5 | 2.23 | 4 | 9 | |||||
San Antonio Spurs | 1.2 | 2.03 | 3 | 8 | |||||
Cleveland Cavaliers | 1.1 | 2.11 | 4 | 10 | |||||
Golden State Warriors | 1.0 | 1.89 | 4 | 9 | |||||
Boston Celtics | 0.8 | 2.00 | 4 | 10 | |||||
Sacramento Kings | 0.7 | 2.02 | 4 | 8 | |||||
Memphis Grizzlies / Vancouver Grizzlies | 0.6 | 1.80 | 4 | 7 | |||||
Utah Jazz | 0.4 | 2.44 | 5 | 15 | |||||
Miami Heat | 0.3 | 1.93 | 4 | 11 | |||||
Washington Wizards / Bullets | 0.2 | 2.09 | 4 | 8 | |||||
Charlotte Hornets/Bobcats | -0.7 | 1.79 | 2 | 4 | |||||
Oklahoma City Thunder / Seattle Supersonics | -0.1 | 2.25 | 5 | 13 | |||||
Detroit Pistons | -0.1 | 2.10 | 2 | 6 | |||||
Philadelphia 76ers | -0.1 | 1.80 | 4 | 11 | |||||
Indiana Pacers | -0.2 | 2.62 | 3 | 9 | |||||
Chicago Bulls | -0.3 | 2.18 | 5 | 11 | |||||
Orlando Magic | -0.3 | 2.11 | 5 | 10 | |||||
Dallas Mavericks | -0.3 | 2.04 | 5 | 10 | |||||
Houston Rockets | -0.4 | 2.09 | 5 | 13 | |||||
New Orleans Pelicans/Hornets | -0.6 | 1.49 | 2 | 3 | |||||
Brooklyn / New Jersey Nets | -0.6 | 1.94 | 4 | 7 | |||||
Portland Trailblazers | -0.7 | 2.20 | 4 | 8 | |||||
Minnesota Timberwolves | -0.8 | 2.06 | 5 | 12 | |||||
Toronto Raptors | -1.1 | 1.77 | 4 | 7 | |||||
Denver Nuggets | -1.2 | 1.89 | 4 | 7 | |||||
LA Clippers | -1.4 | 1.94 | 4 | 8 | |||||
Atlanta Hawks | -1.5 | 1.91 | 4 | 9 | |||||
New York Knicks | -1.5 | 2.45 | 6 | 12 |
So what can we gather from this data? Not a whole lot, there doesn't seem to be any direct tie between player retention rates and their draft success using this (admittedly very crude) way of calculating this. Maybe I'll revisit this at a later stage to see if I can gather some correlation, in the meantime, this was at least a fun way to spend an evening doing something basketball related.
If you are interested in how I gathered this data, these are the scripts used to gather the data from Basketball Reference and to calculate these values
Retention-Calculator.sh
This is a shell script which uses w3m (text based web browser) to fetch the data from Basketball Reference, it then uses two perl scripts to work with the data delivering a textual output in the end for each team
#!/bin/bash
YEARS="1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009"
TEAMS="TOR BOS BRK PHI NYK POR OKC UTA DEN MIN CLE CHI MIL IND DET GSW LAC PHO SAC LAL ATL WAS MIA CHO ORL HOU MEM SAS DAL NOP"
# For each team
for team in $TEAMS; do
echo -n "Processing $team: "
# Check if we already grabbed the stats for this team, if so we won't grab them again
if [ ! -f $team.txt ]; then
# Create an empty text file called $team.txt
echo >$team.txt
# And for every year that we're parsing
for year in $YEARS; do
# Dump the team information page from BB reference, take out the Roster information and use a small perl script to grab just the player names from that roster
# then add that information to the $team.txt file that we created earlier
w3m -cols 256 -dump http://www.basketball-reference.com/teams/$team/$year.html#roster | grep -A 20 ^Roster$ | parse-roster.pl >>$team.txt
# Wait for 3 seconds so as to not bombard the BB ref server
sleep 3
# Print a status message for each year we process
echo -n " $year "
done
fi
# The $team.txt file now includes all players that were on the roster for that team from 1989 to 2008, with one line for each year they were there
# We'll pipe that into another perl script which will do 3 things
# 1.) Count the total number of unique players
# 2.) Count for how many years each unique player was with the team
# 3.) Print out a retention data value for the team
# The retention value is a simple average, we already have a count of how many years each player spent with the team, we simply add them all together and then divide it by the total number of players that have played on these rosters, that way we get an average number of seasons that a player was retained
RETVALUE=`cat $team.txt | retention-counter.pl`
# Finish the status line
echo
# Print out the retention value for each team
echo "For team: $team the average number a player was retained is $RETVALUE"
done
parse-roster.pl
This a script that takes the dumped roster data (in textual format) and gathers just the player names from it and dumps it out to the screen
#!/usr/bin/perl
# Player number
my $playernumber = 0;
# Player name
my $player;
# While we have data coming in (the HTML dump of the webpage)
while() {
# Seperate the player part from the rest of the line, the player part is the part before the PG/SG/SF/PF/C distinction
$_ =~ /(^\s?[0-9]{1,2}\s+)(.*)(\s[P|S]{0,1}[C,G,F]{1}\s)(.*)/igs;
$playernumber = $1;
$player = $2;
# If the player number is indeed a number (which means that this line includes a player, and isn't a header or footer of some sort)
if($playernumber =~ /.*[0-9]{1,2}.*/) {
print "$player\n";
}
}
retention-counter.pl
This a perl script that goes through the player file (which is just a file containing player names, one line for each year that player has played for the team) and prints out the average number of years players have been retained by the team, the median (middle) number of years and the maximum number of years
#!/usr/bin/perl
# A hash containing the players for each team
my %players;
# A hash containing the number of years a player has played (used for calculating the median)
my %playeryears;
# The total years the players have played
my $totalyears = 0;
# The total number of players is simply the number of keys (which are the players) in the hash
my $totalplayers = 0;
# An array to hold the years retained by each player
my @years;
# An array to hold the number of times we've looped
my $loops = 0;
# A value to hold the median number of years players have been retained
my $median = 0;
# A variable to hold the max years a player has been retained
my $max = 0;
# While we have textual input
while() {
# Remove the newline from the end of the line
chomp;
# Remove any additional whitespace from the end of the line
$_ =~ s/\s+$//;
# Take each line, put it into a hash, with a key of the player name
# Check if it exists first
if(exists($players{$_})) {
# If this key already exists, we've already counted the player once, so we'll increment the value (the count of how many years the player has played for the team) by 1
$players{$_}++;
} else {
# Otherwise we'll create a new key with this players name, and put his count as 1 since this is the first time we've seen this player in the data
$players{$_} = 1;
}
}
# Set the total years to the number of keys in the hash (the number of different players)
$totalplayers = keys %players;
# Then we loop for each player in the hash
foreach my $player (keys %players) {
$hashvalue = $players{$player};
# Add to the total years
$totalyears += $hashvalue;
# Add the years for this player to the years array as well
$playeryears{$hashvalue} = 1;
}
# Make an array called years, from the keys of the playeryears hash...
foreach my $yearkey (keys %playeryears) {
$years[$loop++] = $yearkey;
}
# Now we know the total years the players on the roster have been on the roster and we know the number of players that were on the roster
# The average is simply the total years divided by the number of players
my $retention_average = $totalyears/$totalplayers;
# We can also find the median value, we have the years for each and every player in the @years array which we'll now sort numerically so that the lowest number
# of retained years is at the bottom of the array while the highest is at the top
my @sortedyears = sort {$a <=> $b} @years;
# Let's see how many seats are in the years array (we'll add one since the counter starts at 0)
my $arrayyears = $#sortedyears + 1;
# If it's an odd number, the median is simply the middle seat in the array
# If it's an even number, then there are two middle numbers so we need to add them together and divide them by two to find one median number
# Let's do the modulus of the array seats by 2 to find the remainder, if there is a remainder then the value is odd
if($arrayyears % 2 > 0) {
# And let's place the median value as the value in the middle of the array
$median = $sortedyears[int($arrayyears/2)];
} else {
# Otherwise there's an even number of seats
# The median is then the number in the seat right before the middle of the array, plus the number in the middle, divided by 2
$median = ($sortedyears[((int($arrayyears)/2)-1)] + $sortedyears[int($arrayyears/2)])/2;
}
# Then we'll take the maximum, which is the highest number of years a player has been retained, that is simply the last member of the sorted array
$max = $sortedyears[$#sortedyears];
# And we print these three values
printf("%.2f %d %d\n",$retention_average,$median,$max);