Sigurbjörn Lárusson's blog

Musings of a network engineer

Drafting for need

During the (seemingly endless) amount of time passing between games in the NBA Playoffs this year I've had too much time to think about the draft.

At one point I started wondering why teams approach drafting any different from a normal hiring process of any company for a long term job. Seemingly there are many similarities:
  • You'd want to hire smart and confident people
  • You'd want to hire people whose personality complements the team that he/she is being hired into
  • You'd want to hire people whose skill set complements the team that he/she is being hired into (and pay more) or who's shown an ability to learn skills so that he can be trained to pickup the skills required (and pay less)
Seemingly there are also principles that are applied to drafting that don't seem to fit a normal hiring procedure, such as
  • Hiring someone you don't need because he was the best candidate that applied
  • Hiring someone who doesn't fit into your environment because he was the best candidate that applied

So why is that, why would you depart from seemingly sane methods of hiring people? One of the reasons is because a draft pick is considered an 'asset', so you want to get the maximum return on your money.

The problem with the asset thinking is that people aren't things, you need to develop them and nurture them, and to do that, you have to have a fit for them and you have to let them play, preferably even win games or see how they could at some later stage win games, otherwise your 'asset' will probably lose value. Some teams have a better record with the draft than others, and it often seems that the teams who are fairly constant basement dwellers (Knicks and Timberwolves spring to mind), have the least success.

While I was pondering this (in multiple Tweets instead of just writing this article) @Nelush asked me the following question:

"would be interested to see how a list of teams ranked by draft performance measured up to a list ranked by player retention too".

This is not an easy question to answer, how do we rank teams by draft performance? For that I'm cheating and using work done by Roland Beech on grading every team on their draft picks made between 1989 (first draft that was in the current two round format that we all know) and up to 2008. For detailed analysis of his methodology and results you can read his article, but he used data from Basketball Reference to form a rating for each player drafted which is put together as:

Rating = points/game + rebounds/game + assists/game

He then divides players into 6 categories

  • Stars (Rating of 20+)
  • Solid players (Rating of 15-19.9)
  • Role players (Rating of 10-14.9)
  • Deep Bench (Rating of 5-9.9)
  • Complete Bust (Rating < 5)
  • DNP (never played a single minute)
You can then, using the same ratings, compare each pick to the average rating for that pick, to figure out if the team did well (their pick was better than the average pick), on par (their pick was pretty much at the average) or poorly (their pick was worse than the average pick).

Then there's the question of retention, that data isn't easy to come by nor is it simple to calculate. For the sake of simplicity however, I made a script that dumps the Basketball Reference roster data for each team for each year in question and 3 simple scripts that parse the information to enable me to count for how many years each player was on the roster.

Then I calculate three values for the retention

  • The average number of years a player is retained (which is simply the total number of years the players were retained, divided by the number of players who were on the roster during these years)
  • The median (middle) number of retention years, technically this isn't a median since I remove all duplicate values (i.e. if 30 players have been retained for 1 year, the number 1 only appears once in the median list), but it's pretty close to being the median number of years the team retains players
  • The maximum years the team has kept a player.

This isn't perfect for a number of reasons

  • I have no drafting data after the 2008-2009 season, since that's all that is included in the 82games article. Working out the drafting success data is a lot of work but it could be done
  • Not all teams existed for all seasons, Toronto Raptors (1995 is first season), Vancouver Grizzlies (1995 is first season), New Orleans Pelicans (2002 is first season), for those teams they'll suffer from a smaller sample size, making their data less comparable then the rest
  • Some teams have been renamed, I've used the newer names in my data table, the older names are referenced in the 82games article obviously
  • I combined the values for the Charlotte Hornets and Charlotte Bobcats since those records now belong to the Charlotte Hornets
  • The method for calculating the retention (player might only have played one game for the team in any given season for example) and the draft success is not perfect, this is more for fun than intended to be an exact science

The following table uses the RTG (Rating above average for draft picks by the team, referenced from Roland Beech's 82games.com article on Best/Worst drafting teams and the retention value I calculated from Basketball Reference as explained above)

Draft success vs retention
Team RTG Average retention years Median retention years Max retention years
Milwaukee Bucks 1.8 1.95 4 8
Phoenix Suns 1.5 2.02 4 9
LA Lakers 1.5 2.23 4 9
San Antonio Spurs 1.2 2.03 3 8
Cleveland Cavaliers 1.1 2.11 4 10
Golden State Warriors 1.0 1.89 4 9
Boston Celtics 0.8 2.00 4 10
Sacramento Kings 0.7 2.02 4 8
Memphis Grizzlies / Vancouver Grizzlies 0.6 1.80 4 7
Utah Jazz 0.4 2.44 5 15
Miami Heat 0.3 1.93 4 11
Washington Wizards / Bullets 0.2 2.09 4 8
Charlotte Hornets/Bobcats -0.7 1.79 2 4
Oklahoma City Thunder / Seattle Supersonics -0.1 2.25 5 13
Detroit Pistons -0.1 2.10 2 6
Philadelphia 76ers -0.1 1.80 4 11
Indiana Pacers -0.2 2.62 3 9
Chicago Bulls -0.3 2.18 5 11
Orlando Magic -0.3 2.11 5 10
Dallas Mavericks -0.3 2.04 5 10
Houston Rockets -0.4 2.09 5 13
New Orleans Pelicans/Hornets -0.6 1.49 2 3
Brooklyn / New Jersey Nets -0.6 1.94 4 7
Portland Trailblazers -0.7 2.20 4 8
Minnesota Timberwolves -0.8 2.06 5 12
Toronto Raptors -1.1 1.77 4 7
Denver Nuggets -1.2 1.89 4 7
LA Clippers -1.4 1.94 4 8
Atlanta Hawks -1.5 1.91 4 9
New York Knicks -1.5 2.45 6 12

So what can we gather from this data? Not a whole lot, there doesn't seem to be any direct tie between player retention rates and their draft success using this (admittedly very crude) way of calculating this. Maybe I'll revisit this at a later stage to see if I can gather some correlation, in the meantime, this was at least a fun way to spend an evening doing something basketball related.

If you are interested in how I gathered this data, these are the scripts used to gather the data from Basketball Reference and to calculate these values

Retention-Calculator.sh

This is a shell script which uses w3m (text based web browser) to fetch the data from Basketball Reference, it then uses two perl scripts to work with the data delivering a textual output in the end for each team


#!/bin/bash

YEARS="1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009"
TEAMS="TOR BOS BRK PHI NYK POR OKC UTA DEN MIN CLE CHI MIL IND DET GSW LAC PHO SAC LAL ATL WAS MIA CHO ORL HOU MEM SAS DAL NOP"

# For each team
for team in $TEAMS; do
        echo -n "Processing $team: "
        # Check if we already grabbed the stats for this team, if so we won't grab them again
        if [ ! -f $team.txt ]; then
                # Create an empty text file called $team.txt
                echo >$team.txt
                # And for every year that we're parsing
                for year in $YEARS; do
                        # Dump the team information page from BB reference, take out the Roster information and use a small perl script to grab just the player names from that roster
                        # then add that information to the $team.txt file that we created earlier
                        w3m -cols 256 -dump http://www.basketball-reference.com/teams/$team/$year.html#roster | grep -A 20 ^Roster$ | parse-roster.pl >>$team.txt
                        # Wait for 3 seconds so as to not bombard the BB ref server
                        sleep 3
                        # Print a status message for each year we process
                        echo -n " $year "
                done
        fi
        # The $team.txt file now includes all players that were on the roster for that team from 1989 to 2008, with one line for each year they were there
        # We'll pipe that into another perl script which will do 3 things
        # 1.)  Count the total number of unique players
        # 2.)  Count for how many years each unique player was with the team
        # 3.)  Print out a retention data value for the team
        # The retention value is a simple average, we already have a count of how many years each player spent with the team, we simply add them all together and then divide it by the total number of players that have played on these rosters, that way we get an average number of seasons that a player was retained
        RETVALUE=`cat $team.txt | retention-counter.pl`
        # Finish the status line
        echo
        # Print out the retention value for each team
        echo "For team: $team the average number a player was retained is $RETVALUE"
done

parse-roster.pl

This a script that takes the dumped roster data (in textual format) and gathers just the player names from it and dumps it out to the screen


#!/usr/bin/perl

# Player number
my $playernumber = 0;
# Player name
my $player;

# While we have data coming in (the HTML dump of the webpage)
while() {
        # Seperate the player part from the rest of the line, the player part is the part before the PG/SG/SF/PF/C distinction
        $_ =~ /(^\s?[0-9]{1,2}\s+)(.*)(\s[P|S]{0,1}[C,G,F]{1}\s)(.*)/igs;
        $playernumber = $1;
        $player = $2;
        # If the player number is indeed a number (which means that this line includes a player, and isn't a header or footer of some sort)
        if($playernumber =~ /.*[0-9]{1,2}.*/) {
                print "$player\n";
        }
}

retention-counter.pl

This a perl script that goes through the player file (which is just a file containing player names, one line for each year that player has played for the team) and prints out the average number of years players have been retained by the team, the median (middle) number of years and the maximum number of years


#!/usr/bin/perl

# A hash containing the players for each team
my %players;
# A hash containing the number of years a player has played (used for calculating the median)
my %playeryears;
# The total years the players have played
my $totalyears = 0;
# The total number of players is simply the number of keys (which are the players) in the hash
my $totalplayers = 0;
# An array to hold the years retained by each player
my @years;
# An array to hold the number of times we've looped
my $loops = 0;
# A value to hold the median number of years players have been retained
my $median = 0;
# A variable to hold the max years a player has been retained
my $max = 0;

# While we have textual input
while() {
	# Remove the newline from the end of the line
	chomp;
	# Remove any additional whitespace from the end of the line
	$_ =~ s/\s+$//;
	# Take each line, put it into a hash, with a key of the player name
	# Check if it exists first
	if(exists($players{$_})) {
		# If this key already exists, we've already counted the player once, so we'll increment the value (the count of how many years the player has played for the team) by 1
		$players{$_}++;
	} else {
		# Otherwise we'll create a new key with this players name, and put his count as 1 since this is the first time we've seen this player in the data
		$players{$_} = 1;
	}
}

# Set the total years to the number of keys in the hash (the number of different players)
$totalplayers = keys %players;

# Then we loop for each player in the hash
foreach my $player (keys %players) {
	$hashvalue = $players{$player};
	# Add to the total years
	$totalyears += $hashvalue;
	# Add the years for this player to the years array as well
	$playeryears{$hashvalue} = 1;
}

# Make an array called years, from the keys of the playeryears hash...
foreach my $yearkey (keys %playeryears) {
	$years[$loop++] = $yearkey;
}

# Now we know the total years the players on the roster have been on the roster and we know the number of players that were on the roster
# The average is simply the total years divided by the number of players
my $retention_average = $totalyears/$totalplayers;

# We can also find the median value, we have the years for each and every player in the @years array which we'll now sort numerically so that the lowest number
# of retained years is at the bottom of the array while the highest is at the top
my @sortedyears = sort {$a <=> $b} @years;
# Let's see how many seats are in the years array (we'll add one since the counter starts at 0)
my $arrayyears = $#sortedyears + 1;
# If it's an odd number, the median is simply the middle seat in the array
# If it's an even number, then there are two middle numbers so we need to add them together and divide them by two to find one median number
# Let's do the modulus of the array seats by 2 to find the remainder, if there is a remainder then the value is odd
if($arrayyears % 2 > 0) {
	# And let's place the median value as the value in the middle of the array
	$median = $sortedyears[int($arrayyears/2)];
} else {
	# Otherwise there's an even number of seats
	# The median is then the number in the seat right before the middle of the array, plus the number in the middle, divided by 2
	$median = ($sortedyears[((int($arrayyears)/2)-1)] + $sortedyears[int($arrayyears/2)])/2;
}

# Then we'll take the maximum, which is the highest number of years a player has been retained, that is simply the last member of the sorted array
$max = $sortedyears[$#sortedyears];

# And we print these three values
printf("%.2f %d %d\n",$retention_average,$median,$max);