Calculating Normalized Probabilities For Recruits

This is the basis for the second set of analyses I've done (the first being the "effective populations" described here and worth reading first). The main difference here is that each population source is partitioned in full to the 120 colleges on the basis of its probability of going there. In other words, a recruit 500 miles from the University of Wyoming may contribute a small "effective population" to the school but there is a very large probability he will go there versus any other school.

The way to calculate this is rather straightforward. Take each recruit/population source and calculate using the continuous function of mileage and any other scaling factors (win percentage, conference, etc.) his willingness of attending each school. Sum all of these "willingness factors" (note, all are less than 1) and then normalize them to one (i.e. recruit will choose one school and one school only) by dividing each probability by the sum.

As an example, let Recruit have a willingness of 0.80 to go to School A, and a willingness of 0.20 to go to Schools B, C, D, E, F, and G. The sum of these numbers is 2, so then divide each "willingness factor" by 2. The recruit therefore has a 40% probability of choosing School A and a 10% chance of choosing each of the other 6 schools. Therefore school A has gained 0.40 recruits from this source (and the others gain 0.10 each). This essentially simulates a recruiting battle. At the end of the study each school has scored a certain number of recruits and the sum total over all schools is simply the number of recruits in the population source file. In this way the results can be compared to which colleges recruits actually ended up choosing and by getting the model to fit real data we can get a sense of which variables are important in attracting recruits and then predict how future samples of recruits will decide.

First, the advantages of this approach. It is, numerically speaking, a more logical approach when it comes to ranking schools. Particularly when looking at lists such as the Rivals 250 this method lets you say, State U should be hauling in about 3 of these players each year but they're not--why? It provides a good assessment of under or overperformance. This method helps for schools with large land areas and low populations as the Wyoming example above illustrates. It also eliminates the need to define competing schools as "negative populations"--the competing influence of a school is already accounted for once the willingness factor is normalized into a probability. Finally, this method is computationally much simpler. Each recruit/population source only has to be scaled to 120 possible destinations as opposed to every pixel in the country (my maps are 1600 x 750--1,200,000 pixels).

That last pro leads in well to the cons of this approach. There isn't really a map generated. It doesn't specifically tell you where to go if you're looking for a place to recruit with limited competition (or an overabundance of recruits). This method is really best for simply ranking the 120 teams.

All that said, both of these approaches--effective populations and normalized probabilities--can be tried on the same data sets with the same scaling parameters, for the most part. One exception to this maxim is actually a pretty important one--how to quantify the effect of a recruit wanting to stay in-state. This is much easier to characterize via the probability route simply because it's easier to know what state 120 colleges are in as opposed to a million pixels. There may be other such variables I come across as well.


Added 12/09/2009  

 Return to home page

Tom Brennan, © 2009