Fuzzy Cluster Analysis of University Rankings

 

What is fuzzy cluster analysis?

One of the criticisms of university league tables is that scores are added up across a variety of criteria by assigning relative weights to each criterion. This is also true of the global university rankings compiled by Shanghai Jiao Tong University in China and the Times Higher Education Supplement in the UK.

 

Fuzzy clustering provides an alternative approach, by grouping universities into clusters that are statistically similar across all criteria, without making any assumptions about the relative importance of each criterion. Most people are familiar with the concept of (non fuzzy) clustering, where objects are sorted into groups that have similar characteristics. Fuzzy clustering is a statistical technique that achieves something similar to this, but also tells us how similar every object is to the average of each group. Thus some objects will be very similar to a group average - they are 'quintessential members' of that group, and have characteristics that are strongly correlated with the average for that group. And there will be other objects that are only weakly correlated with the average of any particular group, and share certain characteristics with other groups too.

 

This is a very powerful concept, because it allows us to describe with mathematical rigor the statement that "these obects are kind of simlar ... and these objects are kind of different". It is easy to see in a fuzzy cluster analysis whether an object is typical of its peers, or whether it is an 'outlier'. Thus the arbitrary decision of "Top 10" etc. and the question "what's the relative change going from say 10th to 11th and 11th to 20th?" do not arise if we use fuzzy clustering to compare objects across a range of different characteristics. "Objects" in this case, of course, are universities.

 

Fuzzy cluster analysis of Shanghai Jao Tong University Ranking 

A 'fuzzy k-means' analysis of the SJTU league table identifies 7 statistically different groups, illustrated in these charts. The charts show the membership of each group (called arbitrarily 7a through 7g) and also for each university statistically how similar it is to members of other groups. The SJTU assigned ranks are given in brackets. There is scope for considerable analysis of these results, however the main 'headlines' are

(1) that the 'top 12' universities in the SJTU ranks are strongly correlated in one group 7c; and

(2) that the rest of the SJTU positions seem not to provide much useful information about the comparability of universities, because universities that are widely dispersed in the SJTU rank can be seen to be 'similar' in terms of their fuzzy group membership.

 

Fuzzy cluster analysis of the Time Higher League Table

The fuzzy cluster analysis was repeated using the criteria from the 2005 Times Higher (THES) international ranking of universities. The results suggest there are 6 statistically distinct groups in the THES data, and as usual for fuzzy clusters some universities fall squarely in a particular group, and some universities share common traits with several groups. This is summarised in these charts. Like the analysis of the SJTU ranking, these results suggest that the THES league table correctly groups the top few universities (in this case the top 6 in group 6b). It is interesting to note that the fuzzy cluster anaysis suggests that Cornell should also be in this group (THES ranks Cornell at 14), although Cornell also has a lot in common with group 6e, for which the universities of Edinburgh, Chicago and McGill are archetypal members. Of the 'top group' (6b), Oxford and Cambridge also have significant overlap with group 6e, whereas all the other (US) universities in 6b have very little overlap with any other groups.

 

Furthermore, as we found in the fuzzy cluster analysis of the SJTU data, outside the top group, universities that are near neighbors in the THES league table are not necessarily in the same fuzzy cluster groups, once again suggesting that the THES table does not yield much meaningful information about the comparability of universities in the majority of the table. Aside from the question of whether the criteria themselves are really meaningful, this clearly challenges the approach of weighting and adding criteria to produce a single score.

 

Cluster analysis on UK maths departments

Peter I found your interesting site googling for university league table cluster analysis! Interesting work. I did a bit of a study comparing UK maths departments last year. The reason for me googling was I was thinking of cluster analysis  next time I do the exercise. January I expect after the Dec RAE results. What we expect is that COWI (Cambridge Oxford Warwick Imperial) will form a cluster but I wonder if there are any other sensible clusterings? Do you have any advice on doing this. My first thought was to use the cluster analysis tools in mathematica.

Follow the data ...

Bill

If its any help, I made no attempt to predetermine the clusters in my analysis.  I ran the clustering algorithm across a range of parameters - encompassingpotential values for fuzziness and number of distinct clusters.  Being "fuzzy", of course, there is no single answer. 

One aspect I find vey interesting in the results is where the data shows clustering of institutions that are widely separated in the rankings.

I would be happy to discuss this further.

Peter