Skip to content

Research grants yield Twitter followers?

Mon 1st December 2014

I went to a nice little event on Monday evening, organised by the data.ac.uk guys at the University of Southampton. Effectively the aim was to coerce some people with ideas and people with having skills to build some little tools, applications and visualisations using data gathered about institutions under the .ac.uk second-level domain (largely UK universities, entities related to research or other academic pursuits).

I personally had a really rewarding time for a number of reasons. A handful of our first year undergraduates (from a class I teach called “Space Cadets“) attended, contributed and seemed to enjoy themselves. I also learned about the research topic of one of our new PhD students (Johanna) and was impressed at how pertinent and insightful her topic is, and at how she was using this event as a way to gather preliminary data. I also got to catch up with some friends I haven’t seen for a while, like Marja and Colin. And all this while almost hacking something together!

Full disclosure, we didn’t quite finish what we aimed to do within the time, but I managed to pull it together in the pub afterwards 🙂
Gateway to Research homepage
So what did we do? Well we started off looking at the data on Gateway to Research, as we were going to see if we could link it to news stories on university RSS feeds (do universities publish many stories about their research?). Organically, this evolved into looking at their Twitter feed instead (as the data.ac.uk Observatory already scrapes the Twitter account from homepages). As a simple goal, we wondered if there’s any observable link between number of Twitter followers and number of research grants granted.

By the end of session we’d just about extracted all the relevant data (name of university, Twitter account, followers and number of research grants – 4 bits of data from 4 independent data sources) and displayed it as a list. We were somewhat hampered by my poor decision to attempt this in Javascript, as the Same Origin policy made it impossible to AJAX data from live APIs (why make your data available in JSON then not allow me to access it in Javascript, I say)*. However, a quick rewrite into PHP got us back on track.

As I said, we weren’t quite done, as I wanted to visualise this data somehow, as well as fix a few bugs. In the pub, I tried to make use of the (unfortunately deprecated) Google Image Chart API, but it was capping at some weird values. To resolve this, I outputted the data as CSV and imported into Google Sheets and generated the graph manually (hack events require cutting some corners and thinking on your feet!) This is what we got:

grants vs followers

This is the number of research grants a university has had funded against number of Twitter followers on the first Twitter account on their homepage. It’s on a log scale.

The grants vs followers data in a Google spreadsheet, in case you want to look.

What does it tell us? Well it says that the more successful research universities also have more people listening to them on social media. Is this what we would have guessed anyway? It’s easy to say yes in hindsight, but it’s nice to have some numbers to support it. Of course, I’ve not yet run the correlation to see if this is a significant relationship; that’ll come with a bit more time.

Perhaps more importantly, it has helped us identify some quirks in the data and the nuances of how to handle it. For example, the Observatory will record all Twitter handles referenced on the homepage. If there’s a widget displaying a Twitter feed on the homepage, it will include all accounts @replied and retweeted. It also stores the date of an observation as the name of a property in an object, which are hard to sort, so it’s difficult to get latest observation (clearly this requires a smidgen of preprocessing). We spotted these by delving into a couple of the outliers, and interestingly by cleaning up the data, it moved them closer to the centre of the cluster of points.

To conclude, the event was a great success. I think the 2-hour hack might be the perfect format for exploratory data hacks. It’s demoralising to spend a day or three hacking and have nothing to show for it; spending an hour or three and having a result (even a small one) is massively rewarding. I hope to tidy up this code, check the details of the data (especially what grants GtR includes) and do some stats on it. We’ve observed there’s some link (though no inference about the cause of that link) between research funding and social media popularity of universities. I became a bit more confident in having with data within a time constraint, and had fun doing it!

Resources


* I realise now that what I needed was JSONP. Unfortunately, GtR doesn’t support that anyway. I could have used a JSON proxy (e.g. JSONProxy or written my own in PHP) but I didn’t think of that until the day after the hack! At learning has happened 🙂
Advertisements
Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: