Skip to content

Calculating Readability in R

Fri 18th April 2014

I’ve finally got round to exploring an idea around readability, and was excited to find out the programming language R already has a library that will calculate a number of readability metrics. Should save me some time writing my own one or using an API. Having installed this library: install.packages('koRpus') I was hoping it would be as easy as calling the function and giving it some text: readability("hello my name is Rikki") Of course it wasn’t going to be that easy. Here’s my guide to the minimum you have to do to get a readability score out of R. There’s plenty of other options to explore, and feel free to ask questions in the comments below.

  1. Install the koRpus library: install.packages('koRpus')
  2. Install TreeTagger. There are installation steps on that site. There are too many (i.e. it could be simpler), but go with it.
    1. Choose somewhere sensible to put the directory (I put the files in /usr/bin/TreeTagger/ on my Mac).
    2. Download each of the files it tells you to: tagger package, tagging scripts, install-tagger.sh and a parameter file for the language of the text you will be analysing. I didn’t download the English chunker file yet (I’ll see if it’s necessary later).
    3. Don’t unzip the archives.
    4. chmod u+x install-tagger.sh
    5. ./install-tagger.sh
    6. Add $TAGGER_PATH to your PATH variable as well (in your ~/.profile or ~/.bash_profile) and source ~/.profile export TAGGER_CMD=/usr/bin/TreeTragger/cmd export TAGGER_BIN=/usr/bin/TreeTragger/bin export TAGGER_PATH=$TAGGER_CMD:$TAGGER_BIN
    7. Test echo 'Hello world!' | cmd/tree-tagger-english
  3. Set up your TreeTagger and readability options in R: set.kRp.env(TT.cmd="/usr/bin/TreeTagger/cmd/tree-tagger-english", lang="en")
  4. Write your text to a file: tf = tempfile() write(words, tf)
  5. Run the readability function: rdb
  6. Get a value out: rdb@Flesch.Kincaid$grade

There we go. Way more complicated than it needed to be, but that’s how you do it. Install an application that the R library interfaces with, write your words to a temporary file and then call the function. Any questions, pop them in the comments below!

Advertisements

From → Programming

One Comment
  1. M.J. van Dieijen permalink

    Hi Rikki,

    Thanks for the explanation. I performed all the steps (for Windows I used the explanation on this site: http://www.smo.uhi.ac.uk/~oduibhin/oideasra/interfaces/winttinterface_old.htm), but I’m still not able to calculate it.

    I get the following error in R:

    readability <- readability(text, hyphen = NULL, index = c("Flesch.Kincaid","ARI"))
    Error in matrix(unlist(strsplit(tagged.text, "\t")), ncol = 3, byrow = TRUE, :
    'data' must be of a vector type, was 'NULL'
    In addition: Warning message:
    running command 'C:\Windows\system32\cmd.exe /c C:\Program Files\TreeTagger\bin\tag-english.bat D:\CB speeches txt\2012_04_11_JY.txt' had status 1

    And I have no idea what the problem is. Perhaps I made a mistake with the TreeTagger installation?
    Any input you might have is greatly appreciated, I hope you can help.
    Thanks!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: