LE-CAT: Lexicon-based Categorization and Analysis Tool

Software
R package and shiny app for lexicon based categorisation
Author

James Tripp, Noortje Marres

Published

March 20, 2020

Code on GitHub

Project Status: Active – The project has reached a stable, usable state and is being actively developed. codecov R-CMD-check License: MIT DOI

I developed LE-CAT for Professor Noortje Marres. The software is written in R and searches for words in a text. Each word is associated with a category. The presence of the word (e.g., Tony Blair) is assumed to indicate the presence of the category in the text (e.g., politics). Summary information about category co-occurence is displayed displayed to the user as is a network of category and word co-occurence.

An interesting feature of the software is that it is an R package containing a shiny app. I could teach with the tool to users who (a) were comfortable with R, (b) preferred a GUI interface running within R (the shiny app), and (c) much preferred only a web browser interface. At CIM I ran LE-CAT within docker containers via the shinyproxy application so that each user could log into one of our CIM servers and use LE-CAT without R installed.

The software was used for both research and in teaching environments.

Project Description

LE-CAT is a Lexicon-based Categorization and Analysis Tool developed by the Centre for Interdisciplinary Methodologies in collaboration with the Media of Cooperation Group at the University of Siegen.

The tool allows you to apply a set of word queries associated with a category (a lexicon) to a data set of textual sources (the corpus). LE-CAT determines the frequency of occurrence for each query and category in the corpus, as well as the relations between categories (co-occurrence) by source.

The purpose of this technique is to automate and scale up user-led data analysis as it allows the application of a custom-built Lexicon to large data sets. The quick iteration of analysis allows the user to refine a corpus and deeply analyse a given phenomenon.

LE-CAT was coded by James Tripp. It has been used to support the workshops Youtube as Test Society (University of Siegen), Parking on Twitter (University of Warwick) and the Digital Test of the News (University of Warwick) and is part of the CIM module Digital Object, Digital Methods.

Academic correspondence should be sent to Noortje Marres.