Masterclass: Social Media

A masterclass in social media data collection for the QSTEP centre

James Tripp


March 18, 2019

Code on GitHub

DOI RStudio: Binder

I was asked to deliver a Social Media Data Analysis workshop in R by the QSTEP centre at the University of Warwick. I decided to write the Masterclass entirely, where possible, using the tidyverse packages. It was a delightful learning experience and I would suggest anyone face the challenge of putting together such a workshop.

The workshop covered methods for downloading, analysing and visualising social media data using the R programming language. We use the ‘tidyverse’ in R and (optionally) the spacy python module for natural language processing.


The structure of the workshop is as follows

Stage Title Detail R package(s)
Introduction Overview of the day
R intro An introduction to R ggplot2, tidyverse
Collection Scraping Downloading and filtering html pages rvest, tidyverse, magittr, ggplot2, tibble
API and data dumps Accessing data directly using APIs httr, jsonlite, dplyr, textclean, stringr, ggplot2, tidyverse, magittr, tibble, twitteR, RedditExtractoR
Analysis Summarising Tidyverse enabled summaries of our collected data tidyverse, tidytext, dplyr, tidyr
Text analysis Applying numerical analysis to our text tidytext, tidyverse, dplyr, stringr, RedditExtractoR, tidyr, igraph, ggraph, wordcloud, reshape2, tm, topicmodels
Natural Language Optional section using the cleanNLP package cleanNLP, tibble, tidyverse, RedditExtractoR, reticulate