Masterclass: Social Media
Teaching
A masterclass in social media data collection for the QSTEP centre
I was asked to deliver a Social Media Data Analysis workshop in R by the QSTEP centre at the University of Warwick. I decided to write the Masterclass entirely, where possible, using the tidyverse packages. It was a delightful learning experience and I would suggest anyone face the challenge of putting together such a workshop.
The workshop covered methods for downloading, analysing and visualising social media data using the R programming language. We use the ‘tidyverse’ in R and (optionally) the spacy python module for natural language processing.
Outline
The structure of the workshop is as follows
Stage | Title | Detail | R package(s) |
---|---|---|---|
Introduction | Overview of the day | ||
R intro | An introduction to R | ggplot2, tidyverse | |
Collection | Scraping | Downloading and filtering html pages | rvest, tidyverse, magittr, ggplot2, tibble |
API and data dumps | Accessing data directly using APIs | httr, jsonlite, dplyr, textclean, stringr, ggplot2, tidyverse, magittr, tibble, twitteR, RedditExtractoR | |
Analysis | Summarising | Tidyverse enabled summaries of our collected data | tidyverse, tidytext, dplyr, tidyr |
Text analysis | Applying numerical analysis to our text | tidytext, tidyverse, dplyr, stringr, RedditExtractoR, tidyr, igraph, ggraph, wordcloud, reshape2, tm, topicmodels | |
Natural Language | Optional section using the cleanNLP package | cleanNLP, tibble, tidyverse, RedditExtractoR, reticulate |