The code is organized as a package with some additional structure based on the workflowr
package. Both this repo and the workflowr
package need to be installed to rebuild the analysis. (This could also be done directly with rmarkdown
)
install.packages('workflowr')
install.packages('devtools')
devtools::install_github('nateaff/legolda')
The analysis also uses the following packages:
purrr, tidyr, forcats, stringr, ggplot2, dplyr
# For text mining
topicmodels, SpeedReader, tidytext, ldatuning, clues, waffle
After installing required packages and setting your working directory to the legolda
package you need to run:
library(legolda)
workflowr::wflow_build()
This builds the website version of the analysis you are reading now.
There are a few intermediate cached files that are produced by the LDA function. These take a long time to run and to re-run you will need to set from_cache
to FALSE
in the perplexity-cv.R
and train-model.R
files. (I might update how this option is changed.) This takes 8 hours or so to run on a larger AWS instance. You also have the option of passing the load_data
calls the sample_data = 1000
parameter to run on a subset of the data. The sample number refers to the number of lego sets used to build the models.