Description
Preparation of the text data for analysis
Elimination of stop words, punctuation, digits, lowercase
Identify the 10 most frequently used words in the text
- How about the ten least frequently used words?
How does lemmatization change the most/least frequent words?
Explain and demonstrate this topic
- Generate a world cloud for the text
- Demonstrate the generation of n-grams and part of speech tagging
Create a Topic model of the text
- Find the optimal number of topics
test the accuracy of your model
Display your results 2 different ways. 1) Print the topics and explain any insights at this point. 2) Graph the topics and explain any insights at this point.
- Important: Make sure you provide complete and thorough explanations for all of your analysis. You need to defend your thought processes and reasoning.