top of page
Untitled

DATA DOUBLE CONFIRM

Text frequency analysis - Process - R

This is a tutorial to get the frequency distribution of words used in a chunk of text and is a simpler alternative to a more elaborate text mining post that involves auto-removal of stopwords e.g. "the", "a", "and", etc.

The script basically breaks the chunk of text into a dataframe of words and these words are ran through a text cleaning function that removes punctuations/ symbols (the function can be modified to include text stemming if necessary; text stemming is done to derive root words).

bottom of page