Analyze text with tokenization, stemming, and word statistics
Input Text
Options
Tokens
Statistics
Total Tokens
0
Unique Tokens
0
Average Length
0
Longest Token
-
Token Frequency
Rank
Token
Frequency
Percentage
How to Use
Input Text: Enter or paste any text you want to analyze in the text area.
Options:
Convert to lowercase: Converts all tokens to lowercase for case-insensitive analysis.
Remove punctuation: Strips punctuation marks from tokens.
Remove numbers: Filters out tokens that are purely numeric.
Apply Porter Stemmer: Reduces words to their root form (e.g., "running" becomes "run").
Tokenization: The text is split into individual words (tokens) based on whitespace and punctuation.
Statistics: View comprehensive statistics including total tokens, unique tokens, average length, and token frequency distribution.
About Porter Stemmer
The Porter Stemming Algorithm (or 'Porter Stemmer') is a process for removing common morphological and inflexional endings from words in English. It reduces words to their root form:
running → run
flies → fli
happily → happili
connection → connect
nationalizing → nation
Note: The stemmer produces stems (root forms) which may not always be valid English words, but groups related words together for analysis.