Text Tokenizer & Stemmer

How to Use

Input Text: Enter or paste any text you want to analyze in the text area.
Options:
- Convert to lowercase: Converts all tokens to lowercase for case-insensitive analysis.
- Remove punctuation: Strips punctuation marks from tokens.
- Remove numbers: Filters out tokens that are purely numeric.
- Apply Porter Stemmer: Reduces words to their root form (e.g., "running" becomes "run").
Tokenization: The text is split into individual words (tokens) based on whitespace and punctuation.
Statistics: View comprehensive statistics including total tokens, unique tokens, average length, and token frequency distribution.

About Porter Stemmer

The Porter Stemming Algorithm (or 'Porter Stemmer') is a process for removing common morphological and inflexional endings from words in English. It reduces words to their root form:

running → run
flies → fli
happily → happili
connection → connect
nationalizing → nation

Note: The stemmer produces stems (root forms) which may not always be valid English words, but groups related words together for analysis.

Text Tokenizer & Stemmer

Input Text

Options

Tokens

Statistics

Total Tokens

Unique Tokens

Average Length

Longest Token

Token Frequency