Most Common Word

EASY

Description

Given a string paragraph and a string array of the banned words banned, return the most frequent word that is not banned. It is guaranteed there is at least one word that is not banned, and that the answer is unique.

The words in paragraph are case-insensitive and the answer should be returned in lowercase.

Note that words can not contain punctuation symbols.

Example 1:

Input: paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.", banned = ["hit"]
Output: "ball"
Explanation: 
"hit" occurs 3 times, but it is a banned word.
"ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph. 
Note that words in the paragraph are not case sensitive,
that punctuation is ignored (even if adjacent to words, such as "ball,"), 
and that "hit" isn't the answer even though it occurs more because it is banned.

Example 2:

Input: paragraph = "a.", banned = []
Output: "a"

Constraints:

1 <= paragraph.length <= 1000
paragraph consists of English letters, space ' ', or one of the symbols: "!?',;.".
0 <= banned.length <= 100
1 <= banned[i].length <= 10
banned[i] consists of only lowercase English letters.

Approaches

Checkout 2 different approaches to solve Most Common Word. Click on different approaches to view the approach and algorithm in detail.

Optimized Approach with Hash Set

This is the standard and most efficient approach. It improves upon the brute-force method by using a HashSet to store the banned words. This allows for checking if a word is banned in constant average time, significantly speeding up the process.

Algorithm

Create a HashSet<String> from the banned array to store banned words for efficient lookup.
Normalize the paragraph by converting it to lowercase and splitting it into an array of words using spaces and punctuation as delimiters ([\s!?',;.]+).
Initialize a HashMap<String, Integer> to store the frequencies of non-banned words.
Iterate through each word from the split paragraph.
For each word, check if it is present in the banned set. This check has an average time complexity of O(1).
If the word is not empty and not in the banned set, update its count in the HashMap.
After populating the map, find the entry with the maximum value. This can be done efficiently using Collections.max() with a custom comparator.
Return the key (the word) of the entry with the highest frequency.

This optimized solution leverages the strengths of hash-based data structures to achieve linear time complexity. First, the banned words are added to a HashSet. This is a crucial step, as it allows us to check if a word is banned in O(1) average time, a massive improvement over the linear scan of a list.

Next, the paragraph is normalized. It's converted to lowercase, and then all punctuation is effectively removed by splitting the string by a regular expression that matches any sequence of spaces or punctuation marks. This gives us a clean array of words.

We then iterate through these words. For each word, we perform a quick check against the bannedSet. If the word is not banned, we update its frequency in a HashMap. Finally, instead of manually iterating to find the maximum frequency, we can use the elegant Collections.max() method on the map's entry set to find the most common word in a single line.

import java.util.HashMap;
import java.util.Map;
import java.util.Set;
import java.util.HashSet;
import java.util.Arrays;
import java.util.Collections;

class Solution {
    public String mostCommonWord(String paragraph, String[] banned) {
        // 1. Store Banned Words in a Set for O(1) lookups
        Set<String> bannedSet = new HashSet<>(Arrays.asList(banned));

        // 2. Normalize and Split Paragraph
        String[] words = paragraph.toLowerCase().split("[\\s!?',;.]+");

        // 3. Count Word Frequencies
        Map<String, Integer> wordCounts = new HashMap<>();
        for (String word : words) {
            if (!word.isEmpty() && !bannedSet.contains(word)) {
                wordCounts.put(word, wordCounts.getOrDefault(word, 0) + 1);
            }
        }

        // 4. Find Most Frequent Word
        // The problem guarantees an answer exists, so the map is not empty.
        return Collections.max(wordCounts.entrySet(), Map.Entry.comparingByValue()).getKey();
    }
}

Complexity Analysis

Time Complexity: O(N + C), where N is the number of characters in the paragraph and C is the total number of characters in the `banned` list. Creating the set is O(C), processing the paragraph is O(N), and counting words is O(N). This is linear in the total input size.Space Complexity: O(N + C), where N is the number of characters in the paragraph and C is the total number of characters in the `banned` list. Space is required for the `bannedSet` and the `wordCounts` map.

Pros and Cons

Pros:

Highly efficient with a linear time complexity relative to the total input size.
Optimal use of data structures (HashSet, HashMap) for their respective tasks (fast lookups and frequency counting).
The code is clean, concise, and scalable for larger inputs.

Cons:

Requires slightly more memory to store the HashSet for banned words, though this is a worthwhile trade-off for the significant performance improvement.

Code Solutions

Checking out 3 solutions in different languages for Most Common Word. Click on different languages to view the code.

Video Solution

Watch the video walkthrough for Most Common Word

Patterns:

Counting

Data Structures:

ArrayHash TableString

Companies:

Datadog |