Most Common Word
EASYDescription
Given a string paragraph and a string array of the banned words banned, return the most frequent word that is not banned. It is guaranteed there is at least one word that is not banned, and that the answer is unique.
The words in paragraph are case-insensitive and the answer should be returned in lowercase.
Note that words can not contain punctuation symbols.
Example 1:
Input: paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.", banned = ["hit"] Output: "ball" Explanation: "hit" occurs 3 times, but it is a banned word. "ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph. Note that words in the paragraph are not case sensitive, that punctuation is ignored (even if adjacent to words, such as "ball,"), and that "hit" isn't the answer even though it occurs more because it is banned.
Example 2:
Input: paragraph = "a.", banned = [] Output: "a"
Constraints:
1 <= paragraph.length <= 1000- paragraph consists of English letters, space
' ', or one of the symbols:"!?',;.". 0 <= banned.length <= 1001 <= banned[i].length <= 10banned[i]consists of only lowercase English letters.
Approaches
Checkout 2 different approaches to solve Most Common Word. Click on different approaches to view the approach and algorithm in detail.
Optimized Approach with Hash Set
This is the standard and most efficient approach. It improves upon the brute-force method by using a HashSet to store the banned words. This allows for checking if a word is banned in constant average time, significantly speeding up the process.
Algorithm
- Create a
HashSet<String>from thebannedarray to store banned words for efficient lookup. - Normalize the
paragraphby converting it to lowercase and splitting it into an array of words using spaces and punctuation as delimiters ([\s!?',;.]+). - Initialize a
HashMap<String, Integer>to store the frequencies of non-banned words. - Iterate through each
wordfrom the split paragraph. - For each word, check if it is present in the
bannedset. This check has an average time complexity of O(1). - If the word is not empty and not in the
bannedset, update its count in theHashMap. - After populating the map, find the entry with the maximum value. This can be done efficiently using
Collections.max()with a custom comparator. - Return the key (the word) of the entry with the highest frequency.
This optimized solution leverages the strengths of hash-based data structures to achieve linear time complexity. First, the banned words are added to a HashSet. This is a crucial step, as it allows us to check if a word is banned in O(1) average time, a massive improvement over the linear scan of a list.
Next, the paragraph is normalized. It's converted to lowercase, and then all punctuation is effectively removed by splitting the string by a regular expression that matches any sequence of spaces or punctuation marks. This gives us a clean array of words.
We then iterate through these words. For each word, we perform a quick check against the bannedSet. If the word is not banned, we update its frequency in a HashMap. Finally, instead of manually iterating to find the maximum frequency, we can use the elegant Collections.max() method on the map's entry set to find the most common word in a single line.
import java.util.HashMap;
import java.util.Map;
import java.util.Set;
import java.util.HashSet;
import java.util.Arrays;
import java.util.Collections;
class Solution {
public String mostCommonWord(String paragraph, String[] banned) {
// 1. Store Banned Words in a Set for O(1) lookups
Set<String> bannedSet = new HashSet<>(Arrays.asList(banned));
// 2. Normalize and Split Paragraph
String[] words = paragraph.toLowerCase().split("[\\s!?',;.]+");
// 3. Count Word Frequencies
Map<String, Integer> wordCounts = new HashMap<>();
for (String word : words) {
if (!word.isEmpty() && !bannedSet.contains(word)) {
wordCounts.put(word, wordCounts.getOrDefault(word, 0) + 1);
}
}
// 4. Find Most Frequent Word
// The problem guarantees an answer exists, so the map is not empty.
return Collections.max(wordCounts.entrySet(), Map.Entry.comparingByValue()).getKey();
}
}
Complexity Analysis
Pros and Cons
- Highly efficient with a linear time complexity relative to the total input size.
- Optimal use of data structures (
HashSet,HashMap) for their respective tasks (fast lookups and frequency counting). - The code is clean, concise, and scalable for larger inputs.
- Requires slightly more memory to store the
HashSetfor banned words, though this is a worthwhile trade-off for the significant performance improvement.
Code Solutions
Checking out 3 solutions in different languages for Most Common Word. Click on different languages to view the code.
import java.util.regex.Matcher ; import java.util.regex.Pattern ; class Solution { private static Pattern pattern = Pattern . compile ( "[a-z]+" ); public String mostCommonWord ( String paragraph , String [] banned ) { Set < String > bannedWords = new HashSet <>(); for ( String word : banned ) { bannedWords . add ( word ); } Map < String , Integer > counter = new HashMap <>(); Matcher matcher = pattern . matcher ( paragraph . toLowerCase ()); while ( matcher . find ()) { String word = matcher . group (); if ( bannedWords . contains ( word )) { continue ; } counter . put ( word , counter . getOrDefault ( word , 0 ) + 1 ); } int max = Integer . MIN_VALUE ; String ans = null ; for ( Map . Entry < String , Integer > entry : counter . entrySet ()) { if ( entry . getValue () > max ) { max = entry . getValue (); ans = entry . getKey (); } } return ans ; } }Video Solution
Watch the video walkthrough for Most Common Word
Similar Questions
5 related questions you might find useful
Patterns:
Data Structures:
Companies:
Subscribe to Scale Engineer newsletter
Learn about System Design, Software Engineering, and interview experiences every week.
No spam, unsubscribe at any time.