Longest Common Prefix of K Strings After Removal

HARD

Description

You are given an array of strings words and an integer k.

For each index i in the range [0, words.length - 1], find the length of the longest common prefix among any k strings (selected at distinct indices) from the remaining array after removing the i^th element.

Return an array answer, where answer[i] is the answer for i^th element. If removing the i^th element leaves the array with fewer than k strings, answer[i] is 0.

Example 1:

Input: words = ["jump","run","run","jump","run"], k = 2

Output: [3,4,4,3,4]

Explanation:

Removing index 0 ("jump"):
- words becomes: ["run", "run", "jump", "run"]. "run" occurs 3 times. Choosing any two gives the longest common prefix "run" (length 3).
Removing index 1 ("run"):
- words becomes: ["jump", "run", "jump", "run"]. "jump" occurs twice. Choosing these two gives the longest common prefix "jump" (length 4).
Removing index 2 ("run"):
- words becomes: ["jump", "run", "jump", "run"]. "jump" occurs twice. Choosing these two gives the longest common prefix "jump" (length 4).
Removing index 3 ("jump"):
- words becomes: ["jump", "run", "run", "run"]. "run" occurs 3 times. Choosing any two gives the longest common prefix "run" (length 3).
Removing index 4 ("run"):
- words becomes: ["jump", "run", "run", "jump"]. "jump" occurs twice. Choosing these two gives the longest common prefix "jump" (length 4).

Example 2:

Input: words = ["dog","racer","car"], k = 2

Output: [0,0,0]

Explanation:

Removing any index results in an answer of 0.

Constraints:

1 <= k <= words.length <= 10⁵
1 <= words[i].length <= 10⁴
words[i] consists of lowercase English letters.
The sum of words[i].length is smaller than or equal 10⁵.

Approaches

Checkout 2 different approaches to solve Longest Common Prefix of K Strings After Removal. Click on different approaches to view the approach and algorithm in detail.

Optimized Trie with Pre-computation

This efficient approach avoids rebuilding the Trie for each word. Instead, it builds a single Trie for all words and pre-computes necessary information. For each word to be removed, it uses this pre-computed data to quickly find the longest common prefix among the remaining strings.

Algorithm

Build a single Trie for all words in the input array. Each node should store a count of words passing through it.
Perform a post-order DFS traversal on the Trie to compute two values for each node u at depth d:
- maxLen1[u]: The maximum length of a prefix in the subtree of u (inclusive) whose original count is >= k.
- maxLen2[u]: The maximum length of a prefix in the subtree of u (inclusive) whose original count is >= k+1.
Initialize an answer array ans of the same size as words.
For each index i from 0 to words.length - 1:
- If words.length - 1 < k, set ans[i] = 0 and continue.
- Initialize currentMax = 0.
- Traverse the path for words[i] in the Trie, starting from the root. Let the current node be p.
- At each step of the traversal for words[i] (for character at depth d):
  - Let the next node on the path be childOnPath.
  - Iterate through all children c of the current node p. If a child c is not childOnPath, it represents a branch of prefixes not found in words[i]. Update currentMax = max(currentMax, c.maxLen1).
  - Move to the next node: p = childOnPath.
  - The prefix ending at the new node p is a prefix of words[i]. Its count will be reduced by 1. It is a valid LCP if its original count was >= k+1. If so, update currentMax = max(currentMax, d + 1).
- After the traversal for words[i] is complete, the final node p might have children. These branches are also not part of words[i]. For all children c of p, update currentMax = max(currentMax, c.maxLen1).
- Set ans[i] = currentMax.
Return ans.

The core idea is to build one Trie for all n words and then, for each i, calculate the result without modifying the Trie structure. First, we insert all words into a single Trie where each node stores a count of how many words pass through it.

The key insight is that when we remove words[i], the count of nodes corresponding to prefixes of words[i] decreases by 1, while counts for all other nodes remain the same. A prefix P is a candidate for the LCP if its count among the remaining n-1 strings is at least k.

If P is a prefix of words[i], its new count is original_count - 1. It's a candidate if original_count - 1 >= k, i.e., original_count >= k+1.
If P is not a prefix of words[i], its count is unchanged. It's a candidate if original_count >= k.

To find the maximum length efficiently, we pre-compute two values for each node u in the Trie using a post-order DFS traversal:

maxLen1: The maximum length of a prefix in the subtree of u (inclusive) whose original count is >= k.
maxLen2: The maximum length of a prefix in the subtree of u (inclusive) whose original count is >= k+1.

After this pre-computation, we iterate through each words[i] to calculate answer[i]. We traverse the path of words[i] in the Trie. At each node p on the path, the maximum LCP length can come from two sources:

Prefixes of words[i] itself. We check if count(p) >= k+1.
Prefixes not belonging to words[i]. These are found in the subtrees of p's children that are not on the path of words[i]. For any such "off-path" child c, the maximum length we can get from its subtree is maxLen1[c], since counts in that subtree are unaffected.

We take the maximum of all these possibilities to find answer[i].

class Solution {
    class TrieNode {
        TrieNode[] children = new TrieNode[26];
        int count = 0;
        int maxLen1 = 0; // Max depth in subtree with count >= k
        int maxLen2 = 0; // Max depth in subtree with count >= k+1
    }

    public int[] longestCommonPrefix(String[] words, int k) {
        int n = words.length;
        int[] answer = new int[n];
        if (k > n - 1) {
            return answer; // All answers are 0
        }

        TrieNode root = new TrieNode();
        for (String word : words) {
            TrieNode curr = root;
            for (char c : word.toCharArray()) {
                if (curr.children[c - 'a'] == null) {
                    curr.children[c - 'a'] = new TrieNode();
                }
                curr = curr.children[c - 'a'];
                curr.count++;
            }
        }

        dfs(root, 0, k);

        for (int i = 0; i < n; i++) {
            String word = words[i];
            TrieNode curr = root;
            int maxLcp = 0;

            for (int j = 0; j < word.length(); j++) {
                char c = word.charAt(j);
                TrieNode nextNode = curr.children[c - 'a'];
                
                // Check other branches
                for (int childIdx = 0; childIdx < 26; childIdx++) {
                    TrieNode child = curr.children[childIdx];
                    if (child != null && child != nextNode) {
                        maxLcp = Math.max(maxLcp, child.maxLen1);
                    }
                }
                
                curr = nextNode;
                if (curr.count >= k + 1) {
                    maxLcp = Math.max(maxLcp, j + 1);
                }
            }
            
            // Check branches from the last node of the word's path
            for(TrieNode child : curr.children){
                if(child != null){
                    maxLcp = Math.max(maxLcp, child.maxLen1);
                }
            }

            answer[i] = maxLcp;
        }

        return answer;
    }

    private void dfs(TrieNode node, int depth, int k) {
        if (node == null) return;

        node.maxLen1 = (node.count >= k) ? depth : 0;
        node.maxLen2 = (node.count >= k + 1) ? depth : 0;

        for (TrieNode child : node.children) {
            if (child != null) {
                dfs(child, depth + 1, k);
                node.maxLen1 = Math.max(node.maxLen1, child.maxLen1);
                node.maxLen2 = Math.max(node.maxLen2, child.maxLen2);
            }
        }
    }
}

Complexity Analysis

Time Complexity: O(S * A), where `S` is the total length of all words and `A` is the alphabet size. Building the Trie and the DFS for pre-computation takes `O(S)`. Calculating each answer takes `O(length(word) * A)`. The total for all words sums up to `O(S * A)`.Space Complexity: O(S * A), where `S` is the total length of all words and `A` is the alphabet size (26). This space is for storing the Trie. Each of the `O(S)` nodes has an array of `A` children pointers and a few integer fields.

Pros and Cons

Pros:

Highly efficient and passes the given constraints.
Avoids redundant computations by using a single Trie and pre-computation.
Scales well with a large number of words.

Cons:

More complex to implement and reason about compared to the brute-force approach.
Requires more space to store the pre-computed values in each Trie node.

Code Solutions

Checking out 2 solutions in different languages for Longest Common Prefix of K Strings After Removal. Click on different languages to view the code.

Video Solution

Watch the video walkthrough for Longest Common Prefix of K Strings After Removal

Data Structures:

ArrayStringTrie