Longest Duplicate Substring
HARDDescription
Given a string s, consider all duplicated substrings: (contiguous) substrings of s that occur 2 or more times. The occurrences may overlap.
Return any duplicated substring that has the longest possible length. If s does not have a duplicated substring, the answer is "".
Example 1:
Input: s = "banana" Output: "ana"
Example 2:
Input: s = "abcd" Output: ""
Constraints:
2 <= s.length <= 3 * 104sconsists of lowercase English letters.
Approaches
Checkout 3 different approaches to solve Longest Duplicate Substring. Click on different approaches to view the approach and algorithm in detail.
Binary Search on Length with Rabin-Karp
A much more efficient approach uses binary search on the length of the substring combined with the Rabin-Karp algorithm for string matching. The problem has a monotonic property: if a duplicate substring of length k exists, a duplicate of any length less than k also exists. This allows us to binary search for the optimal length. For each candidate length, we use a rolling hash to efficiently check for duplicates in linear time.
Algorithm
- The length of the longest duplicate substring can be anywhere from 0 to
n-1. We can binary search for this length. - Define a search space for the length,
low = 1,high = n-1. - In each step of the binary search, pick a
midlength. - Use a helper function,
search(length), to check if a duplicate substring of lengthmidexists. - The
search(length)function uses the Rabin-Karp algorithm with a rolling hash.- It slides a window of size
lengthacross the string. - It calculates the hash of each substring in
O(1)time (after an initialO(length)calculation) using the rolling hash technique. - A
HashSetis used to store hashes of seen substrings. If a hash collision occurs, we have found a potential duplicate. - To be perfectly correct, upon a hash collision, one should compare the actual substrings to rule out false positives, though with a large prime modulus, this is rare.
- It slides a window of size
- If
search(mid)finds a duplicate, we know a solution of at least lengthmidis possible. We store this potential answer and try for a longer one by settinglow = mid + 1. - If
search(mid)finds no duplicate, the lengthmidis too long. We must try a shorter one by settinghigh = mid - 1. - The final answer is the longest valid substring found during the search.
We can binary search for the answer, which is the length of the substring. The search space for the length L is [0, n-1]. For a given L, we need to efficiently check if there's any repeated substring of that length. This check can be performed in O(n) time using a rolling hash.
The rolling hash technique allows us to calculate the hash of a new substring from the previous one in O(1) time. We slide a window of length L over the string. We store the hashes of the substrings we've seen in a HashSet. If we compute a hash that's already in the set, we've likely found a duplicate. We then verify it's a true duplicate (not a hash collision) and proceed with the binary search.
If a duplicate of length mid is found, we know we might be able to do better, so we search in the upper half (low = mid + 1). If not, mid is too large, and we search in the lower half (high = mid - 1).
import java.util.HashSet;
class Solution {
// Using prefix hashes for O(1) hash calculation of any substring
public String longestDupSubstring(String S) {
int n = S.length();
int low = 1, high = n;
String result = "";
while (low <= high) {
int mid = low + (high - low) / 2;
String duplicate = search(S, mid);
if (duplicate != null) {
result = duplicate;
low = mid + 1;
} else {
high = mid - 1;
}
}
return result;
}
private String search(String s, int len) {
long prime = 29;
long power = 1;
for (int i = 0; i < len; i++) {
power = (power * prime);
}
HashSet<Long> seen = new HashSet<>();
long currentHash = 0;
for (int i = 0; i < s.length(); i++) {
currentHash = (currentHash * prime) + (s.charAt(i) - 'a');
if (i >= len) {
currentHash -= (long)(s.charAt(i - len) - 'a') * power;
}
if (i >= len - 1) {
if (seen.contains(currentHash)) {
// For this problem, simple hash check is often sufficient.
// A robust solution would verify the substring match here.
return s.substring(i - len + 1, i + 1);
}
seen.add(currentHash);
}
}
return null;
}
}
Complexity Analysis
Pros and Cons
- Significantly more efficient with O(n log n) time complexity.
- Fast enough to pass the given constraints.
- A standard and powerful technique for optimization problems.
- More complex to implement than the brute-force approach.
- Requires understanding of binary search and hashing (Rabin-Karp).
- Hash collisions are a potential issue, which might require verification or using multiple hash functions (double hashing) for robustness.
Code Solutions
Checking out 3 solutions in different languages for Longest Duplicate Substring. Click on different languages to view the code.
class Solution {
private
long[] p;
private
long[] h;
public
String longestDupSubstring(String s) {
int base = 131;
int n = s.length();
p = new long[n + 10];
h = new long[n + 10];
p[0] = 1;
for (int i = 0; i < n; ++i) {
p[i + 1] = p[i] * base;
h[i + 1] = h[i] * base + s.charAt(i);
}
String ans = "";
int left = 0, right = n;
while (left < right) {
int mid = (left + right + 1) >> 1;
String t = check(s, mid);
if (t.length() > 0) {
left = mid;
ans = t;
} else {
right = mid - 1;
}
}
return ans;
}
private
String check(String s, int len) {
int n = s.length();
Set<Long> vis = new HashSet<>();
for (int i = 1; i + len - 1 <= n; ++i) {
int j = i + len - 1;
long t = h[j] - h[i - 1] * p[j - i + 1];
if (vis.contains(t)) {
return s.substring(i - 1, j);
}
vis.add(t);
}
return "";
}
}
Video Solution
Watch the video walkthrough for Longest Duplicate Substring
Similar Questions
5 related questions you might find useful
Algorithms:
Patterns:
Data Structures:
Companies:
Subscribe to Scale Engineer newsletter
Learn about System Design, Software Engineering, and interview experiences every week.
No spam, unsubscribe at any time.