Distinct Subsequences II
HARDDescription
Given a string s, return the number of distinct non-empty subsequences of s. Since the answer may be very large, return it modulo 109 + 7.
"ace" is a subsequence of "abcde" while "aec" is not.
Example 1:
Input: s = "abc" Output: 7 Explanation: The 7 distinct subsequences are "a", "b", "c", "ab", "ac", "bc", and "abc".
Example 2:
Input: s = "aba" Output: 6 Explanation: The 6 distinct subsequences are "a", "b", "ab", "aa", "ba", and "aba".
Example 3:
Input: s = "aaa" Output: 3 Explanation: The 3 distinct subsequences are "a", "aa" and "aaa".
Constraints:
1 <= s.length <= 2000sconsists of lowercase English letters.
Approaches
Checkout 2 different approaches to solve Distinct Subsequences II. Click on different approaches to view the approach and algorithm in detail.
Dynamic Programming with O(N) Space
This approach uses dynamic programming to build up the count of distinct subsequences. We define dp[i] as the number of distinct non-empty subsequences for the prefix of the string of length i. We iterate through the string and calculate dp[i] based on dp[i-1] and the last occurrence of the current character.
Algorithm
- Let
dp[i]be the number of distinct non-empty subsequences of the prefixs[0...i-1]. - The base case is
dp[0] = 0for an empty string. - To compute
dp[i], we consider the characterc = s[i-1]. The new subsequences are formed by taking alldp[i-1]subsequences ofs[0...i-2]and appendingcto them, plus the single character subsequencec. This givesdp[i-1] + 1new subsequences. - The total count becomes
dp[i] = dp[i-1] (old ones) + (dp[i-1] + 1) (new ones) = 2 * dp[i-1] + 1. - If
chas appeared before at a 1-based indexj, we have overcounted. The number of overcounted subsequences is the number of subsequences that could be formed with the previousc, which isdp[j-1] + 1. - The corrected recurrence is
dp[i] = (2 * dp[i-1] + 1) - (dp[j-1] + 1) = 2 * dp[i-1] - dp[j-1]. - We use an auxiliary array
lastto store the most recent 1-based index of each character. - The final answer is
dp[n], with all calculations performed modulo10^9 + 7.
We can solve this problem using dynamic programming. Let dp[i] be the number of distinct non-empty subsequences of s.substring(0, i). Our goal is to find dp[n], where n is the length of s.
The base case is dp[0] = 0, as there are no non-empty subsequences in an empty string.
Now, let's find the recurrence relation. When we move from s[0...i-2] to s[0...i-1], we introduce a new character c = s[i-1]. The distinct subsequences of s[0...i-1] consist of:
- The distinct subsequences of
s[0...i-2](which isdp[i-1]). - New subsequences formed by appending
cto existing subsequences ofs[0...i-2](including the empty subsequence). The number of such subsequences isdp[i-1] + 1.
If c is a new character that has not appeared in s[0...i-2], then all these dp[i-1] + 1 new subsequences are unique. So, dp[i] = dp[i-1] + (dp[i-1] + 1) = 2 * dp[i-1] + 1.
If c has appeared before, say its last occurrence was at index j-1 (where j < i), then we have double-counted. The subsequences we are adding are {sub + c | sub is a subsequence of s[0...i-2]}. The duplicates are those that are identical to {sub' + c | sub' is a subsequence of s[0...j-2]}. The number of such duplicates is dp[j-1] + 1.
So, we must subtract this count. The recurrence becomes dp[i] = (2 * dp[i-1] + 1) - (dp[j-1] + 1) = 2 * dp[i-1] - dp[j-1]. We can use an array last to keep track of the last seen index of each character to find j efficiently. All calculations must be done modulo 10^9 + 7, being careful with subtraction.
class Solution {
public int distinctSubseqII(String s) {
int MOD = 1_000_000_007;
int n = s.length();
long[] dp = new long[n + 1];
dp[0] = 0;
// last[char] stores the 1-based index i where the character was last seen.
int[] last = new int[26];
for (int i = 1; i <= n; i++) {
char c = s.charAt(i - 1);
int charIndex = c - 'a';
// Base recurrence: double the previous count.
dp[i] = (2 * dp[i - 1]) % MOD;
int prevIdx = last[charIndex];
if (prevIdx == 0) { // First time seeing this character
// Add 1 for the single-character subsequence itself.
dp[i] = (dp[i] + 1) % MOD;
} else { // Character has been seen before
// Subtract the count of subsequences that were formed with the previous occurrence of c.
// This count is dp[prevIdx - 1].
dp[i] = (dp[i] - dp[prevIdx - 1] + MOD) % MOD;
}
last[charIndex] = i;
}
return (int) dp[n];
}
}
Complexity Analysis
Pros and Cons
- A standard and conceptually clear dynamic programming solution.
- Correctly handles all cases, including duplicate characters.
- Requires O(N) extra space for the DP table, which is not optimal.
Code Solutions
Checking out 3 solutions in different languages for Distinct Subsequences II. Click on different languages to view the code.
class Solution {
private
static final int MOD = (int)1 e9 + 7;
public
int distinctSubseqII(String s) {
int[] dp = new int[26];
int ans = 0;
for (int i = 0; i < s.length(); ++i) {
int j = s.charAt(i) - 'a';
int add = (ans - dp[j] + 1) % MOD;
ans = (ans + add) % MOD;
dp[j] = (dp[j] + add) % MOD;
}
return (ans + MOD) % MOD;
}
}
Video Solution
Watch the video walkthrough for Distinct Subsequences II
Similar Questions
5 related questions you might find useful
Patterns:
Data Structures:
Subscribe to Scale Engineer newsletter
Learn about System Design, Software Engineering, and interview experiences every week.
No spam, unsubscribe at any time.