Distinct Subsequences II

HARD

Description

Given a string s, return the number of distinct non-empty subsequences of s. Since the answer may be very large, return it modulo 10⁹ + 7.

A subsequence of a string is a new string that is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (i.e., "ace" is a subsequence of "abcde" while "aec" is not.

Example 1:

Input: s = "abc"
Output: 7
Explanation: The 7 distinct subsequences are "a", "b", "c", "ab", "ac", "bc", and "abc".

Example 2:

Input: s = "aba"
Output: 6
Explanation: The 6 distinct subsequences are "a", "b", "ab", "aa", "ba", and "aba".

Example 3:

Input: s = "aaa"
Output: 3
Explanation: The 3 distinct subsequences are "a", "aa" and "aaa".

Constraints:

1 <= s.length <= 2000
s consists of lowercase English letters.

Approaches

Checkout 2 different approaches to solve Distinct Subsequences II. Click on different approaches to view the approach and algorithm in detail.

Dynamic Programming with O(N) Space

This approach uses dynamic programming to build up the count of distinct subsequences. We define dp[i] as the number of distinct non-empty subsequences for the prefix of the string of length i. We iterate through the string and calculate dp[i] based on dp[i-1] and the last occurrence of the current character.

Algorithm

Let dp[i] be the number of distinct non-empty subsequences of the prefix s[0...i-1].
The base case is dp[0] = 0 for an empty string.
To compute dp[i], we consider the character c = s[i-1]. The new subsequences are formed by taking all dp[i-1] subsequences of s[0...i-2] and appending c to them, plus the single character subsequence c. This gives dp[i-1] + 1 new subsequences.
The total count becomes dp[i] = dp[i-1] (old ones) + (dp[i-1] + 1) (new ones) = 2 * dp[i-1] + 1.
If c has appeared before at a 1-based index j, we have overcounted. The number of overcounted subsequences is the number of subsequences that could be formed with the previous c, which is dp[j-1] + 1.
The corrected recurrence is dp[i] = (2 * dp[i-1] + 1) - (dp[j-1] + 1) = 2 * dp[i-1] - dp[j-1].
We use an auxiliary array last to store the most recent 1-based index of each character.
The final answer is dp[n], with all calculations performed modulo 10^9 + 7.

We can solve this problem using dynamic programming. Let dp[i] be the number of distinct non-empty subsequences of s.substring(0, i). Our goal is to find dp[n], where n is the length of s.

The base case is dp[0] = 0, as there are no non-empty subsequences in an empty string.

Now, let's find the recurrence relation. When we move from s[0...i-2] to s[0...i-1], we introduce a new character c = s[i-1]. The distinct subsequences of s[0...i-1] consist of:

The distinct subsequences of s[0...i-2] (which is dp[i-1]).
New subsequences formed by appending c to existing subsequences of s[0...i-2] (including the empty subsequence). The number of such subsequences is dp[i-1] + 1.

If c is a new character that has not appeared in s[0...i-2], then all these dp[i-1] + 1 new subsequences are unique. So, dp[i] = dp[i-1] + (dp[i-1] + 1) = 2 * dp[i-1] + 1.

If c has appeared before, say its last occurrence was at index j-1 (where j < i), then we have double-counted. The subsequences we are adding are {sub + c | sub is a subsequence of s[0...i-2]}. The duplicates are those that are identical to {sub' + c | sub' is a subsequence of s[0...j-2]}. The number of such duplicates is dp[j-1] + 1.

So, we must subtract this count. The recurrence becomes dp[i] = (2 * dp[i-1] + 1) - (dp[j-1] + 1) = 2 * dp[i-1] - dp[j-1]. We can use an array last to keep track of the last seen index of each character to find j efficiently. All calculations must be done modulo 10^9 + 7, being careful with subtraction.

class Solution {
    public int distinctSubseqII(String s) {
        int MOD = 1_000_000_007;
        int n = s.length();
        long[] dp = new long[n + 1];
        dp[0] = 0;

        // last[char] stores the 1-based index i where the character was last seen.
        int[] last = new int[26];

        for (int i = 1; i <= n; i++) {
            char c = s.charAt(i - 1);
            int charIndex = c - 'a';
            
            // Base recurrence: double the previous count.
            dp[i] = (2 * dp[i - 1]) % MOD;
            
            int prevIdx = last[charIndex];
            if (prevIdx == 0) { // First time seeing this character
                // Add 1 for the single-character subsequence itself.
                dp[i] = (dp[i] + 1) % MOD;
            } else { // Character has been seen before
                // Subtract the count of subsequences that were formed with the previous occurrence of c.
                // This count is dp[prevIdx - 1].
                dp[i] = (dp[i] - dp[prevIdx - 1] + MOD) % MOD;
            }
            
            last[charIndex] = i;
        }

        return (int) dp[n];
    }
}

Complexity Analysis

Time Complexity: O(N), where N is the length of the string. We iterate through the string once, and all operations inside the loop are constant time.Space Complexity: O(N), where N is the length of the string. This is for the `dp` array. The `last` array uses O(1) space as the alphabet size is constant.

Pros and Cons

Pros:

A standard and conceptually clear dynamic programming solution.
Correctly handles all cases, including duplicate characters.

Cons:

Requires O(N) extra space for the DP table, which is not optimal.

Code Solutions

Checking out 3 solutions in different languages for Distinct Subsequences II. Click on different languages to view the code.

Video Solution

Watch the video walkthrough for Distinct Subsequences II

Patterns:

Dynamic Programming

Data Structures:

String