String Compression II
HARDDescription
Run-length encoding is a string compression method that works by replacing consecutive identical characters (repeated 2 or more times) with the concatenation of the character and the number marking the count of the characters (length of the run). For example, to compress the string "aabccc" we replace "aa" by "a2" and replace "ccc" by "c3". Thus the compressed string becomes "a2bc3".
Notice that in this problem, we are not adding '1' after single characters.
Given a string s and an integer k. You need to delete at most k characters from s such that the run-length encoded version of s has minimum length.
Find the minimum length of the run-length encoded version of s after deleting at most k characters.
Example 1:
Input: s = "aaabcccd", k = 2 Output: 4 Explanation: Compressing s without deleting anything will give us "a3bc3d" of length 6. Deleting any of the characters 'a' or 'c' would at most decrease the length of the compressed string to 5, for instance delete 2 'a' then we will have s = "abcccd" which compressed is abc3d. Therefore, the optimal way is to delete 'b' and 'd', then the compressed version of s will be "a3c3" of length 4.
Example 2:
Input: s = "aabbaa", k = 2 Output: 2 Explanation: If we delete both 'b' characters, the resulting compressed string would be "a4" of length 2.
Example 3:
Input: s = "aaaaaaaaaaa", k = 0 Output: 3 Explanation: Since k is zero, we cannot delete anything. The compressed string is "a11" of length 3.
Constraints:
1 <= s.length <= 1000 <= k <= s.lengthscontains only lowercase English letters.
Approaches
Checkout 2 different approaches to solve String Compression II. Click on different approaches to view the approach and algorithm in detail.
Dynamic Programming with Memoization
The brute-force recursion is slow due to recomputing the same subproblems. We can significantly optimize it by storing the results of each subproblem (i, k) in a cache or memoization table. This ensures that each subproblem is solved only once. This technique is known as memoization or top-down dynamic programming and is efficient enough to solve this problem within the given constraints.
Algorithm
- Initialize a 2D array
memo[n][k+1]with a sentinel value (e.g., -1) to store computed results. - Define a recursive function
solve(i, k)that returns the minimum length fors[i:]withkdeletions. - Base Cases:
- If
k < 0, return a large value (infinity). - If
i >= norn - i <= k, return 0.
- If
- Memoization Check: If
memo[i][k]is not -1, return it. - Recursive Step:
- Calculate the result
resby considering two choices:- Delete
s[i]:res = solve(i + 1, k - 1). - Keep
s[i]: Iteratejfromiton-1. Ins[i...j], counts[i]'s (count) and others (deleted). Ifdeleted <= k, updateres = min(res, getLen(count) + solve(j + 1, k - deleted)).
- Delete
- Calculate the result
- Store the final
resinmemo[i][k]before returning. - The main function calls
solve(0, k).
We use a 2D array, memo[n][k+1], to store the results of our recursive function solve(i, k). memo[i][k] will hold the minimum compressed length for the suffix s[i:] with k deletions.
The logic of the solve function remains the same as the brute-force version. However, before any computation, we check if memo[i][k] has already been computed. If it has, we return the stored value immediately. If not, we compute the result as before, and just before returning, we store it in memo[i][k]. This simple addition prunes the recursion tree drastically, avoiding redundant computations.
The state transition is:
memo[i][k] = min(solve(i+1, k-1), min_{j=i..n-1} (getLen(count) + solve(j+1, k-deleted)))
where count and deleted are for the substring s[i..j] to form a run of s[i].
class Solution {
int[][] memo;
String s;
int n;
// Helper to calculate the length of an encoded run
private int getLen(int count) {
if (count == 1) return 1;
if (count < 10) return 2;
if (count < 100) return 3;
return 4;
}
private int solve(int i, int k) {
// Base case: invalid number of deletions
if (k < 0) {
return 101; // Using a value larger than max possible length
}
// Base case: end of string or can delete all remaining chars
if (i >= n || n - i <= k) {
return 0;
}
// Return memoized result if available
if (memo[i][k] != -1) {
return memo[i][k];
}
// Option 1: Delete the character s[i]
int res = solve(i + 1, k - 1);
// Option 2: Keep s[i] and form a run of this character.
// We iterate through all possible end points 'j' for this run.
int deletedCount = 0;
int keepCount = 0;
for (int j = i; j < n; j++) {
if (s.charAt(j) == s.charAt(i)) {
keepCount++;
} else {
deletedCount++;
}
// If we have enough deletions available for this run
if (deletedCount <= k) {
// Calculate the length of this choice
int currentLen = getLen(keepCount) + solve(j + 1, k - deletedCount);
res = Math.min(res, currentLen);
} else {
// If we need more deletions than available, we can't extend this run further
break;
}
}
// Memoize and return the result
return memo[i][k] = res;
}
public int getLengthOfOptimalCompression(String s, int k) {
this.s = s;
this.n = s.length();
this.memo = new int[n][k + 1];
for (int row = 0; row < n; row++) {
java.util.Arrays.fill(memo[row], -1);
}
return solve(0, k);
}
}
Complexity Analysis
Pros and Cons
- Finds the optimal solution efficiently by avoiding redundant calculations.
- Passes the time limits for the given constraints.
- It's a standard and robust way to solve this type of combinatorial optimization problem.
- The space complexity of O(n*k) might be an issue for problems with much larger constraints, but it is acceptable here.
Code Solutions
Checking out 1 solution in different languages for String Compression II. Click on different languages to view the code.
class Solution {
public
int getLengthOfOptimalCompression(String s, int k) {Video Solution
Watch the video walkthrough for String Compression II
Similar Questions
5 related questions you might find useful
Patterns:
Data Structures:
Companies:
Subscribe to Scale Engineer newsletter
Learn about System Design, Software Engineering, and interview experiences every week.
No spam, unsubscribe at any time.