String Compression II

HARD

Description

Run-length encoding is a string compression method that works by replacing consecutive identical characters (repeated 2 or more times) with the concatenation of the character and the number marking the count of the characters (length of the run). For example, to compress the string "aabccc" we replace "aa" by "a2" and replace "ccc" by "c3". Thus the compressed string becomes "a2bc3".

Notice that in this problem, we are not adding '1' after single characters.

Given a string s and an integer k. You need to delete at most k characters from s such that the run-length encoded version of s has minimum length.

Find the minimum length of the run-length encoded version of s after deleting at most k characters.

 

Example 1:

Input: s = "aaabcccd", k = 2
Output: 4
Explanation: Compressing s without deleting anything will give us "a3bc3d" of length 6. Deleting any of the characters 'a' or 'c' would at most decrease the length of the compressed string to 5, for instance delete 2 'a' then we will have s = "abcccd" which compressed is abc3d. Therefore, the optimal way is to delete 'b' and 'd', then the compressed version of s will be "a3c3" of length 4.

Example 2:

Input: s = "aabbaa", k = 2
Output: 2
Explanation: If we delete both 'b' characters, the resulting compressed string would be "a4" of length 2.

Example 3:

Input: s = "aaaaaaaaaaa", k = 0
Output: 3
Explanation: Since k is zero, we cannot delete anything. The compressed string is "a11" of length 3.

 

Constraints:

  • 1 <= s.length <= 100
  • 0 <= k <= s.length
  • s contains only lowercase English letters.

Approaches

Checkout 2 different approaches to solve String Compression II. Click on different approaches to view the approach and algorithm in detail.

Dynamic Programming with Memoization

The brute-force recursion is slow due to recomputing the same subproblems. We can significantly optimize it by storing the results of each subproblem (i, k) in a cache or memoization table. This ensures that each subproblem is solved only once. This technique is known as memoization or top-down dynamic programming and is efficient enough to solve this problem within the given constraints.

Algorithm

  • Initialize a 2D array memo[n][k+1] with a sentinel value (e.g., -1) to store computed results.
  • Define a recursive function solve(i, k) that returns the minimum length for s[i:] with k deletions.
  • Base Cases:
    • If k < 0, return a large value (infinity).
    • If i >= n or n - i <= k, return 0.
  • Memoization Check: If memo[i][k] is not -1, return it.
  • Recursive Step:
    • Calculate the result res by considering two choices:
      1. Delete s[i]: res = solve(i + 1, k - 1).
      2. Keep s[i]: Iterate j from i to n-1. In s[i...j], count s[i]'s (count) and others (deleted). If deleted <= k, update res = min(res, getLen(count) + solve(j + 1, k - deleted)).
  • Store the final res in memo[i][k] before returning.
  • The main function calls solve(0, k).

We use a 2D array, memo[n][k+1], to store the results of our recursive function solve(i, k). memo[i][k] will hold the minimum compressed length for the suffix s[i:] with k deletions. The logic of the solve function remains the same as the brute-force version. However, before any computation, we check if memo[i][k] has already been computed. If it has, we return the stored value immediately. If not, we compute the result as before, and just before returning, we store it in memo[i][k]. This simple addition prunes the recursion tree drastically, avoiding redundant computations. The state transition is: memo[i][k] = min(solve(i+1, k-1), min_{j=i..n-1} (getLen(count) + solve(j+1, k-deleted))) where count and deleted are for the substring s[i..j] to form a run of s[i].

class Solution {
    int[][] memo;
    String s;
    int n;

    // Helper to calculate the length of an encoded run
    private int getLen(int count) {
        if (count == 1) return 1;
        if (count < 10) return 2;
        if (count < 100) return 3;
        return 4;
    }

    private int solve(int i, int k) {
        // Base case: invalid number of deletions
        if (k < 0) {
            return 101; // Using a value larger than max possible length
        }
        // Base case: end of string or can delete all remaining chars
        if (i >= n || n - i <= k) {
            return 0;
        }

        // Return memoized result if available
        if (memo[i][k] != -1) {
            return memo[i][k];
        }

        // Option 1: Delete the character s[i]
        int res = solve(i + 1, k - 1);

        // Option 2: Keep s[i] and form a run of this character.
        // We iterate through all possible end points 'j' for this run.
        int deletedCount = 0;
        int keepCount = 0;
        for (int j = i; j < n; j++) {
            if (s.charAt(j) == s.charAt(i)) {
                keepCount++;
            } else {
                deletedCount++;
            }

            // If we have enough deletions available for this run
            if (deletedCount <= k) {
                // Calculate the length of this choice
                int currentLen = getLen(keepCount) + solve(j + 1, k - deletedCount);
                res = Math.min(res, currentLen);
            } else {
                // If we need more deletions than available, we can't extend this run further
                break;
            }
        }

        // Memoize and return the result
        return memo[i][k] = res;
    }

    public int getLengthOfOptimalCompression(String s, int k) {
        this.s = s;
        this.n = s.length();
        this.memo = new int[n][k + 1];
        for (int row = 0; row < n; row++) {
            java.util.Arrays.fill(memo[row], -1);
        }
        return solve(0, k);
    }
}

Complexity Analysis

Time Complexity: O(n^2 * k). The number of states is `n * k`. For each state `(i, k)`, we perform a loop that runs at most `n` times to decide the end of the current run.Space Complexity: O(n * k), where n is the string length and k is the max deletions. This is dominated by the size of the memoization table. The recursion stack depth adds O(n).

Pros and Cons

Pros:
  • Finds the optimal solution efficiently by avoiding redundant calculations.
  • Passes the time limits for the given constraints.
  • It's a standard and robust way to solve this type of combinatorial optimization problem.
Cons:
  • The space complexity of O(n*k) might be an issue for problems with much larger constraints, but it is acceptable here.

Code Solutions

Checking out 1 solution in different languages for String Compression II. Click on different languages to view the code.

class Solution {
public
  int getLengthOfOptimalCompression(String s, int k) {

Video Solution

Watch the video walkthrough for String Compression II



Patterns:

Dynamic Programming

Data Structures:

String

Companies:

Subscribe to Scale Engineer newsletter

Learn about System Design, Software Engineering, and interview experiences every week.

No spam, unsubscribe at any time.