Minimum Cost Good Caption

HARD

Description

You are given a string caption of length n. A good caption is a string where every character appears in groups of at least 3 consecutive occurrences.

For example:

  • "aaabbb" and "aaaaccc" are good captions.
  • "aabbb" and "ccccd" are not good captions.

You can perform the following operation any number of times:

Choose an index i (where 0 <= i < n) and change the character at that index to either:

  • The character immediately before it in the alphabet (if caption[i] != 'a').
  • The character immediately after it in the alphabet (if caption[i] != 'z').

Your task is to convert the given caption into a good caption using the minimum number of operations, and return it. If there are multiple possible good captions, return the lexicographically smallest one among them. If it is impossible to create a good caption, return an empty string "".

 

Example 1:

Input: caption = "cdcd"

Output: "cccc"

Explanation:

It can be shown that the given caption cannot be transformed into a good caption with fewer than 2 operations. The possible good captions that can be created using exactly 2 operations are:

  • "dddd": Change caption[0] and caption[2] to their next character 'd'.
  • "cccc": Change caption[1] and caption[3] to their previous character 'c'.

Since "cccc" is lexicographically smaller than "dddd", return "cccc".

Example 2:

Input: caption = "aca"

Output: "aaa"

Explanation:

It can be proven that the given caption requires at least 2 operations to be transformed into a good caption. The only good caption that can be obtained with exactly 2 operations is as follows:

  • Operation 1: Change caption[1] to 'b'. caption = "aba".
  • Operation 2: Change caption[1] to 'a'. caption = "aaa".

Thus, return "aaa".

Example 3:

Input: caption = "bc"

Output: ""

Explanation:

It can be shown that the given caption cannot be converted to a good caption by using any number of operations.

 

Constraints:

  • 1 <= caption.length <= 5 * 104
  • caption consists only of lowercase English letters.

Approaches

Checkout 2 different approaches to solve Minimum Cost Good Caption. Click on different approaches to view the approach and algorithm in detail.

Optimized Linear Time Dynamic Programming

This approach improves upon the O(N^2) DP by optimizing the state transition. Instead of recomputing the minimum over all previous states for each (i, j) pair, we introduce a DP state for each possible character c. This allows us to calculate the minimum costs in O(N) time. Reconstruction is also optimized using precomputed hash maps and binary search to avoid a quadratic search.

Algorithm

  1. Pass 1: Minimum Cost Calculation (O(N))
    • Define dp[i] as the minimum cost for the suffix caption[i...n-1].
    • The key insight is that dp[i] = min_c { min_cost for suffix [i..n-1] if its first block is converted to char c }.
    • Let dp_c[i] be this inner value. The transition is dp_c[i] = min_{j=i+2..n-1} (cost_to_c(i, j) + dp[j+1]).
    • cost_to_c(i, j) is the cost to change caption[i...j] to all c's. This can be written as suffix_cost_c[i] - suffix_cost_c[j+1] using precomputed suffix sums of costs for each character c.
    • The transition becomes dp_c[i] = suffix_cost_c[i] + min_{j=i+2..n-1} (dp[j+1] - suffix_cost_c[j+1]).
    • The min term can be computed in O(1) for each i by maintaining a running minimum as i decreases.
    • We compute dp[i] by taking min_c dp_c[i]. We iterate i from n-1 down to 0. Since dp[i] depends on dp[k] for k > i, which are already computed, this works.
    • The total time for this pass is O(N * C), which is linear in N.
  2. Pass 2: Optimized Reconstruction (O(N log N))
    • With the dp table from Pass 1, we reconstruct the string.
    • At currentPos, we need to find the best next block [currentPos...j] converted to character c.
    • We want to find (c, j) that minimizes c (and then j) subject to the optimality condition: cost_to_c(currentPos, j) + dp[j+1] == dp[currentPos].
    • We iterate c from 'a' to 'z'. For the first c that allows a valid partition, we've found our best character.
    • For a fixed c, we need to find the smallest j that satisfies the condition. The condition can be rewritten as dp[j+1] - suffix_cost_c[j+1] == dp[currentPos] - suffix_cost_c[currentPos].
    • Let h_c(k) = dp[k] - suffix_cost_c[k]. We need to find the smallest j >= currentPos + 2 such that h_c(j+1) equals a target value.
    • We can pre-process the values of h_c(k) into hash maps for each c, mapping a value to a sorted list of indices k where h_c(k) equals that value.
    • Then, for each c, we can query this map and use binary search on the list of indices to find the smallest valid j+1 in O(log N) time.
    • The reconstruction for each block takes O(C * log N). Since there are at most N/3 blocks, total reconstruction time is O(N * C * log N).

Pass 1: Minimum Cost Calculation (O(N*C))

This pass calculates the minimum cost dp[i] for each suffix caption[i...n-1]. The key optimization is to break down the calculation based on the character c of the first block. Let dp_c[i] be the minimum cost for the suffix [i...n-1] given its first block is converted to character c. The overall minimum dp[i] is then min_c(dp_c[i]). The recurrence for dp_c[i] can be expressed as: dp_c[i] = min_{j=i+2..n-1} (cost_to_c(i, j) + dp[j+1]) where cost_to_c(i, j) is the cost to change all characters in caption[i...j] to c. By pre-calculating suffix sums of costs (suffix_cost_c[k] = sum of |char - c| from k to n-1), we can find cost_to_c(i, j) in O(1). The recurrence simplifies to: dp_c[i] = suffix_cost_c[i] + min_{j=i+2..n-1} (dp[j+1] - suffix_cost_c[j+1]) The min term can be tracked with a running minimum as we iterate i from n-1 down to 0, making the calculation for each (i, c) pair O(1). This leads to an O(N*C) algorithm for the first pass.

Pass 2: Optimized Reconstruction (O(N*C*log N))

To construct the lexicographically smallest string, we iterate from currentPos = 0. At each step, we determine the next block [currentPos...j] with character c. We must pick the c that is lexicographically smallest. So, we iterate c from 'a' to 'z'. For each c, we check if it can start a valid optimal partition. A choice of (c, j) is valid if cost_to_c(currentPos, j) + dp[j+1] == dp[currentPos]. To find the smallest j for a given c efficiently, we can pre-process helper values h_c(k) = dp[k] - suffix_cost_c[k] into a map for each character c. The map will store h_c(k) values and the list of indices k where they occur. The validity check becomes finding an index k=j+1 in this map. We can use binary search on the sorted list of indices to find the smallest valid j in O(log N) time. The first (c, j) pair we find will be the optimal choice for the current block. We append it to the result and continue.

class Solution {
    public String goodCaption(String caption) {
        int n = caption.length();
        int C = 26;
        long INF = Long.MAX_VALUE / 2;

        if (n > 0 && n < 3) return "";

        long[][] suffixCost = new long[n + 1][C];
        for (int c = 0; c < C; c++) {
            char c_char = (char) ('a' + c);
            for (int i = n - 1; i >= 0; i--) {
                suffixCost[i][c] = suffixCost[i + 1][c] + Math.abs(caption.charAt(i) - c_char);
            }
        }

        long[] dp = new long[n + 4];
        Arrays.fill(dp, INF);
        dp[n] = 0; dp[n + 1] = 0; dp[n + 2] = 0; dp[n + 3] = 0;

        long[] minTerm = new long[C];
        Arrays.fill(minTerm, INF);

        for (int i = n - 3; i >= 0; i--) {
            long minDpI = INF;
            for (int c = 0; c < C; c++) {
                long g_val = dp[i + 3] - suffixCost[i + 3][c];
                minTerm[c] = Math.min(minTerm[c], g_val);
                if (minTerm[c] < INF) {
                    long dp_c_i = suffixCost[i][c] + minTerm[c];
                    minDpI = Math.min(minDpI, dp_c_i);
                }
            }
            dp[i] = minDpI;
        }

        if (dp[0] >= INF) return "";

        StringBuilder result = new StringBuilder();
        int currentPos = 0;
        while (currentPos < n) {
            for (int c = 0; c < C; c++) {
                char c_char = (char) ('a' + c);
                boolean found = false;
                for (int j = currentPos + 2; j < n; j++) {
                    long blockCost = suffixCost[currentPos][c] - suffixCost[j + 1][c];
                    if (dp[j + 1] < INF && blockCost + dp[j + 1] == dp[currentPos]) {
                        int len = j - currentPos + 1;
                        for (int k = 0; k < len; k++) result.append(c_char);
                        currentPos = j + 1;
                        found = true;
                        break;
                    }
                }
                if (found) break;
            }
        }
        return result.toString();
    }
}

Note: The provided code snippet for reconstruction in the O(N) approach uses a simplified O(N*C) loop for clarity, which would be too slow. A full implementation would use the described map and binary search technique to achieve O(log N) per query.

Complexity Analysis

Time Complexity: O(N * C + N * C * log N), where N is the length of the string and C is the alphabet size (26). The cost calculation is `O(N*C)`. The reconstruction is `O(N/3 * C * log N)`. Since C is a constant, this is effectively `O(N log N)`.Space Complexity: O(N * C), where N is the length of the string and C is the alphabet size (26). This space is used for `dp`, `suffixCost`, and other auxiliary tables.

Pros and Cons

Pros:
  • Highly efficient with a time complexity of O(N*C), which is linear for a fixed alphabet size.
  • Guaranteed to pass within the time limits for the given constraints.
Cons:
  • The implementation is significantly more complex than the O(N^2) approach.
  • Requires more space to store auxiliary tables for the optimization (dp_c, suffix_cost_c, and hash maps for reconstruction).

Video Solution

Watch the video walkthrough for Minimum Cost Good Caption



Patterns:

Dynamic Programming

Data Structures:

String

Subscribe to Scale Engineer newsletter

Learn about System Design, Software Engineering, and interview experiences every week.

No spam, unsubscribe at any time.