Shortest Matching Substring

HARD

Description

You are given a string s and a pattern string p, where p contains exactly two '*' characters.

The '*' in p matches any sequence of zero or more characters.

Return the length of the shortest substring in s that matches p. If there is no such substring, return -1.

Note: The empty substring is considered valid.

 

Example 1:

Input: s = "abaacbaecebce", p = "ba*c*ce"

Output: 8

Explanation:

The shortest matching substring of p in s is "baecebce".

Example 2:

Input: s = "baccbaadbc", p = "cc*baa*adb"

Output: -1

Explanation:

There is no matching substring in s.

Example 3:

Input: s = "a", p = "**"

Output: 0

Explanation:

The empty substring is the shortest matching substring.

Example 4:

Input: s = "madlogic", p = "*adlogi*"

Output: 6

Explanation:

The shortest matching substring of p in s is "adlogi".

 

Constraints:

  • 1 <= s.length <= 105
  • 2 <= p.length <= 105
  • s contains only lowercase English letters.
  • p contains only lowercase English letters and exactly two '*'.

Approaches

Checkout 2 different approaches to solve Shortest Matching Substring. Click on different approaches to view the approach and algorithm in detail.

Linear Time Solution using Precomputed Arrays

This approach achieves optimal linear time complexity by replacing the binary searches of the previous method with precomputed lookup tables. By investing in a linear-time preprocessing step to build these tables, we can find the best corresponding part1 and part3 for each potential part2 in constant time.

Algorithm

  • Parse and Find Occurrences: As in the previous approach, parse p into part1, part2, part3. Find all occurrences of part2 and store them in a list middles.
  • Precomputation for part1: Create an array latestStart of size N (length of s). latestStart[i] will store the starting index of the latest occurrence of part1 that begins at or before index i. This array can be filled in O(N) with a single pass.
  • Precomputation for part3: Create an array earliestEnd of size N+1. earliestEnd[i] will store the starting index of the earliest occurrence of part3 that begins at or after index i. This array can be filled in O(N) with a single pass backwards.
  • Combining: Iterate through each middleIdx from the middles list.
    1. For each middleIdx, the latest part1 must start at or before middleIdx - part1.length(). We can find its starting position i by a direct O(1) lookup in our precomputed array: i = latestStart[middleIdx - part1.length()].
    2. Similarly, the earliest part3 must start at or after middleIdx + part2.length(). We find its starting position l with an O(1) lookup: l = earliestEnd[middleIdx + part2.length()].
    3. If both i and l are valid, we have found a match. Calculate its length (l + part3.length()) - i and update the minimum length.
  • Handle Empty part2: If part2 is empty, the middles list will be empty. In this case, we must iterate through all possible split points i from 0 to N in s, treating i as the boundary between part1 and part3, and use the precomputed arrays to find the best match.

The key insight for optimization is that for every occurrence of part2, we always ask the same questions: what is the latest valid part1 before it, and what is the earliest valid part3 after it? These questions can be pre-answered for all possible positions in s.

We create two arrays:

  1. latestStart[i]: Stores the starting index of the latest part1 that starts at or before index i.
  2. earliestEnd[i]: Stores the starting index of the earliest part3 that starts at or after index i.

These arrays act as lookup tables. latestStart is built by iterating forward through s, and earliestEnd is built by iterating backward. Once these tables are ready, we can iterate through each occurrence of part2 at middleIdx and find the optimal startIdx and endIdx in O(1) time, leading to an overall linear time solution.

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

class Solution {
    public int shortestMatchingSubstring(String s, String p) {
        String[] parts = p.split("\\*", -1);
        String part1 = parts[0], part2 = parts[1], part3 = parts[2];
        int n = s.length();

        if (part1.isEmpty() && part2.isEmpty() && part3.isEmpty()) return 0;

        int[] latestStart = new int[n];
        Arrays.fill(latestStart, -1);
        int last = -1;
        for (int i = 0; i < n; i++) {
            if (s.startsWith(part1, i)) {
                last = i;
            }
            latestStart[i] = last;
        }

        int[] earliestEnd = new int[n + 1];
        Arrays.fill(earliestEnd, -1);
        last = -1;
        for (int i = n - 1; i >= 0; i--) {
            if (s.startsWith(part3, i)) {
                last = i;
            }
            earliestEnd[i] = last;
        }
        for (int i = n - 2; i >= 0; i--) {
            if (earliestEnd[i] == -1) {
                earliestEnd[i] = earliestEnd[i + 1];
            }
        }

        long minLength = Long.MAX_VALUE;

        if (part2.isEmpty()) {
            for (int i = 0; i <= n; i++) {
                int p1StartLimit = i - part1.length();
                if (p1StartLimit < 0) continue;
                int startIdx = latestStart[p1StartLimit];
                if (startIdx == -1) continue;

                int endIdx = earliestEnd[i];
                if (endIdx == -1) continue;

                minLength = Math.min(minLength, (long)endIdx + part3.length() - startIdx);
            }
        } else {
            List<Integer> middles = findOccurrences(s, part2);
            for (int middleIdx : middles) {
                int p1StartLimit = middleIdx - part1.length();
                if (p1StartLimit < 0) continue;
                int startIdx = latestStart[p1StartLimit];
                if (startIdx == -1) continue;

                int p3StartLimit = middleIdx + part2.length();
                if (p3StartLimit > n) continue;
                int endIdx = earliestEnd[p3StartLimit];
                if (endIdx == -1) continue;

                minLength = Math.min(minLength, (long)endIdx + part3.length() - startIdx);
            }
        }

        return minLength == Long.MAX_VALUE ? -1 : (int) minLength;
    }

    private List<Integer> findOccurrences(String text, String pattern) {
        List<Integer> occurrences = new ArrayList<>();
        if (pattern.isEmpty()) return occurrences;
        for (int i = 0; (i = text.indexOf(pattern, i)) != -1; i++) {
            occurrences.add(i);
        }
        return occurrences;
    }
}

Complexity Analysis

Time Complexity: O(N*M), where N is `s.length()` and M is `p.length()`. The `startsWith` calls inside loops lead to this complexity. If we pre-calculate all occurrences using an efficient algorithm like KMP, the complexity becomes O(N + M).Space Complexity: O(N), where N is the length of `s`. This is for the two auxiliary arrays `latestStart` and `earliestEnd`.

Pros and Cons

Pros:
  • Optimal time complexity of O(N + M).
  • Extremely fast in practice, as the main loop only involves array lookups.
Cons:
  • More complex to implement correctly, especially handling the edge cases where parts of the pattern are empty.
  • Uses more memory due to the auxiliary arrays of size N.

Video Solution

Watch the video walkthrough for Shortest Matching Substring



Algorithms:

Binary Search

Patterns:

Two PointersString Matching

Data Structures:

String

Subscribe to Scale Engineer newsletter

Learn about System Design, Software Engineering, and interview experiences every week.

No spam, unsubscribe at any time.