◆Beginnerasked at Amazonasked at Googleasked at Meta

Strings

Strings look like simple text, but they hide a brutal trap: naive concatenation in a loop is O(n²). Learn why, how encoding actually works, and the handful of patterns — sliding window, two pointers, hashing — that solve 80% of string interview problems.

12 min read2026-01-20Ironclad Academy

#strings #arrays #fundamentals #data-structures

Complexity cheat sheet

Operation

Average

Worst

Access by indexsame as array — direct offset into byte buffer

O(1)

Lengthcached; O(n) in C where you scan for the null byte

O(1)

Concatenation (s += t)allocates new string of length |s| + |t|; O(n²) in a loop

O(n)

Slice / substringk = length of slice; copies the bytes

O(k)

Search (s.find)naive; KMP/Rabin-Karp achieve O(n+m)

O(n·m)

Build with list + joinsingle allocation, single pass — always prefer this in a loop

O(n)

// DEPTH

the full breakdown — requirements, capacity, evolution, trade-offs

The mental model: an immutable tape

Think of a string as a tape that has already been printed. You can read any character by rewinding to that position (O(1)). You can scan the whole tape (O(n)). But to "change" anything you have to print a new tape from scratch — you cannot erase and overwrite. That is immutability.

Every major language you will use in an interview enforces this:

Python — str is immutable. s[0] = 'x' is a TypeError.
Java — String is immutable. StringBuilder exists specifically to work around it.
JavaScript — strings are immutable primitives. s[0] = 'x' silently does nothing.
Go — strings are immutable byte slices. Convert to []byte to mutate.
C++ — std::string is mutable (the exception). But even here, s + t allocates a new object.

The practical consequence: any algorithm that builds a string incrementally must collect pieces in a mutable buffer and assemble once at the end.

Encoding: what is actually in the byte buffer?

You can skip this section if a problem says "only lowercase ASCII letters." You cannot skip it for anything involving real user data, file I/O, or Unicode characters.

ASCII: the 128-character starting point

ASCII assigns each of 128 characters (English letters, digits, punctuation, control characters) a number from 0 to 127. The letter 'A' is 65, 'a' is 97, '0' is 48. Each character fits in 7 bits, stored in one byte.

This is why the classic trick works: ord('a') gives 97, and ord('z') - ord('a') gives 25, so any lowercase ASCII string maps onto a 26-element integer array by ord(c) - ord('a'). Every anagram problem that says "only lowercase letters" is quietly telling you to use this trick.

Unicode and UTF-8: the real world

Unicode assigns a code point (a number, written U+XXXX) to every character in every writing system ever recorded — over 140,000 of them. The emoji 🔥 is U+1F525. The Japanese character 日 is U+65E5.

UTF-8 is the dominant encoding that turns those code points into bytes:

U+0000 to U+007F (plain ASCII): 1 byte — identical to ASCII, so all ASCII text is valid UTF-8.
U+0080 to U+07FF: 2 bytes — covers most Latin-derived scripts, Greek, Cyrillic, Arabic, Hebrew.
U+0800 to U+FFFF: 3 bytes — CJK characters, most of the rest of the Basic Multilingual Plane.
U+10000 and above: 4 bytes — emoji, rare historical scripts.

What this means for you in practice:

len("café") in Python 3 returns 4 (four characters). The UTF-8 encoding is 5 bytes (é takes 2). If you ever need byte length, use len("café".encode("utf-8")).
Slicing by index in Python 3 slices by code point, not byte, which is usually what you want.
In Go, ranging over a string with for i, r := range s gives you runes (code points). Ranging with for i := range s gives you byte positions. These are different when multi-byte characters are present.

For interview problems, the encoding footnote matters in exactly one way: do not assume len(s) equals the number of bytes. In Python 3 it equals the number of code points, which is fine for character-based algorithms.

The O(n²) concatenation trap, in detail

Here is the example from the summary, with the cost spelled out:

# BAD — O(n²)
def join_words_naive(words):
    result = ""
    for word in words:
        result += word   # allocates len(result) + len(word) bytes, copies both
    return result

After the first iteration, result has len(words[0]) characters. After the second, len(words[0]) + len(words[1]). The copy cost at step i is proportional to the total length so far, which grows linearly. Add it up and you have a triangular sum: O(n²).

The correct pattern:

# GOOD — O(n)
def join_words(words):
    parts = []
    for word in words:
        parts.append(word)   # O(1) amortized — list append
    return "".join(parts)    # single allocation, single copy pass

"".join(parts) knows the total length upfront (it sums them), allocates once, and copies each piece in one pass. O(n) total.

In Java this is StringBuilder:

StringBuilder sb = new StringBuilder();
for (String word : words) {
    sb.append(word);   // amortized O(1) — dynamic buffer
}
String result = sb.toString();  // one final copy

In Go: strings.Builder. In JavaScript: same list-join pattern (arr.join("")). In C++: std::string accumulation actually does amortize, but std::ostringstream or a vector<string> + final join is the clean approach.

The rule: if you are building a string in a loop, you are probably writing O(n²) code by accident. Collect, then join.

Strings as interview bait: the patterns

Most string interview problems reduce to one of three underlying techniques. Once you recognize which one a problem wants, the solution structure is almost mechanical.

Pattern 1: Sliding window

The tell: "find the longest (or shortest) substring with property X." Longest substring without repeating characters. Minimum window containing all characters of t. Longest substring with at most k distinct characters.

The naive approach is a double loop — try every pair of (left, right) and check the substring — which is O(n²) or worse. The sliding window cuts this to O(n) by maintaining a valid window and only moving pointers forward.

def length_of_longest_substring(s: str) -> int:
    seen = {}          # char -> last seen index
    left = 0
    best = 0

    for right, ch in enumerate(s):
        if ch in seen and seen[ch] >= left:
            # character repeats inside the window — shrink from the left
            left = seen[ch] + 1
        seen[ch] = right
        best = max(best, right - left + 1)

    return best

left never moves backward — that is the key property that makes this O(n). For a full treatment of the pattern, see sliding window.

{ "type": "array", "title": "sliding window on \"abcabcbb\" — window [left, right] expands right, contracts left on repeat", "data": ["a", "b", "c", "a", "b", "c", "b", "b"] }

Watch how the window expands one step at a time to the right, then jumps the left boundary when it hits the duplicate 'a'. The window never backtracks. That is the O(n) guarantee.

Pattern 2: Character frequency hashing

The tell: the problem is about character counts — anagrams, permutations in a string, group anagrams, valid anagram. The insight is that two strings are anagrams of each other if and only if their character frequency maps are identical.

from collections import Counter

# Check if s and t are anagrams — O(n)
def is_anagram(s: str, t: str) -> bool:
    return Counter(s) == Counter(t)

# Cleaner for fixed alphabet (only lowercase ASCII):
def is_anagram_array(s: str, t: str) -> bool:
    if len(s) != len(t):
        return False
    freq = [0] * 26
    for c in s:
        freq[ord(c) - ord('a')] += 1
    for c in t:
        freq[ord(c) - ord('a')] -= 1
    return all(x == 0 for x in freq)

The array version with 26 slots is faster in practice than a hash table because there is no hashing overhead — just direct index arithmetic. Use it whenever the problem restricts to lowercase letters.

The group anagrams problem shows the pattern at scale:

from collections import defaultdict

def group_anagrams(strs: list[str]) -> list[list[str]]:
    groups = defaultdict(list)
    for s in strs:
        key = tuple(sorted(s))   # or: tuple(Counter(s).items())
        groups[key].append(s)
    return list(groups.values())

Each word is sorted to a canonical key. Words that are anagrams of each other land on the same key. O(n · k log k) where n is number of strings and k is average length — the sort dominates.

Pattern 3: Two pointers

The tell: palindrome check, reversal, or any problem where you compare characters from opposite ends moving inward.

def is_palindrome(s: str) -> bool:
    left, right = 0, len(s) - 1
    while left < right:
        if s[left] != s[right]:
            return False
        left += 1
        right -= 1
    return True

O(n) time, O(1) space. No extra string allocation, no slicing. The pattern extends naturally: in "valid palindrome II" you skip non-alphanumeric characters; in "longest palindromic substring" you expand from each center outward. The two-pointer intuition stays the same.

When you actually need to mutate: convert to a list

Python strings cannot be mutated in place. When an algorithm logically needs to — reversing, swapping characters, building a result by position — the standard idiom is:

def reverse_words(s: str) -> str:
    chars = list(s)          # O(n) — mutable list of single chars

    # reverse the whole string
    left, right = 0, len(chars) - 1
    while left < right:
        chars[left], chars[right] = chars[right], chars[left]
        left += 1
        right -= 1

    # reverse each word in place
    start = 0
    for i in range(len(chars) + 1):
        if i == len(chars) or chars[i] == ' ':
            left, right = start, i - 1
            while left < right:
                chars[left], chars[right] = chars[right], chars[left]
                left += 1
                right -= 1
            start = i + 1

    return ''.join(chars)    # O(n) — single allocation

The list(s) at the top and ''.join(result) at the bottom are a matched pair — O(n) each, unavoidable in Python. Everything in between is O(1) per swap.

A slightly harder problem: minimum window substring

Given strings s and t, find the minimum-length contiguous substring of s that contains every character of t.

This combines sliding window with character frequency hashing, and it is a canonical medium-hard string problem.

from collections import Counter

def min_window(s: str, t: str) -> str:
    if not t or not s:
        return ""

    need = Counter(t)        # how many of each char we still need
    missing = len(t)         # total characters still needed
    best_left = 0
    best_len = float('inf')
    left = 0

    for right, ch in enumerate(s):
        if need[ch] > 0:
            missing -= 1     # this character satisfies one need
        need[ch] -= 1

        if missing == 0:
            # window contains all of t — try to shrink from the left
            while need[s[left]] < 0:
                need[s[left]] += 1
                left += 1
            # record best
            if right - left + 1 < best_len:
                best_len = right - left + 1
                best_left = left
            # slide: give up the leftmost required character to restart search
            need[s[left]] += 1
            missing += 1
            left += 1

    return s[best_left:best_left + best_len] if best_len < float('inf') else ""

The trick is missing as a single integer rather than checking the entire need dict every iteration. We only decrement it when a character's count goes from 1 to 0 (actually needed), not when it goes from -1 to -2 (already satisfied extras). This keeps the inner logic O(1). Total time: O(|s| + |t|).

A note on ropes and when strings get really large

Everything above assumes strings that fit comfortably in memory. If you are building a text editor that needs O(1) inserts anywhere in a 100MB document, a rope is the answer: a binary tree where each leaf holds a small chunk of string and each internal node stores the total length of its left subtree. Insert and delete are O(log n) by tree surgery. Concatenation is O(1) — just build a new root.

You will not implement a rope in a coding interview. But you might be asked to design a text editor, and knowing the word "rope" and its O(log n) insertion is the right thing to say.

StringBuilder in Java and Go's strings.Builder are much simpler versions of the same idea: a dynamic buffer with amortized O(1) append that produces a string on demand.

Complexity quick-reference

Operation	Cost	Note
`s[i]` — index access	O(1)	direct byte offset
`len(s)`	O(1)	cached in CPython, Go; O(n) in C (null scan)
`s + t` — concatenation	O(len(s) + len(t))	new allocation and copy
`s += t` in a loop (n times)	O(n²)	the trap
`"".join(parts)`	O(total length)	single pass — always prefer
`s[i:j]` — slice	O(j - i)	copies the bytes
`s.find(t)` — naive search	O(n·m)	KMP/Rabin-Karp: O(n+m)
`Counter(s)` or freq array	O(n)	for anagram/permutation problems

Pitfalls that actually bite people

Using += to build output strings. Covered above, but it bears repeating: this is the single most common O(n²) bug in string code. Every competitive programmer learns this lesson the hard way at least once.

Slicing creates a copy. s[1:] in Python does not give you a view — it allocates a new string of length n-1. If you need to track a moving window, track left and right integer indices rather than slicing. s[left:right] at the end for the final answer is fine; doing it every loop iteration is not.

len(s) vs byte length. In Python 3, len("🔥") is 1 (one code point). len("🔥".encode("utf-8")) is 4 (four bytes). This matters for any problem involving non-ASCII input.

Off-by-one in window boundaries. A window from index left to right inclusive has length right - left + 1. A slice s[left:right] in Python is exclusive on the right — it gives s[left] through s[right-1]. Keep these conventions straight and write them in a comment.

Comparing strings with is instead of == in Python. s is t checks object identity (same object in memory). s == t checks value equality. They happen to agree for short interned strings, which makes bugs intermittent and maddening. Always use == for string comparison.

When NOT to use a plain string

You need…	Use instead
Fast membership test ("is this word in dictionary?")	Hash table / set
Count distinct characters in O(1) per query	Frequency array or Counter, precomputed
O(log n) insert anywhere in a huge document	Rope
Mutable char-by-char construction in Python	`list` of chars → `join`
Prefix matching over many strings	Trie
Pattern matching on many strings at once	Aho-Corasick automaton

For most interview problems, a plain string with the right technique (sliding window, two pointers, frequency hash) is exactly the right tool. The data structure is rarely the bottleneck — the algorithm operating on it is.

If you are still building your foundations, make sure you have a solid grip on arrays first — everything in this article about O(1) indexing and O(n) shifting applies there in the mutable, unconstrained form. Then, when sliding window problems keep coming up, go deep on sliding window. And when frequency-based problems need something faster than a Python dict, hash tables explain why that Counter works and what it costs.

// KEY TAKEAWAYS

▸Strings are immutable arrays of characters in most languages — every += in a loop copies the whole string, giving O(n²) total work. Use list + join instead.
▸UTF-8 encodes Unicode code points in 1–4 bytes; ASCII characters stay 1 byte. "Only lowercase letters" means a 26-slot frequency array can replace a hash map.
▸The sliding window pattern handles almost every "longest/shortest substring with property X" problem in O(n) by avoiding the O(n²) naive double-loop.
▸Two pointers on a string are used for palindromes, reversals, and in-place rearrangements — one pointer from each end, working inward.
▸Character frequency hashing (Counter or a 26-slot array) is the key to anagram and permutation problems; recognize when two strings are related by character counts.

// FAQ

Frequently asked questions

▸Why is string concatenation in a loop O(n²)?

In most languages (Python, Java, JavaScript), strings are immutable. Every += creates a brand-new string object by copying both operands into fresh memory. If you do this in a loop that runs n times, the string grows by one character each iteration, so the copies are 1 + 2 + 3 + ... + n = n(n+1)/2 total character moves — O(n²). The fix is to collect pieces in a list and join once at the end, which does a single allocation and a single pass: O(n).

▸What is the difference between ASCII, Unicode, and UTF-8?

ASCII is a 7-bit encoding of 128 characters (English letters, digits, punctuation). Unicode is a standard that assigns a code point (a number) to every character in every human writing system — over 140,000 of them. UTF-8 is an encoding that turns those code points into bytes: it uses 1 byte for ASCII characters (so UTF-8 is ASCII-compatible) and 2–4 bytes for everything else. Most Python, JavaScript, and Go source code and I/O is UTF-8. When a problem says "only lowercase English letters", it is telling you the input is 26 ASCII characters — which means a frequency array of size 26 works as a hash map.

▸Are Python strings immutable? What does that mean in practice?

Yes. Once you create a Python string you cannot change it — s[0] = 'x' raises a TypeError. Every operation that looks like a modification (upper(), replace(), s + t) returns a new string object. For algorithms that need to modify characters, convert to a list first, do your work, then join back: list(s) → ... → ''.join(result). This is O(n) in both directions and is the standard pattern.

▸How do you check if two strings are anagrams of each other?

Two strings are anagrams if they contain exactly the same characters with the same frequencies. The clean O(n) approach: build a frequency counter (a dict or Counter in Python) for each string and compare. Or use a single counter: increment for every character in s, decrement for every character in t, and check all values are zero at the end. If the character set is guaranteed to be small (e.g., only lowercase ASCII), you can use a 26-element integer array instead of a hash map — faster in practice.

▸When should I use a sliding window on a string?

Use a sliding window when the problem asks for a contiguous substring that satisfies some condition — longest without repeating characters, shortest containing all required characters, maximum distinct characters within length k. The window expands the right pointer and contracts the left pointer to maintain validity. A hash map or frequency array tracks the window's current character counts. The window never resets from scratch, so the total work is O(n) even though you are considering all substrings implicitly.

← previous

Tries (Prefix Trees)

Segment Trees

// RELATED