算法 - 6. Trie(前缀树)

何时使用Trie

当您需要对字符串集合进行快速的基于前缀的操作时使用Trie。"Trie"这个名字来源于单词"retrieval"(检索)(发音类似"try"),由Edward Fredkin在1960年发明。虽然源自"retrieval",但它专门设计用于快速字符串检索和前缀匹配操作。

别名: Trie也常被称为**"前缀树(Prefix Tree)"或"数字树(Digital Tree)"**。"前缀树"这个名字特别形象,因为树中的每个节点都代表其后代的公共前缀。一些文献也将其称为"基数树(Radix Tree)"(尽管严格来说基数树是Trie的空间优化变体)。

虽然最常用于单词验证,但它在以下方面表现出色:

自动补全/建议(经典用例 - 查找具有公共前缀的所有单词)
拼写检查器(检查单词是否存在于字典中)
IP路由表(最长前缀匹配)
电话簿(按前缀搜索联系人)
文字游戏(在Scrabble中验证单词,在Boggle中查找单词)
任何需要对多个字符串进行高效前缀匹配或存在性检查的问题

示例问题

自动补全系统: 给定单词字典 ["apple", "app", "apricot", "application", "apply", "banana", "band", "bandana"],实现一个搜索系统:

可以快速检查"apple"是否是有效单词
可以找到所有以"app"开头的单词用于自动补全
可以高效地向字典添加新单词

答案: Trie以O(m)时间处理所有三个操作,其中m是单词/前缀的长度,与字典大小无关!

Trie如何存储我们的字典

以下是我们示例单词形成的Trie结构:

        root
       /    \
      a      b
     /       |
    p        a
   /|\       |\
  p r i      n d
  | | |      | |
  l i c      a a
  | | |      | |
  [e*] c a   [n*] n
      | |        |
      o t        [a*]
      | |
      [t*] i
           |
           o
           |
           [n*]

[*] = 单词结束标记
单词: app*, apple*, apricot*, application*, apply*, banana*, band*, bandana*

注意"app"、"apple"、"application"和"apply"都共享前缀"app" - 只存储一次!

最重要的洞察

Trie通过共享公共前缀来用空间换取速度 - 从根到节点的每条路径代表一个前缀,允许O(m)操作,无论存储了多少字符串,因为您只遍历您正在搜索的字符。

记住要点: Trie的天才之处在于将字符串操作转换为树的遍历。我们不是重复比较整个字符串,而是逐字符地沿着单一路径前进。每个节点的决策很简单:"这个字符是否作为子节点存在?"这种简单的遍历自动利用了所有共享的前缀。

为什么是"Trie"而不是"Tree": 虽然它是一个树结构,但"Trie"这个名字通过强调其专门目的 - 通过前缀共享高效地重新trieval(检索)字符串,将其与普通树区分开来。它是专门为字符串操作优化的树,而不仅仅是任何分层数据。

决策框架

通用Trie决策

每个Trie操作都归结为一个问题:

"当前节点是否有这个字符的子节点?"

是 → 沿着那条路径走
否 → 创建它(插入)或返回结果(搜索/前缀)

这个单一的决策,对每个字符重复,支撑着所有Trie操作。

在操作的每个字符处,您面临这个决策:

插入时: 创建新节点还是沿着现有路径?
搜索时: 继续沿着现有路径还是返回false?
前缀匹配时: 只要路径存在就继续沿着它走

代码

设计选择说明

function createTrieNode() {
  return {
    isEndOfWord: false, // Why: 区分完整单词和前缀
    children: new Map(), // Why: Map高效处理稀疏字符集
  }; // (与为未使用字母浪费内存的array[26]相比)
}

核心Trie实现

function Trie() {
  const root = createTrieNode();

  function insert(word) {
    let currentNode = root;

    for (const char of word) {
      const hasChildWithChar = currentNode.children.has(char);

      // DECISION FRAMEWORK: Create new path or follow existing?
      if (!hasChildWithChar) {
        const newNode = createTrieNode();
        currentNode.children.set(char, newNode);
      }

      currentNode = currentNode.children.get(char);
    }

    // Mark the end of a complete word
    currentNode.isEndOfWord = true;
  }

  function search(word) {
    let currentNode = root;

    for (const char of word) {
      const hasChildWithChar = currentNode.children.has(char);

      // DECISION FRAMEWORK: Path exists or return false?
      if (!hasChildWithChar) {
        return false; // Character not found, word doesn't exist
      }

      currentNode = currentNode.children.get(char);
    }

    // Found all characters, but is it a complete word?
    const isCompleteWord = currentNode.isEndOfWord;
    return isCompleteWord;
  }

  function startsWith(prefix) {
    let currentNode = root;

    for (const char of prefix) {
      const hasChildWithChar = currentNode.children.has(char);

      // DECISION FRAMEWORK: Continue traversal or prefix not found?
      if (!hasChildWithChar) {
        return false;
      }

      currentNode = currentNode.children.get(char);
    }

    // If we've traversed the entire prefix, it exists
    return true;
  }

  // Return the public interface
  return {
    insert,
    search,
    startsWith,
  };
}

// Example usage with our dictionary
const dictionary = Trie();
const words = ["apple", "app", "apricot", "application", "apply"];

for (const word of words) {
  dictionary.insert(word);
}

console.log(dictionary.search("apple")); // true
console.log(dictionary.search("app")); // true
console.log(dictionary.search("appl")); // false (not a complete word)
console.log(dictionary.startsWith("app")); // true

分步骤: 插入"app"

让我们跟踪如何将单词"app"插入空Trie:

起始状态: root (空)

1. 处理'a':
   - root有子节点'a'吗? 否
   - 为'a'创建新节点
   - 移动到节点'a'
   状态: root → a

2. 处理'p':
   - 'a'有子节点'p'吗? 否
   - 为'p'创建新节点
   - 移动到第一个'p'
   状态: root → a → p

3. 处理'p':
   - 第一个'p'有子节点'p'吗? 否
   - 为第二个'p'创建新节点
   - 移动到第二个'p'
   状态: root → a → p → p

4. 标记单词结束:
   - 在第二个'p'上设置isEndOfWord = true
   最终: root → a → p → p[*]

高级操作

查找具有前缀的所有单词(自动补全)

// Add this to the Trie function's returned object
function findAllWordsWithPrefix(prefix) {
  let currentNode = root;

  // First, navigate to the prefix end
  for (const char of prefix) {
    const hasChildWithChar = currentNode.children.has(char);

    if (!hasChildWithChar) {
      return []; // Prefix doesn't exist
    }

    currentNode = currentNode.children.get(char);
  }

  // Collect all words starting from this node
  const results = [];
  collectWords(currentNode, prefix, results);
  return results;
}

function collectWords(node, currentWord, results) {
  const isWord = node.isEndOfWord;

  if (isWord) {
    results.push(currentWord);
  }

  // Recursively collect from all children
  for (const [char, childNode] of node.children) {
    const nextWord = currentWord + char;
    collectWords(childNode, nextWord, results);
  }
}

// Usage example:
console.log(dictionary.findAllWordsWithPrefix("app"));
// Returns: ["app", "apple", "application", "apply"]

常见实现陷阱

忘记单词结束标记: 没有isEndOfWord,您无法区分"app"(完整单词)和"appl"(在"apple"中存在的前缀)
使用数组而不是Map:
- 数组方式: children = new Array(26) 在只使用几个字母时浪费内存
- Map方式: children = new Map() 只存储实际存在的字符
不处理空字符串: 预先决定 - 空字符串在您的Trie中是有效单词吗?大多数实现将其视为无效
混淆前缀存在和单词存在: "appl"作为前缀存在于我们的Trie中(到"apple"的路径),但它本身不是一个单词

疑问解答

常见疑问: "Trie不会使用过多内存吗?如果我有像'a'、'ab'、'abc'、...、'abcdefghijklmnopqrstuvwxyz'这样的单词,我只为一个单词创建了26个节点!"

视觉比较: Trie vs 原始字符串存储

让我们分析存储"algorithm"、"algorithms"、"algorithmic"、"algorithmically":

无Trie(存储原始字符串):
- "algorithm"         → 9个字符
- "algorithms"        → 10个字符
- "algorithmic"       → 11个字符
- "algorithmically"   → 16个字符
总计: 46个字符存储单元

有Trie(共享前缀):
         a-l-g-o-r-i-t-h-m (9个节点)
                          ├─ [end] (单词: "algorithm")
                          ├─ s-[end] (1个节点,用于"algorithms")
                          └─ i-c (2个节点)
                              ├─ [end] (单词: "algorithmic")
                              └─ a-l-l-y-[end] (4个节点,用于"algorithmically")
总计: 16个节点 (9 + 1 + 2 + 4)

前缀共享的关键洞察

魔力: 不是46个字符,我们只需要16个节点,因为"algorithm"被所有四个单词共享!

实际效率 - 考虑英语单词字典:

单词共享大量前缀("run"、"running"、"runner"、"runs")
出现常见模式("pre-"、"un-"、"re-"、"-ing"、"-ed")
添加的单词越多,前缀共享就越多

以"pre-"开头的1000个单词的例子:

简单存储: 约7000个字符("pre"之后平均7个字符)
Trie存储: "pre"3个节点 + 分支路径(远少于7000)

Trie实际浪费空间的情况

Trie效率低下的情况:

没有公共前缀: 存储["xyz", "abc", "qrs"]创建完全独立的路径
非常短的字符串: 节点结构的开销超过字符串长度
随机数据: UUID或哈希没有有意义的前缀模式

空间-时间权衡

Trie的卓越之处: 它们用空间换取卓越的时间保证:

| 操作     | HashMap                  | Trie                           |
| -------- | ------------------------ | ------------------------------ |
| 插入     | 平均O(m),最坏O(n×m)      | 总是O(m)                       |
| 搜索     | 平均O(m),最坏O(n×m)      | 总是O(m)                       |
| 前缀匹配 | O(n×m) - 检查所有字符串  | O(m) - 只沿着路径走            |
| 自动补全 | O(n×m) - 扫描所有内容    | O(m+k) - 前缀m,结果k           |

其中n = 字符串数量,m = 字符串长度,k = 结果数量

保证: 虽然Trie可能对随机字符串使用更多内存,但它们为前缀操作提供了无与伦比的时间复杂度。在具有自然前缀模式的领域(单词、URL、文件路径)中,它们在时间和空间上都是高效的!