封面画师：ツチヤ封面ID：79601805

本文参考视频：小马哥教育(SEEMYGO) 2019年恋上数据结构与算法（第一季）

源码仓库：mofan212/data-structure-and-algorithm (github.com)

辅助学习网址：数据结构和算法动态可视化

0. 需求分析

现在有一需求：判断一堆 不重复 的字符串是否以某个前缀开头？

由于字符串需要是不重复的，因此可以用 Set 或 Map 存储字符串，然后遍历所有字符串，并对它们进行判断，这么做的话，时间复杂度与元素数量（字符串）的个数相关，为 $O(n)$ 。

那有没有更优的数据结构来实现 前缀搜索 呢？😑

那肯定有啊，天降猛男 —— Trie 登场！👊

1. Trie

基本概念

Trie（\traɪ\），字典树，又称前缀树（Prefix Tree）、单词查找树，注意和 Tree（\triː\）发音的区别。

Trie 搜索字符串的效率主要跟字符串的长度有关。

最开始提出的方法的搜索效率和元素数量有关，如果元素有 100w 个，最坏情况下可能需要遍历 100w 个元素。两者相比，单看文字描述就觉得 Trie 的效率要高很多。

自身结构

那么 Trie 是怎么做到的呢？

先得说明一点，Trie 是一种树形结构，但是它不是二叉树，是多叉树。

假设使用 Trie 存储 cat、dog、doggy、does、cast、add 六个单词，最终得到的 Trie 如下：

针对上面给出的 Trie，可以认为 根节点没有存储任何元素，而其每个子节点都存储了一个字母。

假设这个时候需要查找单词 dog，从根节点出发，先搜索字母 d，找到后再搜索 o ，最后搜索 g ，这样一来就可以找到单词 dog 了。正是因为这种查找方式，使得 Trie 搜索字符串的效率主要跟字符串的长度有关。

2. 编码实现

2.1 接口设计

以下接口仅供参考，在此提供两套接口。

第一套接口如下：

int size();
boolean isEmpty();
void clear();
boolean contains(String str);
void add(String str);
void remove(String str);
// 判断前缀字符串是否为 prefix
boolean startsWith(String prefix);

第二套接口如下：

int size();
boolean isEmpty();
void clear();
boolean contains(String str);
V add(String str, V value);
V remove(String str);
// 判断前缀字符串是否为 prefix
boolean startsWith(String prefix);

两套接口的区别在于添加和删除方法：

对于第一套接口：添加方法仅接收一个参数，且类型为 String，删除方法返回值类型为 void；
对于第二套接口：添加方法多传递了一个参数 value，类型为范型 V，删除方法的返回值类型为 V。

这两套接口就类似于 Set 和 Map ，具体使用哪套视情况而定。

选择第二套接口进行实现，之后就很容易实现第一套接口了。

/**
 * @author 默烦
 * @date 2020/8/7
 */
public class Trie<V> {

    public int size() {
        return 0;
    }

    public boolean isEmpty() {
        return false;
    }

    public void clear() {

    }

    public V get(String key) {
        return null;
    }

    public boolean contains(String key) {
        return false;
    }

    public V add(String key, V value) {
        return null;
    }

    public V remove(String key) {
        return null;
    }

    public boolean startsWith(String prefix) {
        return false;
    }
}

2.2 Node 设计

现有如下 Trie 结构：

因为 Trie 是一个树形结构，需要一个 Node 类来存储每个节点的信息。

上图所示的 Trie 中有些矩形是蓝色的，有些矩形是红色的，这表示什么意思呢？

根据存储的单词可以发现，红色的矩形就表示单词结尾。

那为什么要设置一个矩形是红色呢？

假设没有存储单词 dog，那么第一个字母 g 下的矩形应该是蓝色，但又因为存储了单词 doggy，因此整个 Trie 储存的字母是不变的，仅仅是矩形颜色发生改变。在这个时候，如果需要查询单词 dog 是否存在于 Trie 中，就需要对 Trie 进行遍历。由于 g 下的矩形是蓝色，那么这个时候遍历出来的 dog 只是其他单词中的一部分，进而得出单词 dog 不存在于当前 Trie 中的结论，对单词 dog 的存在性进行了错误的判断。因此需要一个 boolean 类型的成员变量 word 表示当前节点是否为单词结尾。

在测试代码中，有这样一句代码 trie.add("cat", 1);，其中的 cat 表示 key，1 表示 value。那么对于 Node 的设计，应该需要成员变量 key 和 value 吧？😰

value 可以存在，用来存储 1 ，但是需要注意的是，只有在 word == true 的节点中，value 才是有值的。至于 key 的存在就显得多余了，因为在查询某个单词是否存在时，经过一番节点查找其实就相当于获取了这个单词。比如查询单词 dog，先找到 d，又找到 o，最后找到 g ，同时 g 所在的节点中word == true，表示前面扫描的字母组成了一个完整的单词，而这个单词就是 dog，那为了节约内存，key 显然是多余的。

那不要 key ，总得来个字段存储字母吧？

其实也不需要。在查询某个字符串时，是从根节点开始查找的，根节点下有很多个“分叉”，并且需要一个字符才找到一个特定的子节点。那应该怎么设计？

先假设字符只有 26 个英文字母，这样的话可以在 Node 类中设计一个数组，取名为 children，数组长度为 26，用于存放 26 个英文字母，之后可以根据 Node 的 children 字段值来判断是否存在某个字母，得以判断是否存在这个字母对应的特定子节点。但很显然，这种做法的局限性非常大，只能存储英文字母，如果存储的是中文该咋办呢？难不成把所有中文存到一个数组里？😱

这个时候可以使用 HashMap，key 表示字符，value 表示 key 对应的当前节点的子节点，这样就可以兼备所有字符情况了，同时，这也表明了节点里没有直接存储字符，字符存储在节点中 HashMap 的 key 中。正因如此，在所画的 Trie 结构示意图中，字符信息存放在路径上，没有直接存储在节点里。

通过上面一通分析，最后 Node 类的设计如下：

private static class Node<V> {
    HashMap<Character, Node<V>> children;
    V value;
    boolean word; // 是否为单词结尾
}

2.3 清空、取值与包含

在 Trie 类中，暂时需要两个成员变量：

size：Tire 的字符串数量
root：Trie 的根节点

此时可以对一些方法进行实现：

public class Trie<V> {
    private int size; // 元素（字符串）数量
    private Node<V> root;

    public int size() {
        return size;
    }

    public boolean isEmpty() {
        return size == 0;
    }

    public void clear() {
        size = 0;
        root = null;
    }
    
    // --snip--
}

接下来准备实现另外两个方法：

V get(String key)：在 Trie 中获取键为 key 的值（value）
boolean contains(String key)：判断当前 Trie 中是否存在键为 key 的数据

这两个方法有一个共同点：需要根据 key 来获取节点。前者是获取节点的 value，后者是判断节点是否存在。

既然如此，可以编写 Node<V> node(String key) 方法，根据传入的 key 在 Trie 中查找对应的节点。

为了方便获取某一节点的子节点，可以在 Node 类中定义 getChildren() 方法，可以用来获取某个节点的子节点信息：

private static class Node<V> {
    HashMap<Character, Node<V>> children;
    V value;
    boolean word; // 是否为单词结尾

    public HashMap<Character, Node<V>> getChildren() {
        return children == null ? (children = new HashMap<>()) : children;
    }
}

在 node() 方法中，还需要判断传入的参数是否为空，可以使用 keyCheck() 方法统一处理参数为空的情况：

private void keyCheck(String key) {
    if (key == null || key.length() == 0) {
        throw new IllegalArgumentException("key must not be empty");
    }
}

node() 方法需要从根节点遍历 Trie，需要在第一步要判断根节点是否为空：

private Node<V> node(String key) {
    if (root == null) return null;
    keyCheck(key);

    Node<V> node = root;
    int len = key.length();
    for (int i = 0; i < len; i++) {
        char c = key.charAt(i); // d o g
        node = node.getChildren().get(c);
        if (node == null) return null;
    }
    // 还得判断node是否是单词结尾
    return node.word ? node : null;
}

获取方法 get() 和包含方法 contains() 的实现如下：

public V get(String key) {
    Node<V> node = node(key);
    return node == null ? null : node.value;
}

public boolean contains(String key) {
    return node(key) != null;
}

根节点的初始化

关于根节点的设置，就是给成员变量 root 初始化：

1	`private Node<V> root = new Node<>();`

既然都进行了初始化，因此需要修改清空方法 clear()，只清空 Map，不设置为空，方便后续添加子节点：

public void clear() {
    size = 0;
    root.getChildren().clear();
}

初始化后还有一个好处，根节点永远不会为空，之后可以删去 node() 方法中的非空判断：

private Node<V> node(String key) {
    // if (root == null) return null;
    keyCheck(key);

    Node<V> node = root;
    int len = key.length();
    for (int i = 0; i < len; i++) {
        char c = key.charAt(i); // d o g
        node = node.getChildren().get(c);
        if (node == null) return null;
    }
    // 得判断node是否是单词结尾
    return node.word ? node : null;
}

2.4 添加字符串

添加元素的接口定义如下：

1	`V add(String key, V value)`

在实现这个接口时，首先要验证 key 是否为空。

之后的添加步骤如下：

获取根节点
获取 key 的长度
使用循环依次取出 key 中每个字符，然后从根节点向下遍历，判断其子节点是否是以 key 中的字符为 HashMap的键
如果添加的元素已经存在，获取这个元素对应的 value，使用新 value 覆盖旧 value，最后返回旧的 value
如果添加的元素不存在，令元素最后一个字符对应的节点的 word 为 true，并使其 value 为传递进来的新 value，元素数量（字符串数量）加一，最后返回空

public V add(String key, V value) {
    keyCheck(key);

    Node<V> node = root;
    int len = key.length();
    for (int i = 0; i < len; i++) {
        char c = key.charAt(i); // d o g
        Node<V> childNode = node.getChildren().get(c);
        if (childNode == null) {
            childNode = new Node<>();
            node.getChildren().put(c, childNode);
        }
        node = childNode;
    }

    if (node.word) { // 已经存在这个单词
        V oldValue = node.value;
        node.value = value;
        return oldValue;
    }

    node.word = true;
    node.value = value;
    size++;
    return null;
}

2.5 前缀判断

所谓前缀判断就是判断当前 Trie 是否存在某字符串的前缀，注意区分与包含方法 contains() 的区别，包含方法是判断当前 Trie 中是否存在某一完整字符串。

举个例子，假设当前 Trie 存在一个字符串 Student：

使用前缀判断方法 startsWith() 判断当前 Trie 中是否存在前缀 Stud，最终结果返回true
使用包含方法 contains() 判断当前 Trie 中是否存在完整字符串 Student，最终返回结果也是true
如果使用前缀判断当前 Trie 中是否存在前缀 Student，结果也返回 true，但是使用包含方法contains() 判断当前 Trie 中是否存在字符串 Stud 时，得到的返回结果却是false

前缀判断的实现与 node() 方法极其相似，具体步骤如下：

对传入的前缀进行非空判断
获取根节点和前缀的长度
使用循环依次取出前缀中每个字符，然后从根节点向下遍历，判断其子节点是否存在，如果不存在就表示当前 Trie 中不存在传入的前缀，反之则存在

public boolean startsWith(String prefix) {
    keyCheck(prefix);

    Node<V> node = root;
    int len = prefix.length();
    for (int i = 0; i < len; i++) {
        char c = prefix.charAt(i);
        node = node.getChildren().get(c);
        if (node == null) return false;
    }
    return true;
}

2.6 代码调整

为了方便后续编写和节约空间，取消成员变量 root 的初始化，等实际用到根节点时才去创建根节点。

又回到最初的起点~🎵

1	`private Node<V> root;`

同时内部类 Node 中也没有必要获取子节点 children 时就创建一个，children 也可以为空：

private static class Node<V> {
    HashMap<Character, Node<V>> children;
    V value;
    boolean word; // 是否为单词结尾
}

清空与添加

然后修改 clear() 方法：

public void clear() {
    size = 0;
    root = null;
}

在这种情况下，根节点 root 不一定存在，当根节点为空时，才创建根节点。同时因为取消了 getChildren() 方法，添加方法 add() 也需要修改：

public V add(String key, V value) {
    keyCheck(key);

    // 创建根节点
    if (root == null) {
        root = new Node<>();
    }

    Node<V> node = root;
    int len = key.length();
    for (int i = 0; i < len; i++) {
        char c = key.charAt(i); // d o g
        boolean emptyChild = node.children == null;
        Node<V> childNode = emptyChild ? null : node.children.get(c);
        if (childNode == null) {
            childNode = new Node<>();
            node.children = emptyChild ? new HashMap<>() : node.children;
            node.children.put(c, childNode);
        }
        node = childNode;
    }

    if (node.word) { // 已经存在这个单词
        V oldValue = node.value;
        node.value = value;
        return oldValue;
    }

    node.word = true;
    node.value = value;
    size++;
    return null;
}

获取节点与前缀判断

因为根节点不再初始化，导致根节点可能存在为空的情况，还需要对 node() 方法和 startsWith() 方法进行调整。

又因为取消了getChildren()方法，对获取子节点的方式也需要进行修改。

node() 和 startsWith() 的逻辑极其相似，存在代码复用的情况：

public boolean startsWith(String prefix) {
    return node(prefix) != null;
}

private Node<V> node(String key) {
    keyCheck(key);

    Node<V> node = root;
    int len = key.length();
    for (int i = 0; i < len; i++) {
        // 注意这里的逻辑判断
        if (node == null || node.children == null || node.children.isEmpty()) return null;
        char c = key.charAt(i); // d o g
        node = node.children.get(c);
    }

    // return node.word ? node : null; // 得判断node是否是单词结尾
    return node; // 返回值可能为null
}

取值与包含

node() 方法的返回值可能为 null，get() 与 contains() 也一并调整：

public V get(String key) {
    Node<V> node = node(key);
    return node != null && node.word ? node.value : null;
}

public boolean contains(String key) {
    Node<V> node = node(key);
    return node != null && node.word;
}

2.7 字符串删除

先理一下删除的逻辑：

获取传递的 key 对应的字符串的最后一个节点，将这个节点记为 node
如果 node 为 null 或者不是单词结尾，那么不做任何处理，直接返回 null
不满足第二步的情况下，node 一定不为 null 且一定是单词结尾，这个时候需要进行删除，先将 size 减一
如果 node 还有子节点，直接令其 word 为 false，value 为null
如果 node 没有子节点，就从 node 往回删。往回删除的时候遇到要被删除的节点有子节点或者其 word 为true 时，就停止删除
删除完成后，记得返回 node 的 value

为了能更好地实现删除方法 reomve()，需要给内部类 Node 增加两个成员变量和一个构造方法：

private static class Node<V> {
    Node<V> parent; // 父节点
    HashMap<Character, Node<V>> children;
    Character character; // 存入的字符
    V value;
    boolean word; // 是否为单词结尾

    public Node(Node<V> parent) {
        this.parent = parent;
    }
}

这样的话，先前实现的 add() 需要调整一波：

public V add(String key, V value) {
    keyCheck(key);

    // 创建根节点
    if (root == null) {
        root = new Node<>(null);
    }

    Node<V> node = root;
    int len = key.length();
    for (int i = 0; i < len; i++) {
        char c = key.charAt(i); // d o g
        boolean emptyChild = node.children == null;
        Node<V> childNode = emptyChild ? null : node.children.get(c);
        if (childNode == null) {
            childNode = new Node<>(node);
            // 将字符存入Node成员变量中
            childNode.character = c;
            node.children = emptyChild ? new HashMap<>() : node.children;
            node.children.put(c, childNode);
        }
        node = childNode;
    }

    if (node.word) { // 已经存在这个单词
        V oldValue = node.value;
        node.value = value;
        return oldValue;
    }

    node.word = true;
    node.value = value;
    size++;
    return null;
}

最后，完成删除方法 remove() 的实现：

public V remove(String key) {
    // 找到最后一个节点
    Node<V> node = node(key);
    // 如果最后一个节点是空或不是单词结尾，不做任何处理
    if (node == null || !node.word) return null;
    size--;
    V oldValue = node.value;
    // 如果还有子节点
    if (node.children != null && !node.children.isEmpty()) {
        node.word = false;
        node.value = null;
        return oldValue;
    }
    // 如果没有子节点
    Node<V> parent = null;
    while ((parent = node.parent) != null) {
        parent.children.remove(node.character);
        if (parent.word || !parent.children.isEmpty()) break;
        node = parent;
    }
    return oldValue;
}

测试

到此对先前定义的接口都进行了实现，可以尝试运行下面的测试方法对实现的接口进行测试：

@Test
public void test() {
    Trie<Integer> trie = new Trie<>();
    trie.add("cat", 1);
    trie.add("dog", 2);
    trie.add("catalog", 3);
    trie.add("cast", 4);
    trie.add("默烦", 5);

    assertEquals(5, trie.size());
    assertTrue(trie.startsWith("do"));
    assertTrue(trie.startsWith("c"));
    assertTrue(trie.startsWith("ca"));
    assertTrue(trie.startsWith("cat"));
    assertTrue(trie.startsWith("cata"));
    assertFalse(trie.startsWith("hehe"));

    assertEquals(5, trie.get("默烦"));
    assertEquals(1, trie.remove("cat"));
    assertEquals(3, trie.remove("catalog"));
    assertEquals(4, trie.remove("cast"));

    assertEquals(2, trie.size());
    assertTrue(trie.startsWith("默"));
    assertTrue(trie.startsWith("do"));
    assertFalse(trie.startsWith("c"));
}