生产就绪的算法：以随机句子为例

rustum 发表于 2021-12-20 13:02

本文记述笔者的算法学习心得，由于笔者不熟悉算法，如有错误请不吝指正。随机句子问题

这几天得见一个算法题：生成随机句子。推测这道题可能来源于某搜索引擎公司的工作实践？
题意为：输入一个句子（即一个单词序列）和一个预期长度，需从输入句子随机抽取单词组成一个新句子（只是一个单词序列，不考虑语法），新句子的单词数量等于预期长度。随机抽取的具体规则为：随机取一个单词作为首个单词加入新句子，每当新句子加入一个单词，对于这个单词在输入句子中的所有出现位置，随机选择一个出现位置的下一个单词（即下邻单词）加入新句子，这一加入又会引起新单词的加入，由此类推，直到新句子达到预期长度，则输出新句子。也就是要实现一个函数String generateSentence(String sentence, int length)。
例如，输入sentence=&#34;this is a sentence it is not a good one and it is also bad&#34; length=5，若&#34;sentence&#34;被选为首个单词，则下一个单词只有一个选择&#34;it&#34;，然后&#34;it&#34;的下一个单词可以在两个位置选择，但这两个位置都刚好是&#34;is&#34;，而&#34;is&#34;的下一个单词就可以选择&#34;not&#34;或&#34;also&#34;，若选择&#34;not&#34;，则下一个单词只有一个选择&#34;a&#34;，此时凑足了5个单词，得到新句子&#34;sentence it is not a&#34;。
以上是此题的基础版，它还有一个难度加强版：对现有规则做一个修改，再给定一个输入m，首次随机取输入句子的连续m个单词加入新句子，之后每次仍加入一个单词，每当新句子加入一个单词，不再是找到这个单词在输入句子中的出现位置，而是对于新句子的“最后m个单词所形成的词组”在输入句子中的所有出现位置，随机选择一个出现位置的下邻单词加入新句子。基础版可看做m=1的特殊情况。
基础版的解法

解法基本上遵循如此结构：

[*]将输入句子转换为单词序列(String -> String[]，即tokenize)，为新句子准备一个内存缓冲区。
[*]随机选择首个单词，放入内存缓冲区。
[*]循环搜索上一个被选择的单词在输入句子中的所有位置，随机选择一个位置，将其下邻单词放入内存缓冲区，直到新句子达到预期长度，则停止并输出新句子。
注意，把输入句子当作一个环，若搜索到输入句子的句尾单词，则其下邻单词是句首单词，即从句尾跳回了句首。为了在计算下邻位置时做环跳转的处理，我起初是这么写的（在越界时减一次句子总长）：
int nextPos(int pos, String[] words) {
int next = pos + 1;
if (next >= words.length) {
next -= words.length;
}
return next;
}后来听说取模就可以了，即：
int nextPos(int pos, String[] words) {
return (pos + 1) % words.length;
}对于基础版，减法是OK的，但对于难度加强版呢？如果m>=输入句子长度（虽然这种输入不是很合理），则下邻位置可能跨环两次，需要减两次才正确。因此，取模确实更简洁优雅，更sound。
现在介绍三种算法实现。第一种是蛮力搜索法，每次遍历输入句子，找到给定单词的所有位置。第二种是优化搜索法，对蛮力搜索法做了优化，每次不是找到给定单词的所有位置，而是从一个随机位置起步，找到给定单词的某一个位置。第三种是哈希索引法，为输入句子预构建一个哈希表，能快速查找任一个单词对应的下邻单词。
要提前声明，优化搜索法的随机选择有时可能是很不公平的。例如，连续2个相同单词如&#34;go go&#34;，从左到右的搜索几乎总会选中左边那个。这个问题能解决，每次随机决定搜索方向即可（从左到右或从右到左）。但是，连续3个相同单词如&#34;go go go &#34;呢？中间的单词被选中的概率微乎其微。解决这一问题的关键在于以非线性方式遍历数组，我们以后来提供这一算法。（2021.08.15更新：算法在文尾的“附录：非线性搜索法”提供。）
蛮力搜索法的实现代码

class Solution1 {
public static void main(String[] args) {
System.out.println(generateSentence(
   &#34;this is a sentence it is not a good one and it is also bad&#34;, 5));
}

public static String generateSentence(String sentence, int length) {
String[] words = sentence.split(&#34; &#34;);
if (words.length == 0) {
   return sentence;
}

List<String> newWords = new ArrayList<>();

int randIdx = new Random().nextInt(words.length);
String prev = words;
newWords.add(prev);

for (int i = 1; i < length; i++) {
   List<Integer> nexts = searchNexts(words, prev);
   int chosen = new Random().nextInt(nexts.size());
   String chosenWord = words;
   newWords.add(chosenWord);
   prev = chosenWord;
}

return String.join(&#34; &#34;, newWords);
}

// 在words中找到prev的所有出现位置，收集其下邻位置(nexts)
private static List<Integer> searchNexts(String[] words, String prev) {
List<Integer> nexts = new ArrayList<>();
for (int i = 0; i < words.length; i++) {
   if (words.equals(prev)) {
   nexts.add(nextPos(i, words));
   }
}

return nexts;
}

private static int nextPos(int pos, String[] words) {
return (pos + 1) % words.length;
}
}优化搜索法的实现代码

class Solution1 {
public static void main(String[] args) {
System.out.println(generateSentence(
   &#34;this is a sentence it is not a good one and it is also bad&#34;, 5));
}

public static String generateSentence(String sentence, int length) {
String[] words = sentence.split(&#34; &#34;);
if (words.length == 0) {
   return sentence;
}

List<String> newWords = new ArrayList<>();

int randIdx = new Random().nextInt(words.length);
String prev = words;
newWords.add(prev);

for (int i = 1; i < length; i++) {
   String chosenWord = randomNextWord(words, prev);
   newWords.add(chosenWord);
   prev = chosenWord;
}

return String.join(&#34; &#34;, newWords);
}

private static String randomNextWord(String[] words, String prev) {
int randomBeginIndex = new Random().nextInt(words.length);

for (int _i = 0; _i < words.length; _i++) {
   int idx = (randomBeginIndex + _i) % words.length;
   if (words.equals(prev)) {
   return words;
   }
}

return null;
}

private static int nextPos(int pos, String[] words) {
return (pos + 1) % words.length;
}
}
哈希索引法的实现代码

class Solution1 {
public static void main(String[] args) {
System.out.println(generateSentence(
   &#34;this is a sentence it is not a good one and it is also bad&#34;, 5));
}

public static String generateSentence(String sentence, int length) {
String[] words = sentence.split(&#34; &#34;);
if (words.length == 0) {
   return sentence;
}

Table table = new Table(words);

List<String> newWords = new ArrayList<>();

int randIdx = new Random().nextInt(words.length);
String prev = words;
newWords.add(prev);

for (int i = 1; i < length; i++) {
   String chosenWord = table.randomNextWord(prev);
   newWords.add(chosenWord);
   prev = chosenWord;
}

return String.join(&#34; &#34;, newWords);
}

private static int nextPos(int pos, String[] words) {
return (pos + 1) % words.length;
}

static class Table {
private Map<String, List<String>> map = new HashMap<>();

Table(String[] words) {
   for (int i = 0; i < words.length; i++) {
   List<String> nexts = map.computeIfAbsent(words, key -> new ArrayList<>());
   nexts.add(words);
   }
}

String randomNextWord(String word) {
   List<String> nexts = map.get(word);
   int chosen = new Random().nextInt(nexts.size());
   return nexts.get(chosen);
}
}
}哈希索引法引入一个Table结构来封装建表和查询的逻辑，不但理论上更高效，代码也更易理解。
算法复杂度怎么分析呢？设n=words.length，l=length，则：

[*]蛮力搜索法主要花时间在n次迭代中每次都要遍历搜索n个单词，时间复杂度为O(n^2 + l)，空间复杂度为O(l)。
[*]优化搜索法主要花时间在n次迭代中每次都要遍历搜索平均n/(k+1)个单词，k为被搜索单词的出现次数（只出现一次时k=1，则平均遍历n/2个单词，即半个句子），时间复杂度为O(n^2/k + l)，空间复杂度为O(l)。k一般比较小，因此只是常数级优化，实际效果待测试。
[*]哈希索引法主要花时间在构建一个大小为n的哈希表，时间复杂度为O(n+l)，空间复杂度为O(n+l)。事实上，如果对于相同输入反复调用，由于哈希表可以重用，其时间成本摊销后可忽略不计，即O(l)。
难度加强版的解法

解法基本上遵循如此结构：

[*]将输入句子转换为单词序列(String -> String[]，即tokenize)，为新句子准备一个内存缓冲区。
[*]随机选择连续m个单词，放入内存缓冲区。
[*]循环搜索新句子“最后m个单词所形成的词组”在输入句子中的所有位置，随机选择一个位置，将其下邻单词放入内存缓冲区，直到新句子达到预期长度，则停止并输出新句子。
若用m作为输入参数的变量名，岂不毫无含义？按m的语义，此变量可名为lookBack，即“回向查看的项数”（在此顺带一提，编译器的语法分析有一概念叫lookAhead，即“前向查看的项数”）。
现在，我们发现可以重构代码，使其更安全优雅：处理环跳转时，对nextPos取模是怕数组越界，既然如此，那就对数组访问都加一道安全防线，数组偏移量都取模不就好了么！
设计这个可复用的数组访问助手函数以代替nextPos函数：
String safeGetWord(String[] words, int index) {
return words;
}现在再次实现三种算法。
蛮力搜索法的实现代码

class Solution2 {
public static void main(String[] args) {
System.out.println(generateSentence(
   &#34;this is a sentence it is not a good one and it is also bad&#34;, 5, 2));
}

public static String generateSentence(String sentence, int length, int lookBack) {
String[] words = sentence.split(&#34; &#34;);
if (words.length == 0) {
   return sentence;
}
if (lookBack > length) {
   throw new IllegalArgumentException(&#34;lookBack exceeds length&#34;);
}

List<String> newWords = new ArrayList<>();

generateLeading(newWords, words, lookBack);

for (int _i = lookBack; _i < length; _i++) {
   List<String> prevs = newWords.subList(newWords.size() - lookBack, newWords.size());
   List<Integer> nexts = searchNexts(words, prevs);

   int chosen = new Random().nextInt(nexts.size());
   String chosenWord = safeGetWord(words, nexts.get(chosen));
   newWords.add(chosenWord);
}

return String.join(&#34; &#34;, newWords);
}

// 生成最初的几个单词
private static void generateLeading(List<String> newWords, String[] words, int lookBack) {
int randIdx = new Random().nextInt(words.length);
for (int i = 0; i < lookBack; i++) {
   newWords.add(safeGetWord(words, randIdx + i));
}
}

private static List<Integer> searchNexts(String[] words, List<String> prevs) {
List<Integer> nexts = new ArrayList<>();

for (int i = 0; i < words.length; i++) {
   // 试匹配一个词组
   int matchedCount = 0;

   for (int j = 0; j < prevs.size(); j++) {
   if (!safeGetWord(words, i + j).equals(prevs.get(j))) {
      matchedCount = -1;
      break;
   }

   matchedCount++;
   }

   if (matchedCount == prevs.size()) {
   nexts.add(i + prevs.size());
   }
}

return nexts;
}

private static String safeGetWord(String[] words, int index) {
return words;
}
}优化搜索法的实现代码

class Solution2 {
public static void main(String[] args) {
System.out.println(generateSentence(
   &#34;this is a sentence it is not a good one and it is also bad&#34;, 5, 2));
}

public static String generateSentence(String sentence, int length, int lookBack) {
String[] words = sentence.split(&#34; &#34;);
if (words.length == 0) {
   return sentence;
}
if (lookBack > length) {
   throw new IllegalArgumentException(&#34;lookBack exceeds length&#34;);
}

List<String> newWords = new ArrayList<>();

generateLeading(newWords, words, lookBack);

for (int ig = lookBack; ig < length; ig++) {
   List<String> prevs = newWords.subList(newWords.size() - lookBack, newWords.size());
   String chosenWord = randomNextWord(words, prevs);
   newWords.add(chosenWord);
}

return String.join(&#34; &#34;, newWords);
}

// 生成最初的几个单词
private static void generateLeading(List<String> newWords, String[] words, int lookBack) {
int randIdx = new Random().nextInt(words.length);
for (int i = 0; i < lookBack; i++) {
   newWords.add(safeGetWord(words, randIdx + i));
}
}

private static String randomNextWord(String[] words, List<String> prevs) {
int randomBeginIndex = new Random().nextInt(words.length);

for (int _i = 0; _i < words.length; _i++) {
   int idx = randomBeginIndex + _i;
   // 试匹配一个词组
   int matchedCount = 0;

   for (int j = 0; j < prevs.size(); j++) {
   if (!safeGetWord(words, idx + j).equals(prevs.get(j))) {
      matchedCount = -1;
      break;
   }

   matchedCount++;
   }

   if (matchedCount == prevs.size()) {
   return safeGetWord(words, idx + prevs.size());
   }
}

return null;
}

private static String safeGetWord(String[] words, int index) {
return words;
}
}哈希索引法的实现代码

class Solution2 {
public static void main(String[] args) {
System.out.println(generateSentence(
   &#34;this is a sentence it is not a good one and it is also bad&#34;, 5, 2));
}

public static String generateSentence(String sentence, int length, int lookBack) {
String[] words = sentence.split(&#34; &#34;);
if (words.length == 0) {
   return sentence;
}
if (lookBack > length) {
   throw new IllegalArgumentException(&#34;lookBack exceeds length&#34;);
}

Table table = new Table(words, lookBack);

List<String> newWords = new ArrayList<>();

generateLeading(newWords, words, lookBack);

for (int ig = lookBack; ig < length; ig++) {
   Phrase phrase = new Phrase(newWords.subList(newWords.size() - lookBack, newWords.size()));
   String chosenWord = table.randomNextWord(phrase);
   newWords.add(chosenWord);
}

return String.join(&#34; &#34;, newWords);
}

private static void generateLeading(List<String> newWords, String[] words, int lookBack) {
int randIdx = new Random().nextInt(words.length);
for (int i = 0; i < lookBack; i++) {
   newWords.add(safeGetWord(words, randIdx + i));
}
}

private static String safeGetWord(String[] words, int index) {
return words;
}

static class Phrase {
private List<String> elements;

Phrase(List<String> elements) {
   Objects.requireNonNull(elements);
   // TODO 应当拷贝一份以确保不可变性
   this.elements = elements;
}

Phrase(int lookBack, String[] words, int beginIndex) {
   elements = new ArrayList<>(lookBack);
   for (int j = 0; j < lookBack; j++) {
   elements.add(safeGetWord(words, beginIndex + j));
   }
}

@Override
public boolean equals(Object o) {
   if (this == o) return true;
   if (!(o instanceof Phrase)) return false;
   Phrase phrase = (Phrase) o;
   return elements.equals(phrase.elements);
}

@Override
public int hashCode() {
   return elements.hashCode();
}
}

static class Table {
private Map<Phrase, List<String>> map = new HashMap<>();

Table(String[] words, int lookBack) {
   for (int i = 0; i < words.length; i++) {
   Phrase phrase = new Phrase(lookBack, words, i);

   List<String> nexts = map.computeIfAbsent(phrase, key -> new ArrayList<>());
   nexts.add(safeGetWord(words, i + lookBack));
   }
}

String randomNextWord(Phrase phrase) {
   List<String> nexts = map.get(phrase);
   int chosen = new Random().nextInt(nexts.size());
   return nexts.get(chosen);
}
}
}这一回，哈希索引法又引入了一个Phrase结构来作为HashMap key。为什么不直接用List<String>作为key呢？有两个原因：

[*]可读性更好。
[*]key应当是不可变的（immutable），若写入HashMap用的key值不同于查询用的key值，就会查询不到数据。Phrase是一个不可变对象，将List<String>封装了起来。
[*]Phrase的hashCode方法是可以修改的，未来可以优化它以提高性能。现在的hashCode是用List<String>实现的，算法全程的hashCode计算成本为O(mn)，m即lookBack。String会缓存hashCode，因此主要成本在于将每个String的hashCode相加，其实性能还好。
算法复杂度怎么分析呢？设n=words.length，l=length，m=lookBack，则：

[*]蛮力搜索法，时间复杂度为O(mn^2 + l)，空间复杂度为O(l)。
[*]优化搜索法，时间复杂度为O(mn^2/k + l)，空间复杂度为O(l)。
[*]哈希索引法主要花时间在构建一个大小为n的索引表，时间复杂度为O(mn + l)，空间复杂度为O(mn + l)。如果对于相同输入反复调用，由于哈希表可以重用，其时间成本摊销后为O(m+l)。
向生产就绪进发

以上的算法达到生产就绪(production ready)了吗？让我们来考察。
正确性

优化搜索法由于某些情况不公平，在满足功能需求上有所折扣。哈希索引法既高效又易理解，容易正确实现，预计适合用于生产环境。
算法的实现正确吗？需要做功能测试。算法很适合单元测试，但是包含随机数，怎么稳定重现呢，更特别地说，怎么确保测到边界条件呢？
就不放代码了，只说最重要的问题：怎么稳定重现，乃至确保测到边界条件。
答案很简单：应当mock随机数。将用于生成随机数的代码抽成一个函数，在单元测试中将它mock为返回确定的数值。这是因为随机性质不需要放在一起测试，需要被测试的是其他性质。
随机性

注意，随机数的生成要引入熵，才有足够的随机性。以Java的Random伪随机数为例，若全程共用一个Random对象，就只是伪随机，若多次new Random()，由于中间的间隔时间可能随机波动，就引入了时间熵，得到真随机数。也可以全程共用一个SecureRandom对象，这个能提供真随机数，但是速度稍慢一些。
性能

哈希索引法在理论上快n倍，实际有多快呢？
实际性能如何，应当做性能测试，俗话说跑个分（benchmark）。
我们做一个小的性能测试，再做一个大的性能测试，以蛮力搜索法为基准来评估性能。
小的性能测试（测试代码略）：
将例句重复8遍得到一个8倍长的句子，其余参数不变（length=5，lookBack=2）。将算法预热5万遍，再执行20万遍，多次测试取平均值。（20万遍）花费时间如下：

[*]基本版：
[*]蛮力搜索法：1358ms
[*]优化搜索法：850ms
[*]哈希索引法：（不重用哈希表）1212ms，（重用哈希表）699ms

[*]难度加强版：
[*]蛮力搜索法：1186ms
[*]优化搜索法：480ms
[*]哈希索引法：（不重用哈希表）1032ms，（重用哈希表）382ms

结果让人惊讶，哈希索引法（不重用哈希表）没有显著快于基准（蛮力搜索法）！优化搜索法则表现很好，几乎快1倍！
可能是数据量太小，没有发挥哈希表的优势，那么试一试更大数据量吧！
大的性能测试：
加大数据量，将例句随机重排词序，并重复100遍得到一个100倍长的句子，length=100，lookBack=2。将算法预热1千遍，再执行5千遍，多次测试取平均值。
测试代码要复杂一些，因此提供代码如下：
public static void main(String[] args) {
String sentence = &#34;this is a sentence it is not a good one and it is also bad&#34;;
sentence = times(sentence, 100);

// 预热
for (int i = 0; i < 1000; i++) {
generateSentence(
   sentence, 100, 2);
}

// 测试
long start = System.currentTimeMillis();
for (int i = 0; i < 5000; i++) {
generateSentence(
   sentence, 100, 2);
}
long cost = System.currentTimeMillis() - start;
System.out.println(cost);
}

// 生成长句子
static String times(String input, int n) {
List<String> buffer = new ArrayList<>();
List<String> words = Arrays.asList(input.split(&#34; &#34;));
for (int i = 0; i < n; i++) {
Collections.shuffle(words);
buffer.addAll(words);
}
return String.join(&#34; &#34;, buffer);
}
（5千遍）花费时间如下：

[*]基本版：
[*]蛮力搜索法：4600ms
[*]优化搜索法：450ms
[*]哈希索引法：（不重用哈希表）500ms ，（重用哈希表）405ms

[*]难度加强版：
[*]蛮力搜索法：10000ms
[*]优化搜索法：1000ms
[*]哈希索引法：（不重用哈希表）1000ms （重用哈希表）420ms

总算发挥了哈希索引法的优势！
在这个数据量，优化搜索法仍然不亚于哈希索引法，而且空间复杂度更优！
如果能从理论上解释这一现象，那就更好了。怎么解释呢？因为是把例句重复100遍生成的输入句子，使得优化搜索法的k值较高，当然会性能很好啊。那么试一试大量生成随机单词，以降低k值：优化搜索法的性能降低了一半，哈希索引法不受影响。
顺便测试了对hashCode的优化，若Phrase只计算第一个单词的hashCode，当lookBack=2时性能无提升，当lookBack=20时性能提升8%。
结论

哈希索引法的高效性、高稳定性、易理解性，使其成为生产环境的首选。注意几个问题：空间复杂度、哈希表只能对相同输入重用、随机性、key不可变、hashCode可优化。
优化搜索法的有趣性，使其值得进一步研究。
附录：非线性搜索法

前文已提到：
优化搜索法的随机选择有时可能是很不公平的。例如，连续2个相同单词如&#34;go go&#34;，从左到右的搜索几乎总会选中左边那个。这个问题能解决，每次随机决定搜索方向即可（从左到右或从右到左）。但是，连续3个相同单词如&#34;go go go &#34;呢？中间的单词被选中的概率微乎其微。解决这一问题的关键在于以非线性方式遍历数组。如此就能绕过连续相同单词了。
算法要满足如下需求：

[*]搜索时遇到连续相同单词，中间或后面的单词不会被前面的单词挡住。
[*]每次搜索要确保遍历整个数组（即搜索空间），以免找不到单词或算法不能停止。
这里不追求绝对公平，其实常用的随机数也没有那么公平。我们需要的是每个单词都有可观的概率被选中。
一种简单的非线性遍历法是“余数跳步遍历法”。以2的余数为例：在数组上随机选择一个起点和一种遍历方向（从左到右“→”或从右到左“←”）；对于数组上每一个位置，将其与起点的距离对2取模（模的值只能为0或1），也就是将数组划分为奇偶两组，模为1即奇组，模为0即偶组；依序遍历偶组，再依序遍历奇组，如此就遍历了整个数组。以上算法又可称为“奇偶跳步遍历法”，最多支持连续3个相同单词。若用3的余数将数组划分为3组，则最多支持连续5个相同单词。
有没有更好的方法呢？能支持任意多个连续相同单词吗？有的，“对数跳步遍历法”。以2作为底数为例，你知道怎么用数组来存储二叉树吗？模仿这一原理，但用不同的计算公式：将数组长度取以2为底的对数，结果只保留整数部（不要小数部）；在数组上随机选择一个起点（可选可不选遍历方向）；对于数组上每一个位置，将其减去它所能减的2的最大次数幂，按余数划分到对应的组（例如9只能减8，8即2的3次幂，9-8=1所以划分到1组）（又如15只能减8，15-8=7所以划分到7组）；对于每一组，依序遍历其成员，遍历所有组即遍历整个数组。举一个例子，若数组长度为15，起点为1，遍历顺序可为：, , , , 12, 13, 14, 0。尾部5个数退化到了线性遍历，这其实是算法的最坏情况，即数组长度为2的幂减1。不用过于担心。由于随机选择起点，连续相同单词处于尾部线性遍历区间的概率很小(对于n=15为1/3，对于较大的n约为1/√n，即小到可忽略)，即多数时候是能被非线性遍历的。
以下提供代码，注意当数组小到不足以跳跃时应使用线性遍历法。
非线性搜索法的主要实现代码

private static String randomNextWord(String[] words, List<String> prevPhrase) {
int foundIndex = nonlinearSearch(words, prevPhrase);
if (foundIndex >= 0) {
return safeGet(words, foundIndex + 1);
} else {
return null;
}
}

private static <T> T safeGet(T[] array, int index) {
return array;
}

private static int random(int bound) {
return new Random().nextInt(bound);
}

private static <T> int nonlinearSearch(T[] array, List<T> phrase) {
int randomBeginIndex = random(array.length);

int maxBatch = floorLog2(array.length) - 1;

if (maxBatch <= 0) {
return linearSearch(array, phrase);
}

int foundIndex = -1;
for (int batch = 0; batch < maxBatch && foundIndex < 0; batch++) {
foundIndex = nonlinearSearchBatch(array, phrase, randomBeginIndex, batch);
}
return foundIndex;
}

private static <T> int nonlinearSearchBatch(T[] array, List<T> phrase, int beginIndex, int batch) {
for (int i = 1; i <= array.length; i *= 2) {
if (batch < i) {
   int index = i + beginIndex + batch;
   if (match(array, index, phrase)) {
   return index;
   }
}
}

return -1;
}

private static int floorLog2(int n) {
int r = 0;
while ((n >>= 1) > 0) {
r++;
}
return r;
}

private static <T> int linearSearch(T[] array, List<T> phrase) {
int randomBeginIndex = random(array.length);

for (int _i = 0; _i < array.length; _i++) {
int index = randomBeginIndex + _i;
if (match(array, index, phrase)) {
   return index;
}
}

return -1;
}

private static <T> boolean match(T[] array, int beginIndex, List<T> phrase) {
int matchedCount = 0;

for (int j = 0; j < phrase.size(); j++) {
if (!safeGet(array, beginIndex + j).equals(phrase.get(j))) {
   return false;
}
matchedCount++;
}

return matchedCount == phrase.size();
}《三体》中提到从高维世界能不受阻碍地访问低维世界的任一处。非线性搜索法也可以称为高维搜索法，从二维树访问一维序列，使得没有单词会被挡住，再利用随机起步得到的随机性，实现了需求。

页: [1]

Unity开发者联盟's Archiver

生产就绪的算法：以随机句子为例