范文健康探索娱乐情感热点
投稿投诉
热点动态
科技财经
情感日志
励志美文
娱乐时尚
游戏搞笑
探索旅游
历史星座
健康养生
美丽育儿
范文作文
教案论文
国学影视

使用Automata和Rust索引1,600,000,000Keys(4)

  在前两节中,我一直小心避免谈论用于表示有序集或map的有限状态机的构造。也就是说,构造比简单的遍历要复杂一些。
  In the previous two sections, I have been careful to avoid talking about the construction of finite state machines that are used to represent ordered sets or maps. Namely, construction is a bit more complex than simple traversal.
  为了简单起见,我们对 set 或 map 中的元素进行了限制:它们必须按字典顺序添加。这是一个繁重的限制,但我们稍后会看到如何减轻它。
  To keep things simple, we place a restriction on the elements in our set or map: they must be added in lexicographic order. This is an onerous restriction, but we will see later how to mitigate it.
  为了激发有限状态机构造的灵感,让我们尝试谈谈。
  To motivate construction of finite state machines, let’s talk about tries.trie构造 Trie construction
  可以将 trie 视为确定性非循环有限状态接受器。因此,您在上一节中学到的关于有序集合的所有内容同样适用于它们。trie 和本文中显示的 FSA 之间的唯一区别是 trie 允许在密钥之间共享前缀,而 FSA 允许共享前缀和后缀。
  A trie can be thought of as a deterministic acyclic finite state acceptor. Therefore, everything you learned in the previous section on ordered sets applies equally well to them. The only difference between a trie and the FSAs shown in this article is that a trie permits the sharing of prefixes between keys while an FSA permits the sharing of both prefixes and suffixes.
  考虑一个带有key mon, tues和thurs的集合。以下是受益于共享前缀和后缀的FSA:
  Consider a set with the keys mon, tues and thurs. Here is the corresponding FSA that benefits from sharing both prefixes and suffixes:
  这是对应的 trie,它只共享前缀:
  And here is the corresponding trie, which only shares prefixes:
  请注意,现在有三个不同的最终状态,并且keys tues和thurs需要复制s的最终转换到最终状态.
  Notice that there are now three distinct final states, and the keys tues and thurs require duplicating the final transition for s to the final state.
  构造一个 trie 相当简单。给定要插入的新key,只需执行正常查找即可。如果输入已用尽,则应将当前状态标记为最终状态。如果机器在输入耗尽之前停止,因为没有有效的转换可以遵循,那么只需为每个剩余的输入创建一个新的转换和节点。最后创建的节点应标记为最终节点。
  Constructing a trie is reasonably straight-forward. Given a new key to insert, all one needs to do is perform a normal lookup. If the input is exhausted, then the current state should be marked as final. If the machine stops before the input is exhausted because there are no valid transitions to follow, then simply create a new transition and node for each remaining input. The last node created should be marked final.FSA构造 FSA construction
  回想一下,trie 和 FSA 之间的唯一区别是 FSA 允许在key之间共享后缀。由于 trie 本身就是一个 FSA,我们可以构造一个 trie,然后应用一个通用的最小化算法,这将实现我们共享后缀的目标。
  Recall that the only difference between a trie and an FSA is that an FSA permits the sharing of suffixes between keys. Since a trie is itself an FSA, we could construct a trie and then apply a general minimization algorithm, which would achieve our goal of sharing suffixes.
  然而,一般的最小化算法在时间和空间上都可能很昂贵。例如,trie 通常比在key后缀之间共享结构的 FSA大得多。相反,如果我们可以假设键是按字典顺序添加的,我们可以做得更好。基本技巧是意识到在插入新key时,FSA 的任何不与新key共享前缀的部分都可以被冻结。也就是说,添加到 FSA 的任何新key都不可能使 FSA 的该部分更小。
  However, general minimization algorithms can be expensive both in time and space. For example, a trie can often be much larger than an FSA that shares structure between suffixes of keys. Instead, if we can assume that keys are added in lexicographic order, we can do better. The essential trick is realizing that when inserting a new key, any parts of the FSA that don’t share a prefix with the new key can be frozen. Namely, no new key added to the FSA can possibly make that part of the FSA smaller.
  一些图片可能有助于更好地解释这一点。再次考虑key mon, tues 和 thurs。由于我们必须按字典顺序添加它们,因此我们将mon先添加,然后再添加thurs和thes。这是添加第一个key后FSA的样子:
  Some pictures might help explain this better. Consider again the keys mon, tues and thurs. Since we must add them in lexicographic order, we’ll add mon first, then thurs and then tues. Here’s what the FSA looks like after the first key has been added:
  这是不是很有趣。下面是我们插入thurs时发生的情况:
  This isn’t so interesting. Here’s what happens when we insert thurs:
  插入thurs导致第一个key mon被冻结(由图像中的蓝色表示)。当 FSA 的特定部分被冻结时,我们就知道它将来永远不需要修改。即,由于所有将来添加的键都将是>= thurs,我们知道未来的键不会以mon开头。这很重要,因为它让我们可以重用自动机的那一部分,而不必担心它将来是否会改变。换句话说,蓝色的状态是其他key重用的候选状态。
  The insertion of thurs caused the first key, mon, to be frozen (indicated by blue coloring in the image). When a particular part of the FSA has been frozen, then we know that it will never need to be modified in the future. Namely, since all future keys added will be >= thurs, we know that no future keys will start with mon. This is important because it lets us reuse that part of the automaton without worrying about whether it might change in the future. Stated differently, states that are colored blue are candidates for reuse by other keys.
  虚线表示thurs尚未实际添加到 FSA。实际上,添加它需要检查是否存在任何可重用的状态。不幸的是,我们还不能这样做。例如,状态3和8是等价的:两者都是最终的,都没有任何转换。但是, 状态8永远等于状态3是不正确的。即,我们添加的下一个键可以是,例如 thursday。这会将更改状态8为具有d转换,这将使其不等于状态3。因此,我们还不能完全断定自动机中的key thurs是什么样子的。
  The dotted lines represent that thurs hasn’t actually been added to the FSA yet. Indeed, adding it requires checking whether there exists any reusable states. Unfortunately, we can’t do that yet. For example, it is true that states 3 and 8 are equivalent: both are final and neither has any transitions. However, it is not true that state 8 will always be equal to state 3. Namely, the next key we add could, for example, be thursday. That would change state 8 to having a d transition, which would make it not equal to state 3. Therefore, we can’t quite conclude what the key thurs looks like in the automaton yet.
  让我们继续插入下一个key tues:
  Let’s move on to inserting the next key, tues:
  在添加tues的过程中,我们推断出key thurs一部分hurs 可以被冻结。为什么?因为keys是按字典顺序插入的,因此没有将来插入的key可能会最小化所采用路径hurs。例如,我们现在知道 key thursday不可能是集合的一部分,所以我们可以得出结论,thurs的最终状态thurs等价于mon的最终状态.它们都是最终的并且都没有转换,这将永远是正确的.
  In the process of adding tues, we deduced that the hurs part of the thurs key could be frozen. Why? Because no future key inserted could possibly minimize the path taken by hurs since keys are inserted in lexicographic order. For example, we now know that the key thursday cannot ever be part of the set, so we can conclude that the final state of thurs is equivalent to the final state of mon: they are both final and both have no transitions, and this will forever be true.
  请注意,状态4仍然是点状的:状态4可能会在随后的key插入时发生变化,因此我们还不能认为它等于任何其他状态。
  Notice that state 4 remained dotted: it is possible that state 4 could change upon subsequent key insertions, so we cannot consider it equal to any other state just yet.
  让我们再添加一个key来分析。考虑插入zon:
  Let’s add one more key to drive the point home. Consider the insertion of zon:
  我们在这里看到状态4终于被冻结了,因为后面插入的zon不可能改变状态4。此外,我们还可以得出结论thurs和tues共享一个共同的后缀,并且确实,状态7和9(来自上图中)是等价的,因为它们都不是最终的,并且都具有指向相同状态的输入为s的单个转换。关键是它们的两个s转换都指向相同的状态,否则我们不能重用相同的结构。
  We see here that state 4 has finally been frozen because no future insertion after zon can possibly change the state 4. Additionally, we could also conclude that thurs and tues share a common suffix, and that, indeed, states 7 and 9 (from the previous image) are equivalent because neither of them are final and both have a single transition with input s that points to the same state. It is critical that both of their s transitions point to the same state, otherwise we cannot reuse the same structure.
  最后,我们必须表示我们已完成插入键。我们现在可以冻结FSA的最后一部分, zon,并寻找冗余结构:
  Finally, we must signal that we are done inserting keys. We can now freeze the last portion of the FSA, zon, and look for redundant structure:
  当然,由于mon和zon共享一个共同的后缀,确实存在冗余结构。也就是说,前一个图像中的状态9在各个方面都等同于状态1。这是正确的,因为状态10和11也等价于状态2和3。如果这不是真的,那么我们就不能考虑状态9和1是一致的。例如,如果我们将key mom插入到我们的集合中,并且仍然假设状态9和1相等,那么生成的 FSA 将如下所示:
  And of course, since mon and zon share a common suffix, there is indeed redundant structure. Namely, the state 9 in the previous image is equivalent in every way to state 1. This is only true because states 10 and 11 are also equivalent to states 2 and 3. If that weren’t true, then we couldn’t consider states 9 and 1 equal. For example, if we had inserted the key mom into our set and still assumed that states 9 and 1 were equal, then the resulting FSA would look something like this:
  这是错误的!为什么?因为这个 FSA 会声称key zom在集合中——但我们从未真正添加它。
  And this would be wrong! Why? Because this FSA will claim that the key zom is in the set—but we never actually added it.
  最后,值得注意的是,这里概述的构造算法可以以O(n)运行,其中n指的是key的数量。很容易看出,假设在每个状态中查找转换需要恒定的时间,那么在不检查冗余结构的情况下最初将键插入FST不会比遍历键中的每个字符花费更长的时间。更棘手的一点是:我们如何在恒定时间内找到冗余结构?简短的回答是一个哈希表,但我将在实践中的构造部分解释一些挑战。
  Finally, it is worth noting that the construction algorithm outlined here can run in O(n) time where n is the number of keys. It is easy to see that inserting a key initially into the FST without checking for redundant structure does not take any longer than looping over each character in the key, assuming that looking up a transition in each state takes constant time. The trickier bit is: how do we find redundant structure in constant time? The short answer is a hash table, but I will explain some of the challenges with that in the section on construction in practice.

长骑恩施(D00)写在开篇的话2014年4月4日至27日,用24天的时间又一次完成了既定的长骑鄂西南山城恩施的计划。选择春天的四月出行,这已经是第三年了。利用年薪假骑上一次长途,一年都受用,一年都回味无穷,这就我国有一座世界最完整的八卦城,城里没有红绿灯,景色美出天际在我国有一座世界上保存最完整的八卦城,刚听说这个名字,你还以为是这座城里的人喜欢说三道四,爱招惹是非呢,其实不是这样的,是整个城市布局就像一个八卦阵。这个八卦城就是新疆维吾尔自治区谷爱凌恋情被实锤!与帅气男友滑冰场大胆热吻,男方系耶鲁高材生在2022年冬奥会上,自由式滑雪冠军谷爱凌成为了体坛影响力第一人,拥有着混血面孔的她以碾压式的能力一举拿下了三枚奖牌,赛场上英姿飒爽自信张扬的模样给无数国人留下了深刻印象。最为宝贵今日绘本RAZlevelBAreTheseHisorHersListening听绘本Reading读绘本Flashcard单词卡阅读目标可视化以理解文本分类信息描述插图提供的信息辨别声母h音识别声母Hh首字母大写识别并使用高频词these词放任老公出轨,接受开放式婚姻,这10位女星,一个比一个能忍头条群星10月榜娱乐圈中,明星离婚的现象屡见不鲜,其中绝大多数原因都是因为其中一方出轨。毕竟他们面对的名利诱惑太多,并不是每个人都能做到万花丛中过,片叶不沾身。当然也有一些女星明明吃亏是福,贪婪是祸,知足常乐吃亏是福,贪婪是祸,知足常乐。贪图便宜是一些人的本性,但做人应当学会吃亏,吃亏是福,有时吃亏多一些最终福报也会多一些。吃亏与受益如祸福一样,是互相依存相互转化的。懂得吃亏的人,才是当你持续成长,整个宇宙都会帮助你从大城市再次回到小镇,是心有不甘还是妥协认命。这三年,改变了太多人的人生轨迹,有些人赚的盆满钵满,有些人负债累累一无所有,而这就是每个人的人生。以前听到躺平摆烂这些词汇,还觉得有些微信隐藏的5个超牛X小功能不会用真吃亏大家都知道微信是当代人不可或缺的聊天工具,但很少有人知道,微信除了聊天之外,其实还有非常多强大的功能,能让你在日常聊天办公中效率加倍,下面就跟大家分享下这些功能,你看看你知道几个?我国多家芯片企业押注RISCV芯片,致力于摆脱美国芯片限制的束缚在美国收紧对我国实体出口先进芯片技术和设备的限制之际,我国多家企业押注一种开源的芯片设计架构能够帮助我国实现半导体的自给自足。在本周早些时候的一次行业活动中,11家半导体公司公布了今日最火壁纸最好的关系,不是随叫随到,而是各自忙碌又互相牵挂,不用刻意想起,但从未忘记。只要心中对彼此仍有牵挂,这段感情就不会变淡。一辈子很短,要把时间用在值得的人身上一辈子很长,要找到三两知晚间公告丨11月28日这些公告有看头品大事贵州茅台拟每股派发现金红利21。91元控股股东拟增持公司股票贵州茅台公告,经董事会决议,公司拟在2022年度内以实施权益分派股权登记日公司总股本为基数实施回报股东特别分红。公
科技赋能,冰城科技未来可期经过18年的发展,文博会每年汇聚海内外10多万种文化创意产业展品4000多个文化产业投融资项目在现场进行展示与交易,成为国内外文化产业新技术新产品新业态的首秀地。本届哈尔滨展馆以创姓名叫支付宝的男子,以重名向马云索赔,最后得到赔偿了吗?我国绝大多数社会群体已经习惯了互联网时代所带来的便捷,像是微信和支付宝更是成为每位手机用户的必备社交软件和消费软件。值得一提的是,正式在国内无人不知的支付宝软件。曾经还发生过一桩支比特币暴跌超6成,从业人员抽身离去,加密行业失去的一年金融业盘点华夏时报(www。chinatimes。net。cn)记者卢梦雪冉学东北京报道一周前刚被获释的FTX交易所首席执行官SamBankmanFried(SBF)迅速清算了他的加密资产。谁能挡得住!特纳17投11中!三分5中2得到34分3篮板3助攻北京时间2023年1月1日讯,步行者以131130,1分分差战胜对手快船。首节比赛步行者这边状态最好的就是巴迪希尔德,投篮8投4中,得到11分,首节结束步行者以2820领先快船。次佩林卡每笔交易詹姆斯都参与,现在让我背黑锅?12月31日,2022年最后一天詹姆斯今天总算过了个好生日,拿下了40的准三双给自己庆祝,但是最近糟糕的战绩让他抓狂不已,在接受采访时他表示球队没有做好球员调整,目前的阵容搭配不合自己不行还要怪球队?孙悦不满湖人我坐板凳都坐傻了近日,前中国男篮队员孙悦接受一档节目的采访,他回忆起2007年NBA选秀的情景首轮选秀结束后,我就把电视关了,看着闹心。我当时觉得自己是第一轮的水平,我在试训中把小黑豆布鲁克斯打爆芯片突围崛起的力量文二掌柜没有光刻机,就没有芯片。就算有十八般武艺鬼斧神工,我们的斧钺钩叉也难以手工打磨出芯片。刚上市的燕东微招股书里,带来一个振奋人心的消息,他们采购的上海微电子的光刻机正在履行合三一重卡完成近10亿元A轮融资用于加快新一代新能源重卡研发和推广湖南日报12月30日讯(全媒体记者曹娴)近日,湖南行必达网联科技有限公司(以下简称三一重卡)宣布完成近10亿元人民币A轮融资。本轮融资由隐山资本招商资本国投招商联合领投,朱雀资产华独家远程拟美国上市,市值冲击300亿美元中国新能源商用车公司,正准备在海内外掀起风浪。穹眼财经独家获悉,远程商用车科技公司正在筹划在美国上市。12月27日,远程新能源商用车科技公司宣布2023年拟完成A轮融资,年销量冲击徐工汉云创新引领,圆满完成B轮3亿元融资新华日报财经讯近日,工业互联网领导者徐工汉云技术股份有限公司(以下简称徐工汉云)顺利完成B轮3亿元融资。本轮融资由国开制造业转型升级基金领投,徐州疌盛金瑞产业投资基金合伙企业江苏省横眉冷对亚马逊,俯首甘为贝索斯跨境人一顿吃几个鲁迅在跨境圈,也不乏文学出众的亚马逊运营,他们撰写的listing文案行云流水妙笔生花,对中国文学知识也是如数家珍。近期,一位卖家在网上组织了一场别开生面的亚马逊运营文案大赛,看完之后