范文健康探索娱乐情感热点
投稿投诉
热点动态
科技财经
情感日志
励志美文
娱乐时尚
游戏搞笑
探索旅游
历史星座
健康养生
美丽育儿
范文作文
教案论文
国学影视

使用Automata和Rust索引1,600,000,000Keys(4)

  在前两节中,我一直小心避免谈论用于表示有序集或map的有限状态机的构造。也就是说,构造比简单的遍历要复杂一些。
  In the previous two sections, I have been careful to avoid talking about the construction of finite state machines that are used to represent ordered sets or maps. Namely, construction is a bit more complex than simple traversal.
  为了简单起见,我们对 set 或 map 中的元素进行了限制:它们必须按字典顺序添加。这是一个繁重的限制,但我们稍后会看到如何减轻它。
  To keep things simple, we place a restriction on the elements in our set or map: they must be added in lexicographic order. This is an onerous restriction, but we will see later how to mitigate it.
  为了激发有限状态机构造的灵感,让我们尝试谈谈。
  To motivate construction of finite state machines, let’s talk about tries.trie构造 Trie construction
  可以将 trie 视为确定性非循环有限状态接受器。因此,您在上一节中学到的关于有序集合的所有内容同样适用于它们。trie 和本文中显示的 FSA 之间的唯一区别是 trie 允许在密钥之间共享前缀,而 FSA 允许共享前缀和后缀。
  A trie can be thought of as a deterministic acyclic finite state acceptor. Therefore, everything you learned in the previous section on ordered sets applies equally well to them. The only difference between a trie and the FSAs shown in this article is that a trie permits the sharing of prefixes between keys while an FSA permits the sharing of both prefixes and suffixes.
  考虑一个带有key mon, tues和thurs的集合。以下是受益于共享前缀和后缀的FSA:
  Consider a set with the keys mon, tues and thurs. Here is the corresponding FSA that benefits from sharing both prefixes and suffixes:
  这是对应的 trie,它只共享前缀:
  And here is the corresponding trie, which only shares prefixes:
  请注意,现在有三个不同的最终状态,并且keys tues和thurs需要复制s的最终转换到最终状态.
  Notice that there are now three distinct final states, and the keys tues and thurs require duplicating the final transition for s to the final state.
  构造一个 trie 相当简单。给定要插入的新key,只需执行正常查找即可。如果输入已用尽,则应将当前状态标记为最终状态。如果机器在输入耗尽之前停止,因为没有有效的转换可以遵循,那么只需为每个剩余的输入创建一个新的转换和节点。最后创建的节点应标记为最终节点。
  Constructing a trie is reasonably straight-forward. Given a new key to insert, all one needs to do is perform a normal lookup. If the input is exhausted, then the current state should be marked as final. If the machine stops before the input is exhausted because there are no valid transitions to follow, then simply create a new transition and node for each remaining input. The last node created should be marked final.FSA构造 FSA construction
  回想一下,trie 和 FSA 之间的唯一区别是 FSA 允许在key之间共享后缀。由于 trie 本身就是一个 FSA,我们可以构造一个 trie,然后应用一个通用的最小化算法,这将实现我们共享后缀的目标。
  Recall that the only difference between a trie and an FSA is that an FSA permits the sharing of suffixes between keys. Since a trie is itself an FSA, we could construct a trie and then apply a general minimization algorithm, which would achieve our goal of sharing suffixes.
  然而,一般的最小化算法在时间和空间上都可能很昂贵。例如,trie 通常比在key后缀之间共享结构的 FSA大得多。相反,如果我们可以假设键是按字典顺序添加的,我们可以做得更好。基本技巧是意识到在插入新key时,FSA 的任何不与新key共享前缀的部分都可以被冻结。也就是说,添加到 FSA 的任何新key都不可能使 FSA 的该部分更小。
  However, general minimization algorithms can be expensive both in time and space. For example, a trie can often be much larger than an FSA that shares structure between suffixes of keys. Instead, if we can assume that keys are added in lexicographic order, we can do better. The essential trick is realizing that when inserting a new key, any parts of the FSA that don’t share a prefix with the new key can be frozen. Namely, no new key added to the FSA can possibly make that part of the FSA smaller.
  一些图片可能有助于更好地解释这一点。再次考虑key mon, tues 和 thurs。由于我们必须按字典顺序添加它们,因此我们将mon先添加,然后再添加thurs和thes。这是添加第一个key后FSA的样子:
  Some pictures might help explain this better. Consider again the keys mon, tues and thurs. Since we must add them in lexicographic order, we’ll add mon first, then thurs and then tues. Here’s what the FSA looks like after the first key has been added:
  这是不是很有趣。下面是我们插入thurs时发生的情况:
  This isn’t so interesting. Here’s what happens when we insert thurs:
  插入thurs导致第一个key mon被冻结(由图像中的蓝色表示)。当 FSA 的特定部分被冻结时,我们就知道它将来永远不需要修改。即,由于所有将来添加的键都将是>= thurs,我们知道未来的键不会以mon开头。这很重要,因为它让我们可以重用自动机的那一部分,而不必担心它将来是否会改变。换句话说,蓝色的状态是其他key重用的候选状态。
  The insertion of thurs caused the first key, mon, to be frozen (indicated by blue coloring in the image). When a particular part of the FSA has been frozen, then we know that it will never need to be modified in the future. Namely, since all future keys added will be >= thurs, we know that no future keys will start with mon. This is important because it lets us reuse that part of the automaton without worrying about whether it might change in the future. Stated differently, states that are colored blue are candidates for reuse by other keys.
  虚线表示thurs尚未实际添加到 FSA。实际上,添加它需要检查是否存在任何可重用的状态。不幸的是,我们还不能这样做。例如,状态3和8是等价的:两者都是最终的,都没有任何转换。但是, 状态8永远等于状态3是不正确的。即,我们添加的下一个键可以是,例如 thursday。这会将更改状态8为具有d转换,这将使其不等于状态3。因此,我们还不能完全断定自动机中的key thurs是什么样子的。
  The dotted lines represent that thurs hasn’t actually been added to the FSA yet. Indeed, adding it requires checking whether there exists any reusable states. Unfortunately, we can’t do that yet. For example, it is true that states 3 and 8 are equivalent: both are final and neither has any transitions. However, it is not true that state 8 will always be equal to state 3. Namely, the next key we add could, for example, be thursday. That would change state 8 to having a d transition, which would make it not equal to state 3. Therefore, we can’t quite conclude what the key thurs looks like in the automaton yet.
  让我们继续插入下一个key tues:
  Let’s move on to inserting the next key, tues:
  在添加tues的过程中,我们推断出key thurs一部分hurs 可以被冻结。为什么?因为keys是按字典顺序插入的,因此没有将来插入的key可能会最小化所采用路径hurs。例如,我们现在知道 key thursday不可能是集合的一部分,所以我们可以得出结论,thurs的最终状态thurs等价于mon的最终状态.它们都是最终的并且都没有转换,这将永远是正确的.
  In the process of adding tues, we deduced that the hurs part of the thurs key could be frozen. Why? Because no future key inserted could possibly minimize the path taken by hurs since keys are inserted in lexicographic order. For example, we now know that the key thursday cannot ever be part of the set, so we can conclude that the final state of thurs is equivalent to the final state of mon: they are both final and both have no transitions, and this will forever be true.
  请注意,状态4仍然是点状的:状态4可能会在随后的key插入时发生变化,因此我们还不能认为它等于任何其他状态。
  Notice that state 4 remained dotted: it is possible that state 4 could change upon subsequent key insertions, so we cannot consider it equal to any other state just yet.
  让我们再添加一个key来分析。考虑插入zon:
  Let’s add one more key to drive the point home. Consider the insertion of zon:
  我们在这里看到状态4终于被冻结了,因为后面插入的zon不可能改变状态4。此外,我们还可以得出结论thurs和tues共享一个共同的后缀,并且确实,状态7和9(来自上图中)是等价的,因为它们都不是最终的,并且都具有指向相同状态的输入为s的单个转换。关键是它们的两个s转换都指向相同的状态,否则我们不能重用相同的结构。
  We see here that state 4 has finally been frozen because no future insertion after zon can possibly change the state 4. Additionally, we could also conclude that thurs and tues share a common suffix, and that, indeed, states 7 and 9 (from the previous image) are equivalent because neither of them are final and both have a single transition with input s that points to the same state. It is critical that both of their s transitions point to the same state, otherwise we cannot reuse the same structure.
  最后,我们必须表示我们已完成插入键。我们现在可以冻结FSA的最后一部分, zon,并寻找冗余结构:
  Finally, we must signal that we are done inserting keys. We can now freeze the last portion of the FSA, zon, and look for redundant structure:
  当然,由于mon和zon共享一个共同的后缀,确实存在冗余结构。也就是说,前一个图像中的状态9在各个方面都等同于状态1。这是正确的,因为状态10和11也等价于状态2和3。如果这不是真的,那么我们就不能考虑状态9和1是一致的。例如,如果我们将key mom插入到我们的集合中,并且仍然假设状态9和1相等,那么生成的 FSA 将如下所示:
  And of course, since mon and zon share a common suffix, there is indeed redundant structure. Namely, the state 9 in the previous image is equivalent in every way to state 1. This is only true because states 10 and 11 are also equivalent to states 2 and 3. If that weren’t true, then we couldn’t consider states 9 and 1 equal. For example, if we had inserted the key mom into our set and still assumed that states 9 and 1 were equal, then the resulting FSA would look something like this:
  这是错误的!为什么?因为这个 FSA 会声称key zom在集合中——但我们从未真正添加它。
  And this would be wrong! Why? Because this FSA will claim that the key zom is in the set—but we never actually added it.
  最后,值得注意的是,这里概述的构造算法可以以O(n)运行,其中n指的是key的数量。很容易看出,假设在每个状态中查找转换需要恒定的时间,那么在不检查冗余结构的情况下最初将键插入FST不会比遍历键中的每个字符花费更长的时间。更棘手的一点是:我们如何在恒定时间内找到冗余结构?简短的回答是一个哈希表,但我将在实践中的构造部分解释一些挑战。
  Finally, it is worth noting that the construction algorithm outlined here can run in O(n) time where n is the number of keys. It is easy to see that inserting a key initially into the FST without checking for redundant structure does not take any longer than looping over each character in the key, assuming that looking up a transition in each state takes constant time. The trickier bit is: how do we find redundant structure in constant time? The short answer is a hash table, but I will explain some of the challenges with that in the section on construction in practice.

小满小得盈满,刚刚好麦穗初齐稚子娇,桑叶正肥蚕食饱。进入小满时节,温润的南风使夏季向北推进,也带来了更多夏日物产。小满三候为苦菜秀,靡草死,麦秋至。小满节气后,苦菜繁茂起来一些喜阴的细软草类再也受不了空气能助力山东省实现全省80清洁取暖率为全面贯彻落实中共中央国务院办公厅关于推动城乡建设绿色发展的意见,进一步提高城乡建设绿色发展水平,助力国家早日实现碳达峰碳中和目标,近日,山东省人民政府经研究,印发山东省人民政府办美丽外表下藏着无法想象的罪恶,2019年,这个印度女人震惊了全国美丽无辜的外表下,藏着蛇蝎一样的心肠温柔有礼的举止背后,暗藏着汹涌的致命杀机。2019年,一个印度女子引爆了全国的舆论。她叫乔莉约瑟夫,出生于富裕家庭,执教于印度著名大学,嫁入了当毁灭俄罗斯舰队,美国更大阴谋正在筹划,局势极有可能再度升级最近,俄方对外发布消息称,美国正在准备向乌克兰援助其鱼叉反舰导弹和海军打击导弹。本来美国向乌克兰援助武器,就已经是火上浇油了,现在美国正在筹划更大的阴谋,这是要毁灭俄罗斯舰队的节奏从5999元跌至3499元,12GB256GB60倍变焦,无奈依然少人问津发布新手机是手机厂商们每年必做的事情,而且大型手机厂商几乎每个月都会举办新机发布会,部分手机品牌每年所发布的新机数量甚至可能超过20款,因为有时候一场发布会可能会发布多款新机,不过C919完成正式首飞,欧美却拒发适航证,未来可能无法飞出国门?由于我国科技起步较晚,以至于在某些关键领域一直都依赖从西方国家进口,比如商业民航,虽然我国拥有全球最大的民航市场,但所用到的飞机却全部都是美国波音和法国空客这些海外机型,这相当于是docker环境Ubuntu20。0464版Centos7docker版本20。10。16dockercompose版本1。25。1redis镜像版本docker。ioredis6。2。5al名宿迪恩直播吧5月21日讯最近几天,多家英国媒体称曼联门将迪恩亨德森接近加盟纽卡,俱乐部名宿保罗帕克对此事发表了自己的看法。保罗帕克说道我个人觉得迪恩亨德森应该离开曼联加盟纽卡,我们应该给3岁分床,5岁分房骗了太多家长!到底娃在什么年龄适合分床?不知何时起,育儿界就开始流传3岁分床,5岁分房的言论。甚至有人说三岁四岁不分房,五岁六岁悔断肠。于是,不少父母就把这种年龄建议当成了铁律。他们掐着时间执行,强迫孩子分床分房,生怕错没想到,59岁的杨紫琼会用一部黄暴污的电影,成为今年的王炸爆了,爆了,瞬息全宇宙这部电影彻底爆了。要说今年最让人期待的电影,一定少不了这部杨紫琼主演的瞬息全宇宙。早在今年3月份的西南偏南电影节亮相后,这部影片就收获广泛好评,有各路媒体的争欢迎光临张佳宁甜美,白百何有范儿,这个年轻女孩也很有灵气欢迎光临追剧到现在,剧中有这样四位女神给人留下了非常深刻的印象,按照出场顺序来说包括,张佳宁饰演的九斤白百何饰演的郑有恩薛昊婧饰演的冯远叶以及柴碧云饰演的佟娜娜,四位女神不管是身份
中超国安3比1逆转广州队张玉宁打进第19球23日下午,北京国安迎来了本赛季中超的第32轮比赛,球队在先丢1球的情况下,在上半场就由闫雨段德智和张玉宁将比分逆转为3比1,并将胜利保持到终场。多赛一场的国安取得3连胜,暂时升至走出静脉曲张的常见误区静脉曲张的分期微风误区一下肢静脉曲张可以不治疗很多患者认为下肢静脉曲张不是大问题,可以不治疗任其发展,实际上下肢静脉曲张如果不治疗,可能发展为下肢溃疡(老烂腿)下肢静脉血栓下肢骨髓认知误区,别踩坑疫情期间膳食指导,中国疾控中心专家给您十条建议老年人的第六条是,新鲜水果不少于200克,多选富含维生素C的柠檬橘子柚子樱桃等水果。其实,这些水果并不富含维生素C,如樱桃维生素含量只恒创科技对于DDoS攻击防御有哪些误区?DDoS攻击是属于常见网络攻击之一,也是一种较难防御的网络攻击。它的特点就是易于启动难防御有害难跟踪等。因此DDoS攻击也是众多站长最怕遇见的网络攻击。那么大家在使用海外服务器时,粗粮吃得越多越好?这些误区要避免有人总觉得健康生活必须得多食粗粮,因此顿顿吃天天吃,什么种类的粗粮都来点,百无禁忌。然而,粗粮好处虽多,却不是吃得越多越好,也并非人人都适合。今天,我们通过生活中关于粗粮的最常见问艾雪皇后提醒您养生四大误区黄帝内经说,下工治已病之病,谓之医疗中工治欲病之病,谓之保健上工治未病之病,谓之养生。找对适合自己的养生方法,可以拥有健康好身体,体会到中医的无穷魅力。但是,现在人们对于养生的理解嘉御资本卫哲走出创新的误区当下创新的第一要务是提升效率。文中国企业家记者王欣编辑马吉英头图来源中企图库12月10日,嘉御资本创始合伙人兼董事长卫哲参加了由中国企业家杂志社主办的第二十届中国企业领袖年会暨第二常见坚果的种类和特点坚果中富含蛋白质膳食纤维等营养物质,被称为零食界的健康零食。坚果是作物生长初期的生命包,营养物质非常丰富,尤其脂肪含量很高。如果按照种类来分的话,可以分为树坚果和种子类坚果树坚果核亚足联赛事从2024年取消外援注册名额限制12月23日下午,亚足联通过官方渠道宣布,其竞赛委员会已通过了几项事关其主办俱乐部赛事规则改革的动议,决定推出全新亚足联俱乐部赛事竞赛体系,相应推出的还有赛事外援名额放开及参赛俱乐C罗终于笑了!泳池自拍秀肱二头肌,2亿年薪送上门,仍不退役当地时间12月23日,葡萄牙巨星C罗终于更新了个人INS状态,分享了一张躺在泳池里的笑脸照,这也是世界杯出局之后,C罗首次露出久违的灿烂笑容,似乎已经收拾好止步8强的心情了,此外C四川发挥党支部纪检委员前哨作用着力弥补基层监督弱项中央纪委国家监委网站侯荣通讯员杨智淋通过参与案件初核,我的纪法知识进一步丰富,对纪检监察业务的理解也更加深入。近日,在参加下属企业原董事长违纪案调查后,四川能投集团某二级公司党支部