Suffix Trees and Ukkonen’s Algorithm in Swift Part 2

This article is a continuation of Suffix Trees and Ukkonen’s Algorithm in Swift Part 1.

I mentioned at the end of part 1, that we would apply a series of optimizations and tricks to get to an O(n) algorithm.

Optimizations

So in this article we will talk about various optimizations. As is the practice in an introduction, let me tell you what I am going to tell you, then let me tell you, then let me tell you what I told you. So without further ado, we will talk about suffix links, various tips and speed implementation tricks, active points, we will do a walkthrough of a comprehensive example, and finally we will look at some implementation details in Swift. Then of course I will tell you again what I just told you.

Suffix Links

What is a suffix link? If you look up the definition you will see the following:

For an internal node A with path label where x denotes a single character and α denotes a possibly empty substring, if there is another node B with path label α, then a pointer from A to B is called a suffix link.

Wow, isn’t that a mouthful? So what does this mean? What this means is that if you have a node A which has path-label ana and another node B that has path label na then a pointer exists from A to B and we call that pointer a suffix link.

In the tree below, there is a suffix link from the node labeled A to the node labeled B since A is an internal node and A has path label ana and B has path label na. See the turquoise arrow below:

But remember that pesky part of the definition possibly empty ? So what that means is that if you have an internal node A with a single character path label, such as the label a in the tree below, then it points to a node whose path is empty? What node has an empty path? ROOT. See the turquoise arrow below:

Use of Suffix Links to gain speed

Now that we know exactly what a suffix link is, we need to understand how they help us gain speed.
Let’s consider that we are in phase i+1 step j. The algorithm dictates that we need to find the end of the path S[j..<(i+1)] in the tree. If I am trying to locate node B then, let’s say I had kept a record of the end of path S[j-1..<(i+1)] namely A, in that case all I would need to do is walk up one node, follow the suffix link and then walk down… This is going to give us a better run time than traversing from root.
What this means is that if we create an internal node B in some extension j of some phase i and that is not the first internal node created in that extension (i.e. we have previously created an internal node A in a previous extension j-k of the same phase i where 0 < k <= j), then we can create a suffix link from A to B. In code what we will typically do is as soon as we create an internal node we will set its suffix link to point to root. Then later if we end up creating another internal node in another extension of the same phase, we will change the suffix link to point to that new internal node rather than root.

This is the basic idea. I realize there is a little bit of hand waving here, but this will become alot clearer shortly.

Skip / Count Trick

Another optimization we can do is the following. After we traverse a suffix link from C to D in the above tree we know that if there exists a path with label def from C to leaf, then there must also exist a path with label def from D to leaf since D is the suffix link of C. Using this knowledge we can quickly skip over edges if the length of the edge label is less than the number of characters we are trying to match. Now it would be great if we could obtain the length of an edge in constant time. Well, yes we can.

Edge Label Compression

You may have noticed that I have been including some strange numbers underneath the edge labels. For example under na above I have [2,3] and under a I have [1,1]. Rather than copy substrings all over the place we have a global reference to a String and all edge labels refer to the substrings which they define by using indices. So [2,3] means string[2...3] in Swift. As an aside, you can’t actually do that in Swift. You cannot create a substring using integer subscripts. In fact, I wrote an article just on that topic. So if you would like to see how you can make strings integer subscriptable in Swift, I encourage you to read my article Making Strings Integer-Subscriptable.
You may be wondering why there are several labels listed as [0,#] for example with a # instead of a number. Patience you must have my young padawan.

Rule 3 is a show stopper

If we are in phase i+1 step j, that means we are trying to locate the end of path S[j..<(i+1)] in the tree. Now if a Rule 3 extension applies it means that the path p = S[j..<(i+1)] continues with S[i+1]. It stands to reason that if path p = S[j..<(i+1)] continues with S[i+1], then so too will q = S[j+1..<(i+1)] continue with S[i+1] and so too will path r = S[j+2..<(i+1)] continue with S[i+1], and so on. Therefore, during a given phase of i, if a Rule 3 extension is encountered then we can short circuit and get out of the loop. This seems like a good place to use a break as we will see later. This trick will also afford us some speed increases.

Once a leaf, always a leaf

If you look at the definition of the algorithm I gave in Part 1, you will see that Rule 1 and Rule 3 never create any nodes. Rule 2 will either add a new node to root (thereby adding a leaf node), will add a new leaf node to some other node, or will insert a new internal node and add a leaf node. At no point does a leaf node ever become an internal node. We can exploit this to quickly handle all of the Rule 1 extensions for a given phase i in constant time. Recall that we are now storing indices into the string S in lieu of edge labels. For edges which terminate at a leaf node, rather than storing a scalar integer as the end index, we can store a pointer to the global end. This is what I referred to as # in the diagrams (a global end variable). Then for any phase i we need only increment the global end by 1 and we have instantly performed all the Rule 1 extensions for phase i. Now in c or c++ this is relatively straightforward, you just make sure you pass the address of the global end &end and you are good to go. But this requires a little extra work in Swift. We could always just declare the global end an inout type and pass it around by pointer, but this definitely doesn’t feel swifty. Since Int is a struct in swift, a value-type allocated on the stack, we can construct a class based wrapper around Int. Since the swift class is a reference type allocated on the heap, we are always referring to it by pointer anyway. In Swift, this looks something like this.

These tricks plus the definitions from Part 1 comprise Ukkonen’s algorithm and will allow you to construct a suffix tree in linear time.

Now I will talk about implementation.

Implementation

At the end of the day, what I would like to have is a SuffixTree data type that would allow me to construct a tree from a string or series of strings and then perform queries on it so it makes sense that SuffixTree will be the exposed datatype. Internally, I am dealing with edges and nodes. Some implementations only model nodes. I prefer to model both edges and nodes.

Active Point

Recall the example of Ukkonen’s algorithm that was given at the end of the last post, and that there are n phases with each phase having at most i steps. At most, because if we incorporate the tricks introduced here, that in a Rule 3 step the entire phase will short circuit and exit early. Also notice that phase 1 starts with Rule 2 and all of the remaining steps start with Rule 1. Notice further that all of the Rule 1 extensions will be performed, followed by zero or more Rule 2 extensions followed by zero or more Rule 3 extensions.

Also using the trick above, all of the Rule 1 extensions can be handled in constant time using a pointer to the global end variable. For Rules 2 & 3, we need to traverse down the tree to determine if the path exists. Assume we have the tree below, and we are in some phase i where we have completed all of the Rule 1 extensions for some string S = ...abcdef....

So now at the point where all the Rule 1 extensions have been performed, we need to check if a exists in the tree. Sure enough we see that it does exist. And according to Rule 3, the algorithm states that we do nothing. And we also know that now the phase will short circuit and the next phase will start. In the next phase we will again perform all the Rule 1 extensions and we will get to the point where we need to check if ab exists in the tree. Again we start from root and traverse. Eventually you can see where this is going. Because a check in some phase is going to build upon the location of the previous phase, we can keep track of the active node from where we can begin our checks. Let’s define an Active Point which is comprised of an Active Node, an Active Edge and an Active Length. This lets us know from which node to commence our traversal, so we don’t always have to start traversal from root. The Active Edge lets us know on which edge to walk down, and the Active Length lets us know how many characters along that edge were previously considered.

Rules governing alteration of the Active Point

Under certain conditions we alter the active point so that subsequent traversals are as efficient as possible. There are certain rules for alteration of the active point based on Rule 2 or Rule 3 extensions. However, before we introduce those rules, we need to discuss one other variable.
Since we can handle all of the Rule 1 extensions in one fell swoop by incrementing the global end variable by one, it makes sense to keep track of the remaining “real” suffixes that need to be inserted into the tree. We will call this variable remaining.

Rule 2 Active Point Alterations

When a Rule 2 extension is applied we consider two cases:
if Active Node == root AND Active Length > 0 then we:
1. Decrement Active Length by 1
2. Set Active Edge to S[i - remaining + 1]
if Active Node != root
1. Follow the suffix link from the Active Node and that becomes the new Active Node
2. No change to Active Edge or Active Length

Rule 3 Active Point Alterations

When a Rule 3 extension is applied, Active Node remains the same, Active Edge remains the same, and Active Length is incremented by 1.

Walkdown Rules for Active Point Alteration

If at any point when walking down an edge, another internal node is encountered, then that node will become the Active Node, and Active Edge and Active Length will change make sense with the new active node.

In that case the algorithm from the last post can be further summarized in a more optimized way as such in the general form:

  • Initialize remaining to 0.
  • Initialize end to -1.
    • Read in next character of string S
    • Increment the global end variable by one.
    • Increment remaining by one.
    • Enter into a loop remaining times.
      • if the active length is zero then set the active edge to the current character
      • check if there is an edge emanating from the active node for the active edge.
        • If no, then invoke Rule 2 and add a leaf node and possibly an internal node if needed. Then decrement remaining since a real suffix was added. If, during the furtherance of invoking Rule 2 an internal node is created, we set that node’s suffix link to root and keep track of that node. If this is not the first internal node, then create a suffix link from the previous internal node to the current one. We also check to make sure we alter the Active Point if needed according to the Rule 2 alteration rules above.
        • If yes, then we check to see if the current Active Length is less than the current edge length. If so we can compare the character at the active point and the current character in our string. If they are equal, then we can invoke Rule 3, otherwise we must create it using Rule 2. If, however, we found that the current active length was greater than the edge length we need to perform a walkdown (changing active node, active edge, active length as needed) and we need to keep performing the walk down until such time as Active Length < Edge Length, then we compare the character at the new active point to the current character. If they are equal, then we can invoke Rule 3, otherwise we must create it using Rule 2. If we invoked Rule 2 we must remember to decrement remaining since a suffix was added. Then whether we invoked Rule 2 or Rule 3 we need to perform a suffix link check and an active point alteration check and set those accordingly. In either case once Rule 3 is encountered we can break out of the loop and advance to the next phase.

This is the general idea of the O(n) implementation of Ukkonen’s algorithm. Let’s do a walk through.

Walk Through of Suffix Tree Creation for mississippi

To really understand the Active Point and all of the implementation tricks, we will do a complete walk through of the string mississippi$ for all 11 phases. Also, convention dictates that the Active Point is referred to as a tuple of (Active Node, Active Edge, Active Length)

Let’s initialize some variables.

Active Node is set to root.
Active Edge is set to nil.
Active Length is set to 0.
In this case this means the Active Point is (root,nil,0)
end is set to -1.
remaining is set to 0.

We start with i=0. We read in the char at S[i] = S[0] = m:
Increment end to 0 (This takes care of all the Rule 1 extensions in one fell swoop, of course in this case there are none yet).
Increment remaining to 1
We enter a loop remaining = 1 time:

  • Since Active Length == 0 we set Active Edge to be equal to the current character which is m. So now the Active Point is (root,m,0).
  • We check if there is an edge emanating from the Active Node for Active Edge. If not, we must create it in accordance with Rule 2.
  • Once we create the node and edge, since we performed a Rule 2 extension we decrement remaining to 0.

Now i=1. We read in the char at S[i] = S[1] = i:
Increment end to 1. (This takes care of all the Rule 1 extensions in one fell swoop).
Increment remaining to 1
We enter a loop remaining = 1 time:

  • Since Active Length == 0 we set Active Edge to be equal to the current character which is i. So now the Active Point is (root,i,0).
  • We check if there is an edge emanating from the Active Node for Active Edge. If not, we must create it in accordance with Rule 2.
  • Once we create the node and edge, since we performed a Rule 2 extension we decrement remaining to 0.

Now i=2. We read in the char at S[i] = S[2] = s:
Increment end to 2. (This takes care of all the Rule 1 extensions in one fell swoop).
Increment remaining to 1
We enter a loop remaining = 1 time:

  • Since Active Length == 0 we set Active Edge to be equal to the current character which is s. So now the Active Point is (root,s,0).
  • We check if there is an edge emanating from the Active Node for Active Edge. If not, we must create it in accordance with Rule 2.
  • Once we create the node and edge, since we performed a Rule 2 extension we decrement remaining to 0.

Now i=3. We read in the char at S[i] = S[3] = s:
Increment end to 3. (This takes care of all the Rule 1 extensions in one fell swoop).
Increment remaining to 1
We enter a loop remaining = 1 time:

  • Since Active Length == 0 we set Active Edge to be equal to the current character which is s. So now the Active Point is (root,s,0).
  • We check if there is an edge emanating from the Active Node for Active Edge. There is. So we compare the Active Length of 0 with length(Active Edge) of 2. Since active point falls on the middle of the edge, no walkdown is needed and this we compare the active point with the current character. The active point is s and the current character is also s so we have a Rule 3 extension. We can immediately break out of the loop and advance to the next phase. However before doing that we need to do two things, a suffix link check and an active point check. No internal node has been created any time during this phase so there is nothing to do on the suffix link check. For the active point check, we know that after a Rule 3 extension we increment the Active Length by 1 so we do that. Now the Active Point is (root,s,1).

Now i=4. We read in the char at S[i] = S[4] = i:
Increment end to 4. (This takes care of all the Rule 1 extensions in one fell swoop).
Increment remaining to 2
We enter a loop remaining = 2 times:

  • Since Active Length > 0 we don’t need to change the Active Edge.
  • We check if there is an edge emanating from the Active Node (root) for Active Edge (s). There is. So we compare the Active Length of 1 with length(Active Edge) of 3. Since active point falls on the middle of the edge, no walkdown is needed and this we compare the active point with the current character. The active point is s and the current character is i. Since the characters are not equal, we need to perform a Rule 2 extension.
  • We break the ssi edge after the first s and add an internal node there. We will label this internal node A. We will then add the si edge to A and also create a new edge for the current character i also emanating from A.
  • Now that we performed a Rule 2 extension we need to decrement remaining to 1.
  • We perform the Suffix link check. Since an internal node was created we can direct its suffix link to point to root.
  • We also need to perform the Active point alteration check. We have the Active Node set to root and the Active Length of 1 > 0 so we can follow the Rule 2 Active Point Alteration rule and decrement Active Length by 1 and set Active Edge to S[i - remaining + 1] which is S[4-1+1] or S[4] or i. So the Active Point is now (root, i, 0).

Now we enter the 2nd iteration of the loop.

  • Since Active Length == 0 we set the Active Edge to the current character which is i.
  • We check if there is an edge emanating from the Active Node for the Active Edge. Yes there is an i edge.
  • So we compare lengths to see if a walkdown is necessary. The Active Length is 0 and the length(Active Edge) is 4. Thus no walkdown is needed.
  • We check if the active point i is equal to the current character i. Yes it is. So we have a Rule 3 extension. We will be able to break out of the loop, but before doing so, some housekeeping.
  • We perform a suffixlink check. Since we did not create any internal node in this extension step, the internal node from the previous extension, node A continues having its suffix link pointed at root.
  • We perform an Active Point Alteration for Rule 3 check which means that we need to increment the Active Length by 1. Now the Active Point is (root,i,1).

Now i=5. We read in the char at S[i] = S[5] = s:
Increment end to 5. (This takes care of all the Rule 1 extensions in one fell swoop).
Increment remaining to 2
We enter a loop remaining = 2 times:

  • Since Active Length > 0 we don’t need to change the Active Edge.
  • We check if there is an edge emanating from the Active Node (root) for Active Edge (i). There is.
  • So we compare the Active Length of 1 with length(Active Edge) of 5. Since the active point falls on the middle of the edge, no walkdown is needed and this we compare the active point with the current character. The active point is s and the current character is s. Since the characters are equal, we need to perform a Rule 3 extension. We can break out of the loop, but as usual our normal housekeeping.
  • For our suffixlink check, we have not created any internal node at all in this phase so there is nothing to do there.
  • For our Active Point Alteration check we increment Active Length to 2. So now Active Point is now (root, i, 2).

Now i=6. We read in the char at S[i] = S[6] = s:
Increment end to 6. (This takes care of all the Rule 1 extensions in one fell swoop).
Increment remaining to 3
We enter a loop remaining = 3 times:

  • Since Active Length > 0 we don’t need to change the Active Edge.
  • We check if there is an edge emanating from the Active Node (root) for Active Edge (i). There is.
  • So we compare the Active Length of 2 with length(Active Edge) of 6. Since the active point falls on the middle of the edge, no walkdown is needed and this we compare the active point with the current character. The active point is s and the current character is s. Since the characters are equal, we need to perform a Rule 3 extension. We can break out of the loop, but as usual our normal housekeeping.
  • For our suffixlink check, we have not created any internal node at all in this phase so there is nothing to do there.
  • For our Active Point Alteration check we increment Active Length to 3. So now Active Point is now (root, i, 3).

Now i=7. We read in the char at S[i] = S[7] = i:
Increment end to 7. (This takes care of all the Rule 1 extensions in one fell swoop).
Increment remaining to 4
We enter a loop remaining = 4 times:

  • Since Active Length > 0 we don’t need to change the Active Edge.
  • We check if there is an edge emanating from the Active Node (root) for Active Edge (i). There is.
  • So we compare the Active Length of 3 with length(Active Edge) of 7. Since the active point falls on the middle of the edge, no walkdown is needed and this we compare the active point with the current character. The active point is i and the current character is i. Since the characters are equal, we need to perform a Rule 3 extension. We can break out of the loop, but as usual our normal housekeeping.
  • For our suffixlink check, we have not created any internal node at all in this phase so there is nothing to do there.
  • For our Active Point Alteration check we increment Active Length to 4. So now Active Point is now (root, i, 4).

Now i=8. We read in the char at S[i] = S[8] = p:
Increment end to 5. (This takes care of all the Rule 1 extensions in one fell swoop).
Increment remaining to 5
We enter a loop remaining = 5 times:

First iteration of the loop. (for extension issip)

  • Since Active Length > 0 we don’t need to change the Active Edge.
  • We check if there is an edge emanating from the Active Node (root) for Active Edge (s). There is.
  • We compare the Active Length of 4 with length(Active Edge) of 8. Since active point falls on the middle of the edge, no walkdown is needed.
  • We compare the active point with the current character. The active point is s and the current character is p. Since the characters are not equal, we need to perform a Rule 2 extension.
  • We break the ississip edge after issi and add an internal node there. We will label this internal node B. We will then add the ssip edge to B and also create a new edge for the current character p also emanating from B.
  • Now that we performed a Rule 2 extension we need to decrement remaining to 4.
  • We perform the Suffix link check. Since an internal node B was created we can direct its suffix link to point to root. We will also keep track of this node in case we end up making more internal nodes in this phase.
  • We also need to perform the Active Point Alteration for Rule 2 check. We have the Active Node set to root and the Active Length of 4 > 0 so we can follow the Rule 2 Active Point Alteration rule and decrement Active Length by 1 and set it to 3 and set Active Edge to S[i - remaining + 1] which is S[8-4+1] or S[5] or s. So the Active Point is now (root, s, 3).
  • Because this is a complicated phase involving 5 extensions, I will break from tradition and show a tree diagram during each extension of this phase. But don’t get too used to it.

Now we enter the 2nd iteration of the loop. (for extension ssip)

  • Since Active Length > 0 we don’t need to change the Active Edge.
  • We check if there is an edge emanating from the Active Node of root for the Active Edge of s. Yes there is an s edge.
  • So we compare lengths to see if a walkdown is necessary. The Active Length is 3 and the length(Active Edge) is 1. Thus we need to do a walkdown.
  • Since the current extension is ssip and the Active Length is 3 we start doing a walkdown. We consume the first s from ssip and encounter node A. Remember when performing a walkdown, if another internal node is encountered, that node must become the Active Node and the Active Edge and Active Length must change to be in accordance therewith. So we set the Active Node to A and reduce the Active Length by 1 to 2. Since we still need to consume si from the current extension we must embark on the sissip edge. So we can set the Active Edge to be s. Since the length of the sissip edge of 6 is greater than the Active Length of 2, there will not be any need for a further walkdown. We can complete our walkdown with Active Point now being set to (A,s,2).
  • Now that the walkdown portion of the festivities are over, we compare current active point the 2nd s on the sissip edge with the current character p. There are not equal, so we will need to do a Rule 2 extension.
  • We break the sissip edge after si and create a new internal node C which will have as an edge the remainder of the broken edge ssip and also add a leaf node for p.
  • Since we performed a Rule 2 extension we decrement remaining from 4 to 3.
  • We perform our suffix link check. Since we just created node C and a previous extension in the current phase created node B. We point B‘s suffix link to C. We also point C‘s suffix link to root, and keep track of it as the last internal node created in this phase.
  • We also perform our Active Point Alteration check. Since the Active Node is not root, but rather is node A, we follow the suffix link from A and make that the new active node. A‘s suffix link is root. So the new Active Point is now (root,s,2).
  • A diagram ensues:

Now we enter the 3rd iteration of the loop. (for extension sip)

  • Since Active Length > 0 we don’t need to change the Active Edge.
  • We check if there is an edge emanating from the Active Node of root for the Active Edge of s. Yes there is an s edge.
  • So we compare lengths to see if a walkdown is necessary. The Active Length is 2 and the length(Active Edge) is 1. Thus we need to do a walkdown.
  • Since the current extension is sip and the Active Length is 2 we start doing a walkdown. We consume the first s from sip and encounter node A. Remember when performing a walkdown, if another internal node is encountered, that node must become the Active Node and the Active Edge and Active Length must change to be in accordance therewith. So we set the Active Node to A and reduce the Active Length by 1 to 1. Since we still need to consume i from the current extension we must embark on the issip edge. So we can set the Active Edge to be i. Since the length of the issip edge of 5 is greater than the Active Length of 1, there will not be any need for a further walkdown. We can complete our walkdown with Active Point now being set to (A,i,1).
  • Now that the walkdown portion of the festivities are over, we compare the current active point on the i edge which is the 1st s with the current character p. There are not equal, so we will need to do a Rule 2 extension.
  • We break the issip edge after i and create a new internal node D which will have as an edge the remainder of the broken edge ssip and also add a leaf node for p.
  • Since we performed a Rule 2 extension we decrement remaining from 3 to 2.
  • We perform our suffix link check. Since we just created node D and a previous extension in the current phase created node C. We point C‘s suffix link to D. We also point D‘s suffix link to root, and keep track of it as the last internal node created in this phase.
  • We also perform our Active Point Alteration check. Since the Active Node is not root, but rather is node A, we follow the suffix link from A and make that the new active node. A‘s suffix link is root. So the new Active Point is now (root,i,1).
  • A diagram ensues:

Now we enter the 4th iteration of the loop. (for extension ip)

  • Since Active Length > 0 we don’t need to change the Active Edge.
  • We check if there is an edge emanating from the Active Node of root for the Active Edge of i. Yes there is an i edge.
  • So we compare lengths to see if a walkdown is necessary. The Active Length is 1 and the length(Active Edge) is 4. Thus no walkdown is necessary. Since active point falls on the middle of the edge, no walkdown is needed.
  • We compare the active point with the current character. The active point is s and the current character is p. Since the characters are not equal, we need to perform a Rule 2 extension.
  • We break the issi edge after i and create a new internal node E which will have as an edge the remainder of the broken edge issi, namely ssi and also add a leaf node for p.
  • Since we performed a Rule 2 extension we decrement remaining from 2 to 1.
  • We perform our suffix link check. Since we just created node E and a previous extension in the current phase created node D. We point D‘s suffix link to E. We also point E‘s suffix link to root, and keep track of it as the last internal node created in this phase.
    • We also perform our Active Point Alteration check. Since the Active Node is root, and Active Length of 1 > 0, we decrement Active Length from 1 to 0 and we set the Active Edge to S[i - remaining + 1] which is S[8-1+1] or S[8] or p. So the new Active Point is now (root,p,0).
  • A diagram ensues:

Now we enter the 5th iteration of the loop. (for extension p)

  • Since Active Length == 0 we change the Active Edge to the current character which is p.
  • We check if there is an edge emanating from the Active Node of root for the Active Edge of p. There is no such edge. So we must create it in accordance with Rule 2
  • We add a leaf node with edge label p emanating from root.
  • Since we performed a Rule 2 extension we decrement remaining from 1 to 0.
  • We perform our suffix link check. Since we didn’t create any new internal node in this step, the previous internal node E remains with its suffix link pointed at root.
    • We also perform our Active Point Alteration check. Since the Active Node is root, but Active Length of 0 == 0, so there is no change here. So the Active Point is still (root,p,0).
  • A diagram ensues:

And we are finally done with that phase… onwards to the next phase.

Now i=9. We read in the char at S[i] = S[9] = p:
Increment end to 9. (This takes care of all the Rule 1 extensions in one fell swoop).
Increment remaining to 1
We enter a loop remaining = 1 time:

  • Since Active Length == 0 we change the Active Edge to the current character which is p.
  • We check if there is an edge emanating from the Active Node (root) for Active Edge (p). There is.
  • So we compare the Active Length of 0 with length(Active Edge) of 2. Since the active point falls on the middle of the edge, no walkdown is needed and this we compare the active point with the current character. The active point is p and the current character is p. Since the characters are equal, we need to perform a Rule 3 extension. We can break out of the loop, but as usual our normal housekeeping.
  • For our suffixlink check, we have not created any internal node at all in this phase so there is nothing to do there.
  • For our Active Point Alteration check we increment Active Length to 1. So now Active Point is now (root, p, 1).
  • A diagram ensues:

Now i=10. We read in the char at S[i] = S[10] = i:
Increment end to 10. (This takes care of all the Rule 1 extensions in one fell swoop).
Increment remaining to 2
We enter a loop remaining = 2 times:

Now we enter the 1st iteration of the loop. (for extension pi)

  • Since Active Length > 0 we don’t need to change the Active Edge.
  • We check if there is an edge emanating from the Active Node (root) for Active Edge (p). There is.
  • So we compare the Active Length of 1 with length(Active Edge) of 3. Since the active point falls on the middle of the edge, no walkdown is needed and this we compare the active point with the current character. The active point is points to the 2nd p and the current character is i. Since the characters are not equal, we need to perform a Rule 2 extension.
  • We break the ppi edge after the first p and create a new internal node F which will have as an edge the remainder of the broken edge ppi, namely pi and also add a leaf node for i.
  • Since we performed a Rule 2 extension we decrement remaining from 2 to 1.
  • We perform our suffix link check. Since we just created node F, we point F‘s suffix link to root, and keep track of it as the last internal node created in this phase.
    • We also perform our Active Point Alteration check. Since the Active Node is root, and Active Length of 1 > 0, we decrement Active Length from 1 to 0 and we set the Active Edge to S[i - remaining + 1] which is S[10-1+1] or S[10] or i. So the new Active Point is now (root,i,0).

Now we enter the 2nd iteration of the loop. (for extension i)

  • Since Active Length == 0 we change the Active Edge to the current character of i
  • We check if there is an edge emanating from the Active Node (root) for Active Edge (i). There is.
  • So we compare the Active Length of 0 with length(Active Edge) of 1. Since the active point falls on the middle of the edge, no walkdown is needed and this we compare the active point with the current character. The active point is points to the i and the current character is i. Since the characters are equal equal, we need to perform a Rule 3 extension. We can break out of the loop, but as usual our normal housekeeping.
  • For our suffixlink check, we have not created any internal node so the previous node F remains with suffix link pointed to root.
  • For our Active Point Alteration check we increment Active Length to 1. So now Active Point is now (root, i, 1).
    • A diagram ensues:

Now i=11. We read in the char at S[i] = S[11] = $:
Increment end to 11. (This takes care of all the Rule 1 extensions in one fell swoop).
Increment remaining to 2
We enter a loop remaining = 2 times:

Now we enter the 1st iteration of the loop. (for extension i$)

  • Since Active Length > 0 we don’t need to change the Active Edge.
  • We check if there is an edge emanating from the Active Node (root) for Active Edge (i). There is.
  • So we compare the Active Length of 1 with length(Active Edge) of 1. Since they are equal a walkdown is needed. We Consume 1 character and end up at node E. So the Active Node now becomes E and the Active Length is decremented from 1 to 0 accordingly. We now check if the next character on the “edge” (i.e. where the active point would be) is equal to the current character of $. It is not, so we need to effect a Rule 2 extension. Since we are at an actual node, no internal node needs to be created. We can simply create a leaf node from node E with edge label $.
  • Since we performed a Rule 2 extension we decrement remaining from 2 to 1.
  • We perform our suffix link check. No internal node has been previously saved in this phase, and we did not create one now. so there is nothing to do here.
    • We also perform our Active Point Alteration check. Since the Active Node is not root, we follow the suffix link from the active node or E and make that the new Active Node. E‘s suffix link points to root, so root becomes the Active Node now. So the new Active Point is now (root,i,0).

Now we enter the 2nd iteration of the loop. (for extension $)

  • Since Active Length == 0 we change the Active Edge to the current character of $
  • We check if there is an edge emanating from the Active Node (root) for Active Edge ($). There is not.
  • So we must create the leaf node and edge label $ in accordance with Rule 2 from root.
  • Since we performed a Rule 2 extension we decrement remaining from 1 to 0.
  • We perform our suffix link check. No internal node has been previously saved in this phase, and we did not create one now. so there is nothing to do here.
    • We also perform our Active Point Alteration check. Since the Active Node is root, and Active Length of 0 == 0, we do nothing here. So the Active Point is still (root,$,0).
    • A diagram ensues:

The Code

Finally, we can see how this would be implemented in Swift.