2.4/dev/core/src/com/google/gwt/dev/util/editdistance/ModifiedBerghelRoachEditDistance.java - gwt - Git at Google

 /*
  * Copyright 2010 Google Inc.
  *
  * Licensed under the Apache License, Version 2.0 (the "License"); you may not
  * use this file except in compliance with the License. You may obtain a copy of
  * the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing, software
  * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
  * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
  * License for the specific language governing permissions and limitations under
  * the License.
  */
 package com.google.gwt.dev.util.editdistance;

 /**
  * A modified version of a string edit distance described by Berghel and
  * Roach that uses only O(d) space and O(n*d) worst-case time, where n is
  * the pattern string length and d is the edit distance computed.
  * We achieve the space reduction by keeping only those sub-computations
  * required to compute edit distance, giving up the ability to
  * reconstruct the edit path.
  */
 public class ModifiedBerghelRoachEditDistance implements GeneralEditDistance {
   /*
    * This is a modification of the original Berghel-Roach edit
    * distance (based on prior work by Ukkonen) described in
    *   ACM Transactions on Information Systems, Vol. 14, No. 1,
    *   January 1996, pages 94-106.
    *
    * I observed that only O(d) prior computations are required
    * to compute edit distance.  Rather than keeping all prior
    * f(k,p) results in a matrix, we keep only the two "outer edges"
    * in the triangular computation pattern that will be used in
    * subsequent rounds.  We cannot reconstruct the edit path,
    * but many applications do not require that; for them, this
    * modification uses less space (and empirically, slightly
    * less time).
    *
    * First, some history behind the algorithm necessary to understand
    * Berghel-Roach and our modification...
    *
    * The traditional algorithm for edit distance uses dynamic programming,
    * building a matrix of distances for substrings:
    * D[i,j] holds the distance for string1[0..i]=>string2[0..j].
    * The matrix is initially populated with the trivial values
    * D[0,j]=j and D[i,0]=i; and then expanded with the rule:
    * <pre>
    *    D[i,j] = min( D[i-1,j]+1,       // insertion
    *                  D[i,j-1]+1,       // deletion
    *                  (D[i-1,j-1]
    *                   + (string1[i]==string2[j])
    *                      ? 0           // match
    *                      : 1           // substitution ) )
    * </pre>
    *
    * Ukknonen observed that each diagonal of the matrix must increase
    * by either 0 or 1 from row to row.  If D[i,j] = p, then the
    * then the matching rule requires that D[i+x,j+x] = p for all
    * x where string1[i..i+x) matches string2[j..j+j+x).  Ukkonen
    * defined a function f(k,p) as the highest row number in which p
    * appears on the k-th diagonal (those D[i,j] where k=(i-j), noting
    * that k may be negative).  The final result of the edit
    * distance is the D[n,m] cell, on the (n-m) diagonal; it is
    * the value of p for which f(n-m, p) = m.  The function f can
    * also be computed dynamically, according to a simple recursion:
    * <pre>
    *    f(k,p) {
    *      contains_p = max(f(k-1,p-1), f(k,p-1)+1, f(k+1,p-1)+1)
    *      while (string1[contains_p] == string2[contains_p + k])
    *        contains_p++;
    *      return contains_p;
    *    }
    * </pre>
    * The max() expression finds a row where the k-th diagonal must
    * contain p by virtue of an edit from the prior, same, or following
    * diagonal (corresponding to an insert, substitute, or delete);
    * we need not consider more distant diagonals because row-to-row
    * and column-to-column changes are at most +/- 1.
    *
    * The original Ukkonen algorithm computed f(k,p) roughly as
    * follows:
    * <pre>
    *    for (p = 0; ; p++) {
    *      compute f(k,p) for all valid k
    *      if (f(n-m, p) == m) return p;
    *    }
    * </pre>
    *
    * Berghel and Roach observed that many values of f(k,p) are
    * computed unnecessarily, and reorganized the computation into
    * a just-in-time sequence.  In each iteration, we are primarily
    * interested in the terminating value f(main,p), where main=(n-m)
    * is the main diagonal.  To compute that we need f(x,p-1) for
    * three values of x: main-1, main, and main+1.  Those depend on
    * values for p-2, and so forth.  We will already have computed
    * f(main,p-1) in the prior round, and thus f(main-1,p-2) and
    * f(main+1,p-2), and so forth.  The only new values we need to compute
    * are on the edges: f(main-i,p-i) and f(main+i,p-i).  Noting that
    * f(k,p) is only meaningful when abs(k) is no greater than p,
    * one of the Berghel-Roach reviewers noted that we can compute
    * the bounds for i:
    * <pre>
    *    (main+i &le p-i) implies (i &le; (p-main)/2)
    * </pre>
    * (where main+i is limited on the positive side) and similarly
    * <pre>
    *    (-(main-i) &le p-i) implies (i &le; (p+main)/2).
    * </pre>
    * (where main-i is limited on the negative side).
    *
    * This reduces the computation sequence to
    * <pre>
    *   for (i = (p-main)/2; i > 0; i--) compute f(main+i,p-i);
    *   for (i = (p+main)/2; i > 0; i--) compute f(main-i,p-i);
    *   if (f(main, p) == m) return p;
    * </pre>
    *
    * The original Berghel-Roach algorithm recorded prior values
    * of f(k,p) in a matrix, using O(distance^2) space, enabling
    * reconstruction of the edit path, but if all we want is the
    * edit *distance*, we only need to keep O(distance) prior computations.
    *
    * The requisite prior k-1, k, and k+1 values are conveniently
    * computed in the current round and the two preceding it.
    * For example, on the higher-diagonal side, we compute:
    * <pre>
    *    current[i] = f(main+i, p-i)
    * </pre>
    * We keep the two prior rounds of results, where p was one and two
    * smaller.  So, from the preceidng round
    * <pre>
    *    last[i] = f(main+i, (p-1)-i)
    * </pre>
    *  and from the prior round, but one position back:
    * <pre>
    *    prior[i-1] = f(main+(i-1), (p-2)-(i-1))
    * </pre>
    * In the current round, one iteration earlier:
    * <pre>
    *    current[i+1] = f(main+(i+1), p-(i+1))
    * </pre>
    * Note that the distance in all of these evaluates to p-i-1,
    * and the diagonals are (main+i) and its neighbors... just
    * what we need.  The lower-diagonal side behaves similarly.
    *
    * We need to materialize values that are not computed in prior
    * rounds, for either of two reasons: <ul>
    *    <li> Initially, we have no prior rounds, so we need to fill
    *     all of the "last" and "prior" values for use in the
    *     first round.  The first round uses only on one side
    *     of the main diagonal or the other.
    *    <li> In every other round, we compute one more diagonal than before.
    * </ul>
    * In all of these cases, the missing f(k,p) values are for abs(k) > p,
    * where a real value of f(k,p) is undefined.  [The original Berghel-Roach
    * algorithm prefills its F matrix with these values, but we fill
    * them as we go, as needed.]  We define
    * <pre>
    *    f(-p-1,p) = p, so that we start diagonal -p with row p,
    *    f(p+1,p) = -1, so that we start diagonal p with row 0.
    * </pre>
    * (We also allow f(p+2,p)=f(-p-2,p)=-1, causing those values to
    * have no effect in the starting row computation.]
    *
    * We only expand the set of diagonals visited every other round,
    * when (p-main) or (p+main) is even.  We keep track of even/oddness
    * to save some arithmetic.  The first round is always even, as p=abs(main).
    * Note that we rename the "f" function to "computeRow" to be Googley.
    */

   private static final int [] EMPTY_INT_ARRAY = new int[0];

   /**
    * Creates an instance for computing edit distance from {@code pattern}.
    * @param pattern string from which distances are measured
    * @return an instance for computing distances from the pattern
    */
   public static ModifiedBerghelRoachEditDistance
       getInstance(CharSequence pattern) {
     return getInstance(pattern.toString());
   }

   /**
    * Creates an instance for computing edit distance from {@code pattern}.
    * @param pattern string from which distances are measured
    * @return an instance for computing distances from the pattern
    */
   public static ModifiedBerghelRoachEditDistance
       getInstance(String pattern) {
     return new ModifiedBerghelRoachEditDistance(pattern.toCharArray());
   }

   /*
    * The current and two preceding sets of Ukkonen f(k,p) values for diagonals
    * around the main, computed by the main loop of {@code getDistance}.  These
    * arrays are retained between calls to save allocation costs.  (They are all
    * initialized to a real array so that we can indiscriminately use length
    * when ensuring/resizing.)
    */
   private int[] currentLeft = EMPTY_INT_ARRAY;

   private int[] currentRight = EMPTY_INT_ARRAY;

   private int[] lastLeft = EMPTY_INT_ARRAY;

   private int[] lastRight = EMPTY_INT_ARRAY;

   /**
    * The "pattern" string against which others are compared.
    */
   private final char[] pattern;

   private int[] priorLeft = EMPTY_INT_ARRAY;

   private int[] priorRight = EMPTY_INT_ARRAY;

   private ModifiedBerghelRoachEditDistance(char[] pattern) {
     this.pattern = pattern;
   }

   public ModifiedBerghelRoachEditDistance duplicate() {
     return new ModifiedBerghelRoachEditDistance(pattern);
   }

   public int getDistance(CharSequence targetSequence, int limit) {
     final int targetLength = targetSequence.length();

     /*
      * Compute the main diagonal number.
      * The final result lies on this diagonal.
      */
     final int main = pattern.length - targetLength;

     /*
      * Compute our initial distance candidate.
      * The result cannot be less than the difference in
      * string lengths, so we start there.
      */
     int distance = Math.abs(main);
     if (distance > limit) {
       /* More than we wanted.  Give up right away */
       return Integer.MAX_VALUE;
     }

     /* Snag a copy of the second string for efficiency */
     final char[] target = new char[targetLength];
     for (int i = 0; i < targetLength; i++) {
       target[i] = targetSequence.charAt(i);
     }

     /*
      * In the main loop below, the current{Right,Left} arrays record results
      * from the current outer loop pass.  The last{Right,Left} and
      * prior{Right,Left} arrays hold the results from the preceding two passes.
      * At the end of the outer loop, we shift them around (reusing the prior
      * array as the current for the next round, to avoid reallocating).
      * The Right reflects higher-numbered diagonals, Left lower-numbered.
      */

     /*
      * Fill in "prior" values for the first two passes through
      * the distance loop.  Note that we will execute only one side of
      * the main diagonal in these passes, so we only need
      * initialize one side of prior values.
      */

     if (main <= 0) {
       ensureCapacityRight(distance, false);
       for (int j = 0; j <= distance; j++) {
         lastRight[j] = distance - j - 1;  /* Make diagonal -k start in row k */
         priorRight[j] = -1;
       }
     } else {
       ensureCapacityLeft(distance, false);
       for (int j = 0; j <= distance; j++) {
         lastLeft[j] = -1;                 /* Make diagonal +k start in row 0 */
         priorLeft[j] = -1;
       }
     }

     /*
      * Keep track of even rounds.  Only those rounds consider new diagonals,
      * and thus only they require artificial "last" values below.
      */
     boolean even = true;

     /*
      * MAIN LOOP: try each successive possible distance until one succeeds.
      */
     while (true) {
       /*
        * Before calling computeRow(main, distance), we need to fill in
        * missing cache elements.  See the high-level description above.
        */

       /*
        * Higher-numbered diagonals
        */

       int offDiagonal = (distance - main) / 2;
       ensureCapacityRight(offDiagonal, true);

       if (even) {
         /* Higher diagonals start at row 0 */
         lastRight[offDiagonal] = -1;
       }

       int immediateRight = -1;
       for (; offDiagonal > 0; offDiagonal--) {
         currentRight[offDiagonal] = immediateRight = computeRow(
             (main + offDiagonal),
             (distance - offDiagonal),
             pattern,
             target,
             priorRight[offDiagonal - 1],
             lastRight[offDiagonal],
             immediateRight);
       }

       /*
        * Lower-numbered diagonals
        */

       offDiagonal = (distance + main) / 2;
       ensureCapacityLeft(offDiagonal, true);

       if (even) {
         /* Lower diagonals, fictitious values for f(-x-1,x) = x */
         lastLeft[offDiagonal] = (distance - main) / 2 - 1;
       }

       int immediateLeft = even ? -1 : (distance - main) / 2;

       for (; offDiagonal > 0; offDiagonal--) {
         currentLeft[offDiagonal] = immediateLeft = computeRow(
             (main - offDiagonal),
             (distance - offDiagonal),
             pattern, target,
             immediateLeft,
             lastLeft[offDiagonal],
             priorLeft[offDiagonal - 1]);
       }

       /*
        * We are done if the main diagonal has distance in the last row.
        */
       int mainRow = computeRow(main, distance, pattern, target,
                                immediateLeft, lastLeft[0], immediateRight);

       if ((mainRow == targetLength) || (++distance > limit) || (distance < 0)) {
         break;
       }

       /* The [0] element goes to both sides. */
       currentLeft[0] = currentRight[0] = mainRow;

       /* Rotate rows around for next round: current=>last=>prior (=>current) */
       int[] tmp = priorLeft;
       priorLeft = lastLeft;
       lastLeft = currentLeft;
       currentLeft = priorLeft;

       tmp = priorRight;
       priorRight = lastRight;
       lastRight = currentRight;
       currentRight = tmp;

       /* Update evenness, too */
       even = !even;
     }

     return distance;
   }

   /**
    * Computes the highest row in which the distance {@code p} appears
    * in diagonal {@code k} of the edit distance computation for
    * strings {@code a} and {@code b}.  The diagonal number is
    * represented by the difference in the indices for the two strings;
    * it can range from {@code -b.length()} through {@code a.length()}.
    *
    * More precisely, this computes the highest value x such that
    * <pre>
    *     p = edit-distance(a[0:(x+k)), b[0:x)).
    * </pre>
    *
    * This is the "f" function described by Ukkonen.
    *
    * The caller must assure that abs(k) &le; p, the only values for
    * which this is well-defined.
    *
    * The implementation depends on the cached results of prior
    * computeRow calls for diagonals k-1, k, and k+1 for distance p-1.
    * These must be supplied in {@code knownLeft}, {@code knownAbove},
    * and {@code knownRight}, respectively.
    * @param k diagonal number
    * @param p edit distance
    * @param a one string to be compared
    * @param b other string to be compared
    * @param knownLeft value of {@code computeRow(k-1, p-1, ...)}
    * @param knownAbove value of {@code computeRow(k, p-1, ...)}
    * @param knownRight value of {@code computeRow(k+1, p-1, ...)}
    */
   private int computeRow(int k, int p, char[] a, char[] b,
                          int knownLeft, int knownAbove, int knownRight) {
     assert (Math.abs(k) <= p);
     assert (p >= 0);

     /*
      * Compute our starting point using the recurrance.
      * That is, find the first row where the desired edit distance
      * appears in our diagonal.  This is at least one past
      * the highest row for
      */
     int t;
     if (p == 0) {
       t = 0;
     } else {
       /*
        * We look at the adjacent diagonals for the next lower edit distance.
        * We can start in the next row after the prior result from
        * our own diagonal (the "substitute" case), or the next diagonal
        * ("delete"), but only the same row as the prior result from
        * the prior diagonal ("insert").
        */
       t = Math.max(Math.max(knownAbove, knownRight) + 1, knownLeft);
     }

     /*
      * Look down our diagonal for matches to find the maximum
      * row with edit-distance p.
      */
     int tmax = Math.min(b.length, (a.length - k));

     while ((t < tmax) && b[t] == a[t + k]) {
       t++;
     }

     return t;
   }

   /*
    * Ensures that the Left arrays can be indexed through {@code index},
    * inclusively, resizing (and copying) as necessary.
    */
   private void ensureCapacityLeft(int index, boolean copy) {
     if (currentLeft.length <= index) {
       index++;
       priorLeft = resize(priorLeft, index, copy);
       lastLeft = resize(lastLeft, index, copy);
       currentLeft = resize(currentLeft, index, false);
     }
   }

   /*
    * Ensures that the Right arrays can be indexed through {@code index},
    * inclusively, resizing (and copying) as necessary.
    */
   private void ensureCapacityRight(int index, boolean copy) {
     if (currentRight.length <= index) {
       index++;
       priorRight = resize(priorRight, index, copy);
       lastRight = resize(lastRight, index, copy);
       currentRight = resize(currentRight, index, false);
     }
   }

   /* Resize an array, copying old contents if requested */
   private int[] resize(int[] array, int size, boolean copy) {
     int[] result = new int[size];
     if (copy) {
       System.arraycopy(array, 0, result, 0, array.length);
     }
     return result;
   }
 }
	/*
	* Copyright 2010 Google Inc.
	*
	* Licensed under the Apache License, Version 2.0 (the "License"); you may not
	* use this file except in compliance with the License. You may obtain a copy of
	* the License at
	*
	* http://www.apache.org/licenses/LICENSE-2.0
	*
	* Unless required by applicable law or agreed to in writing, software
	* distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
	* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
	* License for the specific language governing permissions and limitations under
	* the License.
	*/
	package com.google.gwt.dev.util.editdistance;

	/**
	* A modified version of a string edit distance described by Berghel and
	* Roach that uses only O(d) space and O(n*d) worst-case time, where n is
	* the pattern string length and d is the edit distance computed.
	* We achieve the space reduction by keeping only those sub-computations
	* required to compute edit distance, giving up the ability to
	* reconstruct the edit path.
	*/
	public class ModifiedBerghelRoachEditDistance implements GeneralEditDistance {
	/*
	* This is a modification of the original Berghel-Roach edit
	* distance (based on prior work by Ukkonen) described in
	* ACM Transactions on Information Systems, Vol. 14, No. 1,
	* January 1996, pages 94-106.
	*
	* I observed that only O(d) prior computations are required
	* to compute edit distance. Rather than keeping all prior
	* f(k,p) results in a matrix, we keep only the two "outer edges"
	* in the triangular computation pattern that will be used in
	* subsequent rounds. We cannot reconstruct the edit path,
	* but many applications do not require that; for them, this
	* modification uses less space (and empirically, slightly
	* less time).
	*
	* First, some history behind the algorithm necessary to understand
	* Berghel-Roach and our modification...
	*
	* The traditional algorithm for edit distance uses dynamic programming,
	* building a matrix of distances for substrings:
	* D[i,j] holds the distance for string1[0..i]=>string2[0..j].
	* The matrix is initially populated with the trivial values
	* D[0,j]=j and D[i,0]=i; and then expanded with the rule:
	* <pre>
	* D[i,j] = min( D[i-1,j]+1, // insertion
	* D[i,j-1]+1, // deletion
	* (D[i-1,j-1]
	* + (string1[i]==string2[j])
	* ? 0 // match
	* : 1 // substitution ) )
	* </pre>
	*
	* Ukknonen observed that each diagonal of the matrix must increase
	* by either 0 or 1 from row to row. If D[i,j] = p, then the
	* then the matching rule requires that D[i+x,j+x] = p for all
	* x where string1[i..i+x) matches string2[j..j+j+x). Ukkonen
	* defined a function f(k,p) as the highest row number in which p
	* appears on the k-th diagonal (those D[i,j] where k=(i-j), noting
	* that k may be negative). The final result of the edit
	* distance is the D[n,m] cell, on the (n-m) diagonal; it is
	* the value of p for which f(n-m, p) = m. The function f can
	* also be computed dynamically, according to a simple recursion:
	* <pre>
	* f(k,p) {
	* contains_p = max(f(k-1,p-1), f(k,p-1)+1, f(k+1,p-1)+1)
	* while (string1[contains_p] == string2[contains_p + k])
	* contains_p++;
	* return contains_p;
	* }
	* </pre>
	* The max() expression finds a row where the k-th diagonal must
	* contain p by virtue of an edit from the prior, same, or following
	* diagonal (corresponding to an insert, substitute, or delete);
	* we need not consider more distant diagonals because row-to-row
	* and column-to-column changes are at most +/- 1.
	*
	* The original Ukkonen algorithm computed f(k,p) roughly as
	* follows:
	* <pre>
	* for (p = 0; ; p++) {
	* compute f(k,p) for all valid k
	* if (f(n-m, p) == m) return p;
	* }
	* </pre>
	*
	* Berghel and Roach observed that many values of f(k,p) are
	* computed unnecessarily, and reorganized the computation into
	* a just-in-time sequence. In each iteration, we are primarily
	* interested in the terminating value f(main,p), where main=(n-m)
	* is the main diagonal. To compute that we need f(x,p-1) for
	* three values of x: main-1, main, and main+1. Those depend on
	* values for p-2, and so forth. We will already have computed
	* f(main,p-1) in the prior round, and thus f(main-1,p-2) and
	* f(main+1,p-2), and so forth. The only new values we need to compute
	* are on the edges: f(main-i,p-i) and f(main+i,p-i). Noting that
	* f(k,p) is only meaningful when abs(k) is no greater than p,
	* one of the Berghel-Roach reviewers noted that we can compute
	* the bounds for i:
	* <pre>
	* (main+i &le p-i) implies (i ≤ (p-main)/2)
	* </pre>
	* (where main+i is limited on the positive side) and similarly
	* <pre>
	* (-(main-i) &le p-i) implies (i ≤ (p+main)/2).
	* </pre>
	* (where main-i is limited on the negative side).
	*
	* This reduces the computation sequence to
	* <pre>
	* for (i = (p-main)/2; i > 0; i--) compute f(main+i,p-i);
	* for (i = (p+main)/2; i > 0; i--) compute f(main-i,p-i);
	* if (f(main, p) == m) return p;
	* </pre>
	*
	* The original Berghel-Roach algorithm recorded prior values
	* of f(k,p) in a matrix, using O(distance^2) space, enabling
	* reconstruction of the edit path, but if all we want is the
	* edit distance, we only need to keep O(distance) prior computations.
	*
	* The requisite prior k-1, k, and k+1 values are conveniently
	* computed in the current round and the two preceding it.
	* For example, on the higher-diagonal side, we compute:
	* <pre>
	* current[i] = f(main+i, p-i)
	* </pre>
	* We keep the two prior rounds of results, where p was one and two
	* smaller. So, from the preceidng round
	* <pre>
	* last[i] = f(main+i, (p-1)-i)
	* </pre>
	* and from the prior round, but one position back:
	* <pre>
	* prior[i-1] = f(main+(i-1), (p-2)-(i-1))
	* </pre>
	* In the current round, one iteration earlier:
	* <pre>
	* current[i+1] = f(main+(i+1), p-(i+1))
	* </pre>
	* Note that the distance in all of these evaluates to p-i-1,
	* and the diagonals are (main+i) and its neighbors... just
	* what we need. The lower-diagonal side behaves similarly.
	*
	* We need to materialize values that are not computed in prior
	* rounds, for either of two reasons: <ul>
	* <li> Initially, we have no prior rounds, so we need to fill
	* all of the "last" and "prior" values for use in the
	* first round. The first round uses only on one side
	* of the main diagonal or the other.
	* <li> In every other round, we compute one more diagonal than before.
	* </ul>
	* In all of these cases, the missing f(k,p) values are for abs(k) > p,
	* where a real value of f(k,p) is undefined. [The original Berghel-Roach
	* algorithm prefills its F matrix with these values, but we fill
	* them as we go, as needed.] We define
	* <pre>
	* f(-p-1,p) = p, so that we start diagonal -p with row p,
	* f(p+1,p) = -1, so that we start diagonal p with row 0.
	* </pre>
	* (We also allow f(p+2,p)=f(-p-2,p)=-1, causing those values to
	* have no effect in the starting row computation.]
	*
	* We only expand the set of diagonals visited every other round,
	* when (p-main) or (p+main) is even. We keep track of even/oddness
	* to save some arithmetic. The first round is always even, as p=abs(main).
	* Note that we rename the "f" function to "computeRow" to be Googley.
	*/

	private static final int [] EMPTY_INT_ARRAY = new int[0];

	/**
	* Creates an instance for computing edit distance from {@code pattern}.
	* @param pattern string from which distances are measured
	* @return an instance for computing distances from the pattern
	*/
	public static ModifiedBerghelRoachEditDistance
	getInstance(CharSequence pattern) {
	return getInstance(pattern.toString());
	}

	/**
	* Creates an instance for computing edit distance from {@code pattern}.
	* @param pattern string from which distances are measured
	* @return an instance for computing distances from the pattern
	*/
	public static ModifiedBerghelRoachEditDistance
	getInstance(String pattern) {
	return new ModifiedBerghelRoachEditDistance(pattern.toCharArray());
	}

	/*
	* The current and two preceding sets of Ukkonen f(k,p) values for diagonals
	* around the main, computed by the main loop of {@code getDistance}. These
	* arrays are retained between calls to save allocation costs. (They are all
	* initialized to a real array so that we can indiscriminately use length
	* when ensuring/resizing.)
	*/
	private int[] currentLeft = EMPTY_INT_ARRAY;

	private int[] currentRight = EMPTY_INT_ARRAY;

	private int[] lastLeft = EMPTY_INT_ARRAY;

	private int[] lastRight = EMPTY_INT_ARRAY;

	/**
	* The "pattern" string against which others are compared.
	*/
	private final char[] pattern;

	private int[] priorLeft = EMPTY_INT_ARRAY;

	private int[] priorRight = EMPTY_INT_ARRAY;

	private ModifiedBerghelRoachEditDistance(char[] pattern) {
	this.pattern = pattern;
	}

	public ModifiedBerghelRoachEditDistance duplicate() {
	return new ModifiedBerghelRoachEditDistance(pattern);
	}

	public int getDistance(CharSequence targetSequence, int limit) {
	final int targetLength = targetSequence.length();

	/*
	* Compute the main diagonal number.
	* The final result lies on this diagonal.
	*/
	final int main = pattern.length - targetLength;

	/*
	* Compute our initial distance candidate.
	* The result cannot be less than the difference in
	* string lengths, so we start there.
	*/
	int distance = Math.abs(main);
	if (distance > limit) {
	/* More than we wanted. Give up right away */
	return Integer.MAX_VALUE;
	}

	/* Snag a copy of the second string for efficiency */
	final char[] target = new char[targetLength];
	for (int i = 0; i < targetLength; i++) {
	target[i] = targetSequence.charAt(i);
	}

	/*
	* In the main loop below, the current{Right,Left} arrays record results
	* from the current outer loop pass. The last{Right,Left} and
	* prior{Right,Left} arrays hold the results from the preceding two passes.
	* At the end of the outer loop, we shift them around (reusing the prior
	* array as the current for the next round, to avoid reallocating).
	* The Right reflects higher-numbered diagonals, Left lower-numbered.
	*/

	/*
	* Fill in "prior" values for the first two passes through
	* the distance loop. Note that we will execute only one side of
	* the main diagonal in these passes, so we only need
	* initialize one side of prior values.
	*/

	if (main <= 0) {
	ensureCapacityRight(distance, false);
	for (int j = 0; j <= distance; j++) {
	lastRight[j] = distance - j - 1; /* Make diagonal -k start in row k */
	priorRight[j] = -1;
	}
	} else {
	ensureCapacityLeft(distance, false);
	for (int j = 0; j <= distance; j++) {
	lastLeft[j] = -1; /* Make diagonal +k start in row 0 */
	priorLeft[j] = -1;
	}
	}

	/*
	* Keep track of even rounds. Only those rounds consider new diagonals,
	* and thus only they require artificial "last" values below.
	*/
	boolean even = true;

	/*
	* MAIN LOOP: try each successive possible distance until one succeeds.
	*/
	while (true) {
	/*
	* Before calling computeRow(main, distance), we need to fill in
	* missing cache elements. See the high-level description above.
	*/

	/*
	* Higher-numbered diagonals
	*/

	int offDiagonal = (distance - main) / 2;
	ensureCapacityRight(offDiagonal, true);

	if (even) {
	/* Higher diagonals start at row 0 */
	lastRight[offDiagonal] = -1;
	}

	int immediateRight = -1;
	for (; offDiagonal > 0; offDiagonal--) {
	currentRight[offDiagonal] = immediateRight = computeRow(
	(main + offDiagonal),
	(distance - offDiagonal),
	pattern,
	target,
	priorRight[offDiagonal - 1],
	lastRight[offDiagonal],
	immediateRight);
	}

	/*
	* Lower-numbered diagonals
	*/

	offDiagonal = (distance + main) / 2;
	ensureCapacityLeft(offDiagonal, true);

	if (even) {
	/* Lower diagonals, fictitious values for f(-x-1,x) = x */
	lastLeft[offDiagonal] = (distance - main) / 2 - 1;
	}

	int immediateLeft = even ? -1 : (distance - main) / 2;

	for (; offDiagonal > 0; offDiagonal--) {
	currentLeft[offDiagonal] = immediateLeft = computeRow(
	(main - offDiagonal),
	(distance - offDiagonal),
	pattern, target,
	immediateLeft,
	lastLeft[offDiagonal],
	priorLeft[offDiagonal - 1]);
	}

	/*
	* We are done if the main diagonal has distance in the last row.
	*/
	int mainRow = computeRow(main, distance, pattern, target,
	immediateLeft, lastLeft[0], immediateRight);

	if ((mainRow == targetLength) \|\| (++distance > limit) \|\| (distance < 0)) {
	break;
	}

	/* The [0] element goes to both sides. */
	currentLeft[0] = currentRight[0] = mainRow;

	/* Rotate rows around for next round: current=>last=>prior (=>current) */
	int[] tmp = priorLeft;
	priorLeft = lastLeft;
	lastLeft = currentLeft;
	currentLeft = priorLeft;

	tmp = priorRight;
	priorRight = lastRight;
	lastRight = currentRight;
	currentRight = tmp;

	/* Update evenness, too */
	even = !even;
	}

	return distance;
	}

	/**
	* Computes the highest row in which the distance {@code p} appears
	* in diagonal {@code k} of the edit distance computation for
	* strings {@code a} and {@code b}. The diagonal number is
	* represented by the difference in the indices for the two strings;
	* it can range from {@code -b.length()} through {@code a.length()}.
	*
	* More precisely, this computes the highest value x such that
	* <pre>
	* p = edit-distance(a[0:(x+k)), b[0:x)).
	* </pre>
	*
	* This is the "f" function described by Ukkonen.
	*
	* The caller must assure that abs(k) ≤ p, the only values for
	* which this is well-defined.
	*
	* The implementation depends on the cached results of prior
	* computeRow calls for diagonals k-1, k, and k+1 for distance p-1.
	* These must be supplied in {@code knownLeft}, {@code knownAbove},
	* and {@code knownRight}, respectively.
	* @param k diagonal number
	* @param p edit distance
	* @param a one string to be compared
	* @param b other string to be compared
	* @param knownLeft value of {@code computeRow(k-1, p-1, ...)}
	* @param knownAbove value of {@code computeRow(k, p-1, ...)}
	* @param knownRight value of {@code computeRow(k+1, p-1, ...)}
	*/
	private int computeRow(int k, int p, char[] a, char[] b,
	int knownLeft, int knownAbove, int knownRight) {
	assert (Math.abs(k) <= p);
	assert (p >= 0);

	/*
	* Compute our starting point using the recurrance.
	* That is, find the first row where the desired edit distance
	* appears in our diagonal. This is at least one past
	* the highest row for
	*/
	int t;
	if (p == 0) {
	t = 0;
	} else {
	/*
	* We look at the adjacent diagonals for the next lower edit distance.
	* We can start in the next row after the prior result from
	* our own diagonal (the "substitute" case), or the next diagonal
	* ("delete"), but only the same row as the prior result from
	* the prior diagonal ("insert").
	*/
	t = Math.max(Math.max(knownAbove, knownRight) + 1, knownLeft);
	}

	/*
	* Look down our diagonal for matches to find the maximum
	* row with edit-distance p.
	*/
	int tmax = Math.min(b.length, (a.length - k));

	while ((t < tmax) && b[t] == a[t + k]) {
	t++;
	}

	return t;
	}

	/*
	* Ensures that the Left arrays can be indexed through {@code index},
	* inclusively, resizing (and copying) as necessary.
	*/
	private void ensureCapacityLeft(int index, boolean copy) {
	if (currentLeft.length <= index) {
	index++;
	priorLeft = resize(priorLeft, index, copy);
	lastLeft = resize(lastLeft, index, copy);
	currentLeft = resize(currentLeft, index, false);
	}
	}

	/*
	* Ensures that the Right arrays can be indexed through {@code index},
	* inclusively, resizing (and copying) as necessary.
	*/
	private void ensureCapacityRight(int index, boolean copy) {
	if (currentRight.length <= index) {
	index++;
	priorRight = resize(priorRight, index, copy);
	lastRight = resize(lastRight, index, copy);
	currentRight = resize(currentRight, index, false);
	}
	}

	/* Resize an array, copying old contents if requested */
	private int[] resize(int[] array, int size, boolean copy) {
	int[] result = new int[size];
	if (copy) {
	System.arraycopy(array, 0, result, 0, array.length);
	}
	return result;
	}
	}