| /* |
| * Copyright 2010 Google Inc. |
| * |
| * Licensed under the Apache License, Version 2.0 (the "License"); you may not |
| * use this file except in compliance with the License. You may obtain a copy of |
| * the License at |
| * |
| * http://www.apache.org/licenses/LICENSE-2.0 |
| * |
| * Unless required by applicable law or agreed to in writing, software |
| * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT |
| * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the |
| * License for the specific language governing permissions and limitations under |
| * the License. |
| */ |
| package com.google.gwt.dev.util.editdistance; |
| |
| /** |
| * An engine definition for computing string edit distances, and |
| * implementation generators. |
| * |
| * Generally, edit distance is the minimum number of edit operations required |
| * to transform one string into another. Being a minimum transform, it |
| * forms a metric space (one that is non-negative, symmetric, and obeys |
| * triangle inequality). |
| * |
| * The most common form is Levenshtein distance, where the set of operations |
| * consists of: |
| * deleting one character; |
| * inserting one character; or, |
| * substituting one character for another, |
| * each having a "cost" of 1. |
| * |
| * Another common form is Damerau-Levenshtein distance, which adds |
| * transpose two adjacent characters |
| * to the set of operations. Note that many implementations of |
| * Damerau distance do not account for insertion after transposition; |
| * this "restricted" Damerau distance does not obey triangle inequality, |
| * but is substantially easier to compute than the unrestricted metric. |
| * |
| * Another variation is to apply differential costs for the various |
| * operations. For example, substitutions could be weighted by |
| * character type (e.g., vowel<=>vowel being a cheaper edit than |
| * vowel<=>consonant). |
| * |
| * This class defines an engine for computing edit distance from |
| * a fixed ("pattern") string to other ("target") strings. The pattern |
| * is fixed at instantiation time; this allows an implementation to |
| * perform precomputations to make individual target computations faster. |
| * |
| * To allow further performance optimization, the GeneralEditDistance |
| * contract does not require thread-safety. A <tt>duplicate</tt> method is |
| * defined to allow additional equivalent instances to be created |
| * for separate threads; it is essentially a type-knowledgeable <tt>clone</tt>. |
| */ |
| public interface GeneralEditDistance { |
| /** |
| * Creates a duplicate (clone) of this distance engine, typically |
| * in order to use it in another thread (as the GeneralEditDistance |
| * contract does not specify thread safety). |
| * |
| * This method differs from Object.clone() in two ways: |
| * it is typed (to GeneralEditDistance); and, it must not throw |
| * an exception. |
| * |
| * @return a clone of this instance. |
| */ |
| GeneralEditDistance duplicate(); |
| |
| /** |
| * Computes edit distance from the pattern string (used to |
| * construct this GeneralEditDistance instance) to a given target |
| * string, bounded by a limit of interest. |
| * |
| * The specified limit must be non-negative. If the actual distance |
| * exceeds the limit, this method may return *any* value above the limit; |
| * it need not be the actual distance. The behavior of |
| * illegal limit values is undefined; an implementation may |
| * return an incorrect result or raise a runtime exception. |
| * |
| * @param target string for which distance is to be compared |
| * against a predefined pattern |
| * @param limit maximum distance for which exact result is |
| * required (and beyond which any over-limit result |
| * can be returned) |
| * @return edit distance from the predefined pattern to the |
| * target (or any value above limit, if the actual distance |
| * exceeds the prescribed limit) |
| */ |
| int getDistance(CharSequence target, int limit); |
| } |