blob: db3f45aaf3d4b3f9df631132da9632cab17f0ef2 [file] [log] [blame]
/*
* Copyright 2010 Google Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License"); you may not
* use this file except in compliance with the License. You may obtain a copy of
* the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
* License for the specific language governing permissions and limitations under
* the License.
*/
package com.google.gwt.dev.util.editdistance;
/**
* An engine definition for computing string edit distances, and
* implementation generators.
*
* Generally, edit distance is the minimum number of edit operations required
* to transform one string into another. Being a minimum transform, it
* forms a metric space (one that is non-negative, symmetric, and obeys
* triangle inequality).
*
* The most common form is Levenshtein distance, where the set of operations
* consists of:
* deleting one character;
* inserting one character; or,
* substituting one character for another,
* each having a "cost" of 1.
*
* Another common form is Damerau-Levenshtein distance, which adds
* transpose two adjacent characters
* to the set of operations. Note that many implementations of
* Damerau distance do not account for insertion after transposition;
* this "restricted" Damerau distance does not obey triangle inequality,
* but is substantially easier to compute than the unrestricted metric.
*
* Another variation is to apply differential costs for the various
* operations. For example, substitutions could be weighted by
* character type (e.g., vowel<=>vowel being a cheaper edit than
* vowel<=>consonant).
*
* This class defines an engine for computing edit distance from
* a fixed ("pattern") string to other ("target") strings. The pattern
* is fixed at instantiation time; this allows an implementation to
* perform precomputations to make individual target computations faster.
*
* To allow further performance optimization, the GeneralEditDistance
* contract does not require thread-safety. A <tt>duplicate</tt> method is
* defined to allow additional equivalent instances to be created
* for separate threads; it is essentially a type-knowledgeable <tt>clone</tt>.
*/
public interface GeneralEditDistance {
/**
* Creates a duplicate (clone) of this distance engine, typically
* in order to use it in another thread (as the GeneralEditDistance
* contract does not specify thread safety).
*
* This method differs from Object.clone() in two ways:
* it is typed (to GeneralEditDistance); and, it must not throw
* an exception.
*
* @return a clone of this instance.
*/
GeneralEditDistance duplicate();
/**
* Computes edit distance from the pattern string (used to
* construct this GeneralEditDistance instance) to a given target
* string, bounded by a limit of interest.
*
* The specified limit must be non-negative. If the actual distance
* exceeds the limit, this method may return *any* value above the limit;
* it need not be the actual distance. The behavior of
* illegal limit values is undefined; an implementation may
* return an incorrect result or raise a runtime exception.
*
* @param target string for which distance is to be compared
* against a predefined pattern
* @param limit maximum distance for which exact result is
* required (and beyond which any over-limit result
* can be returned)
* @return edit distance from the predefined pattern to the
* target (or any value above limit, if the actual distance
* exceeds the prescribed limit)
*/
int getDistance(CharSequence target, int limit);
}