Package translate :: Package search :: Module lshtein :: Class LevenshteinComparer
[hide private]
[frames] | no frames]

Class LevenshteinComparer

source code

Instance Methods [hide private]
 
__init__(self, max_len=200) source code
 
similarity(self, a, b, stoppercentage=40) source code
 
similarity_real(self, a, b, stoppercentage=40)
Returns the similarity between a and b based on Levenshtein distance.
source code
Method Details [hide private]

similarity_real(self, a, b, stoppercentage=40)

source code 
Returns the similarity between a and b based on Levenshtein distance. It
can stop prematurely as soon as it sees that a and b will be no simmilar than
the percentage specified in stoppercentage.

The Levenshtein distance is calculated, but the following should be noted:
    * Only the first MAX_LEN characters are considered. Long strings differing
      at the end will therefore seem to match better than they should. See the
      use of the variable penalty to lessen the effect of this.
    * Strings with widely different lengths give the opportunity for shortcut.
      This is by definition of the Levenshtein distance: the distance will be 
      at least as much as the difference in string length.
    * Calculation is stopped as soon as a similarity of stoppercentage becomes
      unattainable. See the use of the variable stopvalue.
    * Implementation uses memory O(min(len(a), len(b))
    * Excecution time is O(len(a)*len(b))