lcs.3valkey - Man Page

Finds the longest common substring.

Synopsis

LCS key1 key2 [LEN] [IDX] [MINMATCHLEN min-match-len] [WITHMATCHLEN]

Description

The LCS command implements the longest common subsequence algorithm. Note that this is different than the longest common string algorithm, since matching characters in the string does not need to be contiguous.

For instance the LCS between “foo” and “fao” is “fo”, since scanning the two strings from left to right, the longest common set of characters is composed of the first “f” and then the “o”.

LCS is very useful in order to evaluate how similar two strings are. Strings can represent many things. For instance if two strings are DNA sequences, the LCS will provide a measure of similarity between the two DNA sequences. If the strings represent some text edited by some user, the LCS could represent how different the new text is compared to the old one, and so forth.

Note that this algorithm runs in O(N*M) time, where N is the length of the first string and M is the length of the second string. So either spin a different Valkey instance in order to run this algorithm, or make sure to run it against very small strings.

> MSET key1 ohmytext key2 mynewtext
OK
> LCS key1 key2
"mytext"

Sometimes we need just the length of the match:

> LCS key1 key2 LEN
(integer) 6

However what is often very useful, is to know the match position in each strings:

> LCS key1 key2 IDX
1) "matches"
2) 1) 1) 1) (integer) 4
         2) (integer) 7
      2) 1) (integer) 5
         2) (integer) 8
   2) 1) 1) (integer) 2
         2) (integer) 3
      2) 1) (integer) 0
         2) (integer) 1
3) "len"
4) (integer) 6

Matches are produced from the last one to the first one, since this is how the algorithm works, and it more efficient to emit things in the same order. The above array means that the first match (second element of the array) is between positions 2-3 of the first string and 0-1 of the second. Then there is another match between 4-7 and 5-8.

To restrict the list of matches to the ones of a given minimal length:

> LCS key1 key2 IDX MINMATCHLEN 4
1) "matches"
2) 1) 1) 1) (integer) 4
         2) (integer) 7
      2) 1) (integer) 5
         2) (integer) 8
3) "len"
4) (integer) 6

Finally to also have the match len:

> LCS key1 key2 IDX MINMATCHLEN 4 WITHMATCHLEN
1) "matches"
2) 1) 1) 1) (integer) 4
         2) (integer) 7
      2) 1) (integer) 5
         2) (integer) 8
      3) (integer) 4
3) "len"
4) (integer) 6

Reply

Resp2

One of the following:

  • valkey-protocol(7) Bulk string reply: the longest common subsequence.
  • valkey-protocol(7) Integer reply: the length of the longest common subsequence when LEN is given.
  • valkey-protocol(7) Array reply: an array with the LCS length and all the ranges in both the strings when IDX is given.

Resp3

One of the following:

  • valkey-protocol(7) Bulk string reply: the longest common subsequence.
  • valkey-protocol(7) Integer reply: the length of the longest common subsequence when LEN is given.
  • valkey-protocol(7) Map reply: a map with the LCS length and all the ranges in both the strings when IDX is given.

Complexity

O(N*M) where N and M are the lengths of s1 and s2, respectively

Acl Categories

@read @slow @string

History

See Also

append(3valkey), decr(3valkey), decrby(3valkey), get(3valkey), getdel(3valkey), getex(3valkey), getrange(3valkey), incr(3valkey), incrby(3valkey), incrbyfloat(3valkey), mget(3valkey), mset(3valkey), msetnx(3valkey), set(3valkey), setrange(3valkey), strlen(3valkey)

Info

2024-09-23 8.0.0 Valkey Command Manual