Source
ClosestString
Source model issue: #1032
Target
ILP/i32
Motivation
This rule gives ClosestString immediate solver connectivity through the existing integer-linear-programming hub.
The formulation is compact: choose one alphabet symbol at each center position and minimize a radius variable that upper-bounds the Hamming distance to every input string.
Reference
Ming Li, Bin Ma, and Lusheng Wang, "On the closest string and substring problems," Journal of the ACM 49(2):157-171, 2002. https://doi.org/10.1145/506147.506150
Reduction Algorithm
Let the source instance have:
- alphabet size
q,
n input strings s_1, ..., s_n,
- common string length
m.
Construct an ILP/i32 instance as follows.
-
For every center position j in {0,...,m-1} and alphabet symbol a in {0,...,q-1}, create an integer variable x_{j,a}.
It will be forced to be binary and means: the center has symbol a at position j.
-
Create one nonnegative integer radius variable R.
-
For each position j, add the assignment constraint:
sum_a x_{j,a} = 1.
Since all ILP variables are nonnegative integers, this makes the x_{j,a} variables binary.
-
For each input string s_i, add the radius constraint:
R + sum_{j=0}^{m-1} x_{j, s_i[j]} >= m.
This is equivalent to:
R >= m - sum_j x_{j, s_i[j]},
and the right-hand side is exactly the Hamming distance from the chosen center to s_i.
-
Minimize R.
-
Recover the center string by, for each position j, choosing the unique symbol a with x_{j,a} = 1.
Correctness Argument
The assignment constraints ensure that every feasible ILP solution encodes exactly one center string c in Sigma^m.
For a fixed input string s_i, the quantity sum_j x_{j, s_i[j]} counts the number of positions where c matches s_i. Therefore m - sum_j x_{j, s_i[j]} is exactly d_H(c, s_i).
The constraint R + sum_j x_{j, s_i[j]} >= m forces R >= d_H(c, s_i) for every input string. Minimizing R therefore minimizes the maximum Hamming distance from the center to all input strings.
Thus optimal ILP solutions are in one-to-one correspondence with optimal closest strings, ignoring the auxiliary radius variable.
Size Overhead
Let:
q = alphabet_size,
m = string_length,
n = num_strings.
| Target metric |
Formula |
num_vars |
q * m + 1 |
num_constraints |
m + n |
Validation Method
Use closed-loop validation on small instances:
- enumerate all center strings directly and compute the optimum radius,
- solve the ILP instance,
- extract the center string from the
x variables,
- verify that the extracted center has the same optimum radius.
Include examples where:
- several centers tie for optimum,
- the optimum radius is nonzero,
- no input string itself is uniquely optimal,
- the alphabet has more than two symbols.
Example
Use the source instance from #1032:
- alphabet
{0,1},
- strings
000, 011, 101, 110.
Here q = 2, m = 3, n = 4.
The ILP has:
2 * 3 = 6 center-symbol variables,
- one radius variable
R,
3 assignment constraints,
4 radius constraints.
The center 000 is represented by:
x_{0,0} = 1, x_{0,1} = 0,
x_{1,0} = 1, x_{1,1} = 0,
x_{2,0} = 1, x_{2,1} = 0.
For input string 011, the radius constraint is:
R + x_{0,0} + x_{1,1} + x_{2,1} >= 3.
Substituting the center 000 gives:
R + 1 + 0 + 0 >= 3,
so R >= 2.
The other nonzero-distance strings similarly force R >= 2, and setting R = 2 is feasible. Therefore the ILP optimum is 2, matching the ClosestString optimum.
BibTeX
@article{LiMaWang2002ClosestStringSubstring,
author = {Ming Li and Bin Ma and Lusheng Wang},
title = {On the Closest String and Substring Problems},
journal = {Journal of the ACM},
volume = {49},
number = {2},
pages = {157--171},
year = {2002},
doi = {10.1145/506147.506150},
url = {https://doi.org/10.1145/506147.506150}
}
Source
ClosestString
Source model issue: #1032
Target
ILP/i32
Motivation
This rule gives
ClosestStringimmediate solver connectivity through the existing integer-linear-programming hub.The formulation is compact: choose one alphabet symbol at each center position and minimize a radius variable that upper-bounds the Hamming distance to every input string.
Reference
Ming Li, Bin Ma, and Lusheng Wang, "On the closest string and substring problems," Journal of the ACM 49(2):157-171, 2002. https://doi.org/10.1145/506147.506150
Reduction Algorithm
Let the source instance have:
q,ninput stringss_1, ..., s_n,m.Construct an
ILP/i32instance as follows.For every center position
j in {0,...,m-1}and alphabet symbola in {0,...,q-1}, create an integer variablex_{j,a}.It will be forced to be binary and means: the center has symbol
aat positionj.Create one nonnegative integer radius variable
R.For each position
j, add the assignment constraint:sum_a x_{j,a} = 1.Since all ILP variables are nonnegative integers, this makes the
x_{j,a}variables binary.For each input string
s_i, add the radius constraint:R + sum_{j=0}^{m-1} x_{j, s_i[j]} >= m.This is equivalent to:
R >= m - sum_j x_{j, s_i[j]},and the right-hand side is exactly the Hamming distance from the chosen center to
s_i.Minimize
R.Recover the center string by, for each position
j, choosing the unique symbolawithx_{j,a} = 1.Correctness Argument
The assignment constraints ensure that every feasible ILP solution encodes exactly one center string
c in Sigma^m.For a fixed input string
s_i, the quantitysum_j x_{j, s_i[j]}counts the number of positions wherecmatchess_i. Thereforem - sum_j x_{j, s_i[j]}is exactlyd_H(c, s_i).The constraint
R + sum_j x_{j, s_i[j]} >= mforcesR >= d_H(c, s_i)for every input string. MinimizingRtherefore minimizes the maximum Hamming distance from the center to all input strings.Thus optimal ILP solutions are in one-to-one correspondence with optimal closest strings, ignoring the auxiliary radius variable.
Size Overhead
Let:
q = alphabet_size,m = string_length,n = num_strings.num_varsq * m + 1num_constraintsm + nValidation Method
Use closed-loop validation on small instances:
xvariables,Include examples where:
Example
Use the source instance from #1032:
{0,1},000,011,101,110.Here
q = 2,m = 3,n = 4.The ILP has:
2 * 3 = 6center-symbol variables,R,3assignment constraints,4radius constraints.The center
000is represented by:x_{0,0} = 1,x_{0,1} = 0,x_{1,0} = 1,x_{1,1} = 0,x_{2,0} = 1,x_{2,1} = 0.For input string
011, the radius constraint is:R + x_{0,0} + x_{1,1} + x_{2,1} >= 3.Substituting the center
000gives:R + 1 + 0 + 0 >= 3,so
R >= 2.The other nonzero-distance strings similarly force
R >= 2, and settingR = 2is feasible. Therefore the ILP optimum is2, matching the ClosestString optimum.BibTeX