Skip to content

[Rule] ClosestString to ILP #1034

@zhangjieqingithub

Description

@zhangjieqingithub

Source

ClosestString

Source model issue: #1032

Target

ILP/i32

Motivation

This rule gives ClosestString immediate solver connectivity through the existing integer-linear-programming hub.

The formulation is compact: choose one alphabet symbol at each center position and minimize a radius variable that upper-bounds the Hamming distance to every input string.

Reference

Ming Li, Bin Ma, and Lusheng Wang, "On the closest string and substring problems," Journal of the ACM 49(2):157-171, 2002. https://doi.org/10.1145/506147.506150

Reduction Algorithm

Let the source instance have:

  • alphabet size q,
  • n input strings s_1, ..., s_n,
  • common string length m.

Construct an ILP/i32 instance as follows.

  1. For every center position j in {0,...,m-1} and alphabet symbol a in {0,...,q-1}, create an integer variable x_{j,a}.
    It will be forced to be binary and means: the center has symbol a at position j.

  2. Create one nonnegative integer radius variable R.

  3. For each position j, add the assignment constraint:
    sum_a x_{j,a} = 1.
    Since all ILP variables are nonnegative integers, this makes the x_{j,a} variables binary.

  4. For each input string s_i, add the radius constraint:
    R + sum_{j=0}^{m-1} x_{j, s_i[j]} >= m.

    This is equivalent to:
    R >= m - sum_j x_{j, s_i[j]},
    and the right-hand side is exactly the Hamming distance from the chosen center to s_i.

  5. Minimize R.

  6. Recover the center string by, for each position j, choosing the unique symbol a with x_{j,a} = 1.

Correctness Argument

The assignment constraints ensure that every feasible ILP solution encodes exactly one center string c in Sigma^m.

For a fixed input string s_i, the quantity sum_j x_{j, s_i[j]} counts the number of positions where c matches s_i. Therefore m - sum_j x_{j, s_i[j]} is exactly d_H(c, s_i).

The constraint R + sum_j x_{j, s_i[j]} >= m forces R >= d_H(c, s_i) for every input string. Minimizing R therefore minimizes the maximum Hamming distance from the center to all input strings.

Thus optimal ILP solutions are in one-to-one correspondence with optimal closest strings, ignoring the auxiliary radius variable.

Size Overhead

Let:

  • q = alphabet_size,
  • m = string_length,
  • n = num_strings.
Target metric Formula
num_vars q * m + 1
num_constraints m + n

Validation Method

Use closed-loop validation on small instances:

  • enumerate all center strings directly and compute the optimum radius,
  • solve the ILP instance,
  • extract the center string from the x variables,
  • verify that the extracted center has the same optimum radius.

Include examples where:

  • several centers tie for optimum,
  • the optimum radius is nonzero,
  • no input string itself is uniquely optimal,
  • the alphabet has more than two symbols.

Example

Use the source instance from #1032:

  • alphabet {0,1},
  • strings 000, 011, 101, 110.

Here q = 2, m = 3, n = 4.

The ILP has:

  • 2 * 3 = 6 center-symbol variables,
  • one radius variable R,
  • 3 assignment constraints,
  • 4 radius constraints.

The center 000 is represented by:

  • x_{0,0} = 1, x_{0,1} = 0,
  • x_{1,0} = 1, x_{1,1} = 0,
  • x_{2,0} = 1, x_{2,1} = 0.

For input string 011, the radius constraint is:
R + x_{0,0} + x_{1,1} + x_{2,1} >= 3.

Substituting the center 000 gives:
R + 1 + 0 + 0 >= 3,
so R >= 2.

The other nonzero-distance strings similarly force R >= 2, and setting R = 2 is feasible. Therefore the ILP optimum is 2, matching the ClosestString optimum.

BibTeX

@article{LiMaWang2002ClosestStringSubstring,
  author  = {Ming Li and Bin Ma and Lusheng Wang},
  title   = {On the Closest String and Substring Problems},
  journal = {Journal of the ACM},
  volume  = {49},
  number  = {2},
  pages   = {157--171},
  year    = {2002},
  doi     = {10.1145/506147.506150},
  url     = {https://doi.org/10.1145/506147.506150}
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    ruleA new reduction rule to be added.

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions