Learning to Remove Cuts in Integer Linear Programming

Pol Puigdemont    Stratis Skoulakis    Grigorios G Chrysos    Volkan Cevher
Abstract

Cutting plane methods are a fundamental approach for solving integer linear programs (ILPs). In each iteration of such methods, additional linear constraints (cuts) are introduced to the constraint set with the aim of excluding the previous fractional optimal solution while not affecting the optimal integer solution. In this work, we explore a novel approach within cutting plane methods: instead of only adding new cuts, we also consider the removal of previous cuts introduced at any of the preceding iterations of the method under a learnable parametric criteria. We demonstrate that in fundamental combinatorial optimization settings such cut removal policies can lead to significant improvements over both human-based and machine learning-guided cut addition policies even when implemented with simple models.

Machine Learning, ICML

1 Introduction

Integer linear programming (ILP) has numerous applications in engineering (Miller et al., 1960), operational research (Eiselt & Sandblom, 2000), and finance (Konno & Yamamoto, 2008). In fact, any combinatorial optimization problem can be formulated as an ILP (Conforti et al., 2014). ILP is a constrained optimization formulation in which the variables need to take integer values while satisfying some linear constraints. More precisely, an ILP with n𝑛nitalic_n variables and m𝑚mitalic_m linear constraints admits the form,

zint:=minxn{cx:Axb,xj},assignsubscriptsuperscript𝑧intsubscript𝑥superscript𝑛:superscript𝑐top𝑥formulae-sequence𝐴𝑥𝑏subscript𝑥𝑗z^{\star}_{\mathrm{int}}:=\min_{x\in\mathbb{R}^{n}}\{c^{\top}x\>:\>Ax\leq b,~{% }~{}x_{j}\in\mathbb{Z}\},italic_z start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_int end_POSTSUBSCRIPT := roman_min start_POSTSUBSCRIPT italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_A italic_x ≤ italic_b , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ blackboard_Z } , (1)

where cn𝑐superscript𝑛c\in\mathbb{Z}^{n}italic_c ∈ blackboard_Z start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, Am×n𝐴superscript𝑚𝑛A\in\mathbb{Z}^{m\times n}italic_A ∈ blackboard_Z start_POSTSUPERSCRIPT italic_m × italic_n end_POSTSUPERSCRIPT and bm𝑏superscript𝑚b\in\mathbb{Z}^{m}italic_b ∈ blackboard_Z start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT.

Despite the fact that ILPs are 𝒩𝒫𝒩𝒫\mathcal{NP}caligraphic_N caligraphic_P-hard (Bixby et al., 2004), heuristics, such as the cutting plane method (Gomory, 1960) or branch and bound (Land & Doig, 1960a; Zarpellon et al., 2021) have proven to be extremely valuable.

The cutting plane method, proposed by Gomory (Gomory, 1960), is one of the most fundamental approaches for solving ILPs. The idea is to iteratively solve relaxed versions of the original problem (1) by dropping the integrality requirement, thus obtaining a linear program (LP) that is computationally tractable to solve. More precisely, by dropping the integrality constraints, we obtain the following LP:

zfrac:=minxn{cx:Axb,xn}.assignsubscriptsuperscript𝑧fracsubscript𝑥superscript𝑛:superscript𝑐top𝑥formulae-sequence𝐴𝑥𝑏𝑥superscript𝑛z^{\star}_{\mathrm{frac}}:=\min_{x\in\mathbb{R}^{n}}\{c^{\top}x\>:\>Ax\leq b,% \>x\in\mathbb{R}^{n}\}.italic_z start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_frac end_POSTSUBSCRIPT := roman_min start_POSTSUBSCRIPT italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_A italic_x ≤ italic_b , italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT } . (2)

It is clear that zfraczintsubscriptsuperscript𝑧fracsubscriptsuperscript𝑧intz^{\star}_{\mathrm{frac}}\leq z^{\star}_{\mathrm{int}}italic_z start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_frac end_POSTSUBSCRIPT ≤ italic_z start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_int end_POSTSUBSCRIPT. The cornerstone idea of the cutting plane methods is to tighten the bound zfracsubscriptsuperscript𝑧fracz^{\star}_{\mathrm{frac}}italic_z start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_frac end_POSTSUBSCRIPT by adding cutting planes (Gomory, 1960) that increase zfracsubscriptsuperscript𝑧fracz^{\star}_{\mathrm{frac}}italic_z start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_frac end_POSTSUBSCRIPT but leave zintsubscriptsuperscript𝑧intz^{\star}_{\mathrm{int}}italic_z start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_int end_POSTSUBSCRIPT unchanged.

Modern solvers (Bestuzheva et al., 2023; Gurobi, 2021) use cutting planes to tighten the bounds on the linear programs (LPs) that are solved iteratively in the cutting plane method (Gomory, 1960) and the branch-and-bound method (Land & Doig, 1960b). There are multiple types of cutting planes that can increase the value of the respective LP relaxation. Namely, Gomory showed that for ILPs with m𝑚mitalic_m constraints, there are as much as m𝑚mitalic_m possible Gomory cuts (Gomory, 1960) (Apendix A.2 contains details on how to obtain Gomory cuts). Although adding all possible cutting planes would yield stronger tightening of the LP relaxation and thus faster convergence, the number of constraints would grow exponentially over the iterations, making the problem infeasible (Wesselmann & Stuhl, 2012). Consequently, all cutting plane methods select a small number of possible cuts to add at each iteration (Balas et al., 1993; Amaldi et al., 2014; Andreello et al., 2007; Tang et al., 2020; Paulus et al., 2022).

In practice, the cut addition policy—determining which cut to add at each iteration—plays a crucial role in the convergence properties of the method and over several hand-crafted heuristics have been proposed (Balas et al., 1993; Andreello et al., 2007; Amaldi et al., 2014; Coniglio & Tieves, 2015). For instance, the SCIP solver (Bestuzheva et al., 2023) employs a weighted sum of features such as integer support and parallelism to rank the cuts.

A more recent line of research focuses on machine learning-based cut addition policies that, after training, can adjust to the problem distributions of interest (see (Deza & Khalil, 2023) for a survey). These approaches leverage different machine learning techniques, including imitation learning (Paulus et al., 2022), reinforcement learning (Wang et al., 2023; Tang et al., 2020), and multiple instance learning (Huang et al., 2021). It is worth noting that irrespective of the specific training strategy, all previous approaches concentrate on cut addition policies that, at each iteration, include a small set of new cuts in the constraints.

1.1 Our Approach and Results

In this work, we explore cutting plane methods with learning-guided policies that do not only add cuts but also possess the capability to remove them. To the best of knowledge, this is the first work examining such cut removal policies in the context of cutting plane methods.

As already mentioned, adding multiple cutting planes at each iteration of the cut selection method is very beneficial with respect to convergence since larger sizes of the fractional polytope are removed. However, the latter leads to an exponential increase in the number of linear constraints (Wesselmann & Stuhl, 2012).

We consider the concept of cut removals as a strategy to mitigate the exponential growth in the number of constraints, while still capitalizing on the rapid convergence rates associated with incorporating multiple cuts (Wesselmann & Stuhl, 2012). Specifically, in each iteration of the cutting plane method, we opt to include all potential Gomory cuts in the set of linear constraints. As previously noted, this approach provides the advantage of excluding larger regions from the feasibility polytope. Subsequently, we proceed with the cut removal step, where previously introduced cuts are eliminated from the set of linear constraints. This ensures that the overall number of linear constraints increases by just one cut (or a small constant number) from iteration to iteration.

Cut Addition vs Cut Removal As mentioned earlier, our approach differs significantly from previous cut addition policies (Tang et al., 2020; Paulus et al., 2022). Unlike these policies, our method introduces multiple cuts at each iteration but also removes multiple cuts to achieve a balanced increase. In contrast, cut addition policies introduce a small number of new cuts in the constraints, leading to a gradual increment in the number of cuts with each iteration.

A fundamental distinction between the two approaches lies in the permanent addition of cuts. In cut addition, once a cut is incorporated into the linear constraints, it persists in all subsequent iterations of the method. However, a cut that may have been effective in the early iterations may lose its relevance as more cuts are introduced. Our cutting removal approach addresses this limitation by continually assessing the effectiveness of each cut. This allows us to replace multiple outdated cuts from early iterations with fresh cuts.

Learning to Remove Cuts As in the context of cut addition, the strategy for removing cuts plays a crucial role in the convergence properties of our method. An intuitive measure of cut quality is the difference in the LP bound with and without the cut (Coniglio & Tieves, 2015; Paulus et al., 2022). Therefore, a straightforward cut removal strategy is to eliminate cuts with the small difference in the LP bound difference. However computing the LP difference for all candidate cuts requires solving numerous LPs which leads to a significant computational overhead.

To address this challenge, we adopt an imitation learning approach, similar to that of (Paulus et al., 2022) in the case of cut addition. Specifically, we train a simple model that uses hand-crafted features of the cuts (Achterberg, 2007; Wesselmann & Stuhl, 2012) and an additional MLP layer to predict the LP bound difference for each candidate cut.

Interfacing with an implementation of the Cutting Plane method with Gomory Cuts (Gomory, 1960) we experiment on five families of MILPs by training the neural networks to predict the quality of various cuts and remove them acoordingly. Our experimental evaluations indicate that our algorithm outperforms cut addition strategies.

1.2 Related Work

There is a recent line of works examining the applications of machine learning techniques in cutting plane methods. Radu et al. (2018) were the first to employ machine learning models for predicting the bound improvement of cuts in the context of semi-definite programming. In the realm of Integer Linear Programming, Tang et al. (2020) utilize reinforcement learning to devise cut addition strategies for Gomory cuts. Huang et al. (2021) develop alternative cut addition strategies through multiple instance learning. Paulus et al. (2022) employ imitation learning to create efficient cut selection policies, approximating the computationally expensive look-ahead policy that calculates the LP bound improvement for each candidate cut. In a related but slightly orthogonal task, Wang et al. (2023) leverage the RL framework to determine the order of cuts presented in the LP solver, minimizing the solving time of the LP. Deza & Khalil (2023) provides a great survey in recent work for learning to add cuts in the cutting plane method.

At the same time machine learning techniques have been used in the context of other approaches of ILPs such as Column Generation (Chi et al., 2022) and Branch&\&&Bound (Chmiela et al., 2021; Gupta et al., 2020; Khalil et al., 2016; Khalil, 2016; Zarpellon et al., 2021). On the theoretical front (Balcan et al., 2022a, b, 2021) study sample complexity and generalization bounds for cutting plane methods.

2 Background and Preliminaries

We denote with \mathbb{R}blackboard_R the set of real numbers and with \mathbb{Z}blackboard_Z the set of integer numbers. To simplify notation, we describe the linear constraints Axb𝐴𝑥𝑏{Ax\leq b}italic_A italic_x ≤ italic_b with a set of hyperplanes \mathcal{H}caligraphic_H. More precisely, each hyperplane (α,β)𝛼𝛽(\alpha,\beta)\in\mathcal{H}( italic_α , italic_β ) ∈ caligraphic_H (where αn𝛼superscript𝑛\alpha\in\mathbb{Z}^{n}italic_α ∈ blackboard_Z start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and β𝛽\beta\in\mathbb{Z}italic_β ∈ blackboard_Z) denotes the hyperplane αxβsuperscript𝛼top𝑥𝛽\alpha^{\top}x\leq\betaitalic_α start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x ≤ italic_β.

With a slight abuse of notation, we will write that an n𝑛nitalic_n-dimensional vector x𝑥x\in\mathcal{H}italic_x ∈ caligraphic_H if and only if αxβsuperscript𝛼top𝑥𝛽\alpha^{\top}x\leq\betaitalic_α start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x ≤ italic_β for all (α,β)𝛼𝛽(\alpha,\beta)\in\mathcal{H}( italic_α , italic_β ) ∈ caligraphic_H. We also denote with (,c)𝑐(\mathcal{H},c)( caligraphic_H , italic_c ) an instance of Integer Linear Program, minxn{cx:x}subscript𝑥superscript𝑛:superscript𝑐top𝑥𝑥\min_{x\in\mathbb{Z}^{n}}\{c^{\top}x~{}:~{}x\in\mathcal{H}\}roman_min start_POSTSUBSCRIPT italic_x ∈ blackboard_Z start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H }.

We denote with xfrac:=argminxn{cx:x}assignsubscriptsuperscript𝑥fracsubscriptargmin𝑥superscript𝑛conditional-setsuperscript𝑐top𝑥𝑥x^{\star}_{\mathrm{frac}}:=\mathrm{argmin}_{x\in\mathbb{R}^{n}}\{c^{\top}x:x% \in\mathcal{H}\}italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_frac end_POSTSUBSCRIPT := roman_argmin start_POSTSUBSCRIPT italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H } the optimal fractional solution of (,c)𝑐(\mathcal{H},c)( caligraphic_H , italic_c ) and with xint:=argminxn{cx:x}assignsubscriptsuperscript𝑥intsubscriptargmin𝑥superscript𝑛conditional-setsuperscript𝑐top𝑥𝑥x^{\star}_{\mathrm{int}}:=\mathrm{argmin}_{x\in\mathbb{Z}^{n}}\{c^{\top}x:x\in% \mathcal{H}\}italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_int end_POSTSUBSCRIPT := roman_argmin start_POSTSUBSCRIPT italic_x ∈ blackboard_Z start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H } the optimal integral solution. We remark that xfracsubscriptsuperscript𝑥fracx^{\star}_{\mathrm{frac}}italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_frac end_POSTSUBSCRIPT can be computed in polynomial time (Eiselt & Sandblom, 2007). On the contrary computing xintsubscriptsuperscript𝑥intx^{\star}_{\mathrm{int}}italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_int end_POSTSUBSCRIPT is an 𝒩𝒫𝒩𝒫\mathcal{NP}caligraphic_N caligraphic_P-hard problem.

Notice that cxfraccxintsuperscript𝑐topsubscriptsuperscript𝑥fracsuperscript𝑐topsubscriptsuperscript𝑥intc^{\top}x^{\star}_{\mathrm{frac}}\leq c^{\top}x^{\star}_{\mathrm{int}}italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_frac end_POSTSUBSCRIPT ≤ italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_int end_POSTSUBSCRIPT and thus solving the fractional problem provides a lower bound on the cost of optimal integral solution. In Definition 2.1 we introduce the notion of cutting plane (α,β)𝛼𝛽(\alpha,\beta)( italic_α , italic_β ) that will be crucial throughout the paper.

Definition 2.1.

Let the instance (,c)𝑐(\mathcal{H},c)( caligraphic_H , italic_c ). A hyperplane (α,β)𝛼𝛽(\alpha,\beta)( italic_α , italic_β ) is called cutting plane if and only if

αxfrac>β and αxintβformulae-sequencesuperscript𝛼topsuperscriptsubscript𝑥frac𝛽 and superscript𝛼topsuperscriptsubscript𝑥int𝛽\alpha^{\top}x_{\mathrm{frac}}^{\star}>\beta~{}~{}~{}~{}\text{ and }~{}~{}~{}~% {}\alpha^{\top}x_{\mathrm{int}}^{\star}\leq\betaitalic_α start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT roman_frac end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT > italic_β and italic_α start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT roman_int end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ≤ italic_β

In case (α,β)𝛼𝛽(\alpha,\beta)( italic_α , italic_β ) is a cutting plane, then the new instance ((α,β),c)𝛼𝛽𝑐\left(\mathcal{H}\cup(\alpha,\beta),c\right)( caligraphic_H ∪ ( italic_α , italic_β ) , italic_c ) admits a higher optimal fractional value but the same integral optimal value as (,c)𝑐(\mathcal{H},c)( caligraphic_H , italic_c ). More precisely,

cxfraccx^fraccxintsuperscript𝑐topsuperscriptsubscript𝑥fracsuperscript𝑐topsuperscriptsubscript^𝑥fracsuperscript𝑐topsuperscriptsubscript𝑥intc^{\top}x_{\mathrm{frac}}^{\star}\leq c^{\top}\hat{x}_{\mathrm{frac}}^{\star}% \leq c^{\top}x_{\mathrm{int}}^{\star}italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT roman_frac end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ≤ italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_frac end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ≤ italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT roman_int end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT

where x^frac:=argminxn{cx:x(α,β)}assignsubscriptsuperscript^𝑥fracsubscriptargmin𝑥superscript𝑛conditional-setsuperscript𝑐top𝑥𝑥𝛼𝛽\hat{x}^{\star}_{\mathrm{frac}}:=\mathrm{argmin}_{x\in\mathbb{R}^{n}}\{c^{\top% }x:x\in\mathcal{H}\cup(\alpha,\beta)\}over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_frac end_POSTSUBSCRIPT := roman_argmin start_POSTSUBSCRIPT italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H ∪ ( italic_α , italic_β ) }.

Algorithm 1 Cutting plane method
1:  Input: An integer linear program (,c)𝑐(\mathcal{H},c)( caligraphic_H , italic_c )
2:  Set 𝒫1:=assignsubscript𝒫1\mathcal{P}_{1}:=\varnothingcaligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := ∅
3:  for  k=1,,𝑘1k=1,\ldots,italic_k = 1 , … , do
4:     Compute the fractional solution of (𝒫k,c)subscript𝒫𝑘𝑐(\mathcal{H}\cup\mathcal{P}_{k},c)( caligraphic_H ∪ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_c )
xk:=argminxn{cx:x𝒫k}assignsuperscriptsubscript𝑥𝑘subscriptargmin𝑥superscript𝑛conditional-setsuperscript𝑐top𝑥𝑥subscript𝒫𝑘x_{k}^{\star}:=\mathrm{argmin}_{x\in\mathbb{R}^{n}}\{c^{\top}x~{}:~{}x\in% \mathcal{H}\cup\mathcal{P}_{k}\}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT := roman_argmin start_POSTSUBSCRIPT italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H ∪ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT }
5:     if xksuperscriptsubscript𝑥𝑘x_{k}^{\star}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT is integral   return xksuperscriptsubscript𝑥𝑘x_{k}^{\star}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT.
6:     Compute cutpool 𝒞ksubscript𝒞𝑘\mathcal{C}_{k}caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and pick (αk,βk)𝒞ksubscript𝛼𝑘subscript𝛽𝑘subscript𝒞𝑘(\alpha_{k},\beta_{k})\in\mathcal{C}_{k}( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∈ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.
7:     Set 𝒫k+1:=𝒫k(αk,βk)assignsubscript𝒫𝑘1subscript𝒫𝑘subscript𝛼𝑘subscript𝛽𝑘\mathcal{P}_{k+1}:=\mathcal{P}_{k}\cup(\alpha_{k},\beta_{k})caligraphic_P start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT := caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ ( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ##\## insert new constraint
8:  end for

In his seminal work, Gomory (1960) showed that if (,c)𝑐(\mathcal{H},c)( caligraphic_H , italic_c ) admits a strictly fractional solution xfracsuperscriptsubscript𝑥fracx_{\mathrm{frac}}^{\star}italic_x start_POSTSUBSCRIPT roman_frac end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT then there exists a set of separating hyperplanes 𝒞𝒞\mathcal{C}caligraphic_C, i.e. each hyperplane (α,β)𝒞𝛼𝛽𝒞(\alpha,\beta)\in\mathcal{C}( italic_α , italic_β ) ∈ caligraphic_C is a separating hyperplane.

Theorem 2.2.

[Gomory (1960)] Let an instance (,c)𝑐(\mathcal{H},c)( caligraphic_H , italic_c ) with a strictly fractional solution xfracsuperscriptsubscript𝑥fracx_{\mathrm{frac}}^{\star}italic_x start_POSTSUBSCRIPT roman_frac end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT. Then there exists a set of hyperplanes 𝒞𝒞\mathcal{C}caligraphic_C with |𝒞|=||𝒞|\mathcal{C}|=|\mathcal{H}|| caligraphic_C | = | caligraphic_H | such that any (α,β)𝒞𝛼𝛽𝒞(\alpha,\beta)\in\mathcal{C}( italic_α , italic_β ) ∈ caligraphic_C is a cutting plane. 𝒞𝒞\mathcal{C}caligraphic_C is also referred as cutpool.

Gomory (1960) proposed the seminal cutting plane method (Algorithm 8) that solves ILPs by iteratively adding new cutting planes to the constraints.

The cornerstone idea of the cutting plane method is that in case the solution xksuperscriptsubscript𝑥𝑘x_{k}^{\star}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT at iteration k𝑘kitalic_k is not integral then the cutting plane (αk,βk)subscript𝛼𝑘subscript𝛽𝑘(\alpha_{k},\beta_{k})( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) added at Step 6666 will render xksuperscriptsubscript𝑥𝑘x_{k}^{\star}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT infeasible while keeping the optimal integer solution xintsubscriptsuperscript𝑥intx^{\star}_{\mathrm{int}}italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_int end_POSTSUBSCRIPT of (,c)𝑐(\mathcal{H},c)( caligraphic_H , italic_c ) intact. As a result, Algorithm 8 guarantees that cxk+1cxksuperscript𝑐topsuperscriptsubscript𝑥𝑘1superscript𝑐topsuperscriptsubscript𝑥𝑘c^{\top}x_{k+1}^{\star}\geq c^{\top}x_{k}^{\star}italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ≥ italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT until a final integral solution is reached.

Cut Selection We remark that regardless the way (αk,βk)subscript𝛼𝑘subscript𝛽𝑘(\alpha_{k},\beta_{k})( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) is selected at Step 6666, Algorithm 8 is always guaranteed to converge to the optimal integral solution. However in practice the cut selection policy plays a major role on the convergence properties of the cutting plane method. As a result, over the years various handcrafted heuristics have been proposed (Balas et al., 1993; Andreello et al., 2007; Amaldi et al., 2014; Coniglio & Tieves, 2015).

One of the most natural heuristics to select cutting planes is the one that directly looks ahead on the improvement of the fractional LP bound (Amaldi et al., 2014; Coniglio & Tieves, 2015; Paulus et al., 2022). As in Paulus et al. (2022) we will refer to this as the look-ahead policy. More precisely,

(αk,βk):=max(α,β)𝒞k[minx{cx:xk𝒫k(α,β)].(\alpha_{k},\beta_{k}):=\mathrm{max}_{{\color[rgb]{0,0,1}\definecolor[named]{% pgfstrokecolor}{rgb}{0,0,1}(\alpha,\beta)}\in\mathcal{C}_{k}}\left[\min_{x}\{c% ^{\top}x:x\in\mathcal{H}_{k}\cup\mathcal{P}_{k}\cup{\color[rgb]{0,0,1}% \definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}(\alpha,\beta)}\right].( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) := roman_max start_POSTSUBSCRIPT ( italic_α , italic_β ) ∈ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_min start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ ( italic_α , italic_β ) ] .

On the positive side, look-ahead policy admits very favorable convergence properties to the optimal integral solution (Paulus et al., 2022). On the negative side, implementing the above cut selection policy is very time-consuming since it requires the solution of |Ck|subscript𝐶𝑘|C_{k}|| italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | linear programs at each iteration k𝑘kitalic_k which creates a significant computational overhead.

Evaluating Cut Selection Policies

We can utilize the number of iterations until convergence to the optimal integer solution as a measure of performance of a cutting plane method. However cutting plane methods can take a lot of iterations before converging to the the optimal integral solution, rendering the latter metric not a very practical (Tang et al., 2020; Paulus et al., 2022).

An alternative performance metric is to measure how far is the solution of method at iteration k𝑘kitalic_k is from the optimal integer solution. The latter is referred as integrality gap (Tang et al., 2020; Paulus et al., 2022) which at iteration k𝑘kitalic_k is defined as gk:=cxintcxk0assignsubscript𝑔𝑘superscript𝑐topsuperscriptsubscript𝑥intsuperscript𝑐topsuperscriptsubscript𝑥𝑘0g_{k}:=c^{\top}x_{\mathrm{int}}^{\star}-c^{\top}x_{k}^{\star}\geq 0italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT roman_int end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT - italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ≥ 0.

Different MILP problems, even from the same distribution, will have different magnitudes for gksubscript𝑔𝑘g_{k}italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. To compare different instances we can measure the factor by which the integrality gap is closed between the first relaxation and the current round. At iteration k𝑘kitalic_k of the CP method, the integrality gap closure (IGC) (Tang et al., 2020; Paulus et al., 2022) is defined as

IGCk:=g1gkg1=cxkcx1cxintcx1[0,1].assign𝐼𝐺subscript𝐶𝑘subscript𝑔1subscript𝑔𝑘subscript𝑔1superscript𝑐topsuperscriptsubscript𝑥𝑘superscript𝑐topsuperscriptsubscript𝑥1superscript𝑐topsuperscriptsubscript𝑥intsuperscript𝑐topsuperscriptsubscript𝑥101IGC_{k}:=\frac{g_{1}-g_{k}}{g_{1}}=\frac{c^{\top}x_{k}^{\star}-c^{\top}x_{1}^{% \star}}{c^{\top}x_{\mathrm{int}}^{\star}-c^{\top}x_{1}^{\star}}\in[0,1].italic_I italic_G italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := divide start_ARG italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG = divide start_ARG italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT - italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_ARG start_ARG italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT roman_int end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT - italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_ARG ∈ [ 0 , 1 ] . (3)

3 Cut Removal Policies

The starting point of this work concerns the design of novel cutting plane methods that, at each iteration, not only add cuts but are also capable of removing cuts. As we shall see shortly, the latter offers significant advantages compared to cutting plane methods that only add cuts at each iteration.

Adding Multiple Cuts Notice that at each iteration of Algorithm 8, only one new cutting plane (αk,βk)subscript𝛼𝑘subscript𝛽𝑘(\alpha_{k},\beta_{k})( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) is added. However, it would make perfect sense at iteration k𝑘kitalic_k to include the entire cutpool 𝒞ksubscript𝒞𝑘\mathcal{C}_{k}caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT of Theorem 2.2. The latter would result in an even greater increase in the objective function and, consequently, better convergence properties. However since |𝒞k|subscript𝒞𝑘|\mathcal{C}_{k}|| caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | can have size up to ||+|𝒫k|subscript𝒫𝑘|\mathcal{H}|+|\mathcal{P}_{k}|| caligraphic_H | + | caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT |, the latter would lead to an exponential increase in the number of constraints, rendering the solution of the linear program at Step 4444 impossible. This the reason that previous cutting plane methods have focused on adding just one or small constant number of cutting planes at each iteration (Tang et al., 2020; Huang et al., 2021; Paulus et al., 2022).

Cut Removal Our approach revolves around addressing the exponential surge resulting from the inclusion of multiple cuts in each iteration. We incorporate an additional cut-removal procedure to prevent the number of constraints from escalating exponentially. In Algorithm 9, we outline the fundamental pipeline of our approach.

Algorithm 2 Cutting Plane method with Cut Removal
1:  Input: An integer linear program (,c)𝑐(\mathcal{H},c)( caligraphic_H , italic_c )
2:  Set 𝒫1:=assignsubscript𝒫1\mathcal{P}_{1}:=\varnothingcaligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := ∅
3:  for  k=1,,𝑘1k=1,\ldots,italic_k = 1 , … , do
4:     Compute the cutpool 𝒞ksubscript𝒞𝑘\mathcal{C}_{k}caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for (𝒫k,c)subscript𝒫𝑘𝑐(\mathcal{H}\cup\mathcal{P}_{k},c)( caligraphic_H ∪ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_c ).
5:     Compute the fractional solution of (𝒫k𝒞k,c)subscript𝒫𝑘subscript𝒞𝑘𝑐(\mathcal{H}\cup\mathcal{P}_{k}\cup\mathcal{C}_{k},c)( caligraphic_H ∪ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_c )
xk:=argminxn{cx:x𝒫k𝒞k}assignsuperscriptsubscript𝑥𝑘subscriptargmin𝑥superscript𝑛conditional-setsuperscript𝑐top𝑥𝑥subscript𝒫𝑘subscript𝒞𝑘x_{k}^{\star}:=\mathrm{argmin}_{x\in\mathbb{R}^{n}}\{c^{\top}x:x\in\mathcal{H}% \cup\mathcal{P}_{k}\cup\mathcal{C}_{k}\}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT := roman_argmin start_POSTSUBSCRIPT italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H ∪ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT }
##\## include all possible cuts
6:     if xksuperscriptsubscript𝑥𝑘x_{k}^{\star}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT is integral   return xksuperscriptsubscript𝑥𝑘x_{k}^{\star}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT.
7:     Select a subset P^k+1𝒫k𝒞ksubscript^𝑃𝑘1subscript𝒫𝑘subscript𝒞𝑘\hat{P}_{k+1}\subseteq\mathcal{P}_{k}\cup\mathcal{C}_{k}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ⊆ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT with |P^k+1|=k+1subscript^𝑃𝑘1𝑘1|\hat{P}_{k+1}|=k+1| over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | = italic_k + 1.##\## removal of cuts
8:     𝒫k+1:=P^k+1{cxcxk}assignsubscript𝒫𝑘1subscript^𝑃𝑘1superscript𝑐top𝑥superscript𝑐topsuperscriptsubscript𝑥𝑘\mathcal{P}_{k+1}:=\hat{P}_{k+1}\cup\{c^{\top}x\geq\lceil c^{\top}x_{k}^{\star% }\rceil\}caligraphic_P start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT := over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∪ { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x ≥ ⌈ italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ⌉ } ##\## new constraints for the next iteration
9:  end for

Let us explain how the Algorithm 9 is able to combine both the advantages of selecting multiple cuts at each iteration while at the same time avoiding the exponential increase in the number of linear constraints.

In Step 4444, Algorithm 9 computes the cutpool 𝒞ksubscript𝒞𝑘\mathcal{C}_{k}caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (see Definition 2.1) for the problem (H𝒫k,c)𝐻subscript𝒫𝑘𝑐(H\cup\mathcal{P}_{k},c)( italic_H ∪ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_c ) that includes the original constraints \mathcal{H}caligraphic_H plus the additional cuts 𝒫ksubscript𝒫𝑘\mathcal{P}_{k}caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT that have been added so far.

In Step 5, Algorithm 9 calculates the optimal solution x^ksubscript^𝑥𝑘\hat{x}_{k}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for the instance (H𝒫k𝒞k,c)𝐻subscript𝒫𝑘subscript𝒞𝑘𝑐(H\cup\mathcal{P}_{k}\cup\mathcal{C}_{k},c)( italic_H ∪ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_c ) meaning that the algorithm completely incorporates all the cutpool 𝒞ksubscript𝒞𝑘\mathcal{C}_{k}caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. To this end we remark that due to Definition 2.1, the optimal integral solution of (,c)𝑐(\mathcal{H},c)( caligraphic_H , italic_c ) is the same with the optimal integral solution of (𝒫k𝒞k,c)subscript𝒫𝑘subscript𝒞𝑘𝑐(\mathcal{H}\cup\mathcal{P}_{k}\cup\mathcal{C}_{k},c)( caligraphic_H ∪ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_c ). Namely,

argminxn{cx:x}=argminxn{cx:x𝒫k𝒞k}.subscriptargmin𝑥superscript𝑛:superscript𝑐top𝑥𝑥subscriptargmin𝑥superscript𝑛:superscript𝑐top𝑥𝑥subscript𝒫𝑘subscript𝒞𝑘\operatorname*{argmin}_{x\in\mathbb{Z}^{n}}\{c^{\top}x:x\in\mathcal{H}\}=% \operatorname*{argmin}_{x\in\mathbb{Z}^{n}}\{c^{\top}x:x\in\mathcal{H}\cup% \mathcal{P}_{k}\cup\mathcal{C}_{k}\}.roman_argmin start_POSTSUBSCRIPT italic_x ∈ blackboard_Z start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H } = roman_argmin start_POSTSUBSCRIPT italic_x ∈ blackboard_Z start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H ∪ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } .

The idea behind this step is to achieve a substantial improvement in the objective function compared to the strategy of introducing just one cut (α,β)𝒞k𝛼𝛽subscript𝒞𝑘(\alpha,\beta)\in\mathcal{C}_{k}( italic_α , italic_β ) ∈ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. In other words,

minxn{cx:x𝒫k𝒞k}minxn{cx:x𝒫k(α,β)}subscript𝑥superscript𝑛:superscript𝑐top𝑥𝑥subscript𝒫𝑘subscript𝒞𝑘subscript𝑥superscript𝑛:superscript𝑐top𝑥𝑥subscript𝒫𝑘𝛼𝛽\min_{x\in\mathbb{R}^{n}}\{c^{\top}x:x\in\mathcal{H}\cup\mathcal{P}_{k}\cup% \mathcal{C}_{k}\}\geq\min_{x\in\mathbb{R}^{n}}\{c^{\top}x:x\in\mathcal{H}\cup% \mathcal{P}_{k}\cup(\alpha,\beta)\}roman_min start_POSTSUBSCRIPT italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H ∪ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } ≥ roman_min start_POSTSUBSCRIPT italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H ∪ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ ( italic_α , italic_β ) }

where the last inequality comes from the fact that the left problems admit a superset of constraints with respect to the right one.

In Step 6 of Algorithm 9, the cut removal process is executed to prevent an exponential rise in the number of constraints. To be more specific, only k+1𝑘1k+1italic_k + 1 cuts from the set of previous cuts, 𝒫k𝒞ksubscript𝒫𝑘subscript𝒞𝑘\mathcal{P}_{k}\cup\mathcal{C}_{k}caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, are kept. This ensures that only one additional cut is introduced in each iteration, preserving the tractability of the solution obtained in Step 4 of the linear programming formulation.

Step 7 plays a crucial role in ensuring the algorithm’s monotonic improvement. Note that 𝒫ksubscript𝒫𝑘\mathcal{P}_{k}caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT may not necessarily be a subset of 𝒫^ksubscript^𝒫𝑘\hat{\mathcal{P}}_{k}over^ start_ARG caligraphic_P end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, implying that a cut introduced in a previous iteration can be removed in a subsequent iteration if it becomes uninformative. The positive aspect of this approach lies in the ability to replace inactive cuts from earlier iterations with more valuable ones. On the downside, as 𝒫ksubscript𝒫𝑘\mathcal{P}_{k}caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is not guaranteed to be a subset of 𝒫^k+1subscript^𝒫𝑘1\hat{\mathcal{P}}_{k+1}over^ start_ARG caligraphic_P end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT, there is a potential for the objective function to decrease. To mitigate this, a special constraint cxcxksuperscript𝑐top𝑥superscript𝑐topsuperscriptsubscript𝑥𝑘c^{\top}x\geq\lceil c^{\top}x_{k}^{\star}\rceilitalic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x ≥ ⌈ italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ⌉ is included. This constraint ensures that the fractional value of the problem (𝒫k+1,c)subscript𝒫𝑘1𝑐(\mathcal{H}\cup\mathcal{P}_{k+1},c)( caligraphic_H ∪ caligraphic_P start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , italic_c ) can only increase with respect to the fractional value of the problem (𝒫k,c)subscript𝒫𝑘𝑐(\mathcal{H}\cup\mathcal{P}_{k},c)( caligraphic_H ∪ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_c ).

Remark 3.1.

The additional special constraint cxcxksuperscript𝑐top𝑥superscript𝑐topsuperscriptsubscript𝑥𝑘c^{\top}x\geq\lceil c^{\top}x_{k}^{\star}\rceilitalic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x ≥ ⌈ italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ⌉ does not affect the optimal integral solution. The latter is formally stated and proven in Corollary 3.2.

Corollary 3.2.

It holds that argminxn{cx:x𝒫k+1}=argminxn{cx:x}subscriptargmin𝑥superscript𝑛:superscript𝑐top𝑥𝑥subscript𝒫𝑘1subscriptargmin𝑥superscript𝑛:superscript𝑐top𝑥𝑥\operatorname*{argmin}_{x\in\mathbb{Z}^{n}}\{c^{\top}x:x\in\mathcal{H}\cup% \mathcal{P}_{k+1}\}=\operatorname*{argmin}_{x\in\mathbb{Z}^{n}}\{c^{\top}x:x% \in\mathcal{H}\}roman_argmin start_POSTSUBSCRIPT italic_x ∈ blackboard_Z start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H ∪ caligraphic_P start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT } = roman_argmin start_POSTSUBSCRIPT italic_x ∈ blackboard_Z start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H }

Proof.

The reason is that the following holds

minxn{cx:x}subscript𝑥superscript𝑛:superscript𝑐top𝑥𝑥\displaystyle\min_{x\in\mathbb{Z}^{n}}\{c^{\top}x:x\in\mathcal{H}\}roman_min start_POSTSUBSCRIPT italic_x ∈ blackboard_Z start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H } =\displaystyle== minxn{cx:x𝒫k}subscript𝑥superscript𝑛:superscript𝑐top𝑥𝑥subscript𝒫𝑘\displaystyle\min_{x\in\mathbb{Z}^{n}}\{c^{\top}x:x\in\mathcal{H}\cup\mathcal{% P}_{k}\}roman_min start_POSTSUBSCRIPT italic_x ∈ blackboard_Z start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H ∪ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT }
=\displaystyle== minxn{cx:x𝒫^k+1}subscript𝑥superscript𝑛:superscript𝑐top𝑥𝑥subscript^𝒫𝑘1\displaystyle\min_{x\in\mathbb{Z}^{n}}\{c^{\top}x:x\in\mathcal{H}\cup\mathcal{% \hat{P}}_{k+1}\}roman_min start_POSTSUBSCRIPT italic_x ∈ blackboard_Z start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H ∪ over^ start_ARG caligraphic_P end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT }
\displaystyle\geq minxn{cx:x𝒫^k+1cxk}\displaystyle\underbrace{\min_{x\in\mathbb{R}^{n}}\{c^{\top}x:x\in\mathcal{H}% \cup\mathcal{\hat{P}}_{k+1}}_{c^{\top}x_{k}^{\star}}\}under⏟ start_ARG roman_min start_POSTSUBSCRIPT italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H ∪ over^ start_ARG caligraphic_P end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT }

The first equality holds inductively and the second comes from the fact that 𝒫^k+1𝒫ksubscript^𝒫𝑘1subscript𝒫𝑘\hat{\mathcal{P}}_{k+1}\subseteq\mathcal{H}\cup\mathcal{P}_{k}over^ start_ARG caligraphic_P end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ⊆ caligraphic_H ∪ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. Notice that since cn𝑐superscript𝑛c\in\mathbb{Z}^{n}italic_c ∈ blackboard_Z start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, minxn{cx:x}cxksubscript𝑥superscript𝑛:superscript𝑐top𝑥𝑥superscript𝑐topsuperscriptsubscript𝑥𝑘\min_{x\in\mathbb{Z}^{n}}\{c^{\top}x:x\in\mathcal{H}\}\geq\lceil c^{\top}x_{k}% ^{\star}\rceilroman_min start_POSTSUBSCRIPT italic_x ∈ blackboard_Z start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H } ≥ ⌈ italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ⌉. Thus minxn{cx:x𝒫^k+1}cxksubscript𝑥superscript𝑛:superscript𝑐top𝑥𝑥subscript^𝒫𝑘1superscript𝑐topsuperscriptsubscript𝑥𝑘\min_{x\in\mathbb{Z}^{n}}\{c^{\top}x:x\in\mathcal{H}\cup\mathcal{\hat{P}}_{k+1% }\}\geq\lceil c^{\top}x_{k}^{\star}\rceilroman_min start_POSTSUBSCRIPT italic_x ∈ blackboard_Z start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H ∪ over^ start_ARG caligraphic_P end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT } ≥ ⌈ italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ⌉. As a result, adding the additional constraint cxcxksuperscript𝑐top𝑥superscript𝑐topsuperscriptsubscript𝑥𝑘c^{\top}x\geq\lceil c^{\top}x_{k}^{\star}\rceilitalic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x ≥ ⌈ italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ⌉ does not affect the optimal integral solution. ∎

3.1 Learning to Remove Cuts

In this section, we will examine what constitutes a reasonable cut removal criterion for Step 6 of Algorithm 9 that can contribute to fast convergence properties. Motivated by the look-ahead policy presented in Section 2, the most intuitive cut-removal criterion is the one that maximizes the objective function. Namely:

P^k+1:=argmax𝒫𝒫k𝒞k[cx:x𝒫].\hat{P}_{k+1}:=\mathrm{argmax}_{{\color[rgb]{0,0,1}\definecolor[named]{% pgfstrokecolor}{rgb}{0,0,1}\mathcal{P}\subseteq\mathcal{P}_{k}\cup\mathcal{C}_% {k}}}\left[c^{\top}x:x\in\mathcal{H}\cup{\color[rgb]{0,0,1}\definecolor[named]% {pgfstrokecolor}{rgb}{0,0,1}\mathcal{P}}\right].over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT := roman_argmax start_POSTSUBSCRIPT caligraphic_P ⊆ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H ∪ caligraphic_P ] .

Unfortunately, there is no efficient algorithm for solving the problem above. Consequently, the most natural approach is to consider the greedy look-ahead criterion where we keep the k+1𝑘1k+1italic_k + 1 cuts of 𝒫k𝒞ksubscript𝒫𝑘subscript𝒞𝑘\mathcal{P}_{k}\cup\mathcal{C}_{k}caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT that lead to highest bounds in the objective function and remove the rest. More precisely, we associate each cutting plane (α,β)𝒫k𝒞k𝛼𝛽subscript𝒫𝑘subscript𝒞𝑘(\alpha,\beta)\in\mathcal{P}_{k}\cup\mathcal{C}_{k}( italic_α , italic_β ) ∈ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT with the following score,

SC(α,β)SC𝛼𝛽\displaystyle\mathrm{SC}(\alpha,\beta)roman_SC ( italic_α , italic_β ) =\displaystyle== minxn{cx:x𝒫k𝒞k}subscript𝑥superscript𝑛:superscript𝑐top𝑥𝑥subscript𝒫𝑘subscript𝒞𝑘\displaystyle\min_{x\in\mathbb{R}^{n}}\{c^{\top}x~{}:~{}x\in\mathcal{H}\cup% \mathcal{P}_{k}\cup\mathcal{C}_{k}\}roman_min start_POSTSUBSCRIPT italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H ∪ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT }
\displaystyle-- minxn{cx:x𝒫k𝒞k/(α,β)},subscript𝑥superscript𝑛:superscript𝑐top𝑥𝑥subscript𝒫𝑘subscript𝒞𝑘𝛼𝛽\displaystyle\min_{x\in\mathbb{R}^{n}}\{c^{\top}x~{}:~{}x\in\mathcal{H}\cup% \mathcal{P}_{k}\cup\mathcal{C}_{k}/(\alpha,\beta)\},roman_min start_POSTSUBSCRIPT italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H ∪ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT / ( italic_α , italic_β ) } ,

which measures the difference of the fractional LP value once (α,β)𝒫k𝒞k𝛼𝛽subscript𝒫𝑘subscript𝒞𝑘(\alpha,\beta)\in\mathcal{P}_{k}\cup\mathcal{C}_{k}( italic_α , italic_β ) ∈ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is removed. Then 𝒫^k+1subscript^𝒫𝑘1\hat{\mathcal{P}}_{k+1}over^ start_ARG caligraphic_P end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT is defined as the k+1𝑘1k+1italic_k + 1 cuts of 𝒫k𝒞ksubscript𝒫𝑘subscript𝒞𝑘\mathcal{P}_{k}\cup\mathcal{C}_{k}caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT with the highest score.

Parametrizing Cut Removal Unfortunately, similar to the case of single-cut addition, the cut removal policy of Eq. (3.1) is highly computationally demanding to implement since it requires solving numerous linear programs. To circumvent this additional computational overhead, we will approximate the scores of Eq. (3.1) through an adequately parametrized model πθ()subscript𝜋𝜃\pi_{\theta}(\cdot)italic_π start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( ⋅ ). More precisely, for each cut (α,β)𝒫k𝒞k𝛼𝛽subscript𝒫𝑘subscript𝒞𝑘(\alpha,\beta)\in\mathcal{P}_{k}\cup\mathcal{C}_{k}( italic_α , italic_β ) ∈ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT we compute the model πθ()subscript𝜋𝜃\pi_{\theta}(\cdot)italic_π start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( ⋅ ) to compute the scores

SC(α,β)=πθ(,𝒫k,𝒞k,(α,β)).SC𝛼𝛽subscript𝜋𝜃subscript𝒫𝑘subscript𝒞𝑘𝛼𝛽\mathrm{SC}(\alpha,\beta)=\pi_{\theta}(\mathcal{H},\mathcal{P}_{k},\mathcal{C}% _{k},(\alpha,\beta)).roman_SC ( italic_α , italic_β ) = italic_π start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( caligraphic_H , caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , ( italic_α , italic_β ) ) .

More precisely, we first convert each cut (α,β)𝛼𝛽(\alpha,\beta)( italic_α , italic_β ) into a 14141414-dimensional feature vector. Due to the fact that the coordinate of the feature vectors involve also the optimal fractional solution xk:=min{cx:xH𝒫k𝒞k}assignsuperscriptsubscript𝑥𝑘:superscript𝑐top𝑥𝑥𝐻subscript𝒫𝑘subscript𝒞𝑘x_{k}^{\star}:=\min\{c^{\top}x:x\in H\cup\mathcal{P}_{k}\cup\mathcal{C}_{k}\}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT := roman_min { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ italic_H ∪ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } these feature embeddings naturally encode the overall structure of the linear constraints. The details of feature vectors can be found in Appendix B.3. The encoded state is then fed to a Multi Layer Perceptron (MLP) with a sigmoid (σ𝜎\sigmaitalic_σ) activation.

A summary of our approach is illustrated in Figure 1.


Refer to caption
Figure 1: Visual representation of Algorithm 9.
Remark 3.3.

More complicated models such as graph neural networks over a graph encoding of the state as in Paulus et al. (2022); Gasse et al. (2019) could also be used. Although graph architectures draw an interesting line of improvements they are outside the scope of this study.

3.2 Training the Model

The goal of training process is to approximately select the parameters θ𝜃\thetaitalic_θ so as to predict the scores of Eq. (3.1). In other words select θ𝜃\thetaitalic_θ such that

πθ(,𝒫k,𝒞k,(α,β))minx{cx:x𝒫k𝒞k}similar-to-or-equalssubscript𝜋𝜃subscript𝒫𝑘subscript𝒞𝑘𝛼𝛽subscript𝑥:superscript𝑐top𝑥𝑥subscript𝒫𝑘subscript𝒞𝑘\displaystyle\pi_{\theta}(\mathcal{H},\mathcal{P}_{k},\mathcal{C}_{k},(\alpha,% \beta))\simeq\min_{x}\{c^{\top}x~{}:~{}x\in\mathcal{H}\cup\mathcal{P}_{k}\cup% \mathcal{C}_{k}\}italic_π start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( caligraphic_H , caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , ( italic_α , italic_β ) ) ≃ roman_min start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H ∪ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT }
minx{cx:x𝒫k𝒞k/{αxβ}}.subscript𝑥:superscript𝑐top𝑥𝑥subscript𝒫𝑘subscript𝒞𝑘superscript𝛼top𝑥𝛽\displaystyle-\min_{x}\{c^{\top}x~{}:~{}x\in\mathcal{H}\cup\mathcal{P}_{k}\cup% \mathcal{C}_{k}/\{\alpha^{\top}x\leq\beta\}\}.- roman_min start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H ∪ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT / { italic_α start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x ≤ italic_β } } .

We note that Paulus et al. (2022) adopt a similar imitation learning approach to approximate the scores of the look-ahead policy presented in Section 2.

Dataset Generation

In order to generate our training data we use trajectories produced by the cutting plane method with the look-ahead policy for various classes of ILPs of such as Packing, Bin Packing, Set Cover, Max Cut and Production Planning.

For each intermediate ILP produced at iteration k𝑘kitalic_k, denoted as (𝒫k,c)subscript𝒫𝑘𝑐(\mathcal{H}\cup\mathcal{P}_{k},c)( caligraphic_H ∪ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_c ), we utilize the corresponding cut pool 𝒞ksubscript𝒞𝑘\mathcal{C}_{k}caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and the previously added cuts 𝒫ksubscript𝒫𝑘\mathcal{P}_{k}caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT to generate the following data set. For every cutting plane (α,β)𝒞k𝒫k𝛼𝛽subscript𝒞𝑘subscript𝒫𝑘(\alpha,\beta)\in\mathcal{C}_{k}\cup\mathcal{P}_{k}( italic_α , italic_β ) ∈ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, a new data point is created with a value equal to the normalized LP improvement,

cxk(α,β)cxkcxk>0superscript𝑐topsubscriptsuperscript𝑥𝑘𝛼𝛽superscript𝑐topsubscriptsuperscript𝑥𝑘superscript𝑐topsubscriptsuperscript𝑥𝑘0\frac{c^{\top}x^{\star}_{k\cup(\alpha,\beta)}-c^{\top}x^{\star}_{k}}{c^{\top}x% ^{\star}_{k}}>0divide start_ARG italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k ∪ ( italic_α , italic_β ) end_POSTSUBSCRIPT - italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG > 0

where xk:=argminx{cx:x𝒫k}assignsuperscriptsubscript𝑥𝑘subscriptargmin𝑥:superscript𝑐top𝑥𝑥subscript𝒫𝑘x_{k}^{\star}:=\operatorname*{argmin}_{x}\{c^{\top}x:x\in\mathcal{H}\cup% \mathcal{P}_{k}\}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT := roman_argmin start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_H ∪ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } and xk(α,β):=argminx{cx:x𝒫𝒞k(α,β)}assignsuperscriptsubscript𝑥𝑘𝛼𝛽subscriptargmin𝑥:superscript𝑐top𝑥𝑥𝒫subscript𝒞𝑘𝛼𝛽x_{k\cup(\alpha,\beta)}^{\star}:=\operatorname*{argmin}_{x}\{c^{\top}x:x\in% \mathcal{P}\cup\mathcal{C}_{k}\cup(\alpha,\beta)\}italic_x start_POSTSUBSCRIPT italic_k ∪ ( italic_α , italic_β ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT := roman_argmin start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x : italic_x ∈ caligraphic_P ∪ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∪ ( italic_α , italic_β ) }. The reason that we use the normalized LP improvement over the actual LP improvement cxk(α,β)cxksuperscript𝑐topsubscriptsuperscript𝑥𝑘𝛼𝛽superscript𝑐topsubscriptsuperscript𝑥𝑘c^{\top}x^{\star}_{k\cup(\alpha,\beta)}-c^{\top}x^{\star}_{k}italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k ∪ ( italic_α , italic_β ) end_POSTSUBSCRIPT - italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is to get target values that are not as dependent on the specific instance of the problem family.

Once all training data are generated, we solve a regression problem with quadratic loss to choose the parameters θ𝜃\thetaitalic_θ.

4 Experiments

We divide our experiments in two main parts. The first one focuses on evaluating the performance of cut removal acting against multiple benchmark policies by rolling them out on synthetic test MILP instances for each of the problem families in a controlled environment. Next, we investigate how well do our trained models generalize to larger instances.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 2: Mean IGC for the benchmark instances: We report the mean Integral Gap Closed (IGC) that measures quality of the solution with respect to the optimal integer solution (see Section 2 for details), a policy achieving larger IGC values with fewer iterations is better. The highlighted region represents the variance. Our cut removal algorithm outperforms or matches all cut addition methods in all benchmarks. For cut addition policies the look-ahead outperforms all of the others except in Packing where Neural Cut, this behavior matches the results showcased in the equivalent benchmark of Paulus et al. (2022).

4.1 Evaluation Setup

As in Tang et al. (2020); Paulus et al. (2022), we assess the performance of the various cutting plane methods by considering the improvement over the Integral Gap Closure value (IGC) (see Section 2) with respect to the number of iterations. This evaluation metric offers a robust estimate of the actual running time of the cutting plane methods, as the primary bottleneck in running time arises from solving linear programs (LPs). Simultaneously, this metric provides a cleaner benchmark since running time significantly depends on factors such as implementation, the choice of backbone solver, and available compute resources.

Paulus et al. (2022) employ imitation learning to train a cut addition policy, mimicking the behavior of the look-ahead policy discussed in Section 2. Their study demonstrates that the imitation learning approach outperforms the reinforcement learning method proposed by Tang et al. (2020). However, both approaches are surpassed by the look-ahead policy, which involves solving numerous LPs at each iteration. Unfortunately, neither Paulus et al. (2022) nor Tang et al. (2020) provide a public implementation of their code. To ensure a fair comparison, we consider the look-ahead policy, which outperforms both previous approaches, and an in-house implementation of the Neural Cut method proposed by Paulus et al. (2022) using the same feature encoding as in our model (see Appendix B.3). We also include several human-crafted heuristics such as the min similar, max normalized violation, max violation, lexicographical and random (see Appendix B.2 for the definitions).

Remark 4.1.

Paulus et al. (2022) utilize the SCIP solver, which incorporates instance pre-solving and various types of cuts (Achterberg, 2007). To ensure a fair benchmarking environment focused solely on cut decisions, we benchmark on an implementation of the Cutting Plane method from scratch. The implementation considers only Gomory Cuts without any pre-solving modes.

In order to stress-test our environment and compute the optimal solution required to obtain the IGC metrics we use the SCIP solver (Bestuzheva et al., 2023).

Our models are trained with (Robbins & Monro, 1951) using Pytorch (Paszke et al., 2019) for the implementation.

4.2 Datasets

We experiment with five families of MILPs: packing, bin packing, max cut, production planning (as in Paulus et al. (2022); Tang et al. (2020)) and set cover. For each family, instances are generated randomly. Packing, bin packing, max cut and production planning are generated under the random formulations used in Tang et al. (2020); Paulus et al. (2022). For set cover we suggest our own probabilistic formulation. Details on the generation of the instances can be found in the Appendix B.1.

4.3 In Distribution Evaluation

The core experiments in this work aim to respond the following question: Can our cut removal acting algorithm (after training) surpass the look-ahead, Paulus et al. (2022) and other cut addition baselines on unseen instances? We answer the previous question positively on the five different ILP benchmarks.

ILP Benchmarks

We generate a total of 3000300030003000 instances for each of the five problem families. Regarding the dimensions, the small and medium instances suggested in Tang et al. (2020) were too small for our comparisons and they were being solved after too few iterations to extract differences. For this reason we only consider large instances (similar-to\sim100 variables, 100 constraints) as done in Paulus et al. (2022).

For each problem family, we use 2000200020002000 instances for training, 500500500500 instances for validation and 500500500500 instances for testing as done in Paulus et al. (2022). We collect trajectories of the look-ahead policy for the train and validation instances. We train these parametric models πθsubscript𝜋𝜃\pi_{\theta}italic_π start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT with these trajectories to predict the bound improvement as previously specified in Section 3. Finally, the various cutting plane methods are compared in terms of their performance on the test instances.

The dimensions (variables, constraints) for training, validation and testing are the same for each problem. The specifications for each problem can be found in Appendix B.1.

Test Evaluation Metrics

In order to assess which method is best we compare the mean IGC after each iteration across all instances as done in Paulus et al. (2022); Tang et al. (2020). Larger IGC with fewer iterations denotes better performance. The IGC is 00 at the start and 1111 in case of convergence to the optimal intergral solution.

We don not compare execution times because the focus of our experiments relies in evaluating how large is the improvement at each round as in Paulus et al. (2022); Tang et al. (2020). A time detailed evaluation would depend on the LP solver, pre-solving of the instances, ordering of the constraints (Wang et al., 2023) and other factors which are outside the scope of this evaluation. Our experimental findings are presented in Figure 2.

Our Experimental Results

After training on the test instances, we select the best πθsubscript𝜋𝜃\pi_{\theta}italic_π start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT according to the validation loss. Our cut removal algorithm outperforms all cut addition policies in packing, bin-packing, production planning and set cover families. For packing and production planning the mean IGC of our method outpeforms significantly the comparing methods. For bin-packing and set cover the mean IGC is significantly larger in the first half of the iterations. We observe that this match in performance occurs in IGC values close to optimal pointing that in mean the algorithms are able to reach the optimal integral solution before iteration 30.

For max cut cut removal acting does not outperform the expert and neural cut addition methods. Nevertheless, the differences in performance are very small, for instance, a random policy performs as well as the best policies in the first third of the iterations. This behaviour is consistent with the findings in Paulus et al. (2022).

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3: Mean IGC for out-of-distribution instances Our cut removal algorithm outperforms or matches all cut addition methods in all benchmarks. Our method shows a much stronger generalization ability onto the larger instances. The scaling improvement is especially accentuated in packing, bin packing and set cover.

4.4 Generalization on Larger Instances

After having studied the performance that cut removal acting yields we aim to evaluate how well do our trained policies generalize to larger instances. We aim to answer the following questions: Can our cut removal acting algorithm with a model trained on smaller instances generalize for bigger instances? How does it compare with the previous benchmarks?

We remark that generalization to instances of lager size is a very desirable property since the annotation of the data becomes harder as the size increases.

ILP Benchmarks

We use the same instances as in the first experiment (see subsection 4.3) to train the models. We run the evaluation with 500 fresh instances of larger problems 50%similar-toabsentpercent50\sim 50\%∼ 50 % larger. Details on the specific dimensions are contained in Appendix B.1.

Test Evaluation Metrics

In order to asses which method is best we use the IGC as in the previous experiment.

Our Experimental Results

The model πθsubscript𝜋𝜃\pi_{\theta}italic_π start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT is pretrained according to Section 4.3. Figure 3 shows the test IGC for cut removal acting against the benchmark policies for packing, bin-packing, max cut, production planning and set cover.

Even after training πθsubscript𝜋𝜃\pi_{\theta}italic_π start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT on smaller instances than the ones used for testing, our cut removal algorithm outperforms all cut addition policies in packing, bin-packing, production planning and set cover families. For packing, bin-packing and set cover the margin betweeen cut removal acting and the look-ahead policy significantly grows when compared to the margins in training and testing in the medium instances. This shows that the cut removal algorithm scales better than the cut addition benchmarks for this families of instances. For production planning the margin is slightly decreased when compared to the results in the previous experiment as the look-ahead policy matches the performance of the cut removal method in the last third of the iterations.

For the max cut family the performance is the same as in the previous experiment. Again, the cut removal acting does not outperform all cut addition methods but the differences in performance are very small and the random policy performs almost as well as the best policies in the first third of the iterations.

5 Conclusion

Cutting plane methods play a crucial role in solving Integer Linear Programs. Recent works have employed machine learning techniques to design cut addition methods that, after training, are able to ensure fast convergence to the optimal integral solution (Tang et al., 2020; Paulus et al., 2022). In this work, we propose a novel cutting plane method that, at each iteration, introduces multiple cutting planes which are subsequently removed based on the output of a well-trained machine learning model. Our experimental evaluations demonstrate that our method outperforms both human-based heuristics and more recent machine learning-based approaches for cut addition.

Several intriguing research directions emerge for future work. The first involves the use of models capable of capturing the joint combinatorial structure of sets of cuts. The second focuses on designing cutting plane methods that do not maintain a constant increase in the number of linear constraints from iteration to iteration, but rather select the number of linear constraints to add based on an appropriate model. We are confident these approaches have the potential to significantly enhance the convergence properties of modern cutting plane methods.

Impact Statement

This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here.

Acknowledgements

Authors acknowledge the constructive feedback of reviewers and the work of ICML’24 program and area chairs. ARO - Research was sponsored by the Army Research Office and was accomplished under Grant Number W911NF-24-1-0048. Hasler AI - This work was supported by Hasler Foundation Program: Hasler Responsible AI (project number 21043). SNF project – Deep Optimisation - This work was supported by the Swiss National Science Foundation (SNSF) under grant number 200021_205011. Stratis Skoulakis is supported by Innosuisse through an Innovation project (contract agreement 100.960 IP-ICT)

References

  • Achterberg (2007) Achterberg, T. Constraint Integer Programming. PhD thesis, 2007.
  • Amaldi et al. (2014) Amaldi, E., Coniglio, S., and Gualandi, S. Coordinated cutting plane generation via multi-objective separation. Mathematical Programming, 143, 02 2014. doi: 10.1007/s10107-012-0596-x.
  • Andreello et al. (2007) Andreello, G., Caprara, A., and Fischetti, M. Embedding 0,1/2-cuts in a braneh-and-cut framework: A computational study. INFORMS Journal on Computing, 19:229–238, 03 2007. doi: 10.1287/ijoc.l050.0162.
  • Balas et al. (1993) Balas, E., Ceria, S., and Cornuéjols, G. A lift-and-project cutting plane algorithm for mixed 0-1 programs. Math. Program., 58(3):295–324, feb 1993. ISSN 0025-5610.
  • Balcan et al. (2021) Balcan, M., Prasad, S., Sandholm, T., and Vitercik, E. Sample complexity of tree search configuration: Cutting planes and beyond. In Ranzato, M., Beygelzimer, A., Dauphin, Y. N., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pp.  4015–4027, 2021.
  • Balcan et al. (2022a) Balcan, M., Prasad, S., Sandholm, T., and Vitercik, E. Structural analysis of branch-and-cut and the learnability of gomory mixed integer cuts. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022a.
  • Balcan et al. (2022b) Balcan, M., Prasad, S., Sandholm, T., and Vitercik, E. Structural analysis of branch-and-cut and the learnability of gomory mixed integer cuts. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022b.
  • Bestuzheva et al. (2023) Bestuzheva, K., Besançon, M., Chen, W.-K., Chmiela, A., Donkiewicz, T., van Doornmalen, J., Eifler, L., Gaul, O., Gamrath, G., Gleixner, A., Gottwald, L., Graczyk, C., Halbig, K., Hoen, A., Hojny, C., van der Hulst, R., Koch, T., Lübbecke, M., Maher, S. J., Matter, F., Mühmer, E., Müller, B., Pfetsch, M. E., Rehfeldt, D., Schlein, S., Schlösser, F., Serrano, F., Shinano, Y., Sofranac, B., Turner, M., Vigerske, S., Wegscheider, F., Wellner, P., Weninger, D., and Witzig, J. Enabling research through the scip optimization suite 8.0. ACM Trans. Math. Softw., 49(2), jun 2023. ISSN 0098-3500. doi: 10.1145/3585516. URL https://doi.org/10.1145/3585516.
  • Bixby et al. (2004) Bixby, R. E., Fenelon, M., Gu, Z., Rothberg, E., and Wunderling, R. Mixed-integer programming: A progress report. In The sharpest cut: the impact of Manfred Padberg and his work, pp.  309–325. SIAM, 2004.
  • Chi et al. (2022) Chi, C., Aboussalah, A. M., Khalil, E. B., Wang, J., and Sherkat-Masoumi, Z. A deep reinforcement learning framework for column generation. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022.
  • Chmiela et al. (2021) Chmiela, A., Khalil, E. B., Gleixner, A. M., Lodi, A., and Pokutta, S. Learning to schedule heuristics in branch and bound. In Ranzato, M., Beygelzimer, A., Dauphin, Y. N., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pp.  24235–24246, 2021.
  • Conforti et al. (2014) Conforti, M., Cornuéjols, G., and Zambelli, G. Integer programming, volume 271. Springer, 2014.
  • Coniglio & Tieves (2015) Coniglio, S. and Tieves, M. On the generation of cutting planes which maximize the bound improvement. volume 9125, pp.  97–109, 06 2015. ISBN 978-3-319-20085-9. doi: 10.1007/978-3-319-20086-6˙8.
  • Dey & Molinaro (2018) Dey, S. S. and Molinaro, M. Theoretical challenges towards cutting-plane selection. Mathematical Programming, 170(1):237–266, 2018. doi: 10.1007/s10107-018-1302-4. URL https://doi.org/10.1007/s10107-018-1302-4.
  • Deza & Khalil (2023) Deza, A. and Khalil, E. B. Machine learning for cutting planes in integer programming: A survey. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-2023. International Joint Conferences on Artificial Intelligence Organization, August 2023. doi: 10.24963/ijcai.2023/739. URL http://dx.doi.org/10.24963/IJCAI.2023/739.
  • Eiselt & Sandblom (2000) Eiselt, H. and Sandblom, C.-L. Integer Programming and Network Models, pp.  457–477. 01 2000. ISBN 978-3-642-08651-9. doi: 10.1007/978-3-662-04197-0˙20.
  • Eiselt & Sandblom (2007) Eiselt, H. A. and Sandblom, C. L. Linear Programming and its Applications. Number 978-3-540-73671-4 in Springer Books. Springer, June 2007. ISBN ARRAY(0x5513f500). doi: 10.1007/978-3-540-73671-4. URL https://ideas.repec.org/b/spr/sprbok/978-3-540-73671-4.html.
  • Gasse et al. (2019) Gasse, M., Chetelat, D., Ferroni, N., Charlin, L., and Lodi, A. Exact combinatorial optimization with graph convolutional neural networks. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/d14c2267d848abeb81fd590f371d39bd-Paper.pdf.
  • Gomory (1960) Gomory, R. An algorithm for the mixed integer problem. Technical report, RAND CORP SANTA MONICA CA, 1960.
  • Gupta et al. (2020) Gupta, P., Gasse, M., Khalil, E. B., Mudigonda, P. K., Lodi, A., and Bengio, Y. Hybrid models for learning to branch. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  • Gurobi (2021) Gurobi. Gurobi solver, https://www.gurobi.com/, 2021.
  • Huang et al. (2021) Huang, Z., Wang, K., Liu, F., ling Zhen, H., Zhang, W., Yuan, M., Hao, J., Yu, Y., and Wang, J. Learning to select cuts for efficient mixed-integer programming, 2021.
  • Khalil (2016) Khalil, E. B. Machine learning for integer programming. In Kambhampati, S. (ed.), Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016, pp.  4004–4005. IJCAI/AAAI Press, 2016. URL http://www.ijcai.org/Abstract/16/576.
  • Khalil et al. (2016) Khalil, E. B., Bodic, P. L., Song, L., Nemhauser, G. L., and Dilkina, B. Learning to branch in mixed integer programming. In Schuurmans, D. and Wellman, M. P. (eds.), Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA, pp.  724–731. AAAI Press, 2016. doi: 10.1609/AAAI.V30I1.10080. URL https://doi.org/10.1609/aaai.v30i1.10080.
  • Konno & Yamamoto (2008) Konno, H. and Yamamoto, R. Applications of Integer Programming to Financial Optimization, volume 18, pp.  25–48. 01 2008. doi: 10.1007/978-0-387-76682-9˙2.
  • Land & Doig (1960a) Land, A. H. and Doig, A. G. An automatic method of solving discrete programming problems. Econometrica, 28(3):497–520, 1960a. ISSN 00129682, 14680262. URL http://www.jstor.org/stable/1910129.
  • Land & Doig (1960b) Land, A. H. and Doig, A. G. An automatic method of solving discrete programming problems. Econometrica, 28(3):497–520, 1960b. ISSN 00129682, 14680262. URL http://www.jstor.org/stable/1910129.
  • Miller et al. (1960) Miller, C. E., Tucker, A. W., and Zemlin, R. A. Integer programming formulation of traveling salesman problems. J. ACM, 7(4):326–329, oct 1960. ISSN 0004-5411.
  • Nair et al. (2020) Nair, V., Bartunov, S., Gimeno, F., von Glehn, I., Lichocki, P., Lobov, I., O’Donoghue, B., Sonnerat, N., Tjandraatmadja, C., Wang, P., Addanki, R., Hapuarachchi, T., Keck, T., Keeling, J., Kohli, P., Ktena, I., Li, Y., Vinyals, O., and Zwols, Y. Solving mixed integer programs using neural networks, 2020.
  • Paszke et al. (2019) Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. PyTorch: An imperative style, high-performance deep learning library. In Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 32, pp.  8024–8035. Curran Associates, Inc., 2019.
  • Paulus et al. (2022) Paulus, M. B., Zarpellon, G., Krause, A., Charlin, L., and Maddison, C. J. Learning to cut by looking ahead: Cutting plane selection via imitation learning, 2022.
  • Radu et al. (2018) Radu, B.-L., Bonami, P., Misener, R., and Tramontani, A. Scoring positive semidefinite cutting planes for quadratic optimization via trained neural networks., 2018. http://www.optimization-online.org/DB_HTML/2018/11/6943.html.
  • Robbins & Monro (1951) Robbins, H. and Monro, S. A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22(3):400 – 407, 1951. doi: 10.1214/aoms/1177729586. URL https://doi.org/10.1214/aoms/1177729586.
  • Tang et al. (2020) Tang, Y., Agrawal, S., and Faenza, Y. Reinforcement learning for integer programming: Learning to cut, 2020.
  • Wang et al. (2023) Wang, Z., Li, X., Wang, J., Kuang, Y., Yuan, M., Zeng, J., Zhang, Y., and Wu, F. Learning cut selection for mixed-integer linear programming via hierarchical sequence model, 2023.
  • Wesselmann & Stuhl (2012) Wesselmann, F. and Stuhl, U. Implementing cutting plane management and selection techniques. Technical report, Technical report, University of Paderborn, 2012.
  • Zarpellon et al. (2021) Zarpellon, G., Jo, J., Lodi, A., and Bengio, Y. Parameterizing branch-and-bound search trees to learn branching policies. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pp.  3931–3939. AAAI Press, 2021.

Contents of the Appendix

We describe the contents of the supplementary material below:

  • In Appendix A, we provide additional details on how the randomized ILP instances are generated and the description on how Gomory Cuts are obtained.

  • In Appendix B, we present additional implementation details including the dataset dimensions, definitions for both the baselines and the feature extraction step and the specification for the training hyperparameters.

  • In Appendix C, we evaluate the performance of our method in interaction with an ILP solver. We include performance results for both end-to-end and isolated cutting plane stage solving.

  • In Appendix D, we discuss runtime considerations and provide a side-by-side comparison on that axis.

  • In Appendix E, we explore how the cut quality is distributed in cutpools for each of the different families. We gain some insights on the experimental results in Section 4

Appendix A ILP generation and primer on Gomory Cuts

A.1 Integer Programming Domains

We used instances from five integer programming domains. The first four: (i) packing, (ii) bin packing, (iii) maximum cut and (iv) production planning are the ones first suggested by Tang et al. (2020) and also used by Paulus et al. (2022). The exact mathematical formulation for the four first families is given in Tang et al. (2020). We extend the benchmarks with instances from the set cover family.

For set cover, we generate those instances probabilistically. Starting with m𝑚mitalic_m elements and n𝑛nitalic_n subsets, we add each element to a subset with probability p𝑝pitalic_p. After the full iteration, we achieve feasibility by ensuring: (i) that no subset is empty by adding a random element to empty subsets, (ii) that all elements are included in at least one subset by adding non-included elements to a random set. Let E={e1,e2,,en}𝐸subscript𝑒1subscript𝑒2subscript𝑒𝑛E=\{e_{1},e_{2},\dots,e_{n}\}italic_E = { italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_e start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } be a set of n𝑛nitalic_n elements. Let S1,S2,,Smsubscript𝑆1subscript𝑆2subscript𝑆𝑚S_{1},S_{2},\dots,S_{m}italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_S start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT be subsets of E𝐸Eitalic_E with associated costs c1,c2,,cmsubscript𝑐1subscript𝑐2subscript𝑐𝑚c_{1},c_{2},\dots,c_{m}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_c start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT. Let Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT be a random variable associated with each subset Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Xi=1subscript𝑋𝑖1X_{i}=1italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 if Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is in the solution, and 00 otherwise. The ILP formulation is as follows:

Minimize i=1mciXisuperscriptsubscript𝑖1𝑚subscript𝑐𝑖subscript𝑋𝑖\displaystyle\sum_{i=1}^{m}c_{i}X_{i}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
subject to eE,i:eSiXi1,formulae-sequencefor-all𝑒𝐸subscript:𝑖𝑒subscript𝑆𝑖subscript𝑋𝑖1\displaystyle\forall e\in E,\sum_{i:e\in S_{i}}X_{i}\geq 1,∀ italic_e ∈ italic_E , ∑ start_POSTSUBSCRIPT italic_i : italic_e ∈ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 1 ,
Xi,Xi{0,1}.for-allsubscript𝑋𝑖subscript𝑋𝑖01\displaystyle\forall X_{i},X_{i}\in\{0,1\}.∀ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { 0 , 1 } .

The constraints in (1) ensure that every element is present in at least one of the chosen subsets. The constraints in (2) indicate that every subset is either chosen or not. The objective function chooses a feasible solution with the minimum cost. We use p=0.2𝑝0.2p=0.2italic_p = 0.2 and 𝒄=𝟏𝒄1\bm{c}=\bm{1}bold_italic_c = bold_1 for our experiments.

A.2 Generating Gomory Cuts

When an LP is tackled using a simplex algorithm, the primary step involves converting the original LP into standard form. This entails transforming all inequalities into equalities through the introduction of slack variables:

minimizecTxminimizesuperscript𝑐𝑇𝑥\displaystyle\text{minimize}\quad c^{T}xminimize italic_c start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x
subject toAx+Is=b,subject to𝐴𝑥𝐼𝑠𝑏\displaystyle\text{subject to}\quad Ax+Is=b,subject to italic_A italic_x + italic_I italic_s = italic_b ,
andx0,s0,formulae-sequenceand𝑥0𝑠0\displaystyle\text{and}\quad x\geq 0,\quad s\geq 0,and italic_x ≥ 0 , italic_s ≥ 0 ,

where I𝐼Iitalic_I denotes an identity matrix, and s𝑠sitalic_s represents the set of slack variables. The simplex method iterates on the tableau formed by [A,I]𝐴𝐼[A,I][ italic_A , italic_I ], b𝑏bitalic_b, and c𝑐citalic_c. Upon convergence, the simplex method furnishes a final optimal tableau composed by a constraint matrix L𝐿Litalic_L with a constraint vector v𝑣vitalic_v. A Gomory cut in the standard form space is generated by utilizing the row of the tableau corresponding to a fractional variable in the optimal solution xsuperscript𝑥x^{\star}italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT. For each fractional element xisubscriptsuperscript𝑥𝑖x^{\star}_{i}italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT of xsuperscript𝑥x^{\star}italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT we can generate a Gomory cut

(Li+Li)Txvi+vi,superscriptsubscript𝐿𝑖subscript𝐿𝑖𝑇𝑥subscript𝑣𝑖subscript𝑣𝑖(-L_{i}+\lfloor L_{i}\rfloor)^{T}x\leq-v_{i}+\lfloor v_{i}\rfloor,( - italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ⌊ italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⌋ ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ≤ - italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ⌊ italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⌋ , (5)

where Lisubscript𝐿𝑖L_{i}italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the i𝑖iitalic_i-th row of the matrix L𝐿Litalic_L and \lfloor\cdot\rfloor⌊ ⋅ ⌋ means component-wise rounding down. We can decompose the generated cuts cutting plane of the following form:

eTx+rTsdsuperscript𝑒𝑇𝑥superscript𝑟𝑇𝑠𝑑e^{T}x+r^{T}s\leq ditalic_e start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x + italic_r start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_s ≤ italic_d (6)

where e,xn𝑒𝑥superscript𝑛e,x\in\mathbb{R}^{n}italic_e , italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, r,sm𝑟𝑠superscript𝑚r,s\in\mathbb{R}^{m}italic_r , italic_s ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, and d𝑑d\in\mathbb{R}italic_d ∈ blackboard_R. Despite the presence of slack variables, they can be eliminated by multiplying both sides of the linear constraints in (6) by r𝑟ritalic_r:

rTAx+rTs=rTbsuperscript𝑟𝑇𝐴𝑥superscript𝑟𝑇𝑠superscript𝑟𝑇𝑏r^{T}Ax+r^{T}s=r^{T}bitalic_r start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_A italic_x + italic_r start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_s = italic_r start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_b (7)

and subtracting the new cutting plane (7) from the equation above. This results in an equivalent αβ𝛼𝛽\alpha\leq\betaitalic_α ≤ italic_β cutting plane:

(eTrTA)xdrTbsuperscript𝑒𝑇superscript𝑟𝑇𝐴𝑥𝑑superscript𝑟𝑇𝑏(e^{T}-r^{T}A)x\leq d-r^{T}b( italic_e start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - italic_r start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_A ) italic_x ≤ italic_d - italic_r start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_b (8)

This cutting plane exclusively involves variables within the original variable space. Slack variables do not provide additional information about the polytope and operations for the encoding described in B.3 are defined in the original space.

Appendix B Implementation Details

B.1 Dataset Dimensions

Tang et al. (2020) consider three different sizes (small, medium large) for each domain. In Paulus et al. (2022) the authors discard small and medium sizes because they are solved at presolving time or after adding a few number of cuts. Although we do not use pre-solving in our study out method has also shown to converge too fast for small and medium instances in (Tang et al., 2020).

For the in-distribution experiment we generate 2000200020002000 train, 500500500500 validation, 500500500500 test instances of the following dimensions:

  • Packing: 50505050 variables, 50505050 (resource) constraints.

  • Bin Packing: 50505050 variables, 50505050 (resource) constraints + 50505050 binary constraints.

  • Max Cut: |V|=9,|E|=25formulae-sequence𝑉9𝐸25|V|=9,|E|=25| italic_V | = 9 , | italic_E | = 25.

  • Production Planning: T=10𝑇10T=10italic_T = 10.

  • Set Cover: |E|=35,|S|=35formulae-sequence𝐸35𝑆35|E|=35,|S|=35| italic_E | = 35 , | italic_S | = 35.

Note that for each of the training and validation instances a trajectory of the look-ahead generates up to 30303030 cutpools of size around m+302𝑚302\frac{m+30}{2}divide start_ARG italic_m + 30 end_ARG start_ARG 2 end_ARG where m is the number of constraints which leads to approximately 200030m+302200030𝑚3022000\cdot 30\cdot\frac{m+30}{2}2000 ⋅ 30 ⋅ divide start_ARG italic_m + 30 end_ARG start_ARG 2 end_ARG training datapoints per family. For example, on packing m=50𝑚50m=50italic_m = 50 this is approximately 2410524superscript10524\cdot 10^{5}24 ⋅ 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT datapoints.

For the generalization into larger instances experiment we generate 500500500500 test instances of the following dimensions:

  • Packing: 100100100100 variables, 100100100100 (resource) constraints.

  • Bin Packing: 100100100100 variables, 100100100100 (resource) constraints + 100100100100 binary constraints.

  • Max Cut: |V|=14,|E|=40formulae-sequence𝑉14𝐸40|V|=14,|E|=40| italic_V | = 14 , | italic_E | = 40.

  • Production Planning: T=15𝑇15T=15italic_T = 15.

  • Set Cover: |E|=50,|S|=50formulae-sequence𝐸50𝑆50|E|=50,|S|=50| italic_E | = 50 , | italic_S | = 50.

B.2 Baselines

Consider 𝒞𝒞\mathcal{C}caligraphic_C to be a cutpool. The baseline heuristics that we use are defined as follows:

  • Random: Choose (αk,βk)𝒞subscript𝛼𝑘subscript𝛽𝑘𝒞(\alpha_{k},\beta_{k})\in\mathcal{C}( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∈ caligraphic_C uniformly at random.

  • Max Violation (MV): Let xsuperscript𝑥x^{\star}italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT be the basic feasible solution of the current LP. MV selects the cut corresponding to the maximum fractional component, this is the cut corresponding to the index is=argmaxi{|xiround(xi)|}subscript𝑖𝑠subscriptargmax𝑖subscriptsuperscript𝑥𝑖roundsubscriptsuperscript𝑥𝑖i_{s}=\mathrm{argmax}_{i}\{|x^{\star}_{i}-\text{round}(x^{\star}_{i})|\}italic_i start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = roman_argmax start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT { | italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - round ( italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | }.

  • Max Normalized Violation (MNV). Recall that L𝐿Litalic_L denotes the optimal tableau returned by the simplex algorithm. Let Lisubscript𝐿𝑖L_{i}italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT be the i𝑖iitalic_ith row of L𝐿Litalic_L. Then, MNV selects the cut corresponding to index is=argmaxi{|xiround(xi)|/Li}subscript𝑖𝑠subscriptargmax𝑖subscriptsuperscript𝑥𝑖roundsubscriptsuperscript𝑥𝑖normsubscript𝐿𝑖i_{s}=\mathrm{argmax}_{i}\{|x^{\star}_{i}-\text{round}(x^{\star}_{i})|/\|L_{i}\|\}italic_i start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = roman_argmax start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT { | italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - round ( italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | / ∥ italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ }.

  • Lexicographic: Add the first cut with fractional index is=argminxiis fractional{i}subscript𝑖𝑠subscriptargminsubscriptsuperscript𝑥𝑖is fractional𝑖i_{s}=\mathrm{argmin}_{x^{*}_{i}\text{is fractional}}\{i\}italic_i start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = roman_argmin start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is fractional end_POSTSUBSCRIPT { italic_i }.

  • Min Similar: Takes the cut argmin(αk,βk)𝒞{(αk,βk)Tc}subscriptargminsubscript𝛼𝑘subscript𝛽𝑘𝒞superscriptsubscript𝛼𝑘subscript𝛽𝑘𝑇𝑐\mathrm{argmin}_{(\alpha_{k},\beta_{k})\in\mathcal{C}}\{(\alpha_{k},\beta_{k})% ^{T}c\}roman_argmin start_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∈ caligraphic_C end_POSTSUBSCRIPT { ( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_c } where c𝑐citalic_c is the objective coefficient vector.

Wesselmann & Stuhl (2012) is a useful resource for a more detailed description of heuristic cut selection rules.

B.3 Feature Encoding

We design 14 cut features to represent the state for the cut selection task. The first 13 follow Wang et al. (2023); Huang et al. (2021); Wesselmann & Stuhl (2012); Achterberg (2007); Dey & Molinaro (2018). The 14-th is a binary variable indicating if a cut belongs to the latest cutpool. Table 1 provides a description of such features.

Table 1: Designed cut features for a generated cut (αk,βk)subscript𝛼𝑘subscript𝛽𝑘(\alpha_{k},\beta_{k})( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ). c𝑐citalic_c is objective coefficient vector and xsuperscript𝑥x^{\star}italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT is the latest LP solution.
Feature Description Number
Cut Coefficients Mean, Max, Min, Std of (αk,βk)subscript𝛼𝑘subscript𝛽𝑘(\alpha_{k},\beta_{k})( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) 4
Objective Coefficients Mean, Max, Min, Std of c𝑐citalic_c 4
Parallelism Parallelism between the objective and the cut (αk,βk)Tc|c||(αk,βk)|superscriptsubscript𝛼𝑘subscript𝛽𝑘𝑇𝑐𝑐subscript𝛼𝑘subscript𝛽𝑘\frac{(\alpha_{k},\beta_{k})^{T}c}{|c||(\alpha_{k},\beta_{k})|}divide start_ARG ( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_c end_ARG start_ARG | italic_c | | ( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) | end_ARG 1
Efficacy Euclidean distance of the cut hyperplane to xsuperscript𝑥x^{\star}italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT 1
Support Proportion of non-zero coefficients of (αk,βk)subscript𝛼𝑘subscript𝛽𝑘(\alpha_{k},\beta_{k})( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) 1
Integral Support Proportion of non-zero coefficients with respect to integer variables of (αk,βk)subscript𝛼𝑘subscript𝛽𝑘(\alpha_{k},\beta_{k})( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) 1
Normalized Violation Violation of the cut to the current LP solution max{0,αkTxβk|βk|}0superscriptsubscript𝛼𝑘𝑇superscript𝑥subscript𝛽𝑘subscript𝛽𝑘\max\{0,\frac{\alpha_{k}^{T}x^{\star}-\beta_{k}}{|\beta_{k}|}\}roman_max { 0 , divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG | italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | end_ARG } 1
Latest Cutpool Wheter (αk,βk)𝒞ksubscript𝛼𝑘subscript𝛽𝑘subscript𝒞𝑘(\alpha_{k},\beta_{k})\in\mathcal{C}_{k}( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∈ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT or not 1

B.4 Training Hyperparameters

We trained our models with SGD with a lr of 51035superscript1035\cdot 10^{-3}5 ⋅ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT for 50505050 epochs using a batch size of 104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT with a patience parameter of 5555.

Appendix C Interfacing with an ILP Solver

As remarked in the experiments section 4.1 our method is tested in a clean and isolated environment using an implementation of the cutting plane method from scratch. Nevertheless, we acknowledge the interest of evaluating our approach perform inside of an ILP Solver. As a result, we have incorporated our cut-removal approach in the SCIP (Achterberg, 2007) solver which also enables solving larger instances as well as using other types of cuts implemented natively (Mixed Integer Gomory cuts, Strong Chvátal-Gomory cuts, Complemented Mixed Integer cuts and Implied Bound cuts).

In particular, we have developed an implementation of our method in the SCIP solver through the PySCIPOpt python interface and used it on the “Neural Network Verification” (Nair et al., 2020) dataset instances. More precisely, we evaluated the performance of the Branch-and-Cut mode of SCIP with the vanilla cut-addition policy (B&C-Cut_Addition) with the performance of Branch-and-Cut mode with our cut-removal approach (B&C-Cut_Removal).

C.1 End-to-end Performance Comparison

In Table 2 we present the percentage improvement of the solution found by B&C-Cut_Removal with respect to the solution found by B&C-Cut_Addition in 26 randomly selected instances with the node limit set to 100. Our experiments reveal that B&C-Cut_Removal finds on average a 35%percent3535\%35 % better solution than B&C-Cut_Addition. We also observe that B&C-Cut_Removal is able to find a better solution on 88.46%percent88.4688.46\%88.46 % of instances. In the remainder 11.54%percent11.5411.54\%11.54 % of instances both methods reach the same solution.

Table 2: Improvement (%) for Each Instance (id)
Instance 1317 1891 1941 1987 2229 2891 2959 321 3736 3853 3964 4173 4329
Imp. (%) 27.87 4.69 0.00 125.80 46.38 76.00 22.55 17.32 22.75 74.23 20.69 0.00 3.52
Instance 4743 495 5119 5424 5463 5757 5833 6392 6481 7064 8509 8627 8630
Imp. (%) 0.00 47.85 25.24 33.36 21.32 52.88 195.50 34.01 19.20 12.12 11.36 12.87 12.91

C.2 Isolated Performance Comparison

Branching algorithms can be interpreted by the tree they describe. The starting problem formulation relies on the top node of a tree. After making a branching decision, children problem formulations with more restrictive constraints appear. At each of this sub-problem formulations (nodes) the Cutting Plane procedure is invoked, this is where our Cut Removal Algorithm is executed.

We present a second experiment in order to evaluate the improvement that our method yields specifically at a single-node level in the Cutting-Plane stage of the SCIP solver by isolating all parts of the SCIP workflow except the cutting plane step. This setting is analogous to our Cutting Plane implementation but in SCIP. Interfacing with SCIP allows for extended capabilities such as having different kinds of cuts (besides vanilla Gomory Cuts) and having the problem modified through pre-solving at the starting node. At each Cutting Plane method call we measure the improvement of the LP bound of B&C-Cut_Removal against B&C-Cut_Addition. The results are contained in Table 3

Table 3: Mean Improvement (%) for Each Instance (id)
Instance 8630 2891 3853 5424 4329 8627 6481 1941 3964 495 2959 4173 8509
MImp. (%) 137.83 29.69 0.09 0.69 109.31 276.44 3.09 0.01 47.71 0.37 22.20 0.00 31.97
Instance 7064 1317 5463 2229 5757 1891 4743 3736 1987 321 5833 5119 6392
MImp. (%) 91.77 117.55 322.77 59.97 38.47 0.51 2.44 0.00 8.21 1.78 9.61 102.77 35.61

Appendix D Runtime Considerations

Wall-clock time highly depends on external factors such as implementation and programming language (e.g., C++ vs Python/PyTorch). This variability is why the number of cutting planes (number of iterations) is considered a more robust evaluation metric and has also been adopted by Paulus et al. (2022); Tang et al. (2020). This being said, we acknowledge the interest in providing results for this metric. In Table 4, we present the speedup that our method achieves when compared to the baselines in terms of wall-clock time. We measure the total time elapsed from start to finish when solving the instance. We consider instances where at least one of the policies converged. Max-Cut and Planning are not included in this table as none of the methods were able to achieve an ICG value of 1 (see Figure 2 in the paper). Thus, these problems do not permit a fair comparison. We notice our method attains an improvement over all baselines.

Table 4: Performance Comparison
Problem/Method Paulus et al. Look-ahead Expert Other Non-neural Baselines
Bin Packing 1.14x 24.59x 2.71x
Packing 1.61x 24.30x 2.15x
Set Cover 5.12x 47.26x 5.41x

We emphasize that our current implementation on cut removal could be further optimized with data structures. For this reason, we believe that the reported speedups could be improved further.

Appendix E Cut Pool Distributions

In this section we present results on the quality distribution for Gomory Cuts. We aim to answer the following questions: How is the quality in cuts distributed for different ILP families? How does this distribution evolve across the iterations? How do the cutpool quality distributions vary if acting with a random policy versus the lookahead rule?

MILP Benchmarks

We collect trajectories of the random policy and lookahead policy for a total of 500 instances per problem family. At each iteration k𝑘kitalic_k we save each of the generated cuts (α,β)𝛼𝛽(\alpha,\beta)( italic_α , italic_β ), the previous LP value xksubscriptsuperscript𝑥𝑘x^{\star}_{k}italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and we calculate the LP value after adding the cut xk(α,β)subscriptsuperscript𝑥𝑘𝛼𝛽x^{\star}_{k\cup(\alpha,\beta)}italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k ∪ ( italic_α , italic_β ) end_POSTSUBSCRIPT.

Test Evaluation Metrics

In section 3 we presented the bound improvement as a criteria to measure the quality of a cut (α,β)𝛼𝛽(\alpha,\beta)( italic_α , italic_β ).

Recall the formulation for the normalized LP improvement by the previous LP value:

cTxk(α,β)cTxkcTxk.superscript𝑐𝑇subscriptsuperscript𝑥𝑘𝛼𝛽superscript𝑐𝑇subscriptsuperscript𝑥𝑘superscript𝑐𝑇subscriptsuperscript𝑥𝑘\frac{c^{T}x^{\star}_{k\cup(\alpha,\beta)}-c^{T}x^{\star}_{k}}{c^{T}x^{\star}_% {k}}.divide start_ARG italic_c start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k ∪ ( italic_α , italic_β ) end_POSTSUBSCRIPT - italic_c start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG .

This metric (M1) serves as an indicator on how valuable is a cut for the solution of the LP.

Consider also the similar construction but normalizing by the largest bound improvement instead.

cTxk(α,β)cTxkmax(α~,β~)𝒞k{cTxk(α~,β~)cTxk}[0,1].superscript𝑐𝑇subscriptsuperscript𝑥𝑘𝛼𝛽superscript𝑐𝑇subscriptsuperscript𝑥𝑘subscript~𝛼~𝛽subscript𝒞𝑘superscript𝑐𝑇subscriptsuperscript𝑥𝑘~𝛼~𝛽superscript𝑐𝑇subscriptsuperscript𝑥𝑘01\frac{c^{T}x^{\star}_{k\cup(\alpha,\beta)}-c^{T}x^{\star}_{k}}{\max_{(\tilde{% \alpha},\tilde{\beta})\in\mathcal{C}_{k}}\{c^{T}x^{\star}_{k\cup(\tilde{\alpha% },\tilde{\beta})}-c^{T}x^{\star}_{k}\}}\in[0,1].divide start_ARG italic_c start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k ∪ ( italic_α , italic_β ) end_POSTSUBSCRIPT - italic_c start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG roman_max start_POSTSUBSCRIPT ( over~ start_ARG italic_α end_ARG , over~ start_ARG italic_β end_ARG ) ∈ caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT { italic_c start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k ∪ ( over~ start_ARG italic_α end_ARG , over~ start_ARG italic_β end_ARG ) end_POSTSUBSCRIPT - italic_c start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } end_ARG ∈ [ 0 , 1 ] .

This metric (M2) serves as an indicator on how valuable is a cut with respect to its cutpool.

Results

For each problem family (packing, bin packing, maximum cut, production planning, set cover), each metric (M1, M2) and each policy generating the trajectory (random, look-ahead) we compute distribution matrix D𝐷Ditalic_D. Consider M(·) to be a function that calculates metric M for each element of an array and sortd(·) to be a function that sorts an array in decreasing order. Let 𝒞k(l)superscriptsubscript𝒞𝑘𝑙\mathcal{C}_{k}^{(l)}caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT to be the cutpool generated at iteration k𝑘kitalic_k for instance l𝑙litalic_l. Then, the i-th row of D𝐷Ditalic_D is calculated by aggregating sortd(M(Ci(l)))sortd𝑀superscriptsubscript𝐶𝑖𝑙{\text{sortd}(M(C_{i}^{(l)}))}sortd ( italic_M ( italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) ) and scaling. For the scaling note that for some problems the cutpool dimensions may vary across different instances and iterations. For this reason, we divide each component by the number of existing components in the cutpools in the aggregation. Figures 4, 5, 6, 7, 8 show the heatmaps for the different distribution matrices.

The top plots (a), (b) for each figure explain how the ”cut quality with respect to fellow cutpool cuts” distribution evolves across iterations for trajectories of the random and look-ahead policies respectively. The starting cutpools are uneven for all problems and that picking the best cuts in the first iterations (look-ahead) leads to more uniform cutpools compared with random picking in all problems except production planning where the uneven distribution holds.

The bottom plots (c), (d) show how the ”cut quality with respect to previous LP value” distribution evolves across iterations for trajectories of the random and look-ahead policies respectively. For trajectories of both policies most of the bound improvement with respect to the previous solution is yielded in the first iterations except for production planning where the magnitudes hold. This explains the results observed in Figures 2, 3 where the IGC plateaus after the first iterations for all problems except for production planning where it shows a linear trend. We also observe that for the max cut the number of high quality cuts in the first iterations is significantly larger than in other families. This justifies the behaviour observed in the maxcut plot in Figures 2, 3 where for the first iterations many policies perform as well as the best one.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 4: Packing Cutpool Distributions
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 5: Bin Packing Cutpool Distributions
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 6: Max Cut Cutpool Distributions
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 7: Production Planning Cutpool Distributions
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 8: Set Cover Cutpool Distributions