Surprising Limits Discovered in Quest for Optimal Solutions
Introduction
Our lives are a succession of optimization problems. They occur when we search for the fastest route home from work or attempt to balance cost and quality on a trip to the store, or even when we decide how to spend limited free time before bed.
These scenarios and many others can be represented as a mathematical optimization problem. Making the best decisions is a matter of finding their optimal solutions. And for a world steeped in optimization, two recent results provide both good and bad news.
In a paper posted in August 2020, Amir Ali Ahmadi of Princeton University and his former student, Jeffrey Zhang, who is now at Carnegie Mellon University, established that for some quadratic optimization problems — in which pairs of variables can interact — it’s computationally infeasible to find even locally optimal solutions in a time-efficient manner.
But then, two days later, Zhang and Ahmadi released a second paper with a positive takeaway. They proved that it’s always possible to quickly identify whether a cubic polynomial — which can feature three-way interactions between variables — has a local minimum, and to find it if it does.
The limits are not what their discoverers expected.
“I wouldn’t have thought that something magical happens with cubics which makes their local minima tractable to find,” said Ahmadi.
Altogether, the results establish two important markers in the study of computational complexity, proving that certain kinds of problems are easy to solve while others are necessarily hard.
They also provide new guardrails for researchers interested in optimization across a variety of fields, from finance to autonomous systems.
Life Into Math
Suppose you’re in charge of a car factory that only makes two models, the Cheapo and the Deluxe. The Deluxe sells for more than the Cheapo, costs more to manufacture and takes up more time on the production line. How many of each should the factory build?
This dilemma translates into a polynomial optimization problem.
To perform this translation, split the problem into three distinct elements. There are all the quantifiable variables waiting to be optimized — like the number of cars you must produce — constraints, like budgets and production capacity, and then something called the objective function, which is the sum of how each variable swings you toward your goal or away from it.
“The objective function takes the decision variables as input and spits out a number,” said Ahmadi. “This is something that we always want to either minimize or maximize.”
Our car factor example is a simple optimization problem. As we’ve described it, we’re assuming that none of the variables interact with each other, meaning they can be packaged in a linear function. But most real-world problems are messier. The math describing them is too.
For example, imagine you’re trying to pinpoint the optimal hub for an airline. Each airport has its own inherent value (linear contributions) from the traffic or airport cost. But then quadratic terms capture the effect of choosing pairs of airports that interact with each other in particular ways: If you have a lot of traffic out of Los Angeles, you’ll benefit more from pairing it with a San Francisco hub.
And of course problems can be even more complicated than that. Three-way interactions between variables necessitate more complex cubic functions.
Each step up in the complexity of functions allows you to model a wider range of problems. But that complexity comes at a cost — there’s no guarantee you’ll still be able to compute the optimal solutions.
Optimal Trouble
Modern optimization theory developed during World War II when a scientist named George Dantzig devised a procedure for finding solutions to linear optimization problems. His pioneering work helped the U.S. Department of Defense make informed decisions about everything from procuring airplanes to conveying supplies overseas.
Over the next decades researchers followed his lead, developing faster algorithms for finding optimal solutions to increasingly complicated problems.
But in the 1980s these advances hit an immovable barrier. Researchers proved that a fast algorithm for solving optimization problems cannot possibly exist. They established that these questions are fundamentally too complex to allow it.
So where do you go if optimal solutions are out of reach? You can search for approximate solutions, or “locally” optimal solutions, as they’re known.
Locally optimal solutions might not represent the best possible outcome, but they’re better than any similar solutions. They’re “good enough” ways of making decisions, such as how many of each car to manufacture, that can’t be improved by little tweaks to some of the variables. Only a large reshuffling could lead to the absolute best outcome, but for large problems, such a reshuffling is too computationally intensive.
Given all this, since the early 1990s researchers have tried to determine whether there’s a fast method for finding locally optimal solutions. In two important cases, Ahmadi and Zhang finally discovered the answer.
Bad News for Quadratics
When researchers want to determine whether a problem is computationally difficult to solve, they typically equate it to some other question whose complexity they already know. If you know that Problem A is intractable, and you can show that solving Problem B would provide a way of solving A, there’s your proof.
“That must mean the Problem B I had isn’t easy,” said Zhang.
In their first paper, Ahmadi and Zhang matched the challenge of quadratic optimization — in which pairs of variables interact — with something called the maximum stable set problem, a famously (and provably) difficult problem to solve.
A “stable set” is any list of nodes in a graph in which no two nodes are directly linked. The maximum stable set problem asks to find a graph’s largest such set. Even if all you want to know is whether there is a stable set of some given size, it’s computationally complex to determine the answer.
Last June, Ahmadi and Zhang recast the maximum stable set problem as a special case of the search for locally optimal solutions. They came up with a method for representing the stable set question as a quadratic optimization problem. Finding a stable set of a certain size then became a matter of finding locally optimal solutions of this optimization problem.
And given that they already knew there is no fast way to find these stable sets, they knew there’s no fast way to solve this problem. This means that locally optimal solutions are just as hard to find for these kinds of problems as truly optimal ones — a surprising correspondence.
“Intuitively, it should be easier. The surprising thing is that they show it’s hard,” said Monique Laurent of CWI, the national research institute for mathematics and computer science in the Netherlands.
Good News on Cubics
Ahmadi and Zhang ruled out the existence of an efficient algorithm that always finds locally optimal solutions to some quadratic optimization problems. At the same time, they wanted to know: Could they do it for cubic optimization problems that carry the simplifying condition that they incorporate no constraints?
Cubic polynomials are important in many practical ways. They provide a mathematical framework for thinking about third-order interactions between variables. The boost in clarity can massively improve tools, like those used for text mining, where you want algorithms to extract meaning from big data sets.
For example, imagine you feed a block of text into a computer and ask it to determine what the text is about. The computer observes that the word “apple” appears frequently. But without more information, the topic is still ambiguous.
“It could be the fruit. It could be the company,” said Anima Anandkumar of the California Institute of Technology.
Knowing that both “apple” and “orange” appear would make you more confident, but you could still be wrong (Orange is a company too). A third term like “melon” introduces a cubic relationship and might allow you to confidently conclude the text is talking about produce.
But the added clarity brings added complexity — and that’s why Zhang didn’t initially get his hopes up.
“When I went into the problem of cubic local minima, I was actually expecting it to be intractable,” he said.
Starting in early 2019, he explored different ways of approaching the problem. He was stuck until Ahmadi suggested he try a technique called sum of squares that Ahmadi had used previously to solve other optimization problems.
A Brighter Future
“Sum of squares” refers to the idea that some polynomials can be represented as the sum of the square of other polynomials. For example, 2x2 + 16x + 34 = (x + 5)2 + (x + 3)2.
The sum-of-squares perspective immediately reveals properties of the polynomial you started with. Squares of real numbers can’t be negative, so if you can express a polynomial as a sum of squares — and there’s a fast way to check if you can — you’ve proved that it always outputs a nonnegative value. This approach does not work, however, in quadratic optimization problems with constraints, which is why Ahmadi and Zhang could not take advantage of it in their quadratic work.
But for cubic optimization problems with no constraints, sum of squares becomes a handy test for finding locally optimal minimum solutions. Picture the graph of a polynomial function as a curve floating above the horizontal axis. Its low point corresponds to a particular arrangement of variables.
A fast algorithm can cycle through a series of inputs, repeatedly testing that the polynomial is a sum of squares. As it does this, the algorithm drags the curve down so it just touches the axis and is barely positive. At that point, another fast algorithm can tell you the low point’s coordinates efficiently.
Zhang and Ahmadi’s breakthrough came when they discovered that searching for locally optimal solutions of cubic functions can be done by finding low points of certain polynomials with the sum-of-squares test.
In the graph of a cubic polynomial like x3 + 1, one end always goes off to negative infinity. So cubics can never be nonnegative everywhere or a sum of squares. But Ahmadi and Zhang worked out a way of focusing exclusively on the part of the graph that curves upward. And that’s where they applied their sum-of-squares test.
“For cubics, we can always drag our function down to where we want,” said Zhang.
The result settles an important theoretical question about the difficulty of finding locally optimal solutions of cubic functions. Ahmadi and Zhang are now trying to enhance its practical value by upgrading a widely used algorithm so that it can work with cubic relationships instead of only quadratic ones. This would make the program less erratic and could lead to improved performance for a range of machine learning tasks.
“If they succeed to make this work,” said Laurent, “this could be very useful.”