Simple polynomial maximum finding

The algorithm I describe here is intended to find the maximum of a given real-valued polynomial, inside a given interval. It seems robust enough for my own specific purpose, but it has not really been well tested yet.

Code snapshot for this article (tagged polynomial_maximum_finder-2016-10-01): .zip, .tar.gz. For latest version, go to the github page.

The basic algorithm

The initial, naive, algorithm

My first attempt at this algorithm went as follows: First find the zeros of the derivative of the polynomial inside the interval. Then evaluate the polynomial at the interval boundaries as well as on the places inside the interval where the derivative is zero. The largest of those values will be the maximum we seek, as is well known from basic calculus.

That's it! Or is it? Read on to find out.

Root finding? Not so fast!

Oh yeah, that's right – I glossed over the root finding part. But that's pretty straight-forward, isn't it?

Two simple methods for finding zeros of a polynomial are Newton's method and the Bisection method. I won't go into detail about these methods here. I chose the Bisection method just to get started, because it's robust and easy to implement. I plan on switching it out for Newton in the future though.

Both of these methods will find at best one zero, so an appropriate name for this kind of method is isolated root finders. We also need to do the isolation part somehow.

Root isolation?

So we have been forced on a second detour. But that's fine – at least we have broken off a sub-problem which will be simpler than root finding, right?

So how do we isolate our roots? Let's partition our given interval into sub-intervals on which the polynomial increases or decreases monotonically. This is just another way of saying that we want to partition the interval by the sign of the derivative of our polynomial. Easy! We just need the boundary points of those intervals. By the Intermediate value theorem, this is just the points where the derivative is zero. In other words, we just need to find the roots of the derivative! But that's pretty straight-forward, isn't it?

Déjà vu

To summarize: in order to find our maxima, we need to find roots of the derivative of our polynomial. In order to find those roots, we just need to isolate them and then apply an isolated root finder. In order to isolate the roots, we just need to find the roots of the second derivative of our polynomial, and so on.

The saving grace is that the derivative has a strictly smaller degree than our original polynomial, so we have made progress in that sense. We just need to reduce the derivative of our polynomial until it's of degree 1 (i.e. "linear") or less, because in that case it's trivial to find the zeros or conclude that there are none.

We have our basic algorithm, but we're not out of the woods yet. We still need to implement it in terms of floating point arithmetic, and we need to make sure that it's robust.

Implementation

Classifying the partition points

Let's call our polynomial $p$, and let's denote it's $k$-th derivative by $D_k p$. As we figured out in the previous section, we can easily partition our interval by the sign of the derivative.

The zeros can now be classified into two categories: ones which are safely isolated within our sub-intervals, and ones which aren't. Suppose that the points of our partition are the points $x_1, x_2, ..., x_n$. We can classify them as

  1. $ p(x_i) > +\epsilon $
  2. $ p(x_i) < -\epsilon $
  3. $ |p(x_j)| < \epsilon $
Here, $\epsilon > 0$ is some number that we consider "small".

Analyzing the cases

The plan is to loop trough the $x_i$ and find zeros as we go. For example, if $x_i$ is of class 1, and $x_{i+1}$ is of class 2, then we have safely isolated a zero in the interval $[x_{i}, x_{i+1}]$, and so it is safe to give this interval as a starting point for our isolated root finder.

The following picture shows an example of this situation. The vertical dashed lines indicate our partition, the signs along the top indicate how the derivative is behaving, and the safely isolated zero is circled.

A safely isolated zero
Figure 1: A safely isolated zero.

Points of class 3 are less clear-cut. Here we have the following two interesting cases, again illustrated by images.

First, we have a potential zero, which is surrounded by two points which are safely on the same side of the x axis. In other words, we have a local maximum/minimum which may or may not be a zero. So, let's name this the local extremum potential zero, since it is a potential zero which ocurrs at a local extremum.

Figure 2: A local maximum, but what if it's it a zero?

Second, we have an inflection point with a small value at $x_i$, but on each side ($x_{i-1}$ and $x_{i+1}$) of the inflection point the function is large and has opposite sign. In other words, $x_{i-1}$ and $x_{i+1}$ safely isolate a zero. Let's call this the Inflection-point zero, since it's a zero that happens on or very close to an inflection point.

A zero adjacent to an inflection point.
Figure 3: A zero adjacent to an inflection point.

Clearly, these two are pretty sketchy cases.

Considering the sketchy cases

First of all, we should note that the second sketchy case, namely the inflection-point zero, is easily handled. If we detect an inflection point zero, we can just extend the isolation interval to be $(x_{i-1}, x_{i+1})$. An appropriate isolated root finder should then be able to find it, in our case that means the bisection method.

I have no good strategies for finding the so-called local extremum potential zero. Here's where it pays to remember why we are finding roots in the first place – we are interested in the maximum, remember?

Failing to find the local extremum potential zero kind of zero is not actually a very big deal for this particular use-case!

If there is indeed a zero at a local extremum potential zero point, and we fail to find it, we have two cases to consider. The first case is that we are trying to find the zeros of $D p$. That is, the zeros we finding are direct candidates for the maximum we are ultimately seeking. But then, by definition of local extremum potential zero, the value at $x_{i-1}$ or $x_{i+1}$ is safely larger than that at $x_{i}$, so for purposes of finding a maximum it does not matter if we fail to find this potential zero.

The other case to consider is when we are finding the zeros for $D_k p$, for some k > 1. In other words, the only reason we are trying to find this zero is to construct the partition which we will use to isolate the zeros of $D_{k-1}p$. Consider the effect on the resulting partition if we fail to find this zero. By definition $D_k p$ has the same sign on both sides of the zero, it only dips down close to zero momentarily. That means that missing the potential zero will not violate our basic invariant too seriously – we are just missing to record an isolated dip to zero in an interval that has otherwise a constant sign for the derivative.

A better algorithm, considering the sketchy cases

With all of the above in mind, let's work out a more detailed algorithm that will address those sketchy cases. As we have noted, it's fine to skip over points of our partition which are of class 3, because we are not really interested in finding all the zeros — just the ones which are important for finding the maximum in the end. Intervals such as those in Figure 1 are easy to form by simply skipping over such points.

          
inline int
try_find_roots(
    float const min_x,
    float const max_x,
    float coefficients[MAX_DEGREE][MAX_DEGREE],
    int const highest_degree,
    float roots[MAX_DEGREE][MAX_DEGREE],
    float const bracket_epsilon = 1e-10f,
    float const isolated_root_finder_epsilon = 1e-6f
    )
{
    using namespace Numerics;
        
    ENSURE(min_x < max_x);
    ENSURE(highest_degree > 0);

    int num_roots = 0;
        
    // NOTE: Try to find the linear root if it exists inside the interval.
    {
        int const degree = 1;
        float const root = -coefficients[degree][0]/coefficients[degree][1];
        if(root > min_x && root < max_x)
        {
            roots[degree][0] = root;
            num_roots++;
        }
    }

    // NOTE:
    // Try to find roots of successively lower derivatives.
    // The roots of the one-higher derivative at each step provides
    // a partition of our interval into pieces of monotone increase/decrease, which
    // helps to provide brackets to the roots we want to find.
    for(int degree = 2; degree <= highest_degree; degree++)
    {

        // NOTE: All roots of derivative which are inside our interval.
        int const num_derivative_roots = num_roots;

        // NOTE: Points inside our interval which give us our partition.
        int const num_internal_points = num_derivative_roots;
            
        num_roots = 0;

        // NOTE: our first bracket candidate is always the left limit of the interval
        float lo_x = min_x;
        float lo_y = evaluate(coefficients[degree], degree, lo_x);

        // NOTE: skip candidates so long as the left bracket candidate is small
        int lo_idx = -1;
        while(abs(lo_y) < bracket_epsilon)
        {
            lo_idx++;
            if(lo_idx == num_internal_points)
                goto level_done;
            lo_x = roots[degree-1][num_internal_points];
            lo_y = evaluate(coefficients[degree], degree, lo_x);
        }

        while(true)
        {

            int hi_idx = lo_idx;
            if(hi_idx == num_internal_points)
                goto level_done;

            float hi_x;
            float hi_y;

            // NOTE: search for the high limit, or if none can be found we skip to the next level
            do
            {                    
                hi_idx++;
                if(hi_idx == num_internal_points)
                {
                    hi_x = max_x;
                    hi_y = evaluate(coefficients[degree], degree, hi_x);
                    if(abs(hi_y) < bracket_epsilon)
                        goto level_done;
                }
                else
                {
                    hi_x = roots[degree-1][hi_idx];
                    hi_y = evaluate(coefficients[degree], degree, hi_x);
                }
            }
            while(abs(hi_y) < bracket_epsilon);

            // NOTE: We now know that our bracket does not take small values at it's limits:
            ENSURE(abs(lo_y) >= bracket_epsilon);
            ENSURE(abs(hi_y) >= bracket_epsilon);
            // ... however, we still have to check if it actually contains a zero, by comparing a signs
            if(
                (lo_y < float(0) && hi_y > float(0)) ||
                (lo_y > float(0) && hi_y < float(0))
                )
            {
                // NOTE:
                // The interval [lo_x, hi_x] brackets a zero, and does not take small values at it's limits.
                // In fact it is an isolated zero because
                // the polynomial is monotonically increasing or decreasing on [x_lo, x_hi], so
                // we use our isolated root finder.

                float const root = isolated_root(lo_x, hi_x, coefficients, degree, isolated_root_finder_epsilon);
                ENSURE(root >= min_x && root <= max_x);
                ENSURE(root > lo_x && root < hi_x);

                ENSURE(num_roots < MAX_DEGREE);
                roots[degree][num_roots] = root;
                num_roots++;

            }

            // NOTE: low bracket limit for next root is just the high limit of the current bracket
            lo_idx = hi_idx;
            lo_x = hi_x;
            lo_y = hi_y;
                
        }

    level_done:;
            
    }
        
    return num_roots;
}
          
        

Analysis

Overview

I run four tests consisting of $10^6$ test cases each. Each test case, I generate a polynomial by picking a few random zeros in the complex plane, forcing a certain number of them to fall on the real axis. By the complex conjugate root theorem, I can obtain a real polynomial from these zeros by having a factor, $(x - \alpha)(x - \alpha^*)$, for each complex zero I've generated. I also multiply in a linear, $x-a$, for each real zero, $a$ that I generate. Since I know all of the zeros at this point, I then integrate once in order to get a polynomial with a derivative for which I know the real zeros "exactly". Thus I have a polynomial and a handful of candidates for maxima (including also the boundaries of the interval), so I just have to do a short scan to find the "exact" maximum. And so, I have my reference to test against.

The main metric I use to determine the quality of my maxima is what I call the quotient in the listings below. It's calculated as $$ q = \begin{cases} m, m^* < 0 & \frac{m}{m^*} \\ m, m^* > 0 & \frac{m^*}{m}\\ \end{cases} $$ A quotient larger than 1 means the algorithm we're testing beat the "exact" maximum, and a quotient less than 1 means the "exact" maximum beat our algorithm. The cases where $q$ is not defined, I discard.

Results

          
Test 1
~~~~~~~~~~~~
overall quotient distribution: 
+0.00000 ... +0.99985: 0
+0.99985 ... +0.99986: 1
+0.99986 ... +0.99996: 0
+0.99996 ... +0.99997: 3
+0.99997 ... +0.99998: 69
+0.99998 ... +0.99999: 637
+0.99999 ... +1.00000: 151700
+1.00000 ... +1.00001: 832419
+1.00001 ...         : 737
num_discarded_test_cases = 14434

best case: 
test_case->seed = 11653076851549073771
quotient = 1.00003934e+00
polynomial = 
    7.31465816e-01 + 
    -1.10191856e+02*x^1 + 
    9.67053070e+01*x^2 + 
    3.64049263e+01*x^3 + 
    -2.69319534e+01*x^4 + 
    -1.21738100e+01*x^5 + 
    2.38386250e+00*x^6 + 
    1.33199763e+00*x^7 + 
    1.25000000e-01*x^8
reference_condition_number = 8.11721191e+02
test_case->x_min = 5.83633304e-01
test_case->x_max = 1.82704270e+00
reference_result->maximum_x = 1.41652012e+00
reference_result->maximum_y = 8.23834896e-01
result->maximum_x = 1.41652024e+00
result->maximum_y = 8.23867321e-01

worst case: 
test_case->seed = 16422157614463026085
quotient = 9.99854445e-01
polynomial = 
    4.66688061e+00 + 
    1.65968307e+02*x^1 + 
    2.33758453e+02*x^2 + 
    1.76358109e+02*x^3 + 
    7.68706055e+01*x^4 + 
    1.93191204e+01*x^5 + 
    2.58797812e+00*x^6 + 
    1.42857149e-01*x^7
reference_condition_number = 6.54640884e+01
test_case->x_min = -2.07651210e+00
test_case->x_max = -1.27939868e+00
reference_result->maximum_x = -1.71352625e+00
reference_result->maximum_y = -4.40320015e+01
result->maximum_x = -2.07651210e+00
result->maximum_y = -4.40384102e+01            
          
        
          
Test 2
~~~~~~~~~~~~
overall quotient distribution: 
+0.00000 ... +0.93591: 0
+0.93591 ... +0.93592: 1
+0.93592 ... +0.99996: 0
+0.99996 ... +0.99997: 6
+0.99997 ... +0.99998: 79
+0.99998 ... +0.99999: 650
+0.99999 ... +1.00000: 151141
+1.00000 ... +1.00001: 832674
+1.00001 ...         : 760
num_discarded_test_cases = 14689

best case: 
test_case->seed = 5167433566048733080
quotient = 1.00005305e+00
polynomial = 
    -4.11319494e+00 + 
    -7.05925989e+00*x^1 + 
    6.91078663e-01*x^2 + 
    3.25178361e+00*x^3 + 
    -1.19849980e+00*x^4 + 
    -3.38911116e-01*x^5 + 
    1.66666672e-01*x^6
reference_condition_number = 7.24009888e+02
test_case->x_min = -1.96452165e+00
test_case->x_max = 1.52205300e+00
reference_result->maximum_x = -8.26906919e-01
reference_result->maximum_y = -1.79643631e-02
result->maximum_x = -8.26906562e-01
result->maximum_y = -1.79634094e-02

worst case: 
test_case->seed = 8760772879342514659
quotient = 9.35910523e-01
polynomial = 
    -3.16986561e+00 + 
    -5.25545850e+03*x^1 + 
    -6.25271387e+03*x^2 + 
    -3.57517017e+03*x^3 + 
    -9.45244690e+02*x^4 + 
    -4.62977314e+00*x^5 + 
    6.88757858e+01*x^6 + 
    1.94058971e+01*x^7 + 
    2.34972501e+00*x^8 + 
    1.11111112e-01*x^9
reference_condition_number = 9.84797668e+00
test_case->x_min = -3.38641953e+00
test_case->x_max = 2.65139198e+00
reference_result->maximum_x = -1.01432991e+00
reference_result->maximum_y = 1.68593005e+03
result->maximum_x = -3.38641953e+00
result->maximum_y = 1.57787964e+03
          
        
          
Test 3
~~~~~~~~~~~~
overall quotient distribution: 
+0.00000 ... +0.99955: 0
+0.99955 ... +0.99956: 1
+0.99956 ... +0.99976: 0
+0.99976 ... +0.99977: 1
+0.99977 ... +0.99980: 0
+0.99980 ... +0.99981: 1
+0.99981 ... +0.99996: 0
+0.99996 ... +0.99997: 7
+0.99997 ... +0.99998: 72
+0.99998 ... +0.99999: 679
+0.99999 ... +1.00000: 152390
+1.00000 ... +1.00001: 831561
+1.00001 ...         : 725
num_discarded_test_cases = 14563

best case: 
test_case->seed = 13365033600434058427
quotient = 1.00003707e+00
polynomial = 
    -2.25687551e+00 + 
    -6.86918564e+01*x^1 + 
    1.05109520e+02*x^2 + 
    -3.59691696e+01*x^3 + 
    -1.30134506e+01*x^4 + 
    8.11739540e+00*x^5 + 
    5.57884276e-01*x^6 + 
    -8.71161044e-01*x^7 + 
    1.25000000e-01*x^8
reference_condition_number = 9.39889221e+02
test_case->x_min = 5.34300983e-01
test_case->x_max = 2.23442101e+00
reference_result->maximum_x = 1.77653694e+00
reference_result->maximum_y = 1.07413125e+00
result->maximum_x = 1.77653646e+00
result->maximum_y = 1.07417107e+00

worst case: 
test_case->seed = 7009989640267011176
quotient = 9.99556363e-01
polynomial = 
    -3.85069942e+00 + 
    -6.88398486e+03*x^1 + 
    -7.17389111e+03*x^2 + 
    -3.73377612e+03*x^3 + 
    -8.51928101e+02*x^4 + 
    6.47777786e+01*x^5 + 
    8.92707138e+01*x^6 + 
    2.23621426e+01*x^7 + 
    2.52107525e+00*x^8 + 
    1.11111112e-01*x^9
reference_condition_number = 2.45309925e+01
test_case->x_min = -2.59941697e+00
test_case->x_max = 2.48432851e+00
reference_result->maximum_x = -1.73627210e+00
reference_result->maximum_y = 2.67517480e+03
result->maximum_x = -2.59941697e+00
result->maximum_y = 2.67398804e+03
          
        
          
Test 4
~~~~~~~~~~~~
overall quotient distribution: 
+0.00000 ... +0.99510: 0
+0.99510 ... +0.99511: 1
+0.99511 ... +0.99931: 0
+0.99931 ... +0.99932: 1
+0.99932 ... +0.99993: 0
+0.99993 ... +0.99994: 1
+0.99994 ... +0.99996: 0
+0.99996 ... +0.99997: 6
+0.99997 ... +0.99998: 51
+0.99998 ... +0.99999: 615
+0.99999 ... +1.00000: 151516
+1.00000 ... +1.00001: 832292
+1.00001 ...         : 770
num_discarded_test_cases = 14747

best case: 
test_case->seed = 1352265963130428803
quotient = 1.00004983e+00
polynomial = 
    -7.67269135e-02 + 
    8.43146086e-01*x^1 + 
    1.64526153e+00*x^2 + 
    9.78637409e+00*x^3 + 
    1.34945641e+01*x^4 + 
    7.20085001e+00*x^5 + 
    1.67434728e+00*x^6 + 
    1.42857149e-01*x^7
reference_condition_number = 9.83588257e+02
test_case->x_min = -3.42859769e+00
test_case->x_max = -1.11246765e+00
reference_result->maximum_x = -2.19944143e+00
reference_result->maximum_y = 1.04267907e+00
result->maximum_x = -2.19943905e+00
result->maximum_y = 1.04273105e+00

worst case: 
test_case->seed = 9225821850979580020
quotient = 9.95102882e-01
polynomial = 
    1.02076626e+00 + 
    1.05465670e+01*x^1 + 
    1.23443422e+01*x^2 + 
    6.96026039e+00*x^3 + 
    1.89423573e+00*x^4 + 
    2.00000003e-01*x^5
reference_condition_number = 3.47852936e+01
test_case->x_min = -2.41138721e+00
test_case->x_max = -1.20899677e+00
reference_result->maximum_x = -1.54509473e+00
reference_result->maximum_y = -2.44404721e+00
result->maximum_x = -1.20899677e+00
result->maximum_y = -2.45607495e+00
          
        

When I ran these tests the first time, I noticed that almost all of the best and worst cases had maxima which were tiny, even though all of the coefficients, in particular the constant coefficient, of the test polynomials were of order 1 or larger. This means that when the last addition in the polynomial evaluation is carried out (i.e. when the constant term is added), something of order 1 is added to some number which is also of order 1, and the result is a tiny number. That means loss of precision.

I don't know any techniques to generate well-conditioned polynomials (if you do, please let me know!). Therefore I wanted something simple to filter on, so that they would not crap-up my data. I chose the condition number for polynomial evaluation, as defined here.

The definition is $$ \operatorname{cond}(p, x) = \frac{\sum_{i=0}^n |a_i| |x|^i}{\left| \sum_{i=1}^n a_i x^i \right|} $$ where $p$ is a polynomial of degree $n$ with coefficients $a_i$, and $x$ is the point at which you're evaluating. Assuming single precision (32 bit) floating point, the number $\alpha(n) \operatorname{cond}(p, x) \times 2^{-24}$ where $\alpha(n) \approx 2 n$ gives an upper bound on the error.

Indeed, when I added a calculation of the condition number for the evaluation, it turns out that in all of the best and worst cases, the problem of evaluating the polynomial at the maximum had a large condition number. Sometimes on the order of $10^5$ and even larger.

For my purposes I don't care to be robust in the face of ill-conditined polynomials, so I discard test cases with a condition number larger than $10^3$, in order to get interesting results to analyze.