How To Find a Formula For a Set of Numbers - Island of Sanity

Education

# How To Find a Formula For a Set of Numbers

Perhaps you have seen one of those math problems that says, "What's the next number in this series: 2, 6, 12, 20, ...". Or in the real world, scientists and engineers routinely find a set of numbers through experiments and would then like to find a formula that fits these numbers. A few years ago I developed a simple (though admittedly sometimes tedious) technique for finding a formula to fit a set of numbers.

Disclaimer: While I developed this technique myself, the mathematics behind it is simple enough that I would not be in the least surprised to learn that someone else invented it before me. If anyone reading this is aware of this or a similar technique being published elsewhere, I'd be interested to here about it.

A mathematically astute person might immediately object that what I am saying is impossible, that there is no "one formula" that fits any given set of numbers. This is quite true. What this technique finds is the simplest polynomial that fits the numbers. A polynomial is an equation in the form

a0 + a1n + a2n2 + a3n3 + a4n4 + ...

To explain my technique, let's work through an example. Suppose you are given the following series of numbers and want to find a formula for them, and then compute the next number in the sequence.

2, 8, 9, 11, 20

The first step is to arrange them in a column. To the left of this column we write an ascending list of counting numbers, like this:

 1 2 2 8 3 9 4 11 5 20

We number the column with the original values "0". We then create a column 1, where each value is the difference between each pair of values in column 0. Always take the lower number minus the upper number. That is, we first compare the first two numbers: 2 and 8. 8 - 2 = 6, so we write 6 to the right of and somewhat between 2 and 8. Then we compare the third number, 9, to the second number 8. 9 - 8 = 1. We proceed down the column this way, like so:

0 1
1 2
6
2 8
1
3 9
2
4 11
9
5 20

Now we make a column 2 by taking the differences between the values in column 1, like so:

0 1 2
1 2
6
2 8 -5
1
3 9 1
2
4 11 7
9
5 20

Note that negative results -- as in 1 - 6 -- are normal and expected.

We continue making new columns in this way until all the values in a column are the same. In this case, that just takes one more column.

0 1 2 3
1 2
6
2 8 -5
1 6
3 9 1
2 6
4 11 7
9
5 20

Recall that subtracting a negative is the same as adding a positive. Thus 1 minus negative 5 = 1 + 5 = 6.

What happens, you may ask, if we never reach a point where all the numbers in a column are the same? Simple: sooner or later we will get to the point where there is only one number in the column. As this number must always be the same as itself, then we are done. More on this later.

We are now ready to construct the first term of the polynomial. The column number where we stopped is the power of n. In this case we stopped at column 3, so the first term involves n3. We multiply this by the value in the column divided by the factorial of the column number. (A "factorial" is the product of all the integers from 1 to that number. For example, 5 factorial -- which is written "5!" -- is 5 x 4 x 3 x 2 x 1 = 120.) The value in this column is 6 and 3! is 6, so our first term is:

6 / 3! x n3
= 6 / 6 x n3
= n3

Now we evaluate this term for 1, 2, 3, and so on, and subtract each result from the corresponding starting value.

We started with value #1 was 2. So we plug 1 into n3 = 13 = 1, then subtract this from 2, 2 - 1 = 1.

Value #2 is 8. So we plug 2 into n3 = 23 = 8. Subtract this from 8, 8 - 8 = 0.

We proceed thus through all the values:

Original - n3
1 2 - 13 = 2 - 1 = 1
2 8 - 23 = 8 - 8 = 0
3 9 - 33 = 9 - 27 = -18
4 11 - 43 = 11 - 64 = -53
5 20 - 53 = 20 - 125 = -105

We create a new table from these values, calling these our new column 0. Like so:

0
1 1
2 0
3 -18
4 -53
5 -105

Now we go through the same process as we did for the first table.

0 1 2
1 1
-1
2 0 -17
-18
3 -18 -17
-35
4 -53 -17
-52
5 -105

Applying the same rules gives our second term:

-17 / 2! x n2
= -17/2 x n2

Note that this table was completed in fewer columns than the first table. That is, the first table ended with column 3, while this table ended with column 2. Every table must end with fewer columns than the previous table. If that doesn't happen, you have made an arithmetic mistake. (Otherwise you could get two n3 terms or whatever, which doesn't make sense.)

Again we plug the successive integers into this term and subtract from the starting values. Note we subtract with the values that we started with on this table -- we don't go back to the original values. Note that we are subtracting a negative, which is the same as adding a positive.

Original - -17/2 n2
1 1 + 17/2 12 = 1 + 17/2 = 19/2
2 0 + 17/2 22 = 0 + 34 = 34
3 -18 + 17/2 32 = -18 + 153/2 = 117/2
4 -53 + 17/2 42 = -53 + 272/2 = 83
5 -105 + 17/2 52 = -105 + 425/2 = 215/2

We build a new table with these values and go through the whole process again:

0 1
1 19/2
49/2
2 34
49/2
3 117/2
49/2
4 83
49/2
5 215/2

Giving our next term as:

49/2 / 1! x n1

= 49/2 n

Plug integers into this term to get the next table:

Original - 49/2 n
1 19/2 - 49/2 x 1 = 19/2 - 49/2 = -15
2 34 - 49/2 x 2 = 34 - 49 = -15
3 117/2 - 49/2 x 3 = 117/2 - 147/2 = -15
4 83 - 49/2 x 4 = 83 - 98 = -15
5 215/2 - 49/2 x 5 = 215/2 - 245/2 = -15

Building another table and following through the process yields the simple result:

0
1 -15
2 -15
3 -15
4 -15
5 -15

This is simple, of course, because column 0 already contains all the same value, so we are quickly done.

This gives a final term of:

-15 x n0

= -15

(Recall that any number to the zero power is simply one, and can be dropped from the expression.)

Once we get a table that is complete in the zero column, we are done. So now we put all these terms together to get the full equation:

n3 - 17/2 n2 + 49/2 n - 15

To check, try this on each value:

Original Formula
1 2 13 - 17/2 x 12 + 49/2 x 1 - 15 = 1 - 17/2 + 49/2 - 15 = 2
2 8 23 - 17/2 x 22 + 49/2 x 2 - 15 = 8 - 34 + 49 - 15 = 8
3 9 33 - 17/2 x 32 + 49/2 x 3 - 15 = 27 - 153/2 + 147/2 - 15 = 9
4 11 43 - 17/2 x 42 + 49/2 x 4 - 15 = 64 - 136 + 98 - 15 = 11
5 20 53 - 17/2 x 52 + 49/2 x 5 - 15 = 125 - 425/2 + 245/2 - 15 = 20

To find the next value in the table is now simply a matter of plugging 6 into the formula:

Original Formula
6 ? 63 - 17/2 x 62 + 49/2 x 6 - 15 = 216 - 306 + 147 - 15 = 42

And we see that the answer is the near-mythical value, 42.

## Some real-world considerations

### Coming to a single value in a column

As I noted earlier, a table is complete when all the values in a column are the same. If you come to a single value in a column, then it is the same as itself, and the table is complete. Mathematically, this is a perfectly valid solution. But if this happens, it should introduce some doubt into your mind. Do the numbers really follow some pattern, or am I just coming up with a formula to fit a set of random numbers? It is, after all, possible to come up with a formula for any set of numbers. The fact that this technique works should demonstrate that: pick any set of numbers, run them through this method, and you will get a formula to describe them. If you get to a column with two or three numbers the same, this is a good indication that there really is a pattern described by this formula. You would be unlikely to get such a result by chance.

Note this also illustrates the fact that any problem asking "what's the next number in this sequence" has an infinite number of possible answers. If someone gives you the sequence, say, "1, 4, 9, 16", you could run them through the above process and get the answer that the person is probably looking for: the rule is n2 so the next value is 25. But you could also invent any number as the next number in the sequence, say 42, and come up with a rule for "1, 4, 9, 16, 42". Feel free to work it out. It comes out to:

17/24 n4 - 85/12 n3 + 619/24 n2 - 425/12 n + 17

and the next term is then 121.

So if you want to be obnoxious, the next time you are given a quiz of "find the next number in the series" problems, just pick any number you like and fill it in, and you'll be completely correct. You'll probably get a failing grade on the test, but you can enjoy the smug satisfaction of knowing you were right.

### Rouding errors

In the above discussion I've been assuming that all numbers were exact. For a math problem, this would normally be the case. But if you've gotten a set of numbers from real-world measurements, there is probably some margin of error on your measurements. In this case, you run into an additional problem: If two numbers look close, are they equal within some margin of error? Or are they really different? And if you do consider them equal, what value do you use when you proceed to the next step?

I've tinkered with setting a margin of error, for example, differences less than .001 are considered insignificant, and then averaging the numbers together to go to the next step. I haven't found any solution better than trial and error for selecting the margin of error. If the next table does not end in fewer columns than the previous, then the margin of error was not correctly chosen. Perhaps there is something more rigorous one could do here; I haven't pursued it.

## Why This Works

While I haven't constructed a formal mathematical proof that this works, the principle is simple if you know a little calculus.

When we take the difference between pairs of values, we are, in effect, finding the derivative. You may note that the value you get is the derivative at some value in between the two values where you took the difference. That is, when you take the difference between the Y values corresponding to X values of, say, 2 and 3, you get the derivative at some X between 2 and 3. Not necessarily 2.5, but between 2 and 3.

When you continue taking differences until you get a constant, this is the equivalent of taking derivatives until you get to a constant function.

So for example, suppose the equation is y=2 x3 + 3 x - 7. (Of course the whole point is that we don't know that initially, but let's suppose that's what it is.) So as we take successive differences, we get:

Column Equation
0 y = 2 x3 + 3 x - 7
1 y' = 6 x2 + 3
2 y'' = 12 x
3 y''' = 12

Thinking backwards, then, we see that when we reach a constant, the column number must be the highest power in the original equation. Each time we take a derivative, we multiply by the exponent, so if we take derivatives all the way to a constant, we will have multiplied by all the integers from 1 to the original power, that is, by the power factorial. So when we get to a constant row, we know that the first term in the original equation must have been this constant times x to the column power divided by that power factorial, that is, if k is the constant and p is the column number, the first term must have been (k / p!) xp. In this example, the final column would be column 3 which would contain all 12's. We get (12 / 3!) x3 = (12 / 6) x3 = 2 x3. Which indeed is the first term.

If we substract this value from each of the original Y's, then we are effectively dropping the first term from the equation. If we run through the whole process again, this should find us the second term. Etc.

## What Computers Are For

Of course, I don't expect you to do all this arithmetic yourself. That's what computers are for.

So I've written a little program to do the work for you.