Bill Hart: Toom 7
<<TableOfContents>>
Motivation: be fast for certain ranges of integer multiplication. On core 2, will kick "their pants".
Work with 64-bit computer.
 
A "limb" is a "machine word".
To multiply integers
- split into polynomials
 - evaluate polynomials
 - multiplications
 interpolate --> polynomials
- reconstruct our product
 
Now, write in symbols 
 and 
, for sufficiently big 
, where 
, and  
, where the 
 and 
 are the "digits in base 
".  
Then 
. Evaluating 
 (reconstruct -- part 5) is easy bit shifting.   So now we focus on polynomial multiplication.  
Karatsuba (1962)
![$$(c_0 + c_1y)(d_0 + d_1y) = c_0d_0 + c_1d_1 y^2 + [(c_0 + c_1)(d_0 + d_1) -c_0d_0 - c_1d_1]y$$ $$(c_0 + c_1y)(d_0 + d_1y) = c_0d_0 + c_1d_1 y^2 + [(c_0 + c_1)(d_0 + d_1) -c_0d_0 - c_1d_1]y$$](attachments/09(2f)583e(2f)schedule(2f)toom7/latex_b05967414ce37abe376c5ad8fb0b7c7d5d4994e8_p1.png)
- 3 distinct multiplications
 - 5 additions/subtractions
 
For larger polynomials, divide and conquer.
583e(2f)schedule(2f)toom7/latex_bafbf183e2d938ea766cf2dcc2aa9e18b53c427e_p1.png)
583e(2f)schedule(2f)toom7/latex_7342e226a8619d7da7c97141421a4fe95a290f77_p1.png)
583e(2f)schedule(2f)toom7/latex_c3c595f3ed6dfd7ab7bf1464e95b4383f44c10ab_p1.png)
583e(2f)schedule(2f)toom7/latex_5c56ff2dba967d351006f4ca93c409abb9310300_p1.png)
Knuth Multiplication
Very very hard to find in the literature (impossible?). I couldn't find a reference.
This is better, since some intermediate results are smaller. Overflowing into the next limb in Karatsuba is potentially quite nasty. Worrying about sign is less work than dealing with an extra limb. You have to use signed representations -- there is extra branching in both cases, and it really does make a difference. GMP really uses this and it does make a difference. At this level, saving one cycle per limb is a big improvement.
Another way of thinking
The above is difficult to generalize without being more systematic. Toom 3 is the next logical thing, where we break into 3 pieces instead of 2.
Consider
583e(2f)schedule(2f)toom7/latex_0656775960e676c659c27a125717734c18945909_p1.png)
 where 
 have length 
, where the length is the degree plus 
.  Then 
 has length 
.  To find the coefficients of 
 we only need values of 
 at 
 distinct points.   
Example: Karatsuba
Evaluate at
. 
 
 
. 
So we end up doing the same three multiplies as with Karatsuba.
Example: Knuth
Same, but evaluate at 
. 
Interpolation
Now generalizing we evaluate at 
 points and need to find the resulting polynomial. 
583e(2f)schedule(2f)toom7/latex_67a7a8944e5b1f3f1b01a1a7111fabdeb8741931_p1.png)
 and we have 
 for 
 values 
. We recover 
 by solving a linear system with 
 rows:  
 so 
. 
Theorem: generically 
 is invertible. 
So solve the system explicitly and get a set of linear expressions.
Toom 3 (Toom 1963, Cook 1966)
583e(2f)schedule(2f)toom7/latex_6554d8e486075a3485503a16ea30e95aaa9f4811_p1.png)
583e(2f)schedule(2f)toom7/latex_0d63828d16bd3bcf2ed2646a21e9ebec8921fa20_p1.png)
583e(2f)schedule(2f)toom7/latex_e843465c9605ca832dda36bd421334a84df0200d_p1.png)
Evaluate at 
 points.  One gets a messy identity. 
New Idea: Winograd (1980)
Winograd made some suggestions:
Evaluate at 
 and fractions.  Amazingly, nobody thought of this for quite a while -- sometimes the good ideas are the simple ones. Say 
, so  
583e(2f)schedule(2f)toom7/latex_9ae16ea3ca3745860a3fbda855abe3f52b42a4a5_p1.png)
This greatly improves the linear algebra we have to do.
Toom 3 matrix.  Evaluate at 
.  The matrix is: $$\left(\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 0 \\ 16 & 8 & 4 & 2 & 1 \\ 1 & -1 & 1 & -1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 & 1 \end{array}\right)$$ One then worries about finding a minimal list of row operations to reduce this to the identity. 
Evaluate at 
: $$\left(\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 0 \\ 1 & -1 & 1 & -1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & 2 & 4 & 8 & 16 \\ 0 & 0 & 0 & 0 & 1 \end{array}\right)$$ One can solve the resulting systems more efficiently on actual hardware with this matrix. 
Unbalanced operands
583e(2f)schedule(2f)toom7/latex_5f0222353283f52e115cd0ee87c50c186c547a84_p1.png)
and
583e(2f)schedule(2f)toom7/latex_aed0117bc24632222e39f1307303436ba92c7158_p1.png)
Compute 
 simply by evaluating at 
 distinct points. 
Toom 4 2
For example, suppose we have
 and 
.   Evaluate at 
 points.    Pretty much the same as Toom 3... the interpolation (the most involved part) is exactly the same as Toom 3.  
Odd/even Karatsuba (Hanrot-Zimmerman)
- David Harvey reinvented this, not knowing about the original paper.
 
The trick here is to write
583e(2f)schedule(2f)toom7/latex_a1a62d7361a1cad82a57acad9a347d49892966b6_p1.png)
and similarly
583e(2f)schedule(2f)toom7/latex_bbbabe1a879138b72c1f5f7bd72810ef048ef1e5_p1.png)
 Then 
  
Advantage: 
 and 
 do not have to be the same size, so get balanced multiplication no matter what. 
So far though, nobody has figured out how to make this really practical/fast/better than anything else.