Linear Algebra

This chapter is about several algorithms for matrix algebra over the rational numbers and cyclotomic fields. Algorithms for linear algebra over exact fields are necessary in order to implement the modular symbols algorithms that we will describe in Chapter Linear Algebra. This chapter partly overlaps with [Coh93, Sections 2.1–2.4].

Note: We view all matrices as defining linear transformations by acting on row vectors from the right.

Echelon Forms of Matrices

Definition 7.1

A matrix is in (reduced row) echelon form if each row in the matrix has more zeros at the beginning than the row above it, the first nonzero entry of every row is 1, and the first nonzero entry of any row is the only nonzero entry in its column.

Given a matrix A, there is another matrix B such that B is obtained from A by left multiplication by an invertible matrix and B is in reduced row echelon form. This matrix B is called the echelon form of A. It is unique.

A pivot column of A is a column of A such that the reduced row echelon form of A contains a leading 1.

Example 7.2

The following matrix is not in reduced row echelon form:

\left(\begin{array}{rrrrr}
14&2&7&228&-224\\
0&0&3&78&-70\\
0&0&0&-405&381
\end{array}\right).

The reduced row echelon form of the above matrix is

\left(\begin{array}{rrrrr}
1&\frac{1}{7}&0&0&-\frac{1174}{945}\vspace{2pt}\\
0&0&1&0&\frac{152}{135}\vspace{2pt}\\
0&0&0&1&-\frac{127}{135}
\end{array}\right).

Notice that the entries of the reduced row echelon form can be rationals with large denominators even though the entries of the original matrix A are integers. Another example is the simple looking matrix

\left(\begin{array}{rrrrrrrr}
-9&6&7&3&1&0&0&0\\
-10&3&8&2&0&1&0&0\\
3&-6&2&8&0&0&1&0\\
-8&-6&-8&6&0&0&0&1
\end{array}\right)

whose echelon form is

\left(\begin{array}{rrrrrrrr}
1&0&0&0&\frac{42}{1025}&-\frac{92}{1025}&\frac{1}{25}&-\frac{9}{205}\vspace{2pt}\\
0&1&0&0&\frac{716}{3075}&-\frac{641}{3075}&-\frac{2}{75}&-\frac{7}{615}\vspace{2pt}\\
0&0&1&0&-\frac{83}{1025}&\frac{133}{1025}&\frac{1}{25}&-\frac{23}{410}\vspace{2pt}\\
0&0&0&1&\frac{184}{1025}&-\frac{159}{1025}&\frac{2}{25}&\frac{9}{410}
\end{array}\right).

A basic fact is that two matrices A and B have the same reduced row echelon form if and only if there is an invertible matrix E such that EA = B. Also, many standard operations in linear algebra, e.g., computation of the kernel of a linear map, intersection of subspaces, membership checking, etc., can be encoded as a question about computing the echelon form of a matrix.

The following standard algorithm computes the echelon form of a matrix.

Algorithm 7.3

Given an m\times n matrix A over a field, the algorithm outputs the reduced row echelon form of A. Write a_{i,j} for the i,j entry of A, where 0\leq i \leq m-1 and 0\leq j \leq n-1.

  1. [Initialize] Set k=0.
  2. [Clear Each Column] For each column c=0,1,\ldots, n-1, clear the c^{th} column as follows:
    1. [First Nonzero] Find the smallest r such that a_{r,c}\neq 0, or if there is no such r, go to the next column.
    2. [Rescale] Replace row r of A by \frac{1}{a_{r,c}} times row r.
    3. [Swap] Swap row r with row k.
    4. [Clear] For each i=0,\ldots, m-1 with i\neq k, if a_{i,c}\neq 0, add -a_{i,c} times row k of A to row i to clear the leading entry of the i^{th} row.
    5. [Increment] Set k=k+1.

This algorithm takes O(mn^2) arithmetic operations in the base field, where A is an m\times n matrix. If the base field is \Q, the entries can become huge and arithmetic operations are then very expensive. See Section Echelon Forms over for ways to mitigate this problem.

To conclude this section, we mention how to convert a few standard problems into questions about reduced row echelon forms of matrices. Note that one can also phrase some of these answers in terms of the echelon form, which might be easier to compute, or an LUP decomposition (lower triangular times upper triangular times permutation matrix), which the numerical analysts use.

  1. Kernel of A : We explain how to compute the kernel of A acting on column vectors from the right (first transpose to obtain the kernel of A acting on row vectors). Since passing to the reduced row echelon form of A is the same as multiplying on the left by an invertible matrix, the kernel of the reduce row echelon form E of A is the same as the kernel of A. There is a basis vector of \ker(E) that corresponds to each nonpivot column of E. That vector has a 1 at the nonpivot column, 0‘s at all other nonpivot columns, and for each pivot column, the negative of the entry of A at the nonpivot column in the row with that pivot element.

  2. Intersection of Subspaces: Suppose W_1 and W_2 are subspace of a finite-dimensional vector space V. Let A_1 and A_2 be matrices whose columns form a basis for W_1 and W_2, respectively. Let A=[A_1 | A_2] be the augmented matrix formed from A_1 and A_2. Let K be the kernel of the linear transformation defined by A. Then K is isomorphic to the desired intersection. To write down the intersection explicitly, suppose that \dim(W_1) \leq \dim(W_2) and do the following: For each b in a basis for K, write down the linear combination of a basis for W_1 obtained by taking the first \dim(W_1) entries of the vector b. The fact that b is in \Ker(A) implies that the vector we just wrote down is also in W_2. This is because a linear relation

    \sum a_i w_{1,i}  + \sum b_j w_{2,j} = 0,

    i.e., an element of that kernel, is the same as

    \sum a_i w_{1,i} = \sum -b_j w_{2,j}.

    For more details, see [Coh93, Alg. 2.3.9].

Rational Reconstruction

Rational reconstruction is a process that allows one to sometimes lift an integer modulo m uniquely to a bounded rational number.

Algorithm 7.4

Given an integer a\geq 0 and an integer m> 1, this algorithm computes the numerator n and denominator d of the unique rational number n/d, if it exists, with

(1)|n|, d \leq \sqrt{\frac{m}{2}}
\qquad\text{and}\qquad n \con a d \pmod{m},

or it reports that there is no such number.

  1. [Reduce mod m] Replace a with the least integer between 0 and m-1 that is congruent to a modulo m.
  2. [Trivial Cases] If a=0 or a=1, return a. item{} [Initialize] Let b=\sqrt{m/2}, u=m, v=a, and set U=(1,0,u) and V=(0,1,v). Use the notation U_i and V_i to refer to the i^{th} entries of U, V, for i=0,1,2.
  3. [Iterate] Do the following as long as |V_2|>b: Set q = \lfloor U_2 / V_2 \rfloor, set T = U - q V, and set U=V and V=T.
  4. [Numerator and Denominator] Set d=|V_1| and n=V_2.
  5. [Good?] If d\leq b and \gcd(n,d)=1, return n/d; otherwise report that there is no rational number as in (1).

Algorithm 7.4 for rational reconstruction is described (with proof) in [Knu, pgs. 656–657] as the solution to Exercise 51 on page 379 in that book. See, in particular, the paragraph right in the middle of page 657, which describes the algorithm. Knuth attributes this rational reconstruction algorithm to Wang, Kornerup, and Gregory from around 1983.

We now give an indication of why Algorithm 7.4 computes the rational reconstruction of a \pmod{m}, leaving the precise details and uniqueness to [Knu, pgs. 656–657]. At each step in Algorithm 7.4, the 3-tuple V=(v_0, v_1,
v_2) satisfies

(2)m \cdot v_0 + a\cdot v_1 = v_2,

and similarly for U. When computing the usual extended \gcd, at the end v_2=\gcd(a,m) and v_0, v_1 give a representation of the v_2 as a \Z-linear combination of m and a. In Algorithm 7.4, we are instead interested in finding a rational number n/d such that n \con a\cdot d\pmod{m}. If we set n=v_2 and d=v_1 in (2) and rearrange, we obtain

n  = a \cdot d + m \cdot v_0.

Thus at every step of the algorithm we find a rational number n/d such that n\con ad\pmod{m}. The problem at intermediate steps is that, e.g., v_0 could be 0, or n or d could be too large.

Example 7.5

We compute an example using Sage.

sage: p = 389
sage: k = GF(p)
sage: a = k(7/13); a
210
sage: a.rational_reconstruction()
7/13

Echelon Forms over \Q

A difficulty with computation of the echelon form of a matrix over the rational numbers is that arithmetic with large rational numbers is time-consuming; each addition potentially requires a \gcd and numerous additions and multiplications of integers. Moreover, the entries of A during intermediate steps of Algorithm 7.3 can be huge even though the entries of A and the answer are small. For example, suppose A is an invertible square matrix. Then the echelon form of A is the identity matrix, but during intermediate steps the numbers involved could be quite large. One technique for mitigating this is to compute the echelon form using a multimodular method.

If A is a matrix with rational entries, let H(A) be the height of A, which is the maximum of the absolute values of the numerators and denominators of all entries of A. If x,y are rational numbers and p is a prime, we write x\con y\pmod{p} to mean that the denominators of x and y are not divisible by p but the numerator of the rational number x-y (in reduced form) is divisible by p. For example, if x=5/7 and y=2/11, then x-y=41/77, so x\con y\pmod{41}.

Algorithm 7.6

Given an m\times n matrix A with entries in \Q, this algorithm computes the reduced row echelon form of A.

  1. Rescale the input matrix A to have integer entries. This does not change the echelon form and makes reduction modulo many primes easier. We may thus assume A has integer entries.

  2. Let c be a guess for the height of the echelon form.

  3. List successive primes p_1, p_2, \ldots such that the product of the p_i is greater than n \cdot c \cdot H(A) + 1, where n is the number of columns of A.

  4. Compute the echelon forms B_i of the reduction A\pmod{p_i} using, e.g., Algorithm 7.3 or any other echelon algorithm. % (This is where most of the work takes place.)

  5. Discard any B_i whose pivot column list is not maximal among pivot lists of all B_j found so far. (The pivot list associated to B_i is the ordered list of integers k such that the k^{th} column of B_j is a pivot column. We mean maximal with respect to the following ordering on integer sequences: shorter integer sequences are smaller, and if two sequences have the same length, then order in reverse lexicographic order. Thus [1,2] is smaller than [1,2,3], and [1,2,7] is smaller than [1,2,5]. Think of maximal as “optimal”, i.e., best possible pivot columns.)

  6. Use the Chinese Remainder Theorem to find a matrix B with integer entries such that B \con B_i \pmod{p_i} for all p_i.

  7. Use Algorithm 7.4 to try to find a matrix C whose coefficients are rational numbers n/r such that |n|, r \leq \sqrt{M/2}, where M=\prod p_i, and C \con B_i \pmod{p_i} for each prime p. If rational reconstruction fails, compute a few more echelon forms mod the next few primes (using the above steps) and attempt rational reconstruction again. Let E be the matrix over \Q so obtained. (A trick here is to keep track of denominators found so far to avoid doing very many rational reconstructions.)

  8. Compute the denominator d of E, i.e., the smallest positive integer such that dE has integer entries. If

    (3)H(dE) \cdot H(A)
\cdot n < \prod p_i,

    then E is the reduced row echelon form of A. If not, repeat the above steps with a few more primes.

Proof

We prove that if (3) is satisfied, then the matrix E computed by the algorithm really is the reduced row echelon form R of A. First note that E is in reduced row echelon form since the set of pivot columns of all matrices B_i used to construct E are the same, so the pivot columns of E are the same as those of any B_i and all other entries in the B_i pivot columns are 0, so the other entries of E in the pivot columns are also 0.

Recall from the end of Section Echelon Forms of Matrices that a matrix whose columns are a basis for the kernel of A can be obtained from the reduced row echelon form R. Let K be the matrix whose columns are the vectors in the kernel algorithm applied to E, so EK=0. Since the reduced row echelon form is obtained by left multiplying by an invertible matrix, for each i, there is an invertible matrix V_i mod p_i such that A \con V_i B_i\pmod{p_i} so

A\cdot d K \con V_i B_i \cdot d K \con V_i \cdot dE \cdot K \con 0\pmod{p_i}.

Since d K and A are integer matrices, the Chinese remainder theorem implies that

A \cdot dK \con 0\,\,\,\left(\text{mod } \prod{p_i}\right).

The integer entries a of A\cdot dK all satisfy |a| \leq H(A)\cdot{} H(dK)\cdot{} n, where n is the number of columns of A. Since H(K) \leq H(E), the bound (3) implies that A \cdot dK = 0. Thus A K = 0, so \Ker(E) \subset \Ker(A). On the other hand, the rank of E equals the rank of each B_i (since the pivot columns are the same), so

\rank(E) = \rank(B_i) = \rank(A\!\!\!\pmod{p_i}) \leq
\rank(A).

Thus \dim(\Ker(A)) \leq \dim(\Ker(E)), and combining this with the bound obtained above, we see that \Ker(E) = \Ker(A). This implies that E is the reduced row echelon form of A, since two matrices have the same kernel if and only if they have the same reduced row echelon form (the echelon form is an invariant of the row space, and the kernel is the orthogonal complement of the row space).

The reason for step (5) is that the matrices B_i need not be the reduction of R modulo p_i, and indeed this reduction might not even be defined, e.g., if p_i divides the denominator of some element of R, then this reduction makes no sense. For example, set p=p_i and suppose A=\abcd{p}{1}{0}{0}. Then R=\abcd{1}{1/p}{0}{0}, which has no reduction modulo p; also, the reduction of A modulo B_i is B_i = \abcd{0}{1}{0}{0}
\pmod{p}, which is already in reduced row echelon form. However if we were to combine B_i with the echelon form of A modulo another prime, the result could never be lifted using rational reconstruction. Thus the reason we exclude all B_i with nonmaximal pivot column sequence is so that a rational reconstruction will exist. There are only finitely many primes that divide denominators of entries of R, so eventually all B_i will have maximal pivot column sequences, i.e., they are the reduction of the true reduced row echelon form R, so the algorithm terminates.

Remark 7.7

Algorithm 7.6, with sparse matrices seems to work very well in practice. A simple but helpful modification to Algorithm 7.3 in the sparse case is to clear each column using a row with a minimal number of nonzero entries, so as to reduce the amount of “fill in” (denseness) of the matrix. There are much more sophisticated methods along these lines called “intelligent Gauss elimination”. (Cryptographers are interested in linear algebra mod p with huge sparse matrices, since they come up in attacks on the discrete log problem and integer factorization.)

One can adapt Algorithm 7.6 to computation of echelon forms of matrices A over cyclotomic fields \Q(\zeta_n). Assume A has denominator 1. Let p be a prime that splits completely in \Q(\zeta_n). Compute the homomorphisms f_i : \Z_p[\zeta_n] \to \F_p by finding the elements of order n in \F_p^*. Then compute the mod p matrix f_i(A) for each i, and find its reduced row echelon form. Taken together, the maps f_i together induce an isomorphism \Psi:\F_p[X]/\Phi_n(X) \isom \F_p^d, where \Phi_n(X) is the n^{th} cyclotomic polynomial and d is its degree. It is easy to compute \Psi(f(x)) by evaluating f(x) at each element of order n in \F_p. To compute \Psi^{-1}, simply use linear algebra over \F_p to invert a matrix that represents \Psi. Use \Psi^{-1} to compute the reduced row echelon form of A\pmod{p}, where (p) is the nonprime ideal in \Z[\zeta_n] generated by p. Do this for several primes p, and use rational reconstruction on each coefficient of each power of \zeta_n, to recover the echelon form of A.

Echelon Forms via Matrix Multiplication

In this section we explain how to compute echelon forms using matrix multiplication. This is valuable because there are asymptotically fast, i.e., better than O(n^3) field operations, algorithms for matrix multiplication, and implementations of linear algebra libraries often include highly optimized matrix multiplication algorithms. We only sketch the basic ideas behind these asymptotically fast algorithms (following [Ste]), since more detail would take us too far from modular forms.

The naive algorithm for multiplying two m\times m matrices requires O(m^3) arithmetic operations in the base ring. In [Str69], Strassen described a clever algorithm that computes the product of two m\times m matrices in O(m^{\log_2(7)})
= O(m^{2.807\ldots}) arithmetic operations in the base ring. Because of numerical stability issues, Strassen’s algorithm is rarely used in numerical analysis. But for matrix arithmetic over exact base rings (e.g., the rational numbers, finite fields, etc.) it is of extreme importance.

In [Str69], Strassen also sketched a new algorithm for computing the inverse of a square matrix using matrix multiplication. Using this algorithm, the number of operations to invert an m\times m matrix is (roughly) the same as the number needed to multiply two m\times m matrices. Suppose the input matrix is 2^n\times 2^n and we write it in block form as \abcd{A}{B}{C}{D} where A,B,C,D are all 2^{n-1}\times 2^{n-1} matrices. Assume that any intermediate matrices below that we invert are invertible. Consider the augmented matrix

\left(\begin{array}{cc|cc}
A&B&I&0\\
C&D&0&I
\end{array}\right).

Multiply the top row by A^{-1} to obtain

\left(\begin{array}{cc|cc}
I&A^{-1}B&A^{-1}&0\\
C&D&0&I
\end{array}\right),

and write E=A^{-1}B. Subtract C times the first row from the second row to get

\left(\begin{array}{cc|cc}
I&E&A^{-1}&0\\
0&D - CE &-CA^{-1}&I
\end{array}\right).

Set F=D-CE and multiply the bottom row by F^{-1} on the left to obtain

\left(\begin{array}{cc|cc}
I&E&A^{-1}&0\\
0&I &-F^{-1}CA^{-1}&F^{-1}
\end{array}\right).

Set G=-F^{-1} C A^{-1}, and subtract E times the second from the first row to arrive at

\left(\begin{array}{cc|cc}
I&0&A^{-1}-EG&-EF^{-1}\\
0&I &G & F^{-1}
\end{array}\right).

The idea listed above can, with significant work, be extended to a general algorithm (as is done in [Ste06]).

Next we very briefly sketch how to compute echelon forms of matrices using matrix multiplication and inversion. Its complexity is comparable to the complexity of matrix multiplication.

As motivation, recall the standard algorithm from undergraduate linear algebra for inverting an invertible square matrix A: form the augmented matrix [A|I], and then compute the echelon form of this matrix, which is [I|A^{-1}]. If T is the transformation matrix to echelon form, then T[A|I] = [I|T], so T=A^{-1}. In particular, we could find the echelon form of [A|I] by multiplying on the left by A^{-1}. Likewise, for any matrix B with the same number of rows as A, we could find the echelon form of [A|B] by multiplying on the left by A^{-1}. Next we extend this idea to give an algorithm to compute echelon forms using only matrix multiplication (and echelon form modulo one prime).

Algorithm 7.8

Given a matrix A over the rational numbers (or a number field), this algorithm computes the echelon form of A.

1. [Find Pivots] Choose a random prime p (coprime to the denominator of any entry of A) and compute the echelon form of A\pmod{p}, e.g., using Algorithm 7.3. Let c_0, \ldots,
c_{n-1} be the pivot columns of A\pmod{p}. When computing the echelon form, save the positions r_0,\ldots, r_{n-1} of the rows used to clear each column.

  1. [Extract Submatrix] Extract the n\times n submatrix B of A whose entries are A_{r_i,c_j} for 0\leq i,j \leq {n-1}.
  2. [Compute Inverse] Compute the inverse B^{-1} of B. Note that B must be invertible since its reduction modulo p is invertible.
  3. [Multiply] Let C be the matrix whose rows are the rows r_0,\ldots, r_{n-1} of A. Compute E=B^{-1}C. If E is not in echelon form, go to ste (1).
  4. [Done?] Write down a matrix D whose columns are a basis for \ker(E) as explained in the Kernel Algorithm. Let F be the matrix whose rows are the rows of A other than rows r_0,\ldots, r_{n-1}. Compute the product FD. If FD=0, output E, which is the echelon form of A. If FD\neq 0, go to step (1)) and run the whole algorithm again.

Proof

We prove both that the algorithm terminates and that when it terminates, the matrix E is the echelon form of A.

First we prove that the algorithm terminates. Let E be the echelon form of A. By Exercise 7.3, for all but finitely many primes p (i.e., any prime where A\pmod{p} has the same rank as A) the echelon form of A\pmod{p} equals E\pmod{p}. For any such prime p the pivot columns of E\pmod{p} are the pivot columns of E, so the algorithm will terminate for that choice of p.

We next prove that when the algorithm terminates, E is the echelon form of A. By assumption, E is in echelon form and is obtained by multiplying C on the left by an invertible matrix, so E must be the echelon form of C. The rows of C are a subset of those of A, so the rows of E are a subset of the rows of the echelon form of A. Thus \ker(A) \subset \ker(E). To show that E equals the echelon form of A, we just need to verify that \ker(E)\subset \ker(A), i.e., that AD=0, where D is as in step (5). Since E is the echelon form of C, we know that CD=0. By step (5) we also know that FD=0. Thus AD=0, since the rows of A are the union of the rows of F and C.

Example 7.9

Let A be the 4\times 8 matrix

A =\left(\begin{array}{rrrrrrrr}
-9&6&7&3&1&0&0&0\\
-10&3&8&2&0&1&0&0\\
3&-6&2&8&0&0&1&0\\
-8&-6&-8&6&0&0&0&1
\end{array}\right)

from Example 7.2.

sage: M = MatrixSpace(QQ,4,8)
sage: A = M([[-9,6,7,3,1,0,0,0],[-10,3,8,2,0,1,0,0],
[3,-6,2,8,0,0,1,0],[-8,-6,-8,6,0,0,0,1]])

First choose the “random” prime p=41, which does not divide any of the entries of A, and compute the echelon form of the reduction of A modulo 41.

sage: A41 = MatrixSpace(GF(41),4,8)(A)
sage: E41 = A41.echelon_form()

The echelon form of A\pmod{41} is

\left(\begin{array}{rrrrrrrr}
1&0&0&2&0&20&33&18\\
0&1&0&40&0&30&7&1\\
0&0&1&39&0&19&13&17\\
0&0&0&0&1&31&0&37
\end{array}\right).

Thus we take c_0=0, c_1=1, c_2=2, and c_3 = 4. Also r_i=i for i=0,1,2,3. Next extract the submatrix B.

sage: B = A.matrix_from_columns([0,1,2,4])

The submatrix B is

B = \left(\begin{array}{rrrr}
-9&6&7&1\\
-10&3&8&0\\
3&-6&2&0\\
-8&-6&-8&0
\end{array}\right).

The inverse of B is

B^{-1} = \left(\begin{array}{rrrr}
0&-\frac{5}{92}&\frac{1}{46}&-\frac{9}{184}\vspace{2pt}\\
0&-\frac{1}{138}&-\frac{3}{23}&-\frac{11}{276}\vspace{2pt}\\
0&\frac{11}{184}&\frac{7}{92}&-\frac{17}{368}\vspace{2pt}\\
1&-\frac{159}{184}&\frac{41}{92}&\frac{45}{368}
\end{array}\right).

Multiplying by A yields

E = B^{-1} A =
\left(\begin{array}{rrrrrrrr}
1&0&0&-\frac{21}{92}&0&-\frac{5}{92}&\frac{1}{46}&-\frac{9}{184}\vspace{2pt}\\
0&1&0&-\frac{179}{138}&0&-\frac{1}{138}&-\frac{3}{23}&-\frac{11}{276}\vspace{2pt}\\
0&0&1&\frac{83}{184}&0&\frac{11}{184}&\frac{7}{92}&-\frac{17}{368}\vspace{2pt}\\
0&0&0&\frac{1025}{184}&1&-\frac{159}{184}&\frac{41}{92}&\frac{45}{368}
\end{array}\right).

sage: E = B^(-1)*A

This is not the echelon form of A. Indeed, it is not even in echelon form, since the last row is not normalized so the leftmost nonzero entry is 1. We thus choose another random prime, say p=43. The echelon form mod 43 has columns 0,1,2,3 as pivot columns. We thus extract the matrix

B = \left(\begin{array}{rrrr}
-9&6&7&3\\
-10&3&8&2\\
3&-6&2&8\\
-8&-6&-8&6
\end{array}\right).

sage: B = A.matrix_from_columns([0,1,2,3])

This matrix has inverse

B^{-1} =
\left(\begin{array}{rrrr}
\frac{42}{1025}&-\frac{92}{1025}&\frac{1}{25}&-\frac{9}{205}\vspace{2pt}\\
\frac{716}{3075}&-\frac{641}{3075}&-\frac{2}{75}&-\frac{7}{615}\vspace{2pt}\\
-\frac{83}{1025}&\frac{133}{1025}&\frac{1}{25}&-\frac{23}{410}\vspace{2pt}\\
\frac{184}{1025}&-\frac{159}{1025}&\frac{2}{25}&\frac{9}{410}
\end{array}\right).

Finally, the echelon form of A is E = B^{-1}A. No further checking is needed since the product so obtained is in echelon form, and the matrix F of the last step of the algorithm has 0 rows.

Remark 7.10

Above we have given only the barest sketch of asymptotically fast “block” algorithms for linear algebra. For optimized algorithms that work in the general case, please see the source code of [Ste06].

Decomposing Spaces under the Action of Matrix

Efficiently solving the following problem is a crucial step in computing a basis of eigenforms for a space of modular forms (see Sections Computing Using Eigenvectors and Newforms: Systems of Eigenvalues).

Problem 7.11

Suppose T is an n\times n matrix with entries in a field K (typically a number field or finite field) and that the minimal polynomial of T is square-free and has degree n. View T as acting on V=K^n. Find a simple module decomposition W_0 \oplus
\cdots \oplus W_m of V as a direct sum of simple K[T]-modules. Equivalently, find an invertible matrix A such that A^{-1} T A is a block direct sum of matrices T_0, \ldots, T_m such that the minimal polynomial of each T_i is irreducible.

Remark 7.12

A generalization of Problem 7.11 to arbitrary matrices with entries in \Q is finding the rational Jordan form (or rational canonical form, or Frobenius form) of T. This is like the usual Jordan form, but the resulting matrix is over \Q and the summands of the matrix corresponding to eigenvalues are replaced by matrices whose minimal polynomials are the minimal polynomials (over \Q) of the eigenvalues. The rational Jordan form was extensively studied by Giesbrecht in his Ph.D. thesis and many successive papers, where he analyzes the complexity of his algorithms and observes that the limiting step is factoring polynomials over K. The reason is that given a polynomial f\in K[x], one can easily write down a matrix T such that one can read off the factorization of f from the rational Jordan form of T (see also [Ste97]).

Characteristic Polynomials

The computation of characteristic polynomials of matrices is crucial to modular forms computations. There are many approaches to this problems: compute \det(xI-A) symbolically (bad), compute the traces of the powers of A (bad), or compute the Hessenberg form modulo many primes and use CRT (bad; see for [Coh93, Section 2.2.4] the definition of Hessenberg form and the algorithm). A more sophisticated method is to compute the rational canonical form of A using Giesbrecht’s algorithm [1] (see [GS02]), which involves computing Krylov subspaces (see Remark Remark 7.13 below), and building up the whole space on which A acts. This latter method is a generalization of Wiedemann’s algorithm for computing minimal polynomials (see Section Wiedemann’s Minimal Polynomial Algorithm), but with more structure to handle the case when the characteristic polynomial is not equal to the minimal polynomial.

Polynomial Factorization

Factorization of polynomials in \Q[X] (or over number fields) is an important step in computing an explicit basis of Hecke eigenforms for spaces of modular forms. The best algorithm is the van Hoeij method [BHKS06], which uses the LLL lattice basis reduction algorithm [LLL82] in a novel way to solve the optimization problems that come up in trying to lift factorizations mod p to \Z. It has been generalized by Belebas, van Hoeij, Kl”uners, and Steel to number fields.

Wiedemann’s Minimal Polynomial Algorithm

In this section we describe an algorithm due to Wiedemann for computing the minimal polynomial of an n\times n matrix A over a field.

Choose a random vector v and compute the iterates

(4)v_0=v, \quad v_1 = A(v),\quad v_2 = A^2(v),\quad \ldots, \quad v_{2n-1} = A^{2n-1}(v).

If f=x^m + c_{m-1}x^{m-1} + \cdots + c_1 x + c_0 is the minimal polynomial of A, then

A^m + c_{m-1} A^{m-1} + \cdots + c_0 I_n = 0,

where I_n is the n\times n identity matrix. For any k\geq 0, by multiplying both sides on the right by the vector A^kv, we see that

A^{m+k} v + c_{m-1} A^{m-1+k}v + \cdots + c_0 A^k v = 0;

hence

v_{m+k} + c_{m-1} v_{m-1+k} + \cdots + c_0 v_k = 0, \qquad \text{all $k\geq 0$}.

Wiedemann’s idea is to observe that any single component of the vectors v_0, \ldots, v_{2n-1} satisfies the linear recurrence with coefficients 1, c_{m-1}, \ldots, c_0. The Berlekamp-Massey algorithm (see Algorithm 7.14 below) was introduced in the 1960s in the context of coding theory to find the minimal polynomial of a linear recurrence sequence \{a_r\}. The minimal polynomial of this linear recurrence is by definition the unique monic polynomial g, such that if \{a_r\} satisfies a linear recurrence a_{j+k} + b_{j-1} a_{j-1+k} + \cdots + b_0 a_k=0 (for all k\geq
0), then g divides the polynomial x^j + \sum_{i=0}^{j-1} b_i x^i. If we apply Berlekamp-Massey to the top coordinates of the v_i, we obtain a polynomial g_0, which divides f. We then apply it to the second to the top coordinates and find a polynomial g_1 that divides f, etc. Taking the least common multiple of the first few g_i, we find a divisor of the minimal polynomial of f. One can show that with “high probability” one quickly finds f, instead of just a proper divisor of f.

Remark 7.13

In the literature, techniques that involve iterating a vector as in (4) are often called Krylov methods. The subspace generated by the iterates of a vector under a matrix is called a Krylov subspace.

Algorithm 7.14

Suppose a_{0},\ldots , a_{2n-1} are the first 2n terms of a sequence that satisfies a linear recurrence of degree at most n. This algorithm computes the minimal polynomial f of the sequence.

  1. Let R_0 = x^{2n}, R_1 = \sum_{i=0}^{2n-1} a_i x^i, V_0 = 0, V_1 = 1.
  2. While \deg(R_1) \geq n, do the following:
    1. Compute Q and R such that R_0 = Q R_1 + R.
    2. Let (V_0, V_1, R_0, R_1) = (V_1, V_0 - QV_1, R_1, R).
  3. Let d = \max(\deg(V_1), 1+\deg(R_1)) and set P = x^d V_1(1/x).
  4. Let c be the leading coefficient of P and output f = P/c.

The above description of Berlekamp-Massey is taken from [AD04], which contains some additional ideas for improvements.

Now suppose T is an n\times n matrix as in Problem Problem 7.11. We find the minimal polynomial of T by computing the minimal polynomial of T\!\!\pmod{p} using Wiedemann’s algorithm, for many primes p, and using the Chinese Remainder Theorem. (One has to bound the number of primes that must be considered; see, e.g., [Coh93].)

One can also compute the characteristic polynomial of T directly from the Hessenberg form of T, which can be computed in O(n^4) field operations, as described in [Coh93]. This is simple but slow. Also, the T we consider will often be sparse, and Wiedemann is particularly good when T is sparse.

Example 7.15

We compute the minimal polynomial of

A= \left(\begin{array}{ccc}
3&0&0\\
0&0&2\\
-1&1/2&-1
\end{array}\right)

using Wiedemann’s algorithm. Let v=(1,0,0)^t. Then

v&=(1,0,0)^t,\quad Av = (3,0,-1)^t,\quad A^2v = (9,-2,-2)^t, \\
A^3v&=(27,-4,-8)^t, \quad A^4v=(81,-16,-21)^t, \quad A^5v=(243,-42,-68)^t.

The linear recurrence sequence coming from the first entries is

1, 3, 9, 27, 81, 243.

This sequence satisfies the linear recurrence

a_{k+1} -3 a_k = 0, \qquad\text{all `k>0`},

so its minimal polynomial is x-3. This implies that x-3 divides the minimal polynomial of the matrix A. Next we use the sequence of second coordinates of the iterates of v, which is

0, 0, -2, -4, -16, -42.

The recurrence that this sequence satisfies is slightly less obvious, so we apply the Berlekamp-Massey algorithm to find it, with n=3.

  1. We have R_0 = x^6, R_1 = -42x^5 - 16x^4 - 4x^3 - 2x^2, V_0 = 0, V_1 = 1.

    1. Dividing R_0 by R_1, we find

      R_0 = R_1 \left(  -\frac{1}{42}x + \frac{4}{441}  \right) +
\left(\frac{22}{441}x^4 - \frac{5}{441}x^3 + \frac{8}{441}x^2 \right).

    2. The new V_0, V_1, R_0, R_1 are

      V_0 &= 1,\\
V_1 &= \frac{1}{42}x - \frac{4}{441},\\
R_0 &= -42x^5 - 16x^4 - 4x^3 - 2x^2,\\
R_1 &= \frac{22}{441}x^4 - \frac{5}{441}x^3 + \frac{8}{441}x^2.\\

      Since \deg(R_1)\geq n=3, we do the above three steps again.

  2. We repeat the above three steps.

    1. Dividing R_0 by R_1, we find

      R_0 = R_1 \left(-\frac{9261}{11}x - \frac{123921}{242}\right) +
\left( \frac{1323}{242}x^3 + \frac{882}{121}x^2\right).

    2. The new V_0, V_1, R_0, R_1 are:

      V_0 &= \frac{1}{42}x - \frac{4}{441},\\
V_1 &= \frac{441}{22}x^{2} + \frac{2205}{484}x + \frac{441}{121},\\
R_0 &= \frac{22}{441}x^{4} - \frac{5}{441}x^{3} + \frac{8}{441}x^{2},\\
R_1 &= \frac{1323}{242}x^{3} + \frac{882}{121}x^{2}.\\

  3. We have to repeat the steps yet again:

    V_0 &= \frac{441}{22}x^{2} + \frac{2205}{484}x + \frac{441}{121},\\
V_1 &= -\frac{242}{1323}x^{3} + \frac{968}{3969}x^{2} +
\frac{484}{3969}x - \frac{242}{3969},\\
R_0 &= \frac{1323}{242}x^{3} + \frac{882}{121}x^{2},\\
R_1 &= \frac{484}{3969}x^{2}.

  4. We have d=3, so P = -\frac{242}{3969}x^{3} + \frac{484}{3969}x^{2} + \frac{968}{3969}x - \frac{242}{1323}.

  5. Multiply through by -3969/242 and output

    x^3 - 2x^2 - 4x + 3 = (x - 3)(x^2 + x - 1).

The minimal polynomial of T_2 is (x - 3)(x^2 + x - 1), since the minimal polynomial has degree at most 3 and is divisible by (x - 3)(x^2 + x - 1).

p-adic Nullspace

We will use the following algorithm of Dixon [Dix82] to compute p-adic approximations to solutions of linear equations over \Q. Rational reconstruction modulo p^n then allows us to recover the corresponding solutions over \Q.

Algorithm 7.16

Given a matrix A with integer entries and nonzero kernel, this algorithm computes a nonzero element of \ker(A) using successive p-adic approximations.

  1. [Prime] Choose a random prime p.

  2. [Echelon] Compute the echelon form of A modulo p.

  3. [Done?] If A has full rank modulo p, it has full rank, so we terminate the algorithm.

  4. [Setup] Let b_0 = 0.

  5. [Iterate] For each m=0,1,2,\ldots, k, use the echelon form of A modulo p to find a vector y_m with integer entries such that A y_m \con b_m\pmod{p}, and then set

    b_{m+1} = \frac{b_m - A y_m}{p}.

    (If m=0, choose y_m\neq 0.)

  6. [p-adic Solution] Let \ds x=y_0 + y_1 p + y_2 p^2 + y_3 p^3 + \cdots + y_k p^k.

  7. [Lift] Use rational reconstruction (Algorithm 7.4) to find a vector z with rational entries such that z \con x \pmod{p^{k+1}}, if such a vector exists. If the vector does not exist, increase k or use a different p. Otherwise, output z.

Proof

We verify the case k=2 only. We have A y_0 = 0\pmod{p} and Ay_1 = -\frac{Ay_0}{p}\pmod{p}. Thus

A y_0 + p A y_1 \con A y_0 + (-Ay_0) \pmod{p^2}.

Decomposition Using Kernels

We now know enough to give an algorithm to solve Problem Problem 7.11.

Algorithm 7.17

Given an n\times n matrix T over a field K as in Problem 7.11, this algorithm computes the decomposition of V as a direct sum of simple K[T] modules.

  1. [Minimal Polynomial] Compute the minimal polynomial f of T, e.g., using the multimodular Wiedemann algorithm.
  2. [Factorization] Factor f using the algorithm in Section Polynomial Factorization.
  3. [Compute Kernels] For each irreducible factor g_i of f, compute the following.
    1. Compute the matrix A_i = g_i(T).
    2. Compute W_i = \ker(A_i), e.g., using Algorithm 7.16.
  4. [Output Answer] Then V=\bigoplus W_i.

Remark 7.18

As mentioned in Remark Remark 7.12, if one can compute such decompositions V=\bigoplus W_i, then one can easily factor polynomials f; hence the difficulty of polynomial factorization is a lower bound on the complexity of writing V as a direct sum of simples.

Exercises

Exercise 7.1

Given a subspace W of k^n, where k is a field and n\geq 0 is an integer, give an algorithm to find a matrix A such that W = \Ker(A).

Exercise 7.2

If \rref(A) denotes the row reduced echelon form of A and p is a prime not dividing any denominator of any entry of A or of \rref(A), is \rref(A \pmod{p}) = \rref(A) \pmod{p}?

Exercise 7.3

Let A be a matrix with entries in \Q. Prove that for all but finitely many primes p we have \rref(A \pmod{p}) = \rref(A) \pmod{p}.

Exercise 7.4

Let

A = \left(\begin{array}{rrr}
1&2&3\\
4&5&6\\
7&8&9
\end{array}\right).

  1. Compute the echelon form of A using each of Algorithm 7.3, Algorithm 7.6, and Algorithm 7.8.
  2. Compute the kernel of A.
  3. Find the characteristic polynomial of A using the algorithm of Section Wiedemann’s Minimal Polynomial Algorithm.

Exercise 7.5

The notion of echelon form extends to matrices whose entries come from certain rings other than fields, e.g., Euclidean domains. In the case of matrices over \Z we define a matrix to be in echelon form (or Hermite normal form) if it satisfies

  • a_{ij}=0, for i>j,
  • a_{ii}\geq 0,
  • a_{ij}<a_{ii} for all j<i (unless a_{ii}=0, in which case all a_{ij}=0).

There are algorithms for computing with finitely generated modules over \Z that are analogous to the ones in this chapter for vector spaces, which depend on computation of Hermite forms.

  1. Show that the Hermite form of \left(\begin{matrix}1&2&3\\4&5&6\\7&8&9\end{matrix}\right) is \left(\begin{matrix}1&2&3\\0&3&6\\0&0&0\end{matrix}\right).
  2. Describe an algorithm for transforming an n\times n matrix A with integer entries into Hermite form using row operations and the Euclidean algorithm.

Footnotes

[1]Allan Steel also invented a similar algorithm.