Definitions
Nearby Words

# Moore-Penrose pseudoinverse

In mathematics, and in particular linear algebra, the pseudoinverse $A^+$ of an $m times n$ matrix $A$ is a generalization of the inverse matrix. More precisely, this article talks about the Moore-Penrose pseudoinverse, which was independently described by E. H. Moore in 1920 and Roger Penrose in 1955. Earlier, Fredholm had introduced the concept of a pseudoinverse of integral operators in 1903. The term generalized inverse is sometimes used as a synonym for pseudoinverse.

A common use of the pseudoinverse is to compute a 'best fit' (least squares) solution to a system of linear equations (see below under Applications). The pseudoinverse is defined and unique for all matrices whose entries are real or complex numbers. The pseudoinverse can be computed using the singular value decomposition.

## Definition

The pseudoinverse $A^+$ of an m-by-n matrix $A$ (whose entries can be real or complex numbers) is defined as the unique n-by-m matrix satisfying all of the following four criteria:

1. $A A^+A = A$       ($A A^+$ need not be the general identity matrix, but it maps all column vectors of $A$ to themselves);
2. $A^+A A^+ = A^+$       ($A^+$ is a weak inverse for the multiplicative semigroup);
3. $\left(AA^+\right)^* = AA^+$       ($AA^+$ is Hermitian); and
4. $\left(A^+A\right)^* = A^+A$       ($A^+A$ is also Hermitian).

Here $M^*$ is the conjugate transpose of a matrix M. For matrices whose elements are real numbers instead of complex numbers, $M^* = M^T$.

## Properties

Proofs for some of these relations may be found on the proofs page.

• The pseudoinverse exists and is unique: for any matrix $A$, there is precisely one matrix $A^+$ that satisfies the four properties of the definition.
• If the matrix $A$ is invertible, the pseudoinverse and the inverse coincide: $A^+=A^\left\{-1\right\}$.
• Pseudoinversion is reversible. It is its own inverse: $\left(A^+\right)^+=A$.
• $AA^+$ is the orthogonal projector on the range of $A$, and $A^+A$ is the orthogonal projector on the range of $A^*$.
• The pseudoinverse of a zero matrix is its transpose.
• The pseudoinverse can be computed via a limiting process:

$A^+ = lim_\left\{delta to 0\right\} \left(A^* A + delta I\right)^\left\{-1\right\} A^*$
= lim_{delta to 0} A^* (A A^* + delta I)^{-1}
(see Tikhonov regularization). These limits exist even if $\left(A A^*\right)^\left\{-1\right\}$ and $\left(A^*A\right)^\left\{-1\right\}$ do not exist.

• Pseudoinversion commutes with transposition, conjugation, and taking the conjugate transpose:

begin\left\{align\right\}
(A^T)^+ &= (A^+)^T, overline{A}^+ &= overline{A^+}, (A^*)^+ &= (A^+)^*. end{align}

• The pseudoinverse of a scalar multiple of $A$ is the reciprocal multiple of $A^+$:

$\left(alpha A\right)^+ = alpha^\left\{-1\right\} A^+$ for $alphaneq 0$.

• If the pseudoinverse of $A^*A$ is already known, it may be used to compute $A^+$:

$A^+ = \left(A^*A\right)^+A^*$.

• Likewise, if $\left(AA^*\right)^+$ is already known:

$A^+ = A^*\left(AA^*\right)^+$.

If A and B are such that the product $AB$ is defined and

• A has orthonormal columns ($A^* A = I$, unitarity is a special case), or
• B has orthonormal rows ($B B^* = I$), or
• A is of full column rank, and B is of full row rank,

then

$\left(AB\right)^+ = B^+ A^+$.

The third case does not cover the first two; an orthonormal (including non-square) matrix must be of full rank, but otherwise there is no assumption made on the matrix it multiplies.

### Identity transformations

These relations have in common that the right hand side equals the left hand side multiplied with some expression. They can be used to cancel certain subexpressions or expand expressions.
$begin\left\{array\right\}\left\{lcllll\right\}$
A^+ &=& A^+ & A^{+*} & A^* &. A^+ &=& A^* & A^{+*} & A^+ &. A &=& A^{+*}& A^* & A &. A &=& A & A^* & A^{+*}&. A^* &=& A^* & A & A^+ &. A^* &=& A^+ & A & A^* &. end{array}

## Special cases

### Orthonormal columns or rows

If A has orthonormal columns ($A^* A = I$) or orthonormal rows ($A A^* = I$), then $A^+ = A^*$.

### Full rank

If the columns of $A$ are linearly independent, then $A^* A$ is invertible. In this case, an explicit formula is:

$A^+ = \left(A^* A\right)^\left\{-1\right\} A^*$.
It follows that $A^+$ is a left inverse of A:   $A^+ A = I$.

If the rows of $A$ are linearly independent, then $A A^*$ is invertible. In this case, an explicit formula is:

$A^+ = A^*\left(A A^*\right)^\left\{-1\right\}$.
It follows that $A^+$ is a right inverse of A:   $A A^+ = I$.

If both columns and rows are linearly independent (that is, for square regular matrices), the pseudoinverse is just the inverse:

$A^+ = A^\left\{-1\right\}$.

### Scalars and vectors

It is also possible to define a pseudoinverse for scalars and vectors. This amounts to treating these as matrices. The pseudoinverse of a scalar $x$ is zero if $x$ is zero and the reciprocal of $x$ otherwise:

$x^+ = left\left\{begin\left\{matrix\right\} 0, & mbox\left\{if \right\}x=0;$
x^{-1}, & mbox{otherwise}. end{matrix}right.

The pseudoinverse of the null vector is the transposed null vector. The pseudoinverse of a non-null vector is the conjugate transposed vector divided by its squared magnitude:

$x^+ = left\left\{begin\left\{matrix\right\} 0^T, & mbox\left\{if \right\}x = 0;$
{x^* over x^* x}, & mbox{otherwise}. end{matrix}right.

For a proof, simply check that these definitions meet the defining criteria for the pseudoinverse.

### Circulant matrices

For a Circulant matrix $C$ the singular value decomposition is given by the Fourier transform, that is the singular values are the Fourier coefficients. Let $mathcal\left\{F\right\}$ be the Fourier matrix, then

$C = mathcal\left\{F\right\}cdotSigmacdotmathcal\left\{F\right\}^*$
$C^+ = mathcal\left\{F\right\}cdotSigma^+cdotmathcal\left\{F\right\}^*$

### Block matrices

Optimized approaches exist for calculating the pseudoinverse of block structured matrices.

## Finding the pseudoinverse of a matrix

### Using regular inverses

Let k be the rank of a $m times n$ matrix A. Then A can be decomposed as $A = BC$, where B is a $m times k$-matrix and C is a $k times n$ matrix. Then

$A^+ = C^*\left(CC^*\right)^\left\{-1\right\}\left(B^*B\right)^\left\{-1\right\}B^*$.

If A has full row rank, so that k = m, then B can be chosen to be the identity matrix and the formula reduces to $A^+ = A^*\left(AA^*\right)^\left\{-1\right\}$. Similarly, if A has full column rank (that is, k = n), we have $A^+ = \left(A^*A\right)^\left\{-1\right\}A^*.$

### The QR method

However, computing the product $AA^*$ or $A^*A$ or its inverse explicitly is unnecessary and only causes additional rounding errors and computational cost. Consider first the case when A is full column rank. Then all that is needed is the Cholesky decomposition
$A^*A = R^*R$,
where R is an upper triangular matrix. Multiplication by the inverse is then done easily by solving a system with multiple right-hand-side,
$M = \left(A^*A\right)^\left\{-1\right\}N qquad Leftrightarrow qquad \left(A^*A\right)M = N qquad Leftrightarrow qquad R^*RM = N$
by back substitution. To compute the Cholesky decomposition without forming $A^*A$ explicitly, use the QR decomposition of A, $A = QR$ where Q is a unitary matrix, $Q^*Q = QQ^* = I$, and R is upper triangular. Then
$A^*A ,=, \left(QR\right)^*\left(QR\right) ,=, R^*Q^*QR ,=, R^*IR ,=, R^*R$,
so R is the Cholesky factor of $A^*A$. The case of full row rank is obtained by swapping $A$ and $A^*$.

### The general case and the SVD method

A computationally simpler and more accurate way to get the pseudoinverse is by using the singular value decomposition. If $A = USigma V^*$ is the singular value decomposition of A, then $A^+ = VSigma^+ U^*$. For a diagonal matrix such as $Sigma$, we get the pseudoinverse by taking the reciprocal of each non-zero element on the diagonal, and leaving the zeros in place. In numerical computation, only elements larger than some small tolerance are taken to be nonzero, and the others are replaced by zeros. For example, in the Matlab function pinv, the tolerance is taken to be t = ε•max(m,n)•max(Σ), where ε is the machine epsilon.

The cost of this method is dominated by the cost of computing the SVD, which is several times higher than matrix-matrix multiplication, if a state-of-the art implementation (such as that of LAPACK) is used.

### Updating the pseudoinverse

For the cases where A has full row or column rank, and the inverse of the correlation matrix ($AA^*$ for A with full row rank or $A^*A$ for full column rank) is already known, the pseudoinverse for matrices related to $A$ can be computed by applying the Sherman–Morrison–Woodbury formula to update the inverse of the correlation matrix, which may need less work. In particular, if the related matrix differs from the original one by only a changed, added or deleted row or column, incremental algorithms exist that exploit the relationship.

Similarly, it is possible to update the Cholesky factor when a row or column is added, without creating the inverse of the correlation matrix explicitly. However, updating the pseudoinverse in the general rank-deficient case is much more complicated.

### Software libraries

High quality implementations of SVD, QR, and back substitution are available in standard libraries, such as LAPACK. Writing one's own implementation of SVD is a major programming project that requires a significant numerical expertise. In special circumstances, such as parallel computing or embedded computing, however, alternative implementations by QR or even the use of an explicit inverse might be preferable, and custom implementations may be unavoidable.

## Applications

The pseudoinverse provides a least squares solution to a system of linear equations.

Given a system of linear equations

$A x = b,$

in general we cannot always expect to find a vector $x$ which will solve the system; even if there exists such a solution vector, then it may not be unique. We can however always ask for a vector $x$ that brings $Ax$ "as close as possible" to $b$, i.e. a vector that minimizes the Euclidean norm

$|A x - b|^2.$

If there are several such vectors $x$, we could ask for the one among them with the smallest Euclidean norm. Thus formulated, the problem has a unique solution, given by the pseudoinverse:

$x=A^+b.$

Note that $AA^+$ is the orthogonal projection onto the image (range) of A (the space spanned by the column vectors of A), while $\left(1 - A^+ A\right)$ is the orthogonal projection onto the kernel (null space) of A.