Matrix derivatives¶
Matrix and array expressions in SymPy¶
SymPy provides matrices and general N-dimensional arrays through the modules
sympy.matrices and sympy.tensor.array, respectively. Each of these two
modules is component-explicit, meaning that every element is stored and
directly accessible, so the object explicitly contains the full set of
component values rather than representing them implicitly.
SymPy also supports symbolic matrix expressions and array expressions through dedicated modules. These represent matrices and arrays abstractly as symbols, defined solely by an identifying name (such as \(X\) or \(M\)) and a shape tuple. For matrices, the shape is always a 2-element tuple, representing the number of rows and columns. The dimensions within this tuple can be either fixed integers (for known sizes) or arbitrary symbolic expressions (when dimensions are undetermined).
Unlike component-explicit matrices, which perform elementwise calculations
immediately, symbolic matrix expressions retain operations unevaluated in
expression trees. Arithmetic operations (addition, multiplication, transpose,
etc.) are formally applied but frozen as symbolic representations rather than
computed results. This aligns with conventional mathematical notation in
textbooks, where expressions like M*N.T*P (also written as
\(\mathbf{M}\mathbf{N}' \mathbf{P}\)) represent abstract matrix multiplication
rather than explicit numerical results.
Array expressions are analogous, the only difference is that they may have any number of dimensions, whereas matrices are define to have exactly two.
In SymPy both matrix symbols and array symbols support Python’s [ ]
operator to reference individual elements. For example:
M[i, j]denotes the element at row \(i\) (0-indexed) and column \(j\) of matrix symbolM,A[k, l, m]accesses an element of a 3D array symbolA.
SymPy uses zero-based indexing (starting at 0) for all positions, contrasting
with the one-based indexing (starting at 1) common in mathematical literature.
The indices themselves can be arbitrary symbolic expressions, not limited to
integers—allowing representations like M[2*i, j+1]. When combined with
symbolic indices (e.g., undefined variables \(i\) and \(j\)), these element
references collectively span the entire matrix/array, facilitating
component-wise operations while preserving the abstract nature of the symbolic
expression.
Using SymPy code:
>>> from sympy import symbols, MatrixSymbol
>>> from sympy.tensor.array.expressions import ArraySymbol
>>> i, j, k, l, m = symbols("i j k l m")
>>> d = symbols("d")
>>> M = MatrixSymbol("M", d, d)
>>> M.shape
(d, d)
>>> M[i, j]
M[i, j]
>>> A = ArraySymbol("A", (d, d, d))
>>> A.shape
(d, d, d)
>>> A[k, l, m]
A[k, l, m]
Operations on matrix and array expressions¶
The matrix expression module in SymPy reflects the common convention used in
mathematics to represent matrices when indices are not explicit. Therefore,
M*N or \(\mathbf{M}\cdot\mathbf{N}\) is the matrix multiplication, or in
index-explicit form \(\sum_j M_{ij} N_{jk}\) with final free indices
\({}_{\{ik\}}\). Common operators such as determinant, trace, Hadamard product
and Hadamard power are provided.
When a function is applied to a matrix, it is understood in the mathematical
sense, so exp(M) or \(\exp(\mathbf{M})\) denotes the matrix exponential.
Element-wise operations on matrices are provided by the .applyfunc method.
For example, to exponentiate every entry of a matrix \(\mathbf{M}\) you write
M.applyfunc(exp). This is denoted as \(\exp_{\circ}(\mathbf{M})\), with the
subscript \({}_{\circ}\) flagging an entry-wise application, distinguishing it
from the matrix exponential.
The array expressions module, by contrast, is designed to express arbitrary operations on arrays of any dimension.
operation name |
SymPy object |
representation |
index-explicit equivalent |
|---|---|---|---|
tensor product |
|
\(A \otimes B\) |
\(A_{ij} B_{kl}\) |
contraction |
|
\(A\) on axes \(a\), \(b\) |
\(A_{i_{1} i_{2} \ldots i_{a} \ldots i_{b} \ldots } \Rightarrow \sum_j A_{i_{1} \ldots j \ldots j \ldots}\) |
diagonal |
|
\(A\) on axes \(a\), \(b\) |
\(A_{i_{1} i_{2} \ldots i_{a} \ldots i_{b} \ldots } \Rightarrow A_{i_{1} \ldots j \ldots j \ldots}\) |
permutation of dimensions |
|
permutation \(\sigma\) on axes of \(A\) |
\(A_{i_{1} i_{2} \ldots} \Rightarrow A_{i_{\sigma(1)} i_{\sigma(2) \ldots}}\) |
In SymPy, array expressions are built from four primitive nodes
ArrayTensorProduct, ArrayContraction, ArrayDiagonal,
PermuteDims that compose and manipulate arrays of arbitrary dimension.
These three operators have a canonical form where, when all four are present, they must be nested in the following sequence (from outermost to innermost): permutation, diagonalization, contraction, then tensor product.
The fixed relative order must be respected regardless of which operators are present. This canonical form is applied by calling .doit(), as object creation itself does not sort the operators.
The canonical order is not automatically applied upon object creation, but can
be attained by explicitly calling .doit().
For example, nested contractions can be flattened:
>>> from sympy import symbols
>>> from sympy.tensor.array.expressions import ArraySymbol, ArrayContraction, PermuteDims
>>> k = symbols("k")
>>> A = ArraySymbol("A", (k, k, k, k, k, k, k, k))
>>> expr = ArrayContraction(ArrayContraction(A, (1, 6), (2, 5)), (0, 3))
>>> expr
ArrayContraction(ArrayContraction(A, (1, 6), (2, 5)), (0, 3))
>>> expr.doit()
ArrayContraction(A, (0, 7), (1, 6), (2, 5))
>>> expr = ArrayContraction(PermuteDims(A, [6, 3, 0, 7, 1, 5, 4, 2]), (2, 4), (1, 7))
>>> expr
ArrayContraction(PermuteDims(A, (0 6 4 1 3 7 2)), (1, 7), (2, 4))
>>> expr.doit()
PermuteDims(ArrayContraction(A, (0, 1), (2, 3)), (0 2 1 3))
Matrix expressions can be represented using array expressions and their associated operators. However, the converse is not always true, as array expressions encompass a more general set of operations.
matrix operation |
matrix expression form |
index form |
array expression form |
|---|---|---|---|
matrix multiplication |
\(\mathbf{M} \mathbf{N}\) |
\({}_{\{ij\}} \Rightarrow \sum_k M_{ik} N_{kj}\) |
contraction: \(M \otimes N\) on 2nd and 3rd axes |
trace |
\(\mbox{tr}(\mathbf{M})\) |
\({}_{\{\}} \Rightarrow \sum_i M_{ii}\) |
contraction: \(M\) on 1st and 2nd axes |
diagonal |
\(\mbox{diag}(\mathbf{M})\) |
\({}_{\{i\}} \Rightarrow M_{ii}\) |
diagonalize: \(M\) on 1st and 2nd axes |
transposition |
\(\mathbf{M}'\) |
\({}_{\{ij\}} \Rightarrow M_{ji}\) |
permutation: \(M\) on 1st and 2nd axes |
Hadamard product |
\(\mathbf{M} \circ \mathbf{N}\) |
\({}_{\{ij\}} \Rightarrow M_{ij} N_{ij}\) |
diagonalize: \(M \otimes N\) on 1st-3rd axes and 2nd-4th axes |
Kronecker product |
\(\mathbf{M} \boxtimes \mathbf{N}\) |
\({}_{\{m=id_1+k,n=jd_2+l\}} \Longrightarrow A_{ij} B_{kl}\) |
permute \(M \otimes N\) on 2nd and 3rd axes, then reshape |
Kronecker product versus tensor product¶
In SymPy, the Kronecker product \(\boxtimes\) and tensor product \(\otimes\) represent very similar underlying operations. Given matrices \(\mathbf{A}\) and \(\mathbf{B}\), the elements of the resulting tensor product are combined as the product of the individual elements:
When representing this expression as an array, different approaches arise depending on the index-order and the reshaping of the 4-dimensional array into a 2-dimensional array by joining pairs of dimensions.
The array tensor product, as implemented by both ArrayTensorProduct and the
tensorproduct function, performs a direct nesting of the input arrays. This
operation preserves the index order, typically resulting in a new array with
the combined index sequence:
The Kronecker product can be defined in terms of indices \({}_{\{mn\}}\) where \(m\) spans over rows of both \(\mathbf{A}\) and \(\mathbf{B}\), while \(n\) spans over their columns:
The Kronecker product, implemented by KroneckerProduct and kronecker_product, combines the rows and columns of \(\mathbf{A}\) and \(\mathbf{B}\)
In these visual representations of components, the Kronecker product and tensor
product appear very similar, differing only in their nesting structure.
However, when using a Reshape operation to convert an array of shape
\([2 \times 2 \times 2 \times 2]\) into one of shape \([4 \times 4]\), the
inner-outer index ordering is preserved. As a result, the inner \([2\times 2]\)
blocks are rearranged into rows in an order that does not match the layout
produced by the Kronecker product. Converting a tensor product into a
Kronecker product by reshaping requires first permuting the \(j\) and \(k\)
indices:
This array can be reshaped into a \([4 \times 4]\) Kronecker product of \(\mathbf{A}\) and \(\mathbf{B}\). After reshaping, the ordering of the elements in the inner blocks matches exactly the row-wise ordering of the Kronecker product.
>>> from sympy import MatrixSymbol, kronecker_product, tensorproduct
>>> A = MatrixSymbol("A", 2, 2).as_explicit()
>>> B = MatrixSymbol("B", 2, 2).as_explicit()
>>> kronecker_product(A, B)
Matrix([
[A[0, 0]*B[0, 0], A[0, 0]*B[0, 1], A[0, 1]*B[0, 0], A[0, 1]*B[0, 1]],
[A[0, 0]*B[1, 0], A[0, 0]*B[1, 1], A[0, 1]*B[1, 0], A[0, 1]*B[1, 1]],
[A[1, 0]*B[0, 0], A[1, 0]*B[0, 1], A[1, 1]*B[0, 0], A[1, 1]*B[0, 1]],
[A[1, 0]*B[1, 0], A[1, 0]*B[1, 1], A[1, 1]*B[1, 0], A[1, 1]*B[1, 1]]])
>>> tensorproduct(A, B)
[[[[A[0, 0]*B[0, 0], A[0, 0]*B[0, 1]], [A[0, 0]*B[1, 0], A[0, 0]*B[1, 1]]], [[A[0, 1]*B[0, 0], A[0, 1]*B[0, 1]], [A[0, 1]*B[1, 0], A[0, 1]*B[1, 1]]]], [[[A[1, 0]*B[0, 0], A[1, 0]*B[0, 1]], [A[1, 0]*B[1, 0], A[1, 0]*B[1, 1]]], [[A[1, 1]*B[0, 0], A[1, 1]*B[0, 1]], [A[1, 1]*B[1, 0], A[1, 1]*B[1, 1]]]]]
Matrix derivatives¶
You can differentiate a matrix with respect to a scalar by differentiating each element independently. Likewise, differentiating a scalar with respect to a matrix produces a matrix of the same shape, where each entry is the derivative of the scalar with respect to the corresponding matrix element.
While the trace and determinant of a matrix are scalars, their derivatives with respect to a matrix variable can yield nontrivial expressions involving the matrix itself.
Differentiating a matrix \(\mathbf{Y}\) with respect to another matrix \(\mathbf{X}\) results in a 4-dimensional array, where each entry is the drivative of an element of \(\mathbf{Y}\) with respect to an element of \(\mathbf{X}\). This is sometimes informally called a tensor, although in SymPy the term tensor refers to objects from differential geometry and is handled by different modules. Derivatives extend naturally through index notation. In index-explicit form, differentiating a matrix expression \(Y_{ij}\) with respect to a matrix \(X_{mn}\) yields a four-dimensional array whose entries are given by:
SymPy adopts a denominator-first index ordering for derivatives, placing the indices of the differentiation variable \({}_{\{mn\}}\) before those of the derivand \({}_{\{ij\}}\). This \({}_{\{mnij\}}\) convention aligns with Wolfram Mathematica but differs from PyTorch/NumPy, which implicitly follow the \({}_{\{ijmn\}}\) index-order convention.
This structure also extends to expressions involving mixtures of scalars and matrices: scalar-by-matrix derivatives produce matrices, matrix-by-scalar derivatives produce matrices, and matrix-by-matrix derivatives yield 4-dimensional arrays reflecting the relationship between all component pairs.
To explicitly track index reading order, which controls axis transpositions in multi-dimensional representations, we will use the convention of mapping the index sequence to the full expression. For example:
this shows that differentiation indices (\(mn\)) precede the derivand indices (\(ij\)). Importantly, this notation inherently captures index permutations, eliminating the need for operators such as transposition. For example:
demonstrates how transposition is encoded solely through index-order reversal, the \({}_{\{ij\}}\) mapping of \(M_{ji}\) directly yields the component expression of the transposition.
Reconstructing a matrix expression¶
In SymPy, differentiating a matrix expression with respect to a matrix symbol normally yields a 4-dimensional array. When one or more of its axes happen to be singleton (dimension 1), the tensor collapses to a lower dimensional object. SymPy automatically detects this and returns the simplest equivalent matrix expression instead of keeping the full 4-dimensional form.
For example, let \(\mathbf{x}\) be a matrix-vector of shape \([k \times 1]\), that is a vector of size \([k]\) disguised as a single column matrix by adding a singleton dimension to its shape. The derivative of \(\mathbf{x}\) with respect to \(\mathbf{x}\) is then the 4-dimensional identity array of shape \([k \times 1 \times k \times 1]\), but because the second axis of both the numerator and the denominator has length 1, the result can be collapsed to the \([k \times k]\) identity matrix \(\mathbf{I}_{[k]}\) instead of keeping the full tensor form. Explicitly:
Here, the second and fourth axes are singletons, appearing in the derivative as scalar identity matrix \(\mathbf{I}_{[1]}\), equivalent to scalar unit. In index notation they only survive in a Kronecker \(\delta_{nj}\).
Likewise, \(\frac{\partial}{\partial \mathbf{x}}\mathbf{x}'\) has shape \([k \times 1 \times 1 \times k]\), and collapses again to the same \([k \times k]\) identity matrix \(\mathbf{I}_{[k]}\).
Derive matrix by itself¶
As an example, the derivative of \([k \times l]\) matrix \(\mathbf{X}\) by itself is given by identity relationships between indices:
where \(\delta\) is the Kronecker delta, which satisfies \(\delta_{ab} = 1\) if \(a = b\) and \(\delta_{ab} = 0\) otherwise. Indeed, matrix \(\mathbf{X}\) is made of element variables that are to be considered different from one another, so \(X_{mn}\) is the same variable as \(X_{ij}\) if and only if \(m=i\) and \(n=j\). The previous expression also shows how the product of two Kronecker deltas on four different free indices may be viewed as the tensor product of two identity matrices, with the free index order properly permuted.
This matrix derivative returns a 4-dimensional array expression if computed in SymPy
>>> from sympy import MatrixSymbol
>>> X = MatrixSymbol("X", 3, 3)
>>> X.diff(X)
PermuteDims(ArrayTensorProduct(I, I), (3)(1 2))
The permutation \((3)(1 2)\) maps \({}_{\{mnij\}}\) into \({}_{\{minj\}}\). Note that
SymPy uses zero-offset indexing and the trivial cycle \((3)\) is included simply
to indicate the maximum size of the permutation. The PermuteDims object in
SymPy interprets a permutation as mapping the new index order to the old index
order (i.e., the index order of the wrapped expression). This convention is
aligns with most scientific Python libraries but is the opposite of the
convention used by Wolfram Mathematica.
Notice that the result of X.diff(X) can also be expressed as
Reshape(KroneckerProduct(I, I), (k, l, k, l)),
indeed some authors represent
where \(\bar\boxtimes\) represents the 4-dimensionally reshaped Kronecker product.
Mixing symbols and elements¶
SymPy distinguishes between matrix symbols M (or more generally matrix
expressions), an algebraic object whose indices are left implicit, and the
explicit scalar entries M[i, j] that carry visible indices. While most of
SymPy focuses on derivatives of matrix expressions, it can also handle
expressions that freely mix matrix-level and index-level notation. For
example:
where \(\mathbf{E}_{ij}\) is the matrix unit, a matrix with only one non-zero element with value 1 at position \({}_{\{ij\}}\).
Likewise,
Deriving an elementwise function of matrices¶
A common example in neural networks involves deriving expressions for matrices that undergo elementwise functions. For instance, consider a matrix \(\mathbf{W}\) of shape \([k \times l]\) and a matrix-vector \(\mathbf{x}\) of shape \([l \times 1]\)
The symbol \({}_{\circ}\) is used to mark the function as acting elementwise. Here, we have used the diag operator:
How matrix derivatives work¶
The matrix derivation algorithm takes a matrix expression and converts it into an equivalent array expression using array operators. From there, differentiation becomes straightforward, using a variation of the chain rule that also keeps track of free indices and their ordering at each step.
After computing the derivative, the algorithm attempts to reconstruct an equivalent matrix expression, removing any singleton or unnecessary diagonal dimensions. If no suitable matrix expression has been found, the derivative is returned as an array expression.
The symbolic derivation algorithm produces an array expression equivalent to the derivative, effectively altering the operation-flow graph. Unlike other platforms optimized for numeric computations, which compute derivatives without modifying the overall operation graph, this approach finds a symbolic form of the derivative.
Chain rule¶
Chain-rule for matrix derivatives is applied in a similar manner to the standard derivative chain rule, but in case the dimension of the intermediate gradient increases, the axes have to properly rearranged to the right order.
Given \(\mathbf{Y}\) (or \(\mathbf{Z}\)) as a generic matrix expression, the derivative of a function \(\partial f(\mathbf{Y})\) can be expanded with the chain rule of \(f\) to the intermediate expression in terms of \(\partial \mathbf{Y}\). The usual non-commuting principle holds for matrices, so the relative order of \(\partial \mathbf{Y}\) and \(\mathbf{Y}\) cannot be changed.
If you are deriving by a scalar, \(\partial \mathbf{Y}\) will not increase the number of dimensions, so no permutations of axes will be required.
In case you are deriving by a matrix, \(\partial \mathbf{Y}\) will be a 4-dimensional array, the indices coming from \(\mathbf{Y}\) will be connected to the chain rule expression, while the deriving indices need to be brought in front of all others (remember, we use the convention that the indices of the deriving variable precede the indices of the expression to be derived).
operation |
expression |
chain rule |
|---|---|---|
matrix addition |
\(\mathbf{Y} + \mathbf{Z}\) |
\(\partial \mathbf{Y} + \partial\mathbf{Z}\) |
matrix multiplication |
\(\mathbf{Y}\mathbf{Z}\) |
\((\partial \mathbf{Y})\mathbf{Z} + \mathbf{Y} (\partial\mathbf{Z})\) |
Hadamard product |
\(\mathbf{Y} \circ \mathbf{Z}\) |
\((\partial \mathbf{Y})\circ\mathbf{Z} + \mathbf{Y}\circ(\partial\mathbf{Z})\) |
tensor product |
\(\mathbf{Y}\otimes\mathbf{Z}\) |
\((\partial \mathbf{Y})\otimes\mathbf{Z} + \mathbf{Y}\otimes(\partial\mathbf{Z})\) |
inverse |
\(\mathbf{Y}^{-1}\) |
\(-\mathbf{Y}^{-1} (\partial \mathbf{Y}) \mathbf{Y}^{-1}\) |
trace |
\(\mbox{tr}(\mathbf{Y})\) |
\(\mbox{tr}(\partial\mathbf{Y})\) |
determinant |
\(\mbox{det}(\mathbf{Y})\) |
\(\mbox{det}(\mathbf{Y}) \mbox{tr}(\mathbf{Y}^{-1}\partial\mathbf{Y})\) |
transposition |
\(\mathbf{Y}'\) |
\((\partial\mathbf{Y})'\) |
When differentiating with respect to a matrix \(\mathbf{X}\), we treat the gradient \(\partial \mathbf{Y}\) as a 4-dimensional array whose first two axes are the indices of \(\mathbf{X}\) (row, column). After the chain rule step for the derivative is computed, these leading axes have to be permuted so that the final tensor carries the indices in the order expected by the downstream matrix or array expression.
Chain rule in array expressions¶
When the operands are N-dimensional arrays, matrix multiplication has to be rewritten as a tensor product followed by a contraction along the appropriate axes. The chain rule for the tensor product is the familiar one, but contractions and diagonal extractions need extra care: their chain rules have to shuffle the axes they act on so that the summation or diagonalization indices still line up correctly after the derivative.
SymPy computes a matrix derivative by first lifting the whole expression to the array level, differentiating there, and then folding the result back into the simplest matrix form that respects the original axis structure.
Example: Derivative of the inverse matrix¶
For example, when deriving the inverse of matrix expression \(\mathbf{Y}\) is
using our array expression syntax, this can be represented as
here, array_derive(Y, X) returns a 4-dimensional array expression, we then proceed to create an 8-dimensional
array expression,
A concrete example involving the inverse matrix:
Array expression to matrix expression conversion¶
The hardest part of the algorithm of matrix derivation is converting the derivative back to a matrix expression. The algorithm returns an array expression whose axes have been contracted, diagonalized and permuted. To recover the matrix form, we must recognize the equivalent matrix operations those operations represent.
Matrix derivation may produce array expressions that lack a closed-form matrix counterpart, so the result cannot always be rewritten in matrix form.
When the contractions line up in clear pairs that mirror a sequential pattern, spotting the equivalent matrix multiplication or trace forms is straightforward. For example:
generally, open two-paired contraction lines are matrix multiplications:
\[\sum_{j} \mathbf{A}_{ij} \mathbf{B}_{ij} \Longrightarrow \mathbf{A} \mathbf{B}' \]while closed two-paired contraction lines are traces:
\[\sum_{ij} \mathbf{A}_{ij} \mathbf{B}_{ij} \Longrightarrow \mbox{tr}\Big(\mathbf{A} \mathbf{B}'\Big) \]
Matrix multiplication is the most frequent pattern, but traces, diag-expansions, and Hadamard products may also appear.
repeated indices without summation can be identified as Hadamard products
This is often the case with\[\mathbf{A}_{ij} \mathbf{B}_{ij} \Longrightarrow \mathbf{A} \circ \mathbf{B}.\]ArrayDiagonaloperators.double single-index contraction on both axes of the same matrix or matrix expression, suppose that matrix \(\mathbf{M}\) has shape \([m \times n]\), then:
\[\sum_{i,j} M_{ij} \Longrightarrow \mathbf{1}_{[1\times m]} \, \mathbf{M} \, \mathbf{1}_{[n\times 1]}\]
Conversions from array to matrix with dimensional reduction¶
The simplification step tries to drop any singleton or diagonal dimensions; if this reduces the array to two dimensions, an equivalent matrix expression is sought. A handful of tricks is used to collapse the array dimensions:
tensor product of two matrix-vectors \(\mathbf{a}\) and \(\mathbf{a}\), of shape \([k \times 1]\) each, \(\mathbf{a} \otimes \mathbf{b}\), can be turned into a matrix multiplication over their trivial dimension: \(\mathbf{a} \cdot \mathbf{b}'\). Indeed
This has shrunk the array from \([k \times 1 \times k \times 1]\) to \([k \times k]\) by squeezing out the singleton dimensions.\[\mathbf{a} \otimes \mathbf{b} = a_{i0} b_{j0} \Longrightarrow \sum_{k=0}^0 a_{ik} b_{jk} = \mathbf{a} \mathbf{b}' \]similar to the previous point, but more complex:
\[\mathbf{a} \otimes \mathbf{b} \otimes \mathbf{x}'\mathbf{x} \Longrightarrow \mathbf{a} \mathbf{x}' \mathbf{x} \mathbf{b}' \]if one term of the tensor product has shape \([1 \times 1]\), the result may be expressed as a Kronecker product \(\boxtimes\) between matrices:
This has shrunk the array from \([1 \times 1 \times k \times k]\) to \([k \times k]\).\[\mathbf{x}' \mathbf{x} \otimes \mathbf{A} \Longrightarrow \left( \mathbf{x}' \mathbf{x} \right) \boxtimes \mathbf{A} \]the triple contraction of two matrices and a matrix-vector may be reinterpreted in terms of matrix multiplication:
This has shrunk \([k\times 1 \times k]\) to \([k \times k]\).\[\sum_{j} \mathbf{A}_{ij} \mathbf{b}_{j0} \mathbf{C}_{jk} \Longrightarrow \mathbf{A}\, \mbox{diag}(\mathbf{b}) \, \mathbf{C}.\]diagonalization of matrix and matrix-vector can be turned into matrix multiplications
This has shrunk \([k \times k \times 1]\) to \([k \times k]\).\[A_{ij} b_{j0} = \sum_{k} A_{ik} \, \mbox{diag}(\mathbf{b})_{kj} \Longrightarrow \mathbf{A} \, \mbox{diag}(\mathbf{b})\]multi-index contractions that span up to two matrices and any number of vectors reduce to ordinary matrix–vector products once the vectors are promoted to diagonal matrices via diag( ):
Notice that an arbitrary number of matrix-vectors can be added, each contributes to the final expression with a singleton dimension, which can be dropped.\[\sum_{j} A_{ij} b_{j0} c_{j0} d_{j0} E_{jk} \Longrightarrow \mathbf{A} \, \mbox{diag}(\mathbf{b}) \, \mbox{diag}(\mathbf{c}) \, \mbox{diag}(\mathbf{d}) \, \mathbf{E} \]multi-index contractions involving identity matrices may create redundant diagonal axes that can be simplified away:
this is equivalent to:\[\sum_{j} A_{ij} I_{jk} I_{jl} \Longrightarrow \mathbf{A}\]This collapse has shrunk a non-singleton dimension, from \([k \times k \times k]\) to \([k \times k]\). Unlike squeezing singleton dimensions, this simplification cannot be done by a simple array reshaping. Similarly,\[\begin{split}\left[\begin{matrix}\left[\begin{matrix}{A}_{00} & 0 & 0\\0 & {A}_{01} & 0\\0 & 0 & {A}_{02}\end{matrix}\right] & \left[\begin{matrix}{A}_{10} & 0 & 0\\0 & {A}_{11} & 0\\0 & 0 & {A}_{12}\end{matrix}\right] & \left[\begin{matrix}{A}_{20} & 0 & 0\\0 & {A}_{21} & 0\\0 & 0 & {A}_{22}\end{matrix}\right]\end{matrix}\right]\Longrightarrow \left[\begin{matrix}{A}_{00} & {A}_{01} & {A}_{02}\\{A}_{10} & {A}_{11} & {A}_{12}\\{A}_{20} & {A}_{21} & {A}_{22}\end{matrix}\right]\end{split}\]which is equivalent, in explicit form, to:\[\sum_{j} I_{ij} A_{kj} I_{jl} \Longrightarrow \mathbf{A}'\]this last expression has three axes, but the second \({}_{\{j\}}\) and third \({}_{\{k\}}\) are diagonal: with the remaining indices held fixed, they form a diagonal matrix, so the expression is nonzero only when \(j = k\). One of those axes can therefore be dropped without loss of information.\[\begin{split}\left[\begin{matrix}\left[\begin{matrix}{A}_{00} & 0 & 0\\{A}_{01} & 0 & 0\\{A}_{02} & 0 & 0\end{matrix}\right] & \left[\begin{matrix}0 & {A}_{10} & 0\\0 & {A}_{11} & 0\\0 & {A}_{12} & 0\end{matrix}\right] & \left[\begin{matrix}0 & 0 & {A}_{20}\\0 & 0 & {A}_{21}\\0 & 0 & {A}_{22}\end{matrix}\right]\end{matrix}\right] \Longrightarrow \left[\begin{matrix}{A}_{00} & {A}_{10} & {A}_{20}\\{A}_{01} & {A}_{11} & {A}_{21}\\{A}_{02} & {A}_{12} & {A}_{22}\end{matrix}\right] \end{split}\]
References¶
The Matrix Cookbook, by Kaare Brandt Petersen and Michael Syskind Pedersen, https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf