July 31, 2025
There seems to be some uncertainty in the computer science community about exactly what a tensor is. The only things involved are functions and their composition. This short note gives names to functions then composes them. The mathematical definitions can be translated directly into any computer language that allows functions to be defined and called. We cover topics described in The Tensor Cookbook but have no need for their graphical notations that do not have a direct translation to computer implementation.
Everything in classical mathematics is a set. We assume the reader knows what a set is and use \in to denote set membership. The set exponential of the sets A and B is B^A = \{f\colon A\to B\}, the set of all functions from A to B. We also write this as \{A\to B\}. The domain of a function f\colon A\to B is \operatorname{dom}f = A and the codomain is \operatorname{cod}f = B.
The cartesian product of sets A and B is {A\times B = \{(a,b)\mid a\in A, b\in B\}}, the set of ordered pairs of elements from the two sets.
Exercise. (Kuratowski) Show we can identify the ordered pair (a,b) with the set \{\{a\},\{a,b\}\}.
Hint: An ordered pair is defined by (a,b) = (c,d) if and only if a = c and b = d. Show \{\{a\}, \{a,b\}\} = \{\{c\}, \{c,d\}\} if and only if a = c and b = d.
Exercise. A function can be identified with its \operatorname{graph}f = \{(a, f(a))\mid a\in A\}\subseteq A\times B.
Hint: f(a) = b if and only if (a,b)\in\operatorname{graph}f.
If f\colon A\to B we say a,a'\in A are equivalent if f(a) = f(a'). The equivalence class of a\in A is [a] = \{a'\in A\mid f(a) = f(a')\}. The kernel of a f is \ker f = \{[a]\mid a\in A\}.
The kernel of a function is a partition of its domain.
Exercise. Show the union of sets in \ker f is A.
Hint. Show \cup_{a\in A} [a] = A using a\in[a] for a\in A.
Exercise. Show the elements of \ker f are either disjoint or equal.
Hint. If x\in [a]\cap[a'] then f(x) = f(a) and f(x) = f(a') so f(a) = f(a'). If y\in [a] then f(y) = f(a) = f(a') so y\in [a']. Show the converse also holds.
The last two exercises show \ker f is a partiion of A. Partitions are a way to loosen the notion of equality to equivalence. They are also a way of modeling partial information. Full information is knowing the element a\in A, partial information is only knowing [a], no information is only knowing a\in A.
Exercise. Show the parition \{\{a\}\mid a\in A\} represents full information and the partition \{A\} represents no information.
A function f\colon A\to B is one-to-one, or injective, if [a] is a singleton (has exactly one element) for all a\in A.
Exercise. If f\colon A\to B is injective then f(a) = f(a') implies a = a'.
Define [f]\colon\ker A\to B by [a]\mapsto f(a).
Exercise. Show [f] is well-defined.
Hint. Show if [a] = [a'] then f(a) = f(a').
Exercise. Show [f] is injective.
The range of f\colon A\to B is \operatorname{ran}f = \{f(a)\mid a\in A\}. It is onto, or surjective if \operatorname{ran}f = B.
Exercise. If f\colon A\to B is surjective then for every b\in B there exists at least one a\in A with f(a) = b.
Set exponential, domain, codomain…
The composition of two compatible functions is a function \circ\colon\{A\to B\}\times\{B\to C\}\to\{A\to C\} defined by \circ(f,g)(a) = g(f(a)), a\in A. This is also written as g\circ f or simply gf when it is clear composition is intended. Note the order of f and g are reversed. If \operatorname{cod}f = \operatorname{dom}g then the composition {\circ(f,g) = g\circ f = gf} exists. We could write B^A as \{B\leftarrow A\} and define composition as \circ\colon\{C\leftarrow B\}\times\{B\leftarrow A\}\to\{C\leftarrow A\} by \circ(g,f)(a) = g(f(a)) so we don’t have to reverse the order of f and g, but we resist this temptation and stick to standard mathematical notation.
Exercise. Show composition of functions is associative.
Hint: If f\colon A\to B, g\colon B\to C, and h\colon C\to D show h\circ(g\circ f) = (h\circ g)\circ f.
This allows us to write h\circ g\circ f unambiguosly.
Given a set A define the identity function 1_A\colon A\to A by 1_A(a) = a for a\in A.
Exercise. If f\colon A\to B is a function then f1_A = f and 1_Bf = f.
The last two exercises shows sets and functions are the objects and arrows of a category denoted by \text{\bf{Set}}.
Exercise. Show if f\colon A\to B is injective then there exists g\colon B\to A with gf = 1_A.
Hint: If f(a) = b then b\in[a]. Since [a] is a singleton we can define g(b) = a.
Exercise. Show if f\colon A\to B is surjective then there exists g\colon B\to A with fg = 1_B.
Hint: For every b\in B there exists a\in A with f(a) = b. Let g(b) = a. In general, there are many right inverses.
These two exercises show it is possible to define injective and surjective without mentioning elements of sets.
Currying and uncurrying provides a connection between set exponential and cartesian product. The set {\{A\times B\to C\}} is in one-to-one correspondence with the set {\{A\to\{B\to C\}\}}. Given {f\in\{A\times B\to C\}} define {\operatorname{curry}f\colon A\to\{B\to C\}} by {((\operatorname{curry}f)(a))(b) = f(a,b)}. The inverse is uncurrying. Given {g\in\{A\to\{B\to C\}\}} define {\operatorname{uncurry}g\colon A\times B\to C} by {(\operatorname{uncurry}g)(a,b) = (g(a))(b) = (ga)b}.
Exercise. Show \operatorname{uncurry}(\operatorname{curry}f) = f and \operatorname{curry}(\operatorname{uncurry}g) = g.
Given a function f\colon A\to B and a\in A then we can apply the function to a and get f(a)\in B. This is called evaluation and we define the function \operatorname{eval}\colon\{A\to B\}\times A\to B defined by \operatorname{eval}(f,a) = f(a), f\in\{A\to B\}, a\in A.
If f\colon A\times B\to C an a\in A then we can define a function from B to C by b\mapsto f(a,b). Specifying some arguments for a multi-argument function to make it a function on the remaining arguments is called partial evaluation.
A relation on sets A and B is a subset R\subseteq A\times B. We write aRb whenever (a,b)\in R. The left coset is Rb = \{a\in A\mid aRb\} and the right coset iis aRb = \{b\in B\mid aRb\}. If for every a\in A the right coset aR is a singleton then R is a function and we write R(a) = b where b\in B is the unique element of aR.
The composition of relations R\subseteq A\times B and S\subseteq B\times C is SR = \{a\in A, c\in C\mid aRb\text{ and }bSc\text{
for some }b\in B\}. This is used in the definition of the various
JOIN
commands in SQL.
The transpose of a relation R\subseteq A\times B is R^T\subseteq B\times A where bR^Ta if and only if aRb.
Exercise. Show (RS)^T = S^T R^T.
The relation equal is \Delta\subseteq\{(a,a)\mid a\in A\}. It is a left and right identity for composition.
Exercise. Show \Delta R, R, and R\Delta are all equal.
Relations on A\times A are classified by properties they satisfy
Property | Definition | Formula |
---|---|---|
reflexive | aRa is always true. | \Delta\subseteq R |
irreflexive | aRa is always false. | \Delta\cap R = \emptyset |
symmetric | aRb implies bRa. | R^T = R |
antisymmetric | aRb and bRa imply a = b. | \Delta\subseteq R\cap R^T |
transitive | aRb and bRc imply aRc. | R^2\subseteq R |
Exercise. Prove the formulas from the definitions.
Exercise. Show R is symmetric if and only if aR = Ra, a\in A.
Hint. If b\in aR then aRb so bRa if R is symmetric. This shows b\in Ra. Similarly for a\in Rb. If aR = Ra, a\in A, then aRb implies a\in Rb = bR so bRa.
The Kleen plus of a relation R\subseteq A\times A is R^+ = \cup_{n>0} R^n where R^n is R composed with itself n times.
Exercise. Show the Kleen plus of any relation is transitive.
Hint. Note aR^+b if and only if aR^nb for some n>0 and R^nR^m = R^{n+m}.
An equivalence relation is reflexive, symmetric, and transitive.
Exercise. Show \Delta (equal) is an equivalence relation.
Hint: Recall \Delta = \{(a,a)\mid a\in A\}.
Given an equivalence relation R on A define the equivalence class of a\in A by [a] = \{b\in A\mid aRb\}. Note [a] is the left coset aR and aR = Ra since R is symmetric.
Exercise. If R is an equivalence relation show the set of equivalence classes \{[a]\mid a\in A\} is a partition of A.
Hint. Show \cup_{a\in A}[a] = A and if $[a]= then [a] = [b].
Exercise. If \{A_i\}_{i\in I} is a partition of A and we define R by aRb if and only if there exists i\in I with a,b\in A_i then R is an equivalence relation.
Every equivalence relation on A\times A correponds to a partition of A. Every partition of A corresponds to an equivalence relation on A\times A.
If the base of a set exponential is a field then \bm{F}^I is a vector space for any index set I. Typically \bm{F} is the real numbers \bm{R}, the complex numbers \bm{C}, the rational numbers \bm{Q}, or integers modulo p \bm{Z}_p where p is prime. If I = \{1,\ldots,n\} or I = \{0,\ldots,n-1\} we write \bm{F}^{\bm{n}} when the base index of 1 or 0 is understood. Scalar multiplication and vector addition are defined pointwise: (ax)(i) = a(x(i)) and (x + y)(i) = x(i) + y(i) for a\in\bm{F}, x,y\in\bm{F}^I. The additive identity is \bm{0} = 0_I defined by \bm{0}(i) = 0, i\in I. The additive inverse of x\in\bm{F}^I is -x defined by (-x)(i) = -x(i), i\in I.
Exercise. Show \bm{F}^I satisfies the vector space axioms.
The standard basis of \bm{R}^I is (e_i)_{i\in I} where e_i(j) = \delta_{ij} is the Kronecker delta. We write e_i^I to indicate e_i\in\bm{F}^I.
Exercise. Show x = \sum_{i\in I} x(i) e_i when I is finite.
Hint. If x,y\in\bm{F}^I then x = y if and only if x(j) = y(j) for all j\in I. What is x(j)?
The dot product of x,y\in\bm{F}^I when I is finite is x\cdot y = \sum_{i\in I} x(i) y(i). If \bm{F} is the complex numbers \bm{C} then x\cdot y = \sum_{i\in I} x(i) \overline{y(i)}. Recall if z = x + iy\in\bm{C} where x,y\in\bm{R} then the complex conjugate of z is \overline{z} = x - iy.
Exercise. Show for x\in\bm{F}^I that x^*\colon\bm{F}^I\to\bm{F} defined by {x^*y = x\cdot y} is linear if \bm{F} is not \bm{C}.
Hint: Show x^*(ay) = ax^*y and x^*(y + z) = x^*y + x^*z, a\in\bm{F}, x,y,z\in\bm{F}^I.
Exercise. _Show for y\in\bm{F}^I that y^*\colon\bm{F}^I\to\bm{F} defined by y^*x = x\cdot\overline{y} is linear (or conjugate linear if \bm{F} is \bm{C}).
Hint: Show y^*(ax) = ay^*x and y^*(x + w) = y^*x + y^*w, a\in\bm{F}, w,x,y\in\bm{F}^I and y^*(ax) = ay^*x and y^*(x + w) = y^*x + y^*w
An abstract vector space is only required to satisfy the vector space axioms. It does not have a standard basis that can be used to define the dot product.
The norm of x\in\bm{F}^I is \|x\| = \sqrt{x\cdot x}.
Exercise. Show \|x\| \ge 0, \|x\| = 0 implies x = \bm{0}, \|ax\| = |a|\|x\| and \|x + y\| \le \|x\| + \|y\|.
Exercise (Cauchy-Schwartz) Show |x\cdot y| \le \|x\| \|y\| and if equality holds then ax = by for some a,b\in\bm{F}.
Hint: Use 0\le\|x - ty\|^2 for t\in\bm{F} and facts about quadratic equations you learned in high school.
Define the distance from x to y by d(x,y) = \|x - y\|.
Exercise. Show d(x,y) = 0 implies x = y, d(x,y) = d(y,x) and d(x,y) + d(y,z) \ge d(x,z).
This shows d is a metric that defines a topology on \bm{F}^I.
We can identify \bm{F}^I with the cartesian product of I copies of the field \bm{F}. The element (x_i)_{i\in I} in the cartesian product \Pi_{i\in I}\bm{F} corresponds to the function x\in\bm{F}^I by x(i) = x_i, i\in I.
A simple but quite useful observation is that if \sigma\colon J\to I is any function then \circ\sigma\colon\bm{F}^I\to\bm{F}^J by x\mapsto x\circ\sigma, x\in\bm{F}^I.
Exercise. Show \circ\sigma(ax) = a(\circ\sigma(x)) and \circ\sigma(x + y) = \circ\sigma(x) + \circ\sigma(y) for a\in\bm{R} and x,y\in\bm{F}^I.
Hint: Show (ax)\sigma = a(x\sigma) and (x + y)\sigma = x\sigma + y\sigma on \bm{F}^J.
If i\in I and {\sigma_i\colon\{i\}\to I} is the inclusion {i\mapsto i} then {\circ\sigma_i\colon\bm{F}^I\to\bm{F}^{\{i\}}}. We can identify \bm{F}^{\{i\}} with \bm{F} by (x)\leftrightarrow x and get a map {e_i^* = e_i^{I*}\colon\bm{F}^I\to\bm{F}}.
Exercise. Show x = \sum_{i\in I}e_i^*(x) e_i, x\in\bm{F}^I, if I is finite.
Hint: Show e_i^*(x) = x(i).
A subspace is a subset V\subseteq\bm{R}^I that is closed under scalar multiplication and vector addition. There are two trivial subspaces.
Exercise. Show \{0\} and \bm{F}^I are subspaces of \bm{F}^I.
Exercise. Show if v\in\bm{F}^I then \{av\mid a\in\bm{F}\} is a subspace.
Hint: b(av) = (ba)v and av + bv = (a + b)v for a,b\in\bm{F}.
Given v_1,\ldots,v_k\in\bm{F}^I define their span as the set of linear combinations \operatorname{span}\{v_j\}_{j=1}^k = \{\sum_{j=1}^k a_j v_j\mid a_f\in\bm{F}\}.
Exercise. Show the span is a subspace.
Hint. Use a(\sum_j a_j v_j) = \sum_j aa_j v_j and \sum_j a_j v_j + \sum_j b_j v_j = \sum_j (a_j + b_j) v_j.
The vectors v_1,\ldots,v_k\in\bm{F}^I are independent if for a_1,\ldots,a_k\in\bm{F}, \sum_j a_j v_j = 0 implies a_j = 0 for all j.
The vectors v_1,\ldots,v_k\in\bm{F}^I are dependent (not independent) if there exist a_1,\ldots,a_k\in\bm{F}, not all 0, with \sum_j a_j v_j = 0. If a_j\not=0 then v_j = -\sum_{i\not=j} (a_i/a_j) v_i.
A collection of vectors v_1,\ldots,v_n\in\bm{F}^I are a basis if they span \bm{F}^I and are independent.
Exercise Show if \{v_i\}_{i=1}^n is a basis of \bm{F}^I then for every x\in\bm{F}^I there exist unique x_i, 1\le i\le n, with x = \sum_{i=1}^n x_i v_i.
Hint. The vectors \{v_i\}_{i=1}^n span \bm{F}^I so the x_i\in\bm{F} exist. Use independence to show they are unique.
This defines a bijection from \bm{F}^I to \bm{F}^{\bm{n}}. This map is also a linear operator.
Any function T\colon\bm{F}^I\to\bm{F}^J satisfying T(ax) = a(Tx) and T(x + y) = Tx + Ty for a\in\bm{F}, x,y\in\bm{F}^I is a linear operator. We use \mathcal{L}(\bm{F}^I, \bm{F}^J) and [\bm{R}^I\to\bm{F}^J] to denote the space of all linear operators. Linear operators also are a vector space with scalar multiplication (aT)(x) = aT(x) and vector addition (T + U)(x) = Tx + Ux, a\in\bm{F}, T,U\in[\bm{F}^I\to\bm{F}^J], x\in\bm{F}^I.
Exercise. Prove T\bm{0}= \bm{0}.
Hint: Use x + x = x implies x=\bm{0}.
The kernel of a linear operator T\colon\bm{F}^I\to\bm{F}^J is {\ker T = \{x\in\bm{F}^I\mid Tx = 0\}\subseteq\bm{F}^J}.
Exercise. Show \ker T is a subspace of \bm{F}^I.
Hint. Show if Tx = \bm{0} then T(ax) = \bm{0} and that Tx = \bm{0}, Ty = \bm{0} implies T(x + y) = $.
Exercise. Show if Tx = Ty then x - y\in\ker T.
Hint. Use T is linear.
Exercise. Show if \ker T = \{\bm{0}\} then T is injective.
If T is injective and surjective then T is an isomorphism, a bijection that preserves the vector space structure.
The matrix of a linear operator T\in[\bm{F}^I\to\bm{F}^J] is [T]\in\bm{F}^{I\times J} defined by [t_{ij}] = [T](i,j) = e_j^{J*} Te_i^I, i\in I, j\in J.
Since I\times J is a set \bm{F}^{I\times J} is a vector space and [T] = \sum_{i,j} t_{ij} e_{ij} where (e_{ij})_{i\in I, j\in J} is the standard basis of \bm{F}^{I\times J}.
Exercise. Show T\mapsto [T] is linear.
Hint. Show [aT] = a[T] and [T + U] = [T] + [U].
Exercise. If T\colon\bm{F}^I\to\bm{F}^J and U\colon\bm{F}^J\to\bm{F}^K then [UT](i,k) = \sum_{j\in J} t(i,j) u(j,k).
Hint. Use [UT](i,k) = e_k^*UTe_i and Te_i = \sum_j e_j^*(Te_i)e_j.
We have no need for the Einstein summation convention…
Lemma. If T\colon\bm{F}^I\to\bm{F}^J is injective then |I| \le |J|.
Lemma. If T\colon\bm{F}^I\to\bm{F}^J is surjective then |I| \ge |J|.
The dual of \bm{R}^I is \bm{F}^* = [\bm{F}^I\to\bm{F}], the set of linear functionals from a vector space to its underlying field. We have already seen an example of this e_i^*\colon\bm{F}^I\to\bm{F}. If I is finite the linear transformation *\colon\bm{F}\to\bm{F}^* defined by e_i\mapsto e_i^*, i\in I, is an isomorphism.
It seems to be popular to say “a tensor is something that transforms like a tensor.” A more useful definition is a tensor is a vector space indexed by a cartesian product. The number of products is the rank of the tensor.
For example, \bm{F}^{I\times J} is a rank 2 tensor. As we have just seen it corresponds to a linear operator in [\bm{F}^I\to\bm{F}^J]. Define a function with the slightly odd name superscript T by {}^T\colon J\times I\to I\times J by (j,i)\mapsto (i,j). Note \circ{}^T\colon\bm{F}^{J\times I}\to\bm{F}^{I\times J}.
Exercise. Show if T\colon\bm{F}^I\to\bm{F}^J with matrix [t_{ij}] then the matrix of [T]\circ^T is [t_{ji}].
A tensor is an element of \bm{F}^{\bm{n_1}}\times\cdots\times\bm{n_r} where r is the rank of the tensor.
[V\to V]\to\mathcal{P} from linear operators to characteristic polynomial is continuous.
v\wedge w can be thought of as the oriented parallelogram determined by v and w.
Exercise. _Show v\wedge w = -w\wedge v.
The Kleene star of a set S is S^* = \cup_{n\ge0} S^n so an element s\in S^*\in S^n for some n. The Kleene plus of a set S is S^+ = \cup_{n > 0} S^n. Define ?\colon S^*\to\bm{B} by ?s to be false$ if s\in S^0 and true otherwise. Define *\colon S^+\to S by *s = s_1 if s = (s_1, \ldots, s_n). Define +\colon S^+\to S^* by +(s_1, s_2, \ldots, s_n) = (s_2, \ldots, s_n).