Vector Space

Keith A. Lewis

June 10, 2025

Abstract
A mathematical sweet spot

You are probably familiar with the vector space of n-tuples of real numbers \boldsymbol{{{R}}}^n = \{x = (x_1,\dots,x_n)\mid x_j\in\boldsymbol{{{R}}}, 1\le j \le n\}. Scalar multiplication by a\in\boldsymbol{{{R}}} is a(x_1,\ldots,x_n) = (ax_1,\ldots,ax_n) and vector addition is (x_1,\dots,x_n) + (y_1,\dots,y_n) = (x_1 + y_1,\dots,x_n + y_n). The standard basis is {e_i = (0,\ldots,1,\ldots,0)}, {1\le i\le n}, where all entries are 0 except the i-th which is 1.

We can write this without \ldots’s using projections {\pi_i^n\colon\boldsymbol{{{R}}}^n\to\boldsymbol{{{R}}}} where {\pi_i(x) = x_i}. We write {\pi_i^n} as \pi_i when n is evident from the context. Note {x = y} if and only if {\pi_i(x) = \pi_i(y)}, {1\le i\le n} for {x,y\in\boldsymbol{{{R}}}^n}.

Exercise. Show \pi_i(ax) = a\pi_i(x) and \pi_i(x + y) = \pi_i(x) + \pi_i(y) for a\in\boldsymbol{{{R}}}, x,y\in\boldsymbol{{{R}}}^n.

Projections preserve the linear structure of \boldsymbol{{{R}}}^n.

Exercise. Show the standard basis satisfies \pi_i(e_j) = \delta_{ij}.

Hint. Recall the Kronecker delta \delta_{ij} is 1 if i = j and 0 if i\not=j.

Every element of \boldsymbol{{{R}}}^n is a linear combination of standard basis vectors.

Exercise. Show x = \sum_j x_j e_i for x\in\boldsymbol{{{R}}}^n.

Hint. Apply \pi_i to each side and use the definition of the Kronecker delta.

Define \boldsymbol{{0}}\in\boldsymbol{{{R}}}^n by \pi_i(\boldsymbol{{0}}) = 0 for 1\le i\le n.

Exercise. Show 0x = \boldsymbol{{0}} for x\in\boldsymbol{{{R}}}^n.

Exercise. Show x + \boldsymbol{{0}}= x = \boldsymbol{{0}}+ x for all x\in\boldsymbol{{{R}}}^n.

This shows \boldsymbol{{0}} is the identity element for vector addition.

Exercise. Show (a + b)x = ax + bx and a(x + y) = ax + ay for a,b\in\boldsymbol{{{R}}} and x,y\in\boldsymbol{{{R}}}^n.

Hint. Consider \pi_i((a + b)x) and \pi_i(a(x + y)).

This establishes the distributive laws connecting scalar multiplication and vector addition.

Exercise. Show -x = -1x satisfies x + (-x) = \boldsymbol{{0}}.

Hint. Consider x + (-x) = 1x + (-1x) = (1 + (-1))x.

This shows -x is the additive inverse of x\in\boldsymbol{{{R}}}^n.

Exercise. Show a(bx) = (ab)x for a,b\in\boldsymbol{{{R}}}, x\in\boldsymbol{{{R}}}^n.

Hint: Use the fact multiplication is associative in \boldsymbol{{{R}}} so \pi_(a(bx)) = a(\pi_i(bx)) = a(b\pi_i(x)) = (ab)\pi_(i)(x).

Vector Space

There exist vector spaces beyond the special case of \boldsymbol{{{R}}}^n. A more powerful approach is to extract the essential properties and define them through abstract axioms, allowing for a clearer and more general perspective. A vector space is any set equipped with scalar multiplication and vector addition, provided they satisfy fundamental rules established in the preceding exercises. Specifically, vector addition forms an abelian (commutative) group, while scalar multiplication adheres to the distributive laws: a(x + y) = ax + ay and (a + b)x = ax + bx, for all a, b \in \boldsymbol{{{R}}} and x, y \in V. Additionally, scalar multiplication must satisfy the associativity condition, a(bx) = (ab)x, and the identity property, 1x = x, for all a, b \in \boldsymbol{{{R}}} and x \in V.

A subspace is a subset W\subseteq V that is also a vector space. Define the notation \boldsymbol{{{R}}}W = \{ax\mid a\in\boldsymbol{{{R}}}, x\in W\} and W + W = \{x + y\mid x,y\in W\}.

Exercise _Show if W\subseteq V, \boldsymbol{{{R}}}W\subseteq W, and W + W\subseteq W then W is a vector space.

Hint: The assumptions guaranted scalar multiplication and vector addition are well-defined on W. The other axioms follow from V being a vector space.

The span of a finite collection \{v_i\}, i\in I, is the set of all linear combinations \operatorname{span}\{v_i\} = \{\sum_{i\in I} a_i v_i\mid a_i\in\boldsymbol{{{R}}}\}.

Exercise. Show \operatorname{span}\{v_i\}_{i\in I} is a subspace.

Hint: Use a(\sum_i a_i v_i) = \sum_i a a_i v_i and (\sum_i a_i v_i) + (\sum_i b_i v_i) = \sum_i (a_i + b_i) v_i.

Exercise. If v_1,\ldots v_m are independent and v is not in the span of v_i then v_1,\ldots,v_m,v are independent.

Hint. Suppose a v + \sum_i a_i v_i = \boldsymbol{{0}}. If a = 0 then a_i = 0 for all i. If a\not=0 then v is in the span of the v_i.

This shows any finite-dimensional subspace is the span of independent vectors.

A vector space is a set V with scalar multiplication \boldsymbol{{{R}}}\times V\to V and vector addition V\times V\to V

The span of a collection of vectors v_i\in\boldsymbol{{{R}}}^n, i\in I, is the set of all finite linear combinations \sum_{i\in I} a_i v_i, a_i\in\boldsymbol{{{R}}}. A collection of vectors is independent if \sum_i a_i v_i = \boldsymbol{{0}} implies a_i = 0 for all i. A basis of \boldsymbol{{{R}}}^n is a collection of vectors that are independent and span \boldsymbol{{{R}}}^n.

Exercise. If (v_i)_{i\in I} are dependent then there exists j\in I with v_j = \sum_{i\not=j} b_i v_i.

Hint. Dependent means not independent. If \sum_i a_i v_i = \boldsymbol{{0}} but not all a_i = 0 then there exists j\in I with a_j\not=0. Show b_i = -a_i/a_j for i\not=j.

Vector Space

\boldsymbol{{{R}}}^n is a special case of the vector space of all functions from a set I to \boldsymbol{{{R}}}, \boldsymbol{{{R}}}^I = \{x\colon I\to\boldsymbol{{{R}}}\} where I = \{1,\ldots,n\}. Scalar multiplication is {(ax)(i) = ax(i)} and vector addition is {(x + y)(i) = x(i) + y(i)} for i\in I.

Examples

A linear ordinary differential equation constrains a function y = y(x) by requiring its derivatives satisfy \sum_{i=0}^n a_n(x) d^ny/dx^n = b(x). If b(x) = 0 we say the solution is homogeneous.

Exercise. The set of homogeneous solutions of a linear ODE form a vector space.

Hint If y is a homogeneous solution then so is ay for a\in\boldsymbol{{{R}}}. If y and z are homogeneous solutions then so is y + z.

If, by hook or by crook, we can find a single inhomogeneous solution y_b then we can find all solutions.

Exercise. Show every solution to the inhomogeneous equation has the form y_b + z where z is a homogeneous solution.

Hint. If y is any solution to the inhomogeneous equation then y_b - y is a homogeneous solution.

For first order linear ordinary differential equations we can assume the coefficient of dy/dx is a_1(x) = 1 and normalize to dy/dx + a(x) y = b(x).

Exercise. If A'(x) = a(x) then d(e^{A(x)}y)/dx = e^{A(x)}(dy/dx + a(x)y).

Multiplying both sides of dy/dx = a(x)y by the integrating factor e^{A(x)} shows e^{A(x)}y = \int e^{A(x)} b(x)\,dx + C for some constant C.

Exercise. Show every solution of dy/dx + a(x)y = b(x) has the form y = ce^{-A(x)}\int e^{A(x)} b(x)\,dx for some constant c.

In the case of second order linear ODE’s there is a wonderland of solutions.

Linear Transformation

A linear transformation is a function T\colon\boldsymbol{{{R}}}^n\to\boldsymbol{{{R}}}^m that respects the vector space structure: T(ax) = a(Tx) and T(x + y) = Tx + Ty for a\in\boldsymbol{{{R}}}, x,y\in\boldsymbol{{{R}}}^n.

Exercise. Show if T is linear then T\boldsymbol{{0}}= \boldsymbol{{0}}.

Hint. Note T\boldsymbol{{0}}+ T\boldsymbol{{0}}= T(\boldsymbol{{0}}+ \boldsymbol{{0}}) = T\boldsymbol{{0}} and add -T\boldsymbol{{0}} to both sides.

Exercise. If T is linear show T(ax + y) = aTx + y, a\in\boldsymbol{{{R}}}, x,y\in\boldsymbol{{{R}}}^n.

Proof: Note T(ax + y) = T(ax) + Ty = aTx + Ty.

Exercise. Show T is linear if T(ax + y) = aTx + y for a\in\boldsymbol{{{R}}}, x,y\in\boldsymbol{{{R}}}^n.

Hint. Take y = \boldsymbol{{0}} and a = 1 noting 1x = x.

The collection of all linear transformations from \boldsymbol{{{R}}}^n to \boldsymbol{{{R}}}^m also form a vector space. Scalar multiplication is (aT)x = a(Tx) and vector addition is (T + U)x = Tx + Ux for a\in\boldsymbol{{{R}}}, T,U\colon\boldsymbol{{{R}}}^n\to\boldsymbol{{{R}}}^m linear, and x\in\boldsymbol{{{R}}}^n. This is our first inkling that there is more to vector spaces than just \boldsymbol{{{R}}}^n.

The matrix of a linear transformation T\colon\boldsymbol{{{R}}}^n\to\boldsymbol{{{R}}}^m under the standard basis is [t_{ij}] where t_{ij}\in\boldsymbol{{{R}}}, 1\le i\le n, 1\le j\le m are defined by Te^n_i = \sum_j t_{ij}e^m_j. This follow from Te^n_i\in\boldsymbol{{{R}}}^m being a linear combination of e^m_j, 1\le j\le m.

If S\colon\boldsymbol{{{R}}}^m\to\boldsymbol{{{R}}}^k is a linear transformation then the matrix of the composition U = ST\colon\boldsymbol{{{R}}}^n\to\boldsymbol{{{R}}}^k is {u_{ik} = \sum_j t_{ij}s_{jk}}_.

Exercise. Prove this.

Hint. Start with STx = S(\sum_i t_{ij}e_i) = \sum_i t_{ij}Se_i = \sum_i t_{ij}\sum_j s_{jk}e_k.

This shows matrix multiplication is just composition of linear transformations. It also shows calculations involving the standard basis are cumbersome.

Werner Heisenberg reinvented matrix multiplication. His theory of the orbital levels of a hydrogen atom used e_{ij} to represent a jump of an electron from orbital level i to orbital level j. His physical intuition led him to the rule e_{ij}e_{kl} was a jump from level i to level l if and only if j = k so e_{ij}e_{kl} = \delta_{jk}e_{il}.

Exercise. Show (\sum_{ij} t_{ij}e_{ij})(\sum_{kl} s_{kl} e_{kl}) = \sum_j t_{ij}s_{jl} e_{il}.

This shows we can identify the vector space of linear transformations from \boldsymbol{{{R}}}^n to \boldsymbol{{{R}}}^m with {\{(t_{ij})\mid 1\le i\le n, 1\le j\le m\}}. In basic linear algebra courses it is common to define row vectors and column vectors of a matix. Row i is the vector having j-th projection t_{ij} and columns j is the vector having i-th projection t_{ij}, but this does not generalize to higher dimensions.

Tensor

Given two sets A and B the set exponential {B^A = \{f\colon A\to B\}} is the set of all functions from A to B. The exponential tensor with index set I is R^I. It is a vector space with scalar multiplication (ax)(i) = ax(i) and vector addition (x + y)(i) = x(i) + y(i), a\in\boldsymbol{{{R}}}, x,y\in\boldsymbol{{{R}}}^I, i\in I.

Exercise. Show \boldsymbol{{{R}}}^n can be identified with \boldsymbol{{{R}}}^{\boldsymbol{{n}}} where \boldsymbol{{n}} = \{1,\ldots,n\}.

Hint. The vector x\in\boldsymbol{{{R}}}^n corresponds to the function \boldsymbol{{x}}\in\boldsymbol{{{R}}}^\boldsymbol{{n}} via x_i = \boldsymbol{{x}}(i), i\in\boldsymbol{{n}}.

Exercise. Show linear transformations from \boldsymbol{{{R}}}^n to \boldsymbol{{{R}}}^m can be identified with \boldsymbol{{{R}}}^{\boldsymbol{{n}}\times\boldsymbol{{m}}}.

Hint. Recall \boldsymbol{{n}}\times\boldsymbol{{m}} = \{(i,j)\mid 1\le i\le n, 1\le j\le m\}. The matrix [t_{ij}] can be identifed with \boldsymbol{{t}}\in\boldsymbol{{{R}}}^{\boldsymbol{{n}}\times\boldsymbol{{m}}} by t_{ij} = \boldsymbol{{t}}(i,j).

A linear functional is a function f\colon\boldsymbol{{{R}}}^n\to\boldsymbol{{{R}}}, i.e., a linear transformation where m = 1.

Projections are linear tranformations. In this case m = 1 and we call them linear functionals.

A vector space is more than a n-tuple of real numbers, \boldsymbol{{{R}}}^n = \{(x_1,\ldots,x_n)\mid x_j\in\boldsymbol{{{R}}}, 1\le j\le n\}.

Given two sets A and B the set exponential {B^A = \{f\colon A\to B\}} is the set of all functions from A to B. Let’s write B^A suggestively as {f\in\{A\to B\}}. If A and B are sets with structure we want to consider functions that preserve the structure and write that as {[A\to B]}. A function {f\in [A\to B]} is a homomorphism, derived from the Greek homos(ὁμός), meaning alike, and morphe(μορφή), meaning form.

A homomorphism f\in [A\to B] that is one-to-one(injective) and onto(surjective) is a one-to-one correspondence(isomorphism). We use the notation A\cong B to denote the existence of a bijection.

Exercise. Show A\cong A, A\cong B implies B\cong A, and A\cong B, B\cong C imply A\cong C.

The exercise shows \cong is an equivalence relation. This generalizes the notion of equality. Two things can be the “same” without being equal. In the case of sets, two sets are isomorphic if and only if they have the same cardinality.

A game mathematicians like to play is to determine when sets with structure are isomorphic. Sets and vector spaces hold a sweet spot in the mathematical world of sets with structure. Sets have no structure, they are just a bag of elements that are members of the set. Two sets have the same cardinality if and only if there is a bijection between them.

Two vector spaces are isomorphic if and only if they have the same dimension. This is very different from the case for, e.g., groups, rings, and fields. Only a Dedekind complete totally ordered Archemedian field can top this; it must be isomorphic to the real numbers. Lie groups take a distant second. If a group has a topology making the multiplication continuous there is a very beautiful theory due to Killing, fixed up in Elie Cartan’s PhD thesis, providing a complete classification.

The next game they like to play is determining when two homomorphisms are equivalent. A homomorphism between vector spaces is called a linear operator. Keep reading to find out how the Jordan canonical form solves this in finite dimensions. Along the way we will develop a theory that can be widely applied to any linear system.

Set

Everything in math is a set. We write a\in A to indicate a is an element, or member, of A. The set of function from A to B is the set exponential B^A = \{f\colon A\to B\}. A function is a set defined by its graph \{(a,f(a))\mid a\in A\} contained in the cartesion product A\times B. Kazimierz Kuratowski improved Norbert Wiener’s definition of an ordered pair (a,b)\in A\times B as the set \{a,\{a,b\}\}.

Exercise. Show \cap\{a,\{a,b\}\} = \{a\} and \cup\{a,\{a,b\}\} = \{a,b\}.

Hint For any set A, \cap A is the intersection \cap\{x\mid x\in A\}; x belongs to \cap A if and only if x is a member of every element of A. For any set A, \cup A is the union \cup\{x\mid x\in A\}; x belongs to \cup A if and only if x is a member of some element of A.

The intersection identifies the first item of the pair. If the union is a singleton, a set containing only one element, then the second item is equal to the first item, otherwise we remove the first item to identify the second item.

Functions f\colon A\to B and g\colon B\to C can be composed. The composition of f and g is g\circ f\colon A\to C defined by g\circ f(a) = g(f(a)), a\in A. We also write gf for g\circ f.

Exercise. If in additon h\colon C\to D show h(gf) = (hg)f.

Function composition is associative. Every set has an identity function 1_A\colon A\to A defined by 1_A(a) = a, a\in A.

Exercise. If f\colon A\to B show f1_A = f and 1_B f = f.

A function f\colon A\to B is one-to-one, or injective, if f(a) = f(a') implies a = a', a,a'\in A. A function is onto, or surjective, if for every b\in B there exists an a\in A with f(a) = b. A function is a one-to-one correspondence, or bijective, if it is one-to-one and onto. Two finite sets are the same size if they have the same number of elements. Bijections classify possibly infinite sets by cardinality.

Category

Okay, I lied. Not everything in math is a set. There is an alternate foundation of mathematics called category theory involving objects and arrows having an associative composition for arrows and identity arrows for each object. The above exercises show sets and functions are objects and arrows for the category \mathbf{Set}. Samuel Eilenberg and Saunders Mac Lane noticed many proofs in their area, algebraic topology, were based on these simple notions. The goal of algebraic topology is to classify when two topologial spaces are isomorphic, i.e., there is a continuous bijection between them. As of 2025, this problem remains unsolved.

A funny thing happened when people tried to define sets using only objects and arrows. Logicians like to say, “The language of set theory is epsilon.” By that they mean set theory can be defined using first-order logic and axioms for set membership. It turned out the notion of membership in category theory is a little tricky.

Topos Theory theory tries to define sets using only objects and arrows. The best anyone has come up with so far for membership is a subobject classifier \Omega for which there exists (fasten your seatbelt) a morphism from a terminal object 1\to\Omega such that for each monomorphism j\colon U\to X there exists a unique \chi_j\colon X\to\Omega such that \chi_j j\colon U\to\Omega is equal to the composition of U\to 1 and 1\to\Omega. Phew! No wonder some people think category theory is useless.

Putting on your category theory glasses can help see classical set theory results more clearly. It turns out set membership is a special case. For example, the points of a sphere are members of the sphere, but each point also determines a unique tangent plane to the sphere. This is an example of a sheaf that generalizes set membership.

The axioms of category theory are so simple compared to set theory it is no surprise their application is more complicated. Fighting to establish true statements with one hand tied behind your back forces you to get to the bottom of what is essential.

We can define one-to-one and onto using only functions.

Exercise. Show f\colon A\to B is one-to-one if and only if fg = fh implies g = h for g,h\colon C\to A.

Hint: fg = fh if and only if f(g(c)) = f(h(c)) for all c\in C.

Exercise. Show f\colon A\to B is onto if and only if gf = hf implies g = h for g,h\colon B\to C.

Hint: gf = hf if and only if g(f(a)) = h(f(a)) for all a\in A. For all b\in B there exists a\in A with f(a) = b.

In a general category these two cancellation properties define mono and epi arrows. In the category \mathbf{Set} these turn out to be the same as injective and surjective.

Exercise. Show f\colon A\to B is one-to-one if and only if it has a left inverse(retraction).

Hint: A left inverse is a function f^\dashv\colon B\to A with f^\dashv f = 1_A.

Solution If f is one-to-one and f(a) = b define f^\dashv(b) = a. This is well-defined since f is one-to-one, i.e., if f(a') = b then a = a'.

Exercise. Show if f\colon A\to B is a retraction then f is mono.

Hint: If a left inverse exists and fg = fh then pre-compose the left inverse to get g = h.

Exercise. Show f\colon A\to B is onto if and only if it has a right inverse(section).

Hint: A right inverse is a function f^\vdash\colon B\to A with ff^\vdash = 1_B.

Exercise. Show if f\colon A\to B is a section then f is mono.

Hint: If a right inverse exists and gf = hf then post-compose the right inverse to get g = h.

It is also possible to define an ordered pair using only functions. Define projections \pi_A\colon A\times B\to A by \pi_A(a,b) = a and \pi_B\colon A\times B\to B by \pi_B(a,b) = b.

Exercise. If f\colon C\to A and g\colon C\to B show there exists h\colon C\to A\times B with \pi_A h = f and \pi_B h = g.

Hint: Of course h(c) = (f(c), g(c)), c\in C.

This characterizes the cartesian product A\times B. Let P be any set with functions p_A\colon P\to A, p_B\colon P\to B with the property in the previous exercise, then P is in one-to-one correspondence with A\times B.

Exercise. Prove this.

Solution Since p_A\colon P\to A and p_B\colon P\to B there exists h\colon P\to A\times B with \pi_A h = p_A and \pi_B h = p_B. Likewise, there exists k\colon A\times B\to P with p_A k = \pi_A and p_B k = \pi_B. Show h is a left inverse of k and k is a right inverse of h. ???

Set product and exponential are connected by currying. Functions from A\times B to C are in one-to-one correspondence with functions from A to functions from B to C. If f\colon A\times B\to C define f,\colon A\to(B\to C) by f,a(b) = f(a,b). If g\colon A\to(B\to C) define g`\colon A\times B\to C by g`(a,b) = ga(b).

Exercise. If f\colon A\times B\to C show (f,)` = f and if g\colon A\to(B\to C) show (g`), = g.

Evaluation of a function is defined by \operatorname{eval}= \operatorname{eval}_A^B\colon A^B\times B\to A with \operatorname{eval}(f,b) = f(a).

Exercise. !!!connect currying and eval!!!

Our theme is to define everything in terms of sets and functions with an eye toward computer implementation. Every programming language allows you to define functions and call them on types belonging to a set. We provide mathematical names that can be translated into any computer language.

\boldsymbol{{{R}}}^n

There are two ways to think about \boldsymbol{{{R}}}^n, one is as a set of tuples, the other is as a set of functions. These two perspectives are a source of confusion and insight.

An impoverished notion of a vector is that it is a tuple of real numbers {x = (x_1,\ldots,x_n)}. Given a natural number n\in\boldsymbol{{N}}, let {\boldsymbol{{{R}}}^n = \{(x_1,\ldots,x_n)\mid x_i\in\boldsymbol{{{R}}}, 1\le i\le n\} = \prod_{1\le i\le n}\boldsymbol{{{R}}}} be the cartesian product of n\in\boldsymbol{{N}} copies of the real numbers. If bold {\boldsymbol{{n}}} is the set {\{1,\ldots,n\}} then {i\in\boldsymbol{{n}}} is a shorter notation for {1\le i\le n}. We can identify \boldsymbol{{{R}}}^n with \boldsymbol{{{R}}}^{\boldsymbol{{n}}} where the tuple {(x_i)_{i\in\boldsymbol{{n}}}} corresponds to the function \boldsymbol{{x}}\colon\boldsymbol{{n}}\to\boldsymbol{{{R}}} defined by \boldsymbol{{x}}(i) = x_i, i\in\boldsymbol{{n}}.

Exercise. Show \prod_{i\in\boldsymbol{{n}}}\boldsymbol{{{R}}} is in one-to-one correspondence with \boldsymbol{{{R}}}^{\boldsymbol{{n}}}.

Hint: If {\pi_i\colon\prod_{i\in\boldsymbol{{n}}}\boldsymbol{{{R}}}\to\boldsymbol{{{R}}}} are the projections \pi_i(x) = x_i, i\in I define x\mapsto\boldsymbol{{x}} by \boldsymbol{{x}}(i) = \pi_i(x) and show it is bijective.

A more powerful notion is to consider a vector as an element of the vector space of all functions from an index set I to the real numbers, \boldsymbol{{{R}}}^I. The tuple x = (x_i)_{i\in I} in \prod_{i\in I}\boldsymbol{{{R}}} corresponds to a function {\boldsymbol{{x}}\colon I\to\boldsymbol{{{R}}}} in \boldsymbol{{{R}}}^I defined by \boldsymbol{{x}}(i) = x_i, i\in I. In what follows we just write x for \boldsymbol{{x}} and leave it to you to figure out from context if a vector is a tuple or a function.

Exercise. For any set I, show \prod_{i\in I}\boldsymbol{{{R}}} is in one-to-one correspondence with \boldsymbol{{{R}}}^{I}.

Hint: If \pi_i\colon \prod_{i\in I}\boldsymbol{{{R}}}\to\boldsymbol{{{R}}} are the projections defining the product then \boldsymbol{{x}}(i) = \pi_i(x).

I -> (Pi_i R -> R)

Scalar Multiplication and Vector Addition

Scalar multiplication and vector addition on \boldsymbol{{{R}}}^I are defined pointwise by {(ax)(i) = a(x(i))} and {(x + y)(i) = x(i) + y(i)}, a\in\boldsymbol{{{R}}}, x,y\in\boldsymbol{{{R}}}^I.

Axioms

For any set I, a,b\in\boldsymbol{{{R}}}, and x, y, z\in\boldsymbol{{{R}}}^I show the following:

Exercise. x + (y + z) = (x + y) + z.

Exercise. x + y = y + x.

Exercise. \boldsymbol{{0}}+ x = x where \boldsymbol{{0}}(i) = 0, i\in I.

Exercise. x + (-x) = \boldsymbol{{0}} where (-x)(i) = -(x(i)), for i\in I.

Exercise. a(bx) = (ab)x.

Exercise. 1x = x.

Exercise. a(x + y) = ax + ay.

Exercise. (a + b)x = ax + bx.

Hint: Use the pointwise definitions and the properties of real numbers.

The exercises are the axioms for an abstract vector space with scalar multiplication {\boldsymbol{{{R}}}\times V\to V} where {(a,x)\mapsto ax = xa} and binary addition {V\times V\to V} where {(x,y)\mapsto x + y}. The first four axioms show vector addition is an abelian (commutative) group. The last two axioms are the distributive laws connecting scalar multiplication and vector addition. Every abstract vector space can be represented by \boldsymbol{{{R}}}^I for some set I, but this is not a trivial result.

Proofs involving only the abstract axioms are considered more elegant.

If x\in\boldsymbol{{{R}}}^I then (0x)(i) = 0(x(i)) = 0 and (1x)(i) = 1(x(i)) = x(i) for all i\in I so 0x = \boldsymbol{{0}} and 1x = x. These hold for any vector space, not just \boldsymbol{{{R}}}^I.

Exercise. (Zero is unique) Show if \boldsymbol{{0}}' + v = v for all v\in V then 0' = 0.

Hint: \boldsymbol{{0}}+ v = v so \boldsymbol{{0}}' + v = \boldsymbol{{0}}+ v. Add -v to both sides.

Exercise. Show 0v = \boldsymbol{{0}} for all v\in V.

Hint: Show 0v + v = v and use the previous exercise.

Exercise. For any vector space V show {v + v = v} implies {v = \boldsymbol{{0}}} for all {v\in V}.

Solution \begin{aligned} v + v &= v \\ &\quad\langle x = y\Rightarrow x + z = y + z\mid x\leftarrow v + v,y\leftarrow v,z\leftarrow -v\rangle\\ (v + v) + (-v) &= v + (-v) \\ &\quad\langle (x + y) + z = x + (y + z)\mid x\leftarrow v, y\leftarrow v, z\leftarrow -v\rangle\\ v + (v + (-v)) &= v + (-v) \\ &\quad\langle x + (-x) = \boldsymbol{{0}}\mid x\leftarrow v\text{ twice }\rangle\\ v + \boldsymbol{{0}}&= \boldsymbol{{0}}\\ &\quad\langle x + \boldsymbol{{0}}= x\mid x\leftarrow v\rangle\\ v &= \boldsymbol{{0}} \end{aligned}

This may seem to be a tedoious way to prove an obvious result, but it is a simple example of the power of mathematics. Unlike other areas of human endeavour, whenever two mathematicians disagree it is possible to get to the bottom of things.

Standard Basis

The standard basis of \boldsymbol{{{R}}}^n is {e_i\in\boldsymbol{{{R}}}^n}, {i\in\boldsymbol{{n}}}, where {e_i = (0,\ldots,1,\ldots,0)} with all elements 0 except for a 1 in the i-th position. It is plausible that {x = (x_1,\ldots,x_n) = x_1 e_1 + \cdots + x_n e_n} for {x\in\boldsymbol{{{R}}}^n}, but you should always be wary of definitions involving dots.

More generally, the standard basis of \boldsymbol{{{R}}}^I is {e_i\in\boldsymbol{{{R}}}^I}, {i\in I}, where {e_i(j) = \delta_{ij}} for j\in I, where \delta_{ij} is the Kronecker delta defined by {δ_{ij} = 1} if {i=j} and {δ_{ij} = 0} if {i\not=j}. We write e^I_j to indicate the domain if needed.

Exercise. If I is finite, show x = \sum_{i\in I} x(i)e_i for x\in\boldsymbol{{{R}}}^I.

Hint: Evaluate the putative equality at j\in I.

Linear Transformation

A linear transformation between vector spaces is a function {T\colon V\to W} that preserves the scalar multiplication and vector addition: T(ax) = aTx and {T(x + y) = Tx + Ty}, {a\in\boldsymbol{{{R}}}}, {x,y\in V}. The collection of all linear operators from V to W is denoted \mathcal{L}(V,W).

Exercise. Show if T\colon V\to W is a function with {T(ax + y) = aTx + y}, {a\in\boldsymbol{{{R}}}}, {x,y\in V} then T is a linear operator.

Hint: Take y = \boldsymbol{{0}} and a = 1.

The collection \mathcal{L}(V,W) can be made into a vector space. Define scalar multiplication by (aT)v = a(Tv) and vector addition by (T + S)v = Tv + Sv, a\in\boldsymbol{{{R}}}, T,S\in\mathcal{L}(V,W). One could check the scalar multiplication and vector addition satisfy the axioms for an abstract vector space but we will show a simple way to establish this once we define matrix.

Matrix

If T\in\mathcal{L}(\boldsymbol{{{R}}}^I,\boldsymbol{{{R}}}^J) then the image of the i-th standard basis element in \boldsymbol{{{R}}}^I under T can be written Te^I_i = \sum_{j\in J} t_{ij} e^J_j for some scalars t_{ij}\in\boldsymbol{{{R}}} when I and J are finite. The matrix of T is [t_{ij}]_{i\in I, j\in J}.

If S\colon\boldsymbol{{{R}}}^J\to\boldsymbol{{{R}}}^K then U = ST\colon\boldsymbol{{{R}}}^I\to\boldsymbol{{{R}}}^K.

Exercise. Show the matrix of the composition U = ST is {u_{ik} = \sum_{j\in J} s_{ij}t_{jk}}, for i\in I, k\in K.

Matrix multiplication is just composition of linear operators. Define E_{ij}\colon\boldsymbol{{{R}}}^I\to\boldsymbol{{{R}}}^J to be the linear operator with matrix [\delta_{ij}].

Exercise. Show E_{ij}E_{k,l} = \delta_{jk}E_{il}.

Werner Heisenberg reinvented matrix multiplication. His interpretation was E_{ij} represented a jump of an electron from orbital level i in a hydrogen atom to orbital level j. He positied E_{ij}E_{kl} was a jump from level i to level l if and only if j = k. Eventually Pascual Jordan pointed this out to him.

This shows linear operators from \boldsymbol{{{R}}}^I to \boldsymbol{{{R}}}^J are in one-to-one correspondence with \boldsymbol{{{R}}}^{I\times J}. Define a function from T\in\mathcal{L}(\boldsymbol{{{R}}}^I,\boldsymbol{{{R}}}^J) to \boldsymbol{{T}}\in\boldsymbol{{{R}}}^{I\times J} by \boldsymbol{{T}}(i,j) = t_{ij} where [t_{ij}] is the matrix of T.

Exercise. Show \mathcal{L}(\boldsymbol{{{R}}}^I,\boldsymbol{{{R}}}^J) is isomorphic to \boldsymbol{{{R}}}^{I\times J}.

Since I\times J is a set, \boldsymbol{{{R}}}^{I\times J} is a vector space. This shows \mathcal{L}(\boldsymbol{{{R}}}^I,\boldsymbol{{{R}}}^J) is a vector space.

Dual

The dual of a vector space V is the set of all linear functionals from V to \boldsymbol{{{R}}}, V^* = \mathcal{L}(V,\boldsymbol{{{R}}}). We write the dual pairing \langle v, v^*\rangle = v^*(v) for v\in V, v^*\in V^*.

The standard dual basis of \boldsymbol{{{R}}}^I, e_i^*\colon\boldsymbol{{{R}}}^I\to\boldsymbol{{{R}}} is defined by e_i^*(x) = x(i), i\in I.

Exercise. Show e_i^* is linear for all i\in I.

Exercise. Show \langle e_i, e^*_j\rangle = \delta_{ij}, i,j\in I.

Exercise Show the matrix of T\in\mathcal{L}(\boldsymbol{{{R}}}^I,\boldsymbol{{{R}}}^J) is {t_{ij} = \langle Te_i, e^*j\rangle}, i\in I, j\in J.

Operator Structure

We now turn to the problem of characterizing linear operators up to similarity. Operators T,U\colon V\to V are similar if and only if there exists an invertible S\colon V\to V with T = S^{-1}US.

Exercise. The matrix of T under any basis (v_i) equals the matrix of U under the basis (Sv_i).

Hint: The matrix of T is t_{ij} = v_j^*Tv_i where (v_j^*) is the dual basis.

Linear operators are overgrown numbers. We can add and multiply them. Addition is commutative but multiplication might not be.

If V is one-dimensional then Tv = t v, v\in V, for some number t \in\boldsymbol{{C}}. The number t corresponds to the linear operator of scalar multiplication by t.

Exercise. Prove this.

Hint: Pick any v\in V that is not the zero vector. Since Tv\in V there exists \lambda\in\boldsymbol{{C}} with Tv = \lambda v.

We say T is diagonalizable (d’able for short) if there exists a basis (v_i) of V with Tv_i = t_i v_i for some t_i\in\boldsymbol{{C}}. We say v_i is and eigenvector with eigenvalue t_i.

Exercise. Show the matrix of T under the basis (v_i) is t_{ij} = t_i\delta_{ij}.

T = \begin{bmatrix} t_1 & 0 & 0 \\ 0 & \ddots & 0 \\ 0 & 0 & t_n \\ \end{bmatrix}

We will prove later that every operator T\colon V\to V is almost d’able in the sense that given \epsilon > 0 there exists d’able T_\epsilon\colon V\to V with \|T_\epsilon - T\| < \epsilon.

You will prove now that Every d’able operator is near to a d’able operator with distinct eigenvalues.

Exercise. If T is d’able and \epsilon > 0 then there exists T_\epsilon with distinct eigenvalues satisfying {\|T_\epsilon - T\| < \epsilon}.

Hint: The norm of a d’able operator is the maximum of the modulus of every eigenvalue.

!!!Uses vector space norm.

if the dimension of V is greater than 1. For any T\colon V\to V we say v\in V is an eigenvector corresponding to the eigenvalue \lambda\in\boldsymbol{{C}} if {v\not=\boldsymbol{{0}}} and {Tv = \lambda v}. The first important fact about eigenvectors is that if they have different corresponding eigenvalues they must be independent.

Exercise. If Tv = \lambda v, Tw = \mu w, \lambda,\mu\in\boldsymbol{{C}}, v, w\in V, and \lambda\not=\mu, then v and w are independent.

Hint: If av + bw = \boldsymbol{{0}} then T(av + bw) = a\lambda v + b\mu w = \boldsymbol{{0}}. Subtract \lambda(av + bw) = 0 from both sides to get b(\mu - \lambda)w = \boldsymbol{{0}}. Conclude b = 0 and a = 0.

In general, if (\lambda_j) are m distinct eigenvalues corresponding to eigenvectors (v_j) then the eigenvectors are linearly independent. The previous exercise establishes this for m = 2 and the induction step is similar to proof for two distinct eigenvalues.

Exercise. Show (T - \lambda I)\boldsymbol{{0}}= \boldsymbol{{0}} for any linear operator T and \lambda\in\boldsymbol{{C}}.

Hint: \boldsymbol{{0}} is in the kernel of every linear operator.

Exercise. Show non-zero v\in V is an eigenvector corresponding to the eigenvalue \lambda\in\boldsymbol{{C}} if and only if v\in\operatorname{ker}T - \lambda I.

The spectrum of an operator is \sigma(T) = \{\lambda\in\boldsymbol{{C}}\mid T - \lambda I\text{ is not invertible}\}. Recall T - \lambda I is not invertible if and only if \operatorname{ker}T - \lambda I \not= \{\boldsymbol{{0}}\}.

Exercise. Show if \lambda\in\sigma(T) there exists an eigenvector having eigenvalue \lambda.

If the cardinality of \sigma(T) equals the dimension of V we say T is diagonalizable. Let’s abbreviate this to d’able. If V has dimension n and \sigma(T) = \{\lambda_1,\ldots,\lambda_n\} then there are independent eigenvectors \{v_1,\ldots,v_n\} with Tv_j = \lambda_j v_j.

Exercise. Show \{v_1,\ldots,v_n\} is a basis of V.

If T is d’able then the spectrum completely determines T. For \lambda\in\sigma(T) there exists v_\lambda\not=\boldsymbol{{0}} with Tv_\lambda = \lambda v_\lambda. Since \{v_\lambda\mid\lambda\in\sigma(T)\} is a basis of V every v\in V can be written v = \sum_{\lambda\in\sigma(T)} t_\lambda v_\lambda for some t_\lambda\in\boldsymbol{{C}}. The overgrown number T satifies Tv = \sum_{\lambda\in\sigma(T)} t_\lambda \lambda v_\lambda.

You might be thinking d’able operators are a special case, but we will prove later that every linear operator can be appoximated by them. Not every result about d’able operators can be used for the general theory but it is easy to prove some results for d’able operators.

If T\colon V\to V then I, T, T^2, \ldots, T^{n^2} must be dependent since [T\to T] is a vector space with dimension n^2 where n is the dimension of V. Hence there exists a polynomial p of order n^2 with p(T) = 0. For a d’able operator T there is a more drastic result that there is a polynomial of order n with p(T) = 0.

Exercise. If T\colon V\to V is d’able and p(z) = \prod_{\lambda\in\sigma(T)} (z - \lambda) then p(T) = 0.

Hint: Show p(T)v_\lambda = 0 if Tv_\lambda = \lambda v_\lambda.

If it is true that every linear operator can be approximated by a d’able operators then this holds for all operators.

Exercise. If T_n\to T in norm then p(T_n) \to p(T) in norm for any polynomial p.

If T is a linear operator on V and TM\subseteq M is an invariant subspace define T/M\colon V/M\to V/M by v + M \mapsto Tv + M.

Exercise. Show T/M is well-defined.

Hint: Show if v + M = w + M then Tv + M = Tw + M. Since v - w\in M we have T(v - w)\in M.

Show T\oplus M\colon M\oplus V/M\to M\oplus V/M is equivalent to T

T\colon V\to V and U\colon W\to W are equivalent if there exists an isomorphism E\colon V\to W with ET = UE.

Every linear operator T\colon\boldsymbol{{{R}}}^I\to\boldsymbol{{{R}}}^I is an element of the vector space \boldsymbol{{{R}}}^{I\times I} so I, T, T^2,\ldots,T^{|I\times I|} are linearly dependent and there exists a polynomial p of degree |I\times I| with p(T) = 0. But there is a drastic reduction. For every linear operator T\colon\boldsymbol{{{R}}}^I\to\boldsymbol{{{R}}}^I