Download pdf - MIT Calculus Revisited: Multivariate Calculus Notes

Calculus Revisited: Multivariable CalculusAs given by Herbert Gross, MIT

Notes by Aleksandar Petrov

March 2015

Contents

1 Vector Arithmetic 11.1 The Game of Mathematics . . . . . . . . . . . . . . . . . . . . . 11.2 The Structure of Vector Arithmetic . . . . . . . . . . . . . . . . . . 21.3 Applications to 3-Dimensional Space . . . . . . . . . . . . . . . . . 21.4 The Dot Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.5 The Cross Product . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.6 Equations of Lines and Planes . . . . . . . . . . . . . . . . . . . . . 6

2 Vector Calculus 82.1 Vector Functions of a Scalar Variable . . . . . . . . . . . . . . . . . 82.2 Tangential and Normal Vectors . . . . . . . . . . . . . . . . . . . . 92.3 Polar Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.4 Vectors in Polar Coordinates . . . . . . . . . . . . . . . . . . . . . 11

3 Partial Derivatives 133.1 n-Dimensional Vector Spaces . . . . . . . . . . . . . . . . . . . . . 133.2 An Introduction to Partial Derivatives . . . . . . . . . . . . . . . . 143.3 Differentiability and the Gradient . . . . . . . . . . . . . . . . . . . 163.4 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.5 Exact Differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 Matrix Algebra 214.1 Linearity Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2 Introduction to Matrix Algebra . . . . . . . . . . . . . . . . . . . . 224.3 Inverting a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.4 Maxima and Minima in Several Variables . . . . . . . . . . . . . . 25

5 Multiple Integration 265.1 The Fundamental Theorem . . . . . . . . . . . . . . . . . . . . . . 265.2 Multiple Integration and the Jacobian . . . . . . . . . . . . . . . . 265.3 Line Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.4 Greens Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

i

Chapter 1

Vector Arithmetic

1.1 The Game of Mathematics

We can define a game to be any system consisting of definitions, rules, andobjectives, where the objectives are carried out as inescapable consequencesof the definitions and the rules by means of strategy. Not all definitions can beclearly set. For example, no definition of number can be given. That meansthat some of the definitions are subjective. However, using only specificobjective facts about these concepts (the rules) allows us to draw inescapableconclusions.

It is important to distinguish between truth and validity. Truth is sub-jective and can change with the time. Validity means that the conclusion isinescapable result of the definitions and the rules. Simply put, an argumentis valid when it follows logically from the definitions and the rules. If ourpremises are true and our argument is valid, then the conclusions are alsogoing to be true. However, our conclusion may also be true if the premisesare not true or the arguments are not valid. Mathematics deal with the ar-gumentation part of this problem. It draws valid conclusions. However, it isnot necessary that they are true. That will be the case only if the premisesare true. To conclude, one can be sure that a conclusion is true only if theassumptions (definitions and rules) are true and the argumentation is valid.

That allows us to put a line between pure and applied mathematics.Applied mathematics deal with problems that have definitions and rules thatwe believe to be reality. Pure mathematics focus on the consistency of therules and the validity of the argument. However these rules need not be true.An example could be Lobachevskys geometry which was pure math as itdid not correspond to any physical truths known back then until Einsteinnoticed that it served as a realistic model for his theory of relativity.

1

CHAPTER 1. VECTOR ARITHMETIC 2

To show that an argument is invalid, all we need do is give one set ofconditions in which the assumptions are obeyed but the conclusion is false.On the other hand, proving that something is true is rather difficult: one hasto find a way to show that the statement is always true.

1.2 The Structure of Vector Arithmetic

An important fact to keep in mind is that the operations in vector arithmeticare result of definitions, not nature. We define a vector to be an object thathas a magnitude and direction. A more modern approach will define a vectoras an ordered sequence of numbers. A vector has magnitude (length), direc-tion (orientation) and sense (each direction has two possible senses). Themathematical concept of a vector is geometrically represented by an arrow.Furthermore, as a vector is defined solely by its magnitude, direction andsense, two vectors can be equal even is they do not have the same startingand ending point. The equality of vectors, the zero vector, the summationand subtraction of vectors are all operations that are defined in such a waythat they are easy and useful. Beware in mind that the mathematical struc-ture of vector arithmetic is different from the structure of scalar arithemtic.For example, we talk about summation of vectors but this operation is notthe same operation as summation of scalars. Furthermore, multiplication ofvectors is ambiguous while multiplication of scalars is clearly defined.

Some of the properties that vectors have are:

~a+~b = ~b+ ~a ~a+ (~b+ ~c) = (~a+~b) + ~c ~a+~0 = ~a

~a+~b = ~c ~a = ~c~b c(~a+~b) = c~a+ c~b (c+ d)~a = c~a+ d~a

Keep in mind that these properties were defined. They are not intrinsicto all mathematical structures. For example if you subtract set B from AByou will not get set A (apart from the case when A and B have no commonelements).

1.3 Applications to 3-Dimensional Space

When talking about three dimensional vectors we do actually mean vectorswith three coordinates. Of course, a vector is geometrically represented byan arrow and an arrow is a two dimensional element. One extremely usefulproperty (or if you wish definition) of vectors is that the components of a


vector connecting the origin of a Cartesian coordinate system with a pointare the same as the coordinates of the point. An important note about themathematical structure of vectors is that the recipes stay the same no mat-ter of the dimensionality of the vector. That means that the property statedabove is true for two-, three-, four- and n- dimensional vectors. Furthermore,the definition of magnitude of a vector as the square root of the squares ofits components is also true for any n-dimensional vector provided that thecomponents are given in a Cartesian coordinate system.

But why do we use Cartesian coordinate system? In Cartesian system~a+~b = (a1+b1, a2+b2). However in polar system ~a+~b 6= (ra+rb, a+b). Thevector properties do not change with different coordinate systems. However,the convenient methods that we mentioned above are result of properties ofthe coordinate system. As a result it is suggested to use Cartesian coordinatesystem as often as possible. A very important consideration, though, is thatas the vector properties do not change with different coordinate systems if agiven property is proven to hold in one coordinate system then it is a vectorproperty that always works in all coordinate systems.

1.4 The Dot Product

Lets start with a physical motivation for the dot product. We know thatwork is the product of a path and the component of a force in the directionof the path. That can be written in vector notation as W = | ~A|| ~B| cos()where is the angle between the force vector ~A and the path ~B. Now, wedefine this to be equal to the dot product of two vectors:

~A ~B = | ~A|| ~B| cos() (1.1)

As finding the angle between the two vectors is often a pretty hard task,

~A

~B

~A ~B

Figure 1.1:

lets try to get rid of the cosine. As can be seen from Figure 1.1 using the


~A

~B

| ~A| cos

Figure 1.2: Vector projection

cosine theorem we get the following:

| ~A ~B|2 = | ~A|2 + | ~B|2 2| ~A|| ~B| cos()

| ~A|| ~B| cos() = |~A|2 + | ~B|2 | ~A ~B|2

2

~A ~B = |~A|2 + | ~B|2 | ~A ~B|2

2(1.2)

Note that this result is not dependable on the coordinate system in use. Onlyfor Cartesian coordinate system this simplifies to

~A ~B = a1b1 + a2b2 + a3b3 (1.3)

Projections Lets take a look at Figure 1.2. We can see that the projectionof ~A on ~B looks like a dot product but with missing magnitude of anothervector. Furthermore, note that the length of the projection of ~A on ~B doesnot depend on the magnitude of ~B. We define a unit vector ~uB to have thesame direction and sense as ~B but to be with magnitude of unity.

~uB =~B

| ~B| (1.4)

Then as |~uB| = 1, | ~A| cos = |~uB|| ~A| cos . As a result,

ProjB ~A = ~uB ~A (1.5)

Structural properties

~a ~b = ~b ~a ~a (~b+ ~c) = ~a ~b+ ~a ~c


(c ~A) ~B = c( ~A ~B)This may seem trivial but one has to always keep in mind which operationsand conclusions are applicable in which situations. For example ~A ~B = 0does not mean that ~A or ~B is zero. It could be the case that they areorthogonal vectors and their cosine is zero.

1.5 The Cross Product

Although the cross product has vast applications in the physical sciences,our focus will be on the geometry. The cross product is also called the vectorproduct because the result of it is a vector (contrary to the dot product thatis a scalar). As a result we need to define three parameters of a the cross

product: magnitude, direction and sense. For ~A ~B the magnitude is definedto be | ~A|| ~B| sin , the direction is perpendicular to both ~A and ~B (that isperpendicular to the plane defined by the two vectors) and sense comes fromthe right hand rule: going from the first vector to the second through thesmaller angle.

As a result from the definition of the sense of the cross product ~A ~Bis not equal to ~B ~A. They have the same magnitude and direction butopposite sense, thus:

~A ~B = ~B ~A (1.6)Now, lets consider ~A ( ~B ~C). What is the direction of this vector? Itshould be perpendicular to a vector that is perpendicular to the plane definedby ~B and ~C. That means that ~A ( ~B ~C) is in fact parallel to the planecontaining ~B and ~C. However, there this vector is not equal to ( ~A ~B) ~Cfor the simple reason that each of them is parallel to one of two non-parallelplanes.

Finally for cross product the following holds:

~A ( ~B + ~C) = ~A ~B + ~A ~C (1.7)

The cross product of two vectors can be found through direct multiplicationof their components (keep in mind the sign of the vector products of unitvectors) or through the determinant method.

An interesting conclusion is that the magnitude of the cross productequals the area of the parallelogram which is enclosed by the two vectors.


1.6 Equations of Lines and Planes

Planes are to surfaces what lines are to curves. In the Calculus of singlevariable the topic of tangent line comes up pretty frequently. When doingCalculus of two variables we will use the concept of a tangent plane.

Lets start with the derivation of the equation of a plane. There areseveral ways to define a plane but for this discussion it is useful to defineit with a point that the plane passes through and the normal vector to theplane. We call the fixed point P0(x0, y0, z0) and the normal vector will be~N = (a, b, c). We want to find an equation for the components of any pointP (x, y, z) lying in the plane. Now, a smart way to approach this problem is

to see that the vectorPP0 = (x x0, y y0, z z0) is on the plane and thus

is perpendicular to N . That means that ~N PP0 = 0. As a result:~N PP0 = 0

(a, b, c) (x x0, y y0, z z0) = 0a(x x0) + b(y y0) + c(z z0) = 0 (1.8)

Now, several things can be observed from this equation. First, this is theequation for a plane that has a normal vector (a, b, c) and passes throughpoint (x0, y0, z0). Second, a plane can be expressed with an infinite amountof different equations of this kind. This can be easlily deduced from the factthat this equation can be derived for any point P0 on the plane. The normalvector would be the same and only the values of x0, y0 and z0 will change.Third, if we change the coordinates of P0 with those of a point that do not lieon the original plane, will get an equation for another plane that is parallel tothe original one. This is because we keep the normal vector (the componentthat determines the orientation of the plane) and change only its position inspace.

Finally, there are two important things to note. The equation of a planeis linear. Furthermore, it has two degrees of freedom. That means that wehave to fix two of the variables so that we can calculate the others.

Next, lets shift our focus to the equation of a line. We choose to fix a linebased on a point P0(x0, y0, z0) that it passes through and a vector parallelto the line (giving its direction) ~v = (a, b, c). Just as we did in the case of aplane, we want to find equation for any arbitrary point P (x, y, z) on the line.

Then, the vectorPP0 = (x x0, y y0, z z0) is on the line as it connects

two points that are of the line. As the line is parallel to ~v that means that


PP0 is a scalar multiple of ~v. As a result:

t~v =PP0

t(a, b, c) = (x x0, y y0, z z0)t a = x x0,t b = y y0,t c = z z0

x x0a = y y0b = z z0c (1.9)One can observe that the components of the position vector

PP0 are propor-

tional to the components of the direction vector ~v with the same constantof proportionality t. Furthermore, the equation of a line has one degree offreedom. If we fix one of the three coordinates we can easily find the othertwo.

A very important point to stress on is that the three parts of the equationdefine a line together. If you use only two parts you will get an equation of aplane (although you will have only two variables). This can be understoodif the deference between the following two sets is understood:

{(x, y) : 4y 3x = 17}{(x, y, z) : 4y 3x = 17}

In the second case, which is our case, z is free to take any value. However, ifyou want to define a line, the three variables should be constrained.

Chapter 2

Vector Calculus

2.1 Vector Functions of a Scalar Variable

Functions can be divided into four types. They can have a scalar or a vectoras an input and a scalar or a vector as an output - four different combinationsin total. Single variable Calculus deals only with the case of scalar input andoutput. In this section we will discuss functions that have a scalar input anda vector output.

What is suggested by Mr. Gross is that if there is direct correspondencebetween the definitions and the rules of scalar limits and vector limits, thenall the consequences coming from scalar limits, that use only accepted forvector arithmetic rules, will be also true for vectors. That means that if wedefine limit of a vector in a way that is analogous to the limit of a scalar andwe use only rules (operations) that are defined both for scalars and vectorsthen vectorial limits and derivatives should be the same as the scalar onesbut with the appropriate variables vectorized.

Following this strategy the following conclusions can be drawn:

limxa ~f(x) = ~L means that given any > 0 we can find > 0 suchthat whenever 0 < |x a| < then |~f(x) ~L| <

~f (x) = limx0[~f(x+ x) ~f(x)

x

]

if ~h(x) = ~f(x) + ~g(x), then ~h(x) = ~f (x) + ~g(x) d

dx[f(x)~g(x)] = f(x)~g(x) + f (x)~g(x)

ddx

[~f(x) ~g(x)] = ~f(x) ~g(x) + ~f (x) ~g(x)

8

CHAPTER 2. VECTOR CALCULUS 9

ddx

[~f(x) ~g(x)] = [~f(x) ~g(x)] + [~f (x) ~g(x)]

2.2 Tangential and Normal Vectors

Curves in planes or space have shapes that are independent from the coordi-nate system or the parametrization. Thus, it makes sense to try and expressthem and their properties solely from their shape rather than external co-ordinate system. For this reason local coordinates shall be used. We callthese the tangential, normal and binormal vectors. That is what we will tryto do here.

Let ~r(t) be the position vector as a function of the parameter t. Thederivative of the position vector (the so-called velocity vector) is always tan-

gent to the curve. Then the tangent vector ~T should be just a unit vector inthe direction of d~r/dt:

~T =

d~r

dtd~rdt =

d~r

dtds

dt

=d~r

ds(2.1)

This result comes naturally as d~r is in the tangent direction. Furthermore,as we are talking about infinitesimal quantities, the magnitude of d~r and dsis the same. Here ds is an arc length differential. Thus ds/dt has a physicalrepresentation of speed. So we divide the velocity vector d~r/dt by the speedds/dt and it makes sense that the result is a unit vector in the direction ofthe velocity (velocity is always tangential to the path). This is a beautifulresult as it does not depend on any coordinate system or parametrization.However, it is quite impractical in real life as one is rarely given s(t). Thatis why usually the unit tangent vector is calculated from d~r/dt divided byits magnitude.

Next, lets derive the unit normal vector ~N . Intuitively, it should be inthe direction of the derivative of the unit tangent vector. Why? Because themagnitude of a unit vector is always one, thus no change in this direction ispossible. Moreover, as the only change of the unit tangent vector can be inits direction, then its derivative has to be perpendicular to the unit tangentvector itself. We also define the normal vector to be perpendicular to thetangent vector. So:

~N =

d~T

dtd~Tdt

(2.2)


Now, we can prove this very same conclusion with more rigor. First, considerany function ~r(t) such that |~r(t)| = c. Then, the dot product of ~r(t) withitself will equal:

~r(t) ~r(t) = |~r(t)||~r(t)| cos 0 = |~r(t)|2 = c2

Lets take the derivative of this expression:

d

dt[~r(t) ~r(t)] = dc

2

dt= 0

However recall from the previous section that:

d

dx[~r(x) ~r(x)] = ~r(x) ~r(x) + ~r(x) ~r(x) = 2~r(x) ~r(x)

Combining the two expressions we get:

2~r(x) ~r(x) = 0

This proves that the derivative of a vector with constant magnitude is alwaysperpendicular to the original vector. In our discussion the magnitude of ~Tis always one, so its derivative is always orthogonal to it. Thus ~T is in thenormal direction. The only thing left is to make sure that its length is one,so we divide it by the magnitude. In this way we get Equation 2.2.

As the definitions of the tangent and the normal unit vectors do not de-pend on the coordinate system, they also hold in three dimensions. However,when we deal with space curves we can also define a third unit vector thatis normal to the oscillating plane - the plane defined by the unit normal andtangent vectors:

~B = ~T ~N (2.3)We call ~B the binormal vector.

2.3 Polar Coordinates

Polar coordinates are another way of representing coordinates in a plane. Apoint P is defined by its distance from the origin r and angle between theline connecting P with the origin and the horizontal axis. One can easily gofrom polar coordinates to Cartesian or the other way around:

x = r cos r =x2 + y2

y = r sin = arctan(y/x)


~r

r

~u

~ur

Figure 2.1:

A complication that arises when using polar coordinates is that a one pointcan have many representations. Recall that in Cartesian coordinate systemeach point has a set of coordinates and no point that has other coordinatescan be the same point. However, this is not the case with polar coordinates.For example we can have the following two cases in which one point can berepresented by different sets of coordinates:

(r, ) = (r, + 2kpi)

(r, ) = (r, + pi)An extremely important observation is that a point is to satisfy an equation,not its representation. There can be a case in which a representation doesnot satisfy the equation but the point satisfies it because there is anotherrepresentation that fits the equation. An example can be the following equa-tion: r = sin2 . Point P (1

4, 7pi

6) clearly does not satisfy the equation as r

cannot be negative. However the very same point P can be represented by(1

4, pi

6) which satisfies the equation r = sin2 .

2.4 Vectors in Polar Coordinates

In order to use vectors in Polar Coordinates we define two new unit vectors -~ur and ~u. ~ur is a unit vector in the direction of increasing r and ~u is positive90 degree rotation of it. This can be seen in Figure 2.1. The position of apoint is defined by a position vector ~r = r~ur. It can be easily found that:

~ur = cos ~i+ sin ~j (2.4)


~u is a positive 90 degree rotation of ~ur so:

~u = cos(90 + )~i+ sin(90 + )~j = sin ~i+ cos ~j = d~urd

(2.5)

In fact, it turns out that each differentiation of a unit vector gives a normalvector that is rotated positive 90 degrees from the original one. Thus dif-ferentiating ~u will give a unit vector in the same direction as ~ur but withopposite sense.

Another important thing to note is that the velocity vector expressed inpolar coordinates will generally have components both in ~ur and ~u. This isbecause neither of the two polar unit vectors is always tangent to the path.Furthermore, straightforward differentiation of the position vector will givethe velocity vector and differentiation of the velocity vector will give theacceleration vector.

The instantaneous velocity ~v is obtained from taking the time derivativeof the position vector.

~v =d~r

dt=dr

dt~ur + r

d~urdt

Now it can be seen from Equation 2.5 thatd~urdt

=d

dt~u. Thus,

~v =dr

dt~ur + r

d

dt~u (2.6)

If we differentiate Equation 2.6 with respect to time we can obtain the in-stantaneous acceleration:

~a =d2r

dt2~ur +

dr

dt

d~urdt

+dr

dt

d

dt~u + r

d2

dt2~u + r

d

dt

d~udt

~a =d2r

dt2~ur +

dr

dt

d~urd

d

dt+dr

dt

d

dt~u + r

d2

dt2~u + r

d

dt

d~ud

d

dt

~a =d2r

dt2~ur +

dr

dt

d

dt~u +

dr

dt

d

dt~u + r

d2

dt2~u + r

d

dt

d

dt(~ur)

~a =d2r

dt2~ur + 2

dr

dt

d

dt~u + r

d2

dt2~u r

(d

dt

)2~ur

~a =

[d2r

dt2 r

(d

dt

)2]~ur +

[2dr

dt

d

dt+ r

d2

dt2

]~u (2.7)

Chapter 3

Partial Derivatives

3.1 n-Dimensional Vector Spaces

In the last section we discussed the case with the function box having a scalaras an input and a vector as an output. Now we will consider the oppositeidea: vector input and scalar output. This type of functions are called scalarfunctions of vector variables.

Although till now we used vectors and arrows interchangeably vectors donot need to be arrows. Consider the following function:

V (r, h) = pir2h

This is a function that gives the volume of a cylinder with radius r and heighth. The arrow representation of the input (r, h) has no physical meaning.That is why it is more natural to view this input not as an arrow but as anordered 2-tuple. Furthermore, as we do not link vectors with arrows anymorethe notation x shall be used for denoting n-tuples.

Now, as we have outgrown the graphical representation of a vector wecan talk about vectors that have more than three components. It makesperfect sense for 4-tuples, 5-tuples and n-tuples to exist. Furthermore as wehave liberated ourselves from the constraints of the physical space, spacecoordinates like (x, y, z) do not make much sense anymore. That is why ann-tuple is defined as:

x = (x1, x2, x3, . . . , xn)

Lets talk about the mathematical structure of vectors. We have alreadydefined n-tuples. However, they are useless without any operations that wecan do with them. We need to empower them, give them special abilities.The insight here is that only when our set (the n-tuples) is endowed with thestructure of equality, summation and scalar multiplication, we can call the

13

CHAPTER 3. PARTIAL DERIVATIVES 14

resultant structure a n-dimensional vector space. What is to be rememberedis that the n-tuples together with the structure (equality, summation andscalar multiplication) are called vector space, not the n-tuples alone. Thisstructure is easily defined:

1. If a = (a1, a2, . . . , an) and b = (b1, b2, . . . , bn), then a = b means thata1 = b1, a2 = b2,. . . ,an = bn.

2. a = (a1, a2, . . . , an) and b = (b1, b2, . . . , bn), then a + b = (a1 + b1, a2 +b2, . . . , an + bn).

3. If c is any scalar, scalar multiplication is defined as c(a1, a2, . . . , an) =(ca1, ca2, . . . , can).

We should also note that the length of a n-tuple can be found by:

||x|| =x21 + x

32 + . . .+ x

2n (3.1)

Furthermore, dot product and its properties are also applicable to n-tuples.Finding limits is a quite tricky as in 2,3,4,n-dimensional space you can ap-proach a point from an infinite number of directions. A limit exists only if thelimits along all paths are the same. So, one need to prove that all the paths(infinite amount) approach the same limit. The epsilon-delta limit proof canbe used in n-dimensions to solve this issue. An important consequence isthat a function is continuous at a point if its limit exists for this point andits value is the value of the function at this point.

3.2 An Introduction to Partial Derivatives

We cant take the derivative of functions of multiple variables in the samefashion as we do with function of a single variables because we know howto take the derivate with respect to only one variable. However, there is ago-around solution for functions of several variables. This is let all variablesbut one be fixed, treat them as constants and take the derivative with respectto the variable that we left unfixed. Note, that in order for this method towork we need to take the derivative with respect to an independent variable.Most of the usual derivative properties still hold. However, we cannot treatdifferentials as fractions anymore. At least not in all cases.

u

x6=(x

u

)1


We can do this only in the case when in both derivatives the same variablesare held constant: (

u

x

)y

=

(x

u

)1y

Derivatives of functions of multiple variables can be taken in an infiniteamount of directions. Partial derivative are only a few of these. However theyare very representative. A partial derivative with respect to one variable givesthe slope of a function in the direction in which all but that variable are heldconstant.

If we narrow our discussion to functions of two variables we can obtainsome intuition and valuable results. If we have a function w(x, y) then foreach point in the xy plane for which the function is defined there will exist avalue w. This can be depicted graphically as a surface in three dimensions.Now, the partial derivative at some point P with respect to x will give a sliceof the function perpendicular to the xw plane and passing through P (x0, y0).If we vectorize these derivatives as will get vectors that are tangent to thesurface at point P . In order to do this remember that the derivative is thechange in the function (w) for unit length of the variable (x or y). We dontcare about the magnitude of the vector so we can just take one in ~i (or ~j)

and the value of derivative in ~k. Note that, the the third component will bezero.

~V1 = ~j +w

y

(x0,y0)

~k

~V2 =~i+w

x

(x0,y0)

~k

The normal vector to the surface at point P can be found from the crossproduct of the two tangent vectors:

~N = ~V1 ~V2 = wx

(x0,y0)

~i+w

y

(x0,y0)

~j ~k (3.2)Then, as was shown in Section 1.6, the equation of the tangent plane withnormal vector ~N at point P (x0, y0) is:

w

x

(x0,y0)

(x x0) + wy

(x0,y0)

(y y0) (w w0) = 0 (3.3)From here, the change in w on the tangent plane as a function of the cahngein x and y is:

wtan =w

x

(x0,y0)

x+w

y

(x0,y0)

y (3.4)

Note that this equation holds for the tangent plane but is only an approxi-mation to the function itself.


3.3 Differentiability and the Gradient

Lets continue our two dimensional discussion. Why should we restrict our-selves to derivatives only in the x and y directions? It makes perfect sense totalk about a derivative in direction s at (a, b) fs(a, b) or dw/ds. Note that weare using dw/ds instead of w/s. That is because when we talk about anarbitrary path no variable is held constant. Or rather, x and y are no longerindependent. They are linked through the equation of the line s. Now, howdo we find how much dw/ds is? We can start with the definition of limit:

fs(a, b) =dw

ds= lim

s0w

s

Recall from the previous section that

wtan = fx(a, b)x+ fy(a, b)y

Then dividing both sides by s we get

wtans

= fx(a, b)x

s+ fy(a, b)

y

s

If we let s 0

fs(a, b) =dw

ds= fx(a, b)

dx

ds+ fy(a, b)

dy

ds

One can recognize this as the dot product of two vectors:

f = (fx(a, b), fy(a, b))~us =

(dx

ds,dy

ds

)Note that we call the second vector a unit vector. Why is that? One caneasily see that its magnitude is always one as ds =

dx2 + dy2. Additionally

the first vector is the gradient of f . Now rewriting the equation for fs(a, b)as a dot product we get:

fs(a, b) = f ~us (3.5)An interesting observation is that the maximum possible directional deriva-tive occurs when ~us is parallel to f . In this case the direction of s is thesame as the direction of f . Thus the directional derivative is maximum inthe direction of the gradient. Keep in mind that the definition of the gradientis not dependent on a coordinate system. However in Cartesian coordinates


f = fx(a, b)~i+fy(a, b)~j. For example the gradient vector expressed in polarcoordinates is:

f = wr~ur +

1

r

w

~u (3.6)

These conclusions seem very nice but if you recall, we derived all theseresults only after we restricted ourselves to 2-dimensional vector space. Now,is it possible to scale the idea of differentiation to vectors of more than twovariables? Seems reasonable. Lets first see how will the limit definition ofthe derivative look:

f (x) = limx0

f(x+ x) f(x)x

(3.7)

This looks good on first sight. However, if one looks closely they will seethat the numerator of this fraction is a real number while the denominatoris a vector. Wait, have we defined how we divide scalar by vectors? Not yet.Lets first see how we define division of scalars. The number c

xis such that

when multiplied with x equals c. This definition also gives reason why it isimpossible to divide by zero. If one tried to do so, they would get that c

0

multiplied by zero equals c. However there is no number that multiplied byzero will give anything different than zero. Now, if we go back to our problemwith dividing scalars by vectors we can use the very same definition. That iscx

is a vector that multiplied with x equals c. What kind of multiplication isthis? Obviously should be the dot product as the result of the multiplicationhas to be a scalar. Notice that we said the number for the first case anda vector for the second. The reason for this is that while there is only onenumber that equals a division of scalars, there is an infinite amount of vectorsthat equal the division of a scalar by a vector. Now, for sake of simplicityand as we would like to get only one derivative out of the differentiation, wewill reduce the possible answers to only one:

c

v=

c

v2v (3.8)

Now lets rewrite a bit Equation 3.7. First of all, note that x is avector. What do we mean by x 0? We mean that its direction is keptconstant while the magnitude approaches zero. Then we can substitute xby tu where u is a unit vector in the direction of x and t is a positivereal number. It is obvious that if t 0, then x 0. If we make this


substitution in Equation 3.7 and use Equation 3.8 we get:

f (x) = limt0

f(x+ tu) f(x)tu

f (x) = limt0

f(x+ tu) f(x)tu2 tu

f (x) = limt0

f(x+ tu) f(x)t2 u2 tu

f (x) = limt0

f(x+ tu) f(x)t

u (3.9)

Here, Equation 3.9 represents the instantaneous rate of change in the di-rection of u. This can be also denoted as the directional derivative fu(x).Recall that we did not fix any constraints on x thus this result holds for anyfunction of an n-tuple.

Now we can define what differentiability is. A function f(x) is differen-tiable at x = a if and only if fu(a) exists in every direction u. That meansthat the existence of Equation 3.9 should be independent from the directionu. We can also define what a smooth surface is: a smooth surface is a sur-face for which the directional derivative exists in each direction at a point.Finally, another definition we can make is for the derivative of f(x). Thatis defined to be the directional derivative of f at a which has the greatestmagnitude.

3.4 The Chain Rule

The Chain Rule allows one to link a function to the functions that determineits variables. Just as an illustration cosider the follwoing case:

w = f(x, y) x = g(r, s) y = h(r, s)

It is easy to see that w can be expressed as a function of r and s. Then onecan find the partial derivative with respect to r or s. However, it the ChainRule allows us to do this without substituting variables:

w

r=w

x

x

r+w

y

y

r

Now, this holds only if the functions are continuously differentiable. Further-more, it is not allowed to cross out the x-es and the y-s. This would leadto an expression of the type 1 = 2. The reason for this that the different


partial derivatives are taken with different variables being held fixed. Thiscan be illustrated as:

w

r=

(w

x

)y

(x

r

)s

+

(w

y

)x

(y

r

)s

The Chain Rule works for the general kind of functions whose parametersare functions of other parameters. This nesting can continue even further.

The Chain Rule also holds for higher order derivatives. The logic behindthis is that if w/x is a partial derivative of w which is a function of bothx and y then in the general case w/x is also a function of both x and y.If w/x is a continuous function then it can be differentiated again.

In most cases fxy = fyx. However this is not always the case.

Theorem: If f , fx, fy, fxy exist and are continuous in the neigh-borhood of the point (a, b), then fyx also exist at (a, b) and infact fyx(a, b) = fxy(a, b) =

Even if f , fx and fy exist and are continuous, it is possible that fxy and fyxare not continuous.

3.5 Exact Differentials

Although that for illustration purposes we will use an example with a functionw = f(x, y), the principles are the same for functions of more than twovariables.

Recall thatw = fxx+ fyy

If w is differentiable we can make x and y into the infinitesimal dx anddy:

dw = fxdx+ fydy (3.10)

This is the equation for the total differential of w. Any expression of theform M(x, y)dx+N(x, y)dy is called a differential.

If we want to get back to w from the differential we need to integratewith respect to the first variable while keeping in mind that we will have aconstant function that depends on the other variables or a constant. Then wedifferentiate with respect to the second variable and establish the constantof integration. Repeat this for all variables. The last constant should be anumber. We call a differential exact if this method is able to find a functionw whose partial derivatives form the differential. If such a function does notexist, then the differential is inexact. Finally, it turns out that for functions


of two variables fxy = fyx. The opposite is also true: if fxy = fyx, then thedifferential is exact.

Chapter 4

Matrix Algebra

4.1 Linearity Revisited

Linear functions are simple and nice to work with. One property theyshare is that all linear functions have an inverse function. Unfortunately,most functions are non-linear. However most functions are locally linear.By this we mean that provided the function f is differentiable at x = a, thenf f (a)x near x = a. In other words, if f is continuously differentiableat x = a then locally (near x = a) f(x) = f(a)+f (a)(xa). This is also truefor functions of multiple variables. If w = f(x1, . . . , xn) and f is continuouslydifferentiable at x = a, then:

wlin = fx1(a)x1 + . . .+ fxn(a)xn (4.1)

This motivates the use of linear systems:

a11x1 + . . .+ a1nxn = b1...

a1mx1 + . . .+ amnxn = bm

Lets start with the definition of a matrix: by an m by n matrix we meana rectangle array of numbers arranged in m rows and n columns. Now, onecan put the n coefficients of m linear equations in a matrix. It turns outthat the chain rule motivates the definition of matrix multiplication. Wecan multiply two matrices if the number of columns of the first one equalsthe number of rows of the second. The dot the i-th row of the first matrixwith the j-th column of the second to obtain the term in the i-th row, j-thcolumn of the product. This can be used to change the variable of the linearequations. Namely, if the first matrix gives y1, y2, y3 as functions of x1, x2and the second one gives x1, x2 as functions of q1, q2, q3, q4, then the productof the two matrices will give us y1, y2, y3 as functions of q1, q2, q3, q4.

21

CHAPTER 4. MATRIX ALGEBRA 22

4.2 Introduction to Matrix Algebra

We defined what matrices are but without defining their structure, theyare pretty useless. Now, lets start with equating matrices. Any two m nmatrices (with the same dimensions) are equal if they are equal term-by-term.This is they are equal if [aij] = [bij]. Next, the sum of two m n matricesequals to the matrix that is obtained by the term-by-term summation: [cij] =[aij] + [bij]. The same situation holds for scalar multiplication: it is term-by-term multiplication with the scalar. For all these definitions the sizes of thematrices do not matter, as long as they have the same size.

If we want to define multiplication of matrices though this wont be thecase. Of course, we can define the multiplication of matrices to be term-by-term, then we will be able to do it with any size matrices and will beabsolutely feasible abstract mathematics definition. However, it would haveno physical application. That is why we define multiplication of matricesas dotting the i-th row of the first matrix with the j-th column of thesecond to obtain the term in the i-th row, j-th column of the product. Oneconsequence from this is that the order of the matrices does matter. Thus,generally AB 6= BA. Of course, there are cases when this is true, butgenerally you get different result if you switch the matrices.

Some other properties also follow:

1. A+B = B + A

2. A+ (B + C) = (A+B) + C

3. If 0 =

0 . . . 0... ...0 . . . 0

, then A+ 0 = A4. A = [aij]5. A(BC) = (AB)C

6. A(B + C) = AB + AC

7. If In =

1 0 . . . 0... ... ...0 0 . . . 1

, then AIn = InA = AThe last result is pretty important. The identity matrix In is a nn matrixthat has ones in the major (top left to bottom right) diagonal and all theother values are zeros. It comes from our definition of multiplication that


this result is true. Note that although generally AB 6= BA in this caseAIn = InA.

The inverse A1 of a matrix A is another matrix that multiplied by theoriginal matrix give the identity matrix. A very important fact is that A1

need not exist. An interesting link to the systems of linear equations is thatif A1 does not exist, then for some reason we cannot invert the system oflinear equations. The matrices for which A1 exists are called non-singularmatrices.

We can prove that if AB = AC for non-singular A then B = C. First,take a look at the following equation where a, b and c are real numbers:

ab = ac

It is clear that unless a = 0 b = c. But why is that? If we multiply bothsides of the equation by the inverse of a, 1/a then we get b = c. The verysame train of thought can be applied to matrices:

AB = AC

A1AB = A1AC

InB = InC

B = C

Keep in mind that in this derivation we assumed that A1 exists. If it didnot exist that would not be possible. So, if AB = AC then B = C if Ais non-singular. If A is singular then the case could be that AB = AC yetA 6= 0 and B 6= C. When we talk about A1 we infer that A is a squarematrix. Non-square matrices do not have inverses.

Finally, if A is any matrix, we define the transpose of A, written AT , tobe the matrix obtained when we interchange the rows and columns of A.That is, the columns of A are the rows of AT .

4.3 Inverting a Matrix

If we have

y1 = ax1 + bx2 + cx3

y2 = dx1 + ex2 + fx3

y3 = gx1 + hx2 + ix3

we can rewrite is as

CHAPTER 4. MATRIX ALGEBRA 24y1y2y3

=a b cd e fg h i

x1x2x3

And this can be further rewritten as

Y = AX (4.2)

Now, if one want to solve for X, meaning find x1, x2 and x3, we can rearrangethe equation provided that the inverse of A exists. A exists if the linearequations are just as many as the x variable and are unique, meaning thatnone of them is a constant multiplier times another. If A exists then x1, x2and x3 can be expressed as a function of y1, y2 and y3.

if A1 exists:

A1Y = A1AX

A1Y = InX

A1Y = X (4.3)

So far so good. It is clear that we can solve a system of linear equa-tions if only we knew the inverse of the matrix that contains the coefficients.But how do we compute the inverse? We perform matrix row operations.These are row-switching, row multiplication by scalar and rows addition andsubtraction (one row from another). To start, write down the matrix thatcontains the coefficient and on the right of it the identity matrix:

a b c 1 0 0d e f 0 1 0g h i 0 0 1

Now using the row operations stated above transform this matrix to: 1 0 0 j k l0 1 0 m n o

0 0 1 p r s

The right-hand part is the inverse:

A1 =

j k lm n op r s

If the determinant of a matrix is non-zero then it has an inverse. Fur-

thermore, the if the matrix of coefficients of a system of linear equations isnon-zero, then the system has a unique solution, otherwise when the deter-minant is zero there are either no solutions or many solutions.


4.4 Maxima and Minima in Several Variables

A local maximum is a point a for which f(a) f(x) for each x in th neigh-borhood of a. The definition for a local minimum is similar. There are threesteps one should take when looking for max-min candidates. Why candi-dates? All the maxima and the minima that the function has (on the givendomain) will be in the set of the candidate points. However, some of thiscandidates might not be minima or maxima, thus further investigation isnecessary.

1. Solve the system fx1(x1, . . . , xn) = 0...

fxn(x1, . . . , xn) = 0

This will give the points where all the partial derivatives are zero.Note, that these points will not be maxima or minima if a directionalderivative in any other direction is not zero.

2. Find the points where f is not differentiable as these points were notincluded in the analysis in the previous point

3. Check the boundaries of the domain. If the domain is bounded, thenthere is at least one maximum and minimum and it is possible thatthese occur on the boundary.

WWhen we have found a candidate point (a, b) we must look at the signof f(a+x, b+y)f(a, b). For a maximum this should be negative for allsmall values of x and y and for a minimum it should be positive. This isnot always easy to show. In this case, it is usually easier to use the secondderivatives. However, we will restrict the further discussion of this matter tofunctions of two variables. We will use the values for fx, fy and fxy = fyx sothis will hold only if the function and its partial derivatives are differentiableat (a, b). If fx(a, b) = fy(a, b) = 0 then:

1. If fxxfyy f 2xy > 0 then (a, b) is a local minimum if fxx > 0 and localmaximum if fxx < 0

2. If fxxfyy f 2xy < 0 then (a, b) is a saddle point3. If fxxfyy f 2xy = 0 then the test is insufficient and f(a+ x, b+ y)f(a, b) should be used to investigate further

Chapter 5

Multiple Integration

5.1 The Fundamental Theorem

Integrating multiple integrals of a continuous function of several variablesis done in an iterative manner. First, the innermost integral is computedwhile keeping all other variables constant and evaluating with the limits.In this fashion all instances of the variable that we integrated with respectto should be gone and we are left with an integral of a function dependentonly on the other variables. Then, the same procedure is repeated withthe next integral. Keep in mind, that although the order of integrationis not important provided that the function is continuous, it is impossibleto evaluate integrals with limits of integration dependent on x after onehas already integrated with respect to x. That means, that the order ofintegration should be such that no limits of integration depend on variablesthat were already integrated with respect to.

5.2 Multiple Integration and the Jacobian

Lets discuss variable substitution. We know that often our integration canbe greatly simplified if we use substitution. However, one thing that we needto always keep in mind when mapping integrals is that the area elements getscaled. That is that we not only have to perform the substitution and thechange of the limits of integration but also introduce a scaling factor. Thiscan be illustrated with the following example:

If

31

2xx2 + 1dx and u = x2 + 1,

31

2xx2 + 1dx 6=

102

2u 1udu

The key idea is that although the scaling is not always linear, due to the

26

CHAPTER 5. MULTIPLE INTEGRATION 27

fact that we are dealing with infinitesimal values means that the error thatarises from using linearizion goes to zero. The general form of the scalingfactor (also known as the Jacobian) is:

J =dF

dx=

[F

x1 F

xn

]=

F1x1

F1xn

.... . .

...Fmx1

Fmxn

(5.1)or, component-wise:

J i,j =Fixj

(5.2)

5.3 Line Integrals

Line integrals are often used in physics to calculate the work a force has done.They can be written in several ways:

~F d~r

if ~F = (M,N) and d~r = (dx, dy)Mdx+Ndy

Often ~F and ~r are expressed in terms of the same variable, so integrationis further simplified. Line integrals do not depend only on the starting andfinal positions but also on the path taken. That is taken care of by ~r.

5.4 Greens Theorem

Lets first introduce the concept of a connected region. A connected regionis a region in which any point can be connected to any other point with aline (not necessary straight) that does not leave the region. Furthermore, theconnected regions are divided into simply-connected and multiply-connected.Intuitively, a simply-connected region is a region without any holes. Rig-orously defined, a simply-connected region is a region whose compliment(inverted) region is also connected.

Greens Theorem: IfR is a simply-connected region with bound-ary C, then

C

Mdx+Ndy =

R

(N

x M

y

)dA

CHAPTER 5. MULTIPLE INTEGRATION 28

provided that M , N , My and Nx exist and are continuous on R.

Here the positive direction of C is defined to be such that when one is movingalong C the region is on their left side.

There are two interesting consequences from Greens theorem. First,note that if Mdx+Ndy is an exact differential (if there exists potential fieldwhose partial derivatives are M and N) then (N/x M/y) = 0 andthe integral will equal zero. That is to be expected as a closed line integralin a conservative field is always zero.

Note that Greens theorem is defined for a simply-connected region. How-ever, it is easy to show that it in fact holds for any closed region. If we have aregion with a hole, we can split it into two separate regions. As the cut hasno thickness, the sum of the areas of the new regions will be the same as theoriginal area. Furthermore, the cuts will be traversed twice but in oppositedirections they cancel out and the total boundary traversed is the outsideplus the inside boundary. How do we evaluate integrals like this. An integralwith a hole will be the sum of the line integrals of the outside boundary andthe inside boundary (provided that they are in the positive direction thatwas defined above).

Vector ArithmeticThe ``Game'' of MathematicsThe Structure of Vector ArithmeticApplications to 3-Dimensional SpaceThe Dot ProductThe Cross ProductEquations of Lines and Planes

Vector CalculusVector Functions of a Scalar VariableTangential and Normal VectorsPolar CoordinatesVectors in Polar Coordinates

Partial Derivativesn-Dimensional Vector SpacesAn Introduction to Partial DerivativesDifferentiability and the GradientThe Chain RuleExact Differentials

Matrix AlgebraLinearity RevisitedIntroduction to Matrix AlgebraInverting a MatrixMaxima and Minima in Several Variables

Multiple IntegrationThe Fundamental TheoremMultiple Integration and the JacobianLine IntegralsGreen's Theorem