Wednesday, September 30, 2009

Nothing new

Unless I miraculously complete 3 problems sets, find the errors in the calculations I've been working on for 4 months and the bugs in the code based on those calculations, and mow the lawn by a reasonable hour today, there will likely not be a new Behind the Guesses post of substance this month.

However, I am considering a new method of posting, using Google Document Viewer, rather than converting LaTeX into pictures, etc. This is the last post, Noncommuting Rotation and Angular Momentum Operators, using the new method. Would you prefer this, or the old way? Or do you just download the PDF, and it makes no difference?


Monday, August 31, 2009

Noncommuting Rotation and Angular Momentum Operators

[Click here for a PDF of this post with nicer formatting]
The Setup
Avi Ziskind1 asked me to cover non-commuting operators in quantum mechanics, specifically why angular momentum operators do not commute. He pointed out that Griffiths [1] gives an intuitive argument for understanding why position and momentum operators do not commute but does not present any rationale given for why the different components of angular momentum have the commutation relation

(1)
Additionally, Schwabl [2], for example, defines the angular momentum operator, presents the commutation relations, and at least attempts (I think) to show (in a post-facto way) why they should have such relations. Likewise, in a related (as we'll see) problem, Goldstein, et. al. [3] discuss the commutation relations of generators of rotation without any physical argument.

However, both Sakurai [4] and Landau and Lifshitz [5], to some degree, present physical rationales for these relations. Landau and Lifshitz derive the notion of angular momentum in quantum theory quite nicely, and succinctly, but do not argue for why the commutation relations should hold. Sakurai develops a set of commutation relations independently of QM (as I will, shortly), but, I feel, bridges the gap to angular momentum rather poorly.

This post assumes familiarity with the ``generator of transformation'' ideas in [6].


The Generator of Rotation
(a) In 3D.

(b) The projection of (a) onto the -plane.
Figure 1: The rotation of a vector around the -axis.

In a previous post I covered the notion of ``generators of transformations,''[6] and claimed, as an example, that the ``generator of rotation'' is the angular momentum. Actually, I was getting ahead of myself there, and the statement in that context was not entirely correct. As I did not derive this result in that post, I will now, and will hopefully clear things up.

Suppose we have a function and we want to rotate it in space around the axis through some angle to . To do this, we'll find an ``angle rotation'' operator , which, when applied to , gives . That is,

(2)
The shift in coordinates can be derived from regular vector analysis, see Fig. 1 and Ref. [7], applied inside the arguments of the function.

Now the tricky part -- the Taylor expansion. Unlike the last time where the translated function had a simple argument, here we have inside sines and cosines. Since I'm really too lazy to do this expansion by hand I had Mathematica do it for me (click to see full-size):
(3)
In Mathematica's notation, raised to those parenthetical powers denotes partial derivatives. Say, means , for example. This expression is a bit of a mess, but we are not completely lost. From our discussion in the beginning of [6], we know that at least one similar operator takes an exponential form. So, we'll guess that here, as well, our operator will take an exponential form. We just need to process the mess of (3) to find that hidden exponential.

The first two terms in the series give us hope. They can be written as

(4)
which are, indeed, what we would expect to see at the beginning of an exponential expansion, where is the generator. Now we check that this keeps up for higher powers.

Continuing with the quadratic term, let's see if we can write as which would be the next term in an exponential. We check:





(5)
which does match the mess for the term in the expansion (3). You can verify on your own that this pattern continues in the higher powers.

Thus we conclude that

(6)
where we now identify

(7)
as the generator of the rotation.

This generator is the -component of the cross product

where and . Thus, we can simplify

(8)
If we carry through these same calculations for rotations around the or axes (try it yourself!) we get similar generators

(9)

(10)
This allows us to write the rotation operator for a rotation around an arbitrary axis , as

(11)
where for

(12)
is the generator of the transformation.


Commutators in general
In general, rotations do not commute. That is, rotating an object first around the -axis and then around the -axis will give a different result than rotating in the opposite order. You can convince yourself of this by the ultimate hand-waving argument2 -- twist your hand around different axes in different orders. Or see Fig. 2.

We'd like to find a way to quantify the difference between applying the rotations in different order, but, for the sake of generality, we'll discuss this for any two arbitrary operators and . The most natural way to quantify a difference is to look at, well, the difference. That is, if these operators act on a vector , we'd like to know what

(13)
is. This difference (for linear operators) does not depend on the particular vector , so we'll define the commutator of two operators as

(14)
Thus, a commutator of two operators is another operator which enacts this difference. If the order of operator application does not affect the end result the commutator is 0, and the operators are said to ``commute.''

In quantum mechanics, the issue of non-commuting operators is closely tied to the problem of measurement and the uncertainty principle. For example, if I have a state and I want to measure the position I apply the position operator . Likewise, if I want to measure the momentum I apply the momentum operator . However, in quantum mechanics, the order of taking these measurements affects the results, such that , for example. However, the applicability of commutators is not relegated only to quantum mechanics.


Commutators for rotation
This brings us back to our original question of the commutator of rotations. Because any two rotations through arbitrary angles, done in opposite orders give drastically different results depending on the angles, we'll consider rotations through small angles , such that we can approximate (11) by the first two terms in the expansion:

(15)
This simpler expression makes calculating the commutator much simpler. For rotations around and , the commutator depends only on the commutator of the generators .3 This commutator is the generator of the transformation for ``the difference between the order of the rotations.'' That is

(16)
where is the parameter for this transformation. Then, just as any rotation can then be built up from repeated applications of the generator (as in that exponential), the commutators for larger angles can be built up from repeated applications of the commutators of the generators.

Figure 3: Graphical commutator of . Blue vector is application of either or . Red is further application of to and green is further application of to . Brown is difference between the two.

For ease of illustration, we'll consider small rotations around the - and -axes (i.e. and ). There are two ways to find the commutator . One way is by brute force calculation which I encourage you to try on your own (use the expressions for (9) and (10)). However, I prefer showing it graphically, see Fig. 3. Starting with a vector in the -plane, we apply a small rotation around . This directs the vector upwards (blue in the picture). Then we apply another small rotation around , which directs the vector along the red line.

If we start with the same vector, and apply a small rotation around , the vector follows the blue line again. However, when we then rotate around , the vector veers off in the opposite direction at the same rate. The difference between the red and green vectors, as well as that difference added to the initial vector is shown in brown. The picture illustrates that

(17)
i.e. the generator of rotation around the -axis. Similar relationships

(18)
hold for other permutations of .


Angular momentum
Looking back at the expression for the generator of rotations (12), we see that we can re-write this in terms of the momentum operator

(19)
in quantum mechanics:


(20)
where we call the ``quantum mechanical angular momentum'' operator.4 Flipping this around to solve for in terms of :

(21)
In other words, the quantum mechanical angular momentum is the same (up to a constant) as the generator of rotations. Thus, the reason that quantum angular momentum has commutation relations (1) is due to the fact that it's simply a generator of rotation masquerading as a quantum mechanical operator.


References
[1] D.J. Griffths. Introduction to Electrodynamics. Pearson Prentice Hall, 3rd edition, 1999.
[2] F. Schwabl. Quantum Mechanics. Springer, 3rd edition, 2005.
[3] H. Goldstein, C. Poole, and J. Safko. Classical Mechanics. Cambridge University Press, San Francisco, CA, 3rd edition, 2002.
[4] J.J. Sakurai. Modern Quantum Mechanics. Addison-Wesley, San Francisco, CA, revised edition, 1993.
[5] L.D. Landau and E.M. Lifshitz. Quantum Mechanics. Butterworth-Heinemann, Oxford, UK, 3rd edition, 1977.
[6] E. Lansey. The Schrodinger Equation -- Corrections [online]. June 2009. Available from: http://behindtheguesses.blogspot.com/2009/06/schrodinger-equation-corrections.html.
[7] D.C. Lay. Linear Algebra and Its Applications. Addison-Wesley, Reading, MA, 3rd edition, 2003.
[8] C.T.J. Dodson and T. Poston. Tensor Geometry: The Geometric Viewpoint and its Uses. Springer, 2nd edition, 1997.



1 Everyone congratulate him on the birth of a son!
2 Borrowing a joke from Dodson and Poston, [8]
3 If this isn't obvious, work it out for yourself. Hint: The identity operator 1 commutes with everything.
4 There are better arguments (see [5]) using symmetry for why should actually be the angular momentum, not just called it, as I've argued, but they require much more talking. And this post is long enough already.

Wednesday, July 29, 2009

Transverse Electric and Magnetic Fields in a Waveguide

[Click here for a PDF of this post with nicer formatting]
The Setup

Figure 1: An example of a section cylindrical waveguide with embedded coordinate axes.

A conducting waveguide is a metal tube -- think pipe or air conditioning duct, for example -- through which electromagnetic waves can propagate. If you want to know what real-life waveguides look like, just do a quick internet image search. We'll assume the length of the tube is oriented along the -direction, see Fig 1. There is no loss of generality in doing this, since we can always choose a coordinate system as we like. So really, we're picking a coordinate system such that the -axis points along the tube.

Now, we can decompose the electric field and magnetic (inductance) field vectors into two parts each. One part points along the (normal) direction while the other is pointing somewhere in the (transverse) plane. Explicitly:

(1a)

(1b)

In the first([1], Eq. (8.24)) and third[2], Eq. (8.26)) editions of Classical Electrodynamics, J.D. Jackson gives the transverse fields in terms of the -components of the fields. (I have no idea why he left the complete expression out of the second edition.) In the third edition, for example, he assumes plane wave propagation in the positive direction -- that is an dependance -- and simply states, without any real explanation:
the transverse fields are

where I've converted his new choice of MKSA units back into the clearer CGS units. However, back in the first edition he does not insist on the assumption of positive propagation. Moreover, he does not just state the fields; he suggests a method for getting them -- namely, manipulation of the curl equations in Maxwell's equations. However, in that edition, he does not expand the curl equations in light of the separation of the fields into transverse and parallel components as he does in the second and third editions.

Because of all this confusion, I'm going to derive the cavity modes fully, starting from Maxwell's equations, once and for all. This derivation is based on a combination of all three editions of Jackson's book. This is a tedious, although not completely trivial exercise. Brace yourselves for quite a bit of algebra.


Maxwell's Equations - The Curls
Here we'll deal with the two curl equations in Maxwell's equations:

(2a)

(2b)
where is the magnetic field and is the electric displacement field. We will assume the inside of the waveguide has uniform permittivity and permeability, so and . Also, we'll assume the absence of any currents, so and we'll drop it from here on. Additionally, we'll assume the same sinusoidal time dependance for both the fields. Thus, the time derivatives ``bring down'' a factor of .

Furthermore, since we're splitting up and into normal and transverse parts, we'll do the same with the gradient operator :


Because curl equations are annoying, and because we're ultimately looking for an equation for the transverse fields, I'm going to try and get rid of the 's. The symmetry of form in (2) means that we'll only need to do these calculations once; I will use in place of either or .
First, we'll expand :


(3)
We've killed one term through this expansion. However, the leftmost cross product term gives a quantity with only a component. The righthand side of these equations also have a term. We can get rid of both by multiplying the entire equation(s) by :


(4)

Figure 2: Vectors , and .

For why see Fig. 2. Also, we note that

(5)
for the same reason. We could have used the vector multiplication identity

to simplify both of these expressions, or expanded and and carried through even more algebra, but I think the picture is clearer.

Thus,

(6)
and we can write (2) as

(7a)

(7b)

At this point, it's time to introduce the explicit dependence and process the derivatives.


Some and notes
Unlike Jackson, who works with the assumption of upward propagating waves -- i.e. an dependence -- we'll work with an assumed dependance, thus allowing both upward and downward propagating waves. Thus, the derivatives ``bring down'' a factor of . Whenever we have or the upper symbol is the sign for upward propagating waves, the lower symbol is for downward propagating. Because we'll be mucking about with these plus-minus guys in some algebra, I want to get a few issues out of the way.

The first thing to keep in mind about these plus-minus operators is that an equation like

(8)
is shorthand for two different equations:

(9a)

(9b)

So, there are essentially two ways to approach these things. One way is to carefully trace at the outset what happens to or under various arithmetic operations like addition, multiplication, etc. This has the benefit of being more concise -- you only need to write each equation once -- but is a lot easier to make errors and hides the double-equation nature of the symbol. I'll admit, though, that when I'm writing a paper I'm generally inclined to take this path.

However, for the purposes of this blog post, I'll explicitly carry out the calculations in parallel equations. (This really looks much better in the PDF. If anyone has any suggestions for improving the web version, please, let me know!) The left-hand column corresponds to , the right-hand column to . At the end I will also show what the results looks like in the shorthand notation and I encourage you to work out the rules on your own. Perhaps in another post I'll address the shorthand notation in detail.


Some more algebra
Now, it's time for some more algebra.1 Taking the derivative in (7) gives:








|


(10)
and








|


(11)

Solving (10) for gives








|


(12)
Substituting this into (11) and simplifying:


















|



|



|


(13)

Solving this for gives:








|


(14)
Or, in form:

(15)

In the first edition, Jackson converts the back into to get rid of the , but I feel this confuses things, as this expression only holds for a plane wave in the direction. In any case, we now substitute this expression for back into (12) and simplify:























|



|



|



|


(16)
Or, in form:

(17)

So, we've finally achieved Jackson's result, allowing for both upward and downward propagating waves.


References
[1] J.D. Jackson. Classical Electrodynamics. John Wiley & Sons, Inc., 1st edition, 1966.
[2] J.D. Jackson. Classical Electrodynamics. John Wiley & Sons, Inc., 3rd edition, 1998.




1 In case you were wondering why Jackson left out the whole calculation...

Tuesday, June 30, 2009

Derivative and Integral of the Heaviside Step Function

[Click here for a PDF of this post with nicer formatting]
The Setup
(a) Large horizontal scale

(b) ``Zoomed in''
Figure 1: The Heaviside step function. Note how it doesn't matter how close we get to
the function looks exactly the same.

The Heaviside step function , sometimes called the Heaviside theta function, appears in many places in physics, see [1] for a brief discussion. Simply put, it is a function whose value is zero for and one for . Explicitly,

(1)
We won't worry about precisely what its value is at zero for now, since it won't effect our discussion, see [2] for a lengthier discussion. Fig. 1 plots . The key point is that crossing zero flips the function from 0 to 1.


Derivative -- The Dirac Delta Function
(a) Dirac delta function

(b) Ramp function
Figure 2: The derivative (a), and the integral (b) of the Heaviside step function.
Say we wanted to take the derivative of . Recall that a derivative is the slope of the curve at at point. One way of formulating this is

(2)
Now, for any points or , graphically, the derivative is very clear: is a flat line in those regions, and the slope of a flat line is zero. In terms of (2), does not change, so and . But if we pick two points, equally spaced on opposite sides of , say and , then and . It doesn't matter how small we make , stays the same. Thus, the fraction in (2) is


(3)
Graphically, again, this is very clear: jumps from 0 to 1 at zero, so it's slope is essentially vertical, i.e. infinite. So basically, we have

(4)
This function is, loosely speaking, a ``Dirac Delta'' function, usually written as , which has seemingly endless uses in physics.

We'll note a few properties of the delta function that we can derive from (4). First, integrating it from to any :



(5)
since . On the other hand, integrating the delta function to any point greater than :



(6)
since .

At this point, I should point out that although the delta function blows up to infinity at , it still has a finite integral. An easy way of seeing how this is possible is shown in Fig. 2(a). If the width of the box is and the height is , the area of the box (i.e. its integral) is , no matter how large is. By letting go to infinity we have a box with infinite height, yet, when integrated, has finite area.


Integral -- The Ramp Function
Now that we know about the derivative, it's time to evaluate the integral. I have two methods of doing this. The most straightforward way, which I first saw from Prof. T.H. Boyer, is to integrate piece by piece. The integral of a function is the area under the curve,1 and when there is no area, so the integral from to any point less than zero is zero. On the right side, the integral to a point is the area of a rectangle of height 1 and length , see Fig. 1(a). So, we have

(7)
We'll call this function a ``ramp function,'' . We can actually make use of the definition of and simplify the notation:

(8)
since and . See Fig. 2(b) for a graph -- and the reason for calling this a ``ramp'' function.

But I have another way of doing this which makes use of a trick that's often used by physicists: We can always add zero for free, since . Often we do this by adding and subtracting the same thing,

(9)
for example. But we can use the delta function (4) to add zero in the form

(10)
Since is zero for , the part doesn't do anything in those regions and this expression is zero. And, although at , at , so the expression is still zero.

So we'll add this on to :





(11)
where the last step follows from the ``product rule'' for differentiation. At this point, to take the integral of a full differential is trivial, and we get (8).


References
[1] M. Springer. Sunday function [online]. February 2009. Available from: http://scienceblogs.com/builtonfacts/2009/02/sunday_function_22.php [cited 30 June 2009].
[2] E.W. Weisstein. Heaviside step function [online]. Available from: http://mathworld.wolfram.com/HeavisideStepFunction.html [cited 30 June 2009].



1 To be completely precise, it's the (signed) area between the curve and the line .

Thursday, June 4, 2009

The Schrödinger Equation - Corrections

[Click here for a PDF of this post with nicer formatting]
In my last post, I claimed
Additionally, we can extend from here that any quantum operator is written in terms of its classical counterpart by
Peeter Joot correctly pointed out that this result does not follow from the argument involving the Hamiltonian. While it is true that
any arbitrary unitary transformation, , can be written as
where is an Hermitian operator,
the relationship between a classical and its quantum counterpart is not as straightforward as I claimed. In reality, we can only relate the classical Poisson brackets to the quantum mechanical commutators, and we must work from there. Perhaps I will discuss this further in a later post.

In any case, though, the derivation of the Schrödinger equation only makes use of the relationship between the classical and quantum mechanical Hamiltonians, so the remainder of the derivation still holds. I am leaving the original post up as reference, but the corrected, restructured version (with some additional, although slight, notation changes) is below.


A brief walk through classical mechanics
Say we have a function of and we want to translate it in space to a point , where need not be small. To do this, we'll find a ``space translation'' operator which, when applied to , gives . That is,

(1)
We'll expand in a Taylor series:


(2)
which can be simplified using the series expansion of the exponential1 to

(3)
from which we can conclude that

(4)
If you do a similar thing with rotations around the -axis, you'll find that the rotation operator is

(5)
where is the -component of the angular momentum.

Comparing (4) and (5), we see that both have an exponential with a parameter (distance or angle) multiplied by something ( or ). We'll call the something the ``generator of the transformation.'' So, the generator of space translation is and the generator of rotation is . So, we'll write an arbitrary transformation operator through a parameter

(6)
where is the generator of this particular transformation.2 See [1] for an example with Lorentz transformations.


From classical to quantum
Generalizing (6), we'll postulate that any arbitrary quantum mechanical (unitary) transformation operator through a parameter can be written as

(7)
where is the quantum mechanical version of the classical operator . We'll call this the ``quantum mechanical generator of the transformation.'' If we have a way of relating a classical generator to a quantum mechanical one, then we have a way of finding a quantum mechanical transformation operator.

For example, in classical dynamics, the time derivative of a quantity is given by the Poisson bracket:

(8)
where is the classical Hamiltonian of the system and is shorthand for a messy equation.[2] In quantum mechanics this equation is replaced with

(9)
where the square brackets signify a commutation relation and is the quantum mechanical Hamiltonian.[3] This holds true for any quantity , and is a number which commutes with everything, so we can argue that the quantum mechanical Hamiltonian operator is related to the classical Hamiltonian by

(10)


Time translation of a quantum state
Consider a quantum state at time described by the wavefunction . To see how the state changes with time, we want to find a ``time-translation'' operator which, when applied to the state , will give . That is,

(11)
From our previous discussion we know that if we know the classical generator of time translation we can write using (7). Classically, the generator of time translations is the Hamiltonian![4] So we can write


(12)
where we've made the substitution from (10). Then (11) becomes

(13)

This holds true for any time translation, so we'll consider a small time translation and expand (13) using a Taylor expansion3 dropping all quadratic and higher terms:

(14)
Moving things around gives

(15)
In the limit the right-hand side becomes a partial derivative giving the Schrödinger equation

(16)

For a system with conserved total energy, the classical Hamiltonian is the total energy

(17)
which, making the substitution for quantum mechanical momentum and substituting into (19) gives the familiar differential equation form of the Schrödinger equation

(18)


References
[1] J.D. Jackson. Classical Electrodynamics. John Wiley & Sons, Inc., 3rd edition, 1998.
[2] L.D. Landau and E.M. Lifshitz. Mechanics. Pergamon Press, Oxford, UK.
[3] L.D. Landau and E.M. Lifshitz. Quantum Mechanics. Butterworth-Heinemann, Oxford, UK.
[4] H. Goldstein, C. Poole, and J. Safko. Classical Mechanics. Cambridge University Press, San Francisco, CA, 3rd edition, 2002.



1
2 There are other ways to do this, differing by factors of in the definition of the generators and in the construction of the exponential, but I'm sticking with this one for now.
3 Kind of the reverse of how we got to this whole exponential notation in the first place...

Tuesday, May 26, 2009

The Schrödinger Equation

Update: A corrected and improved version of this post is now up: http://behindtheguesses.blogspot.com/2009/06/schrodinger-equation-corrections.html

[Click here for a PDF of this post with nicer formatting]
notElon asked me to discuss, and to try and derive the Schrödinger equation, so I'll give it a shot. This derivation is partially based on Sakurai,[1] with some differences.

A brief walk through classical mechanics
Say we have a function of and we want to translate it in space to a point . To do this, we'll find a ``space translation'' operator which, when applied to , gives . That is,

(1)
We'll expand in a Taylor series:


(2)
which can be simplified using the series expansion of the exponential1 to

(3)
from which we can conclude that

(4)
If you do a similar thing with rotations around the -axis, you'll find that the rotation operator is

(5)
where is the -component of the angular momentum.

Comparing (4) and (5), we see that both have an exponential with a parameter (distance or angle) multiplied by something ( or ). We'll call the something the ``generator of the transformation.'' So, the generator of space translation is and the generator of rotation is . So, we'll write an arbitrary transformation operator through a parameter as

(6)
where is the generator of this particular transformation.2 See [2] for an example with Lorentz transformations.


From classical to quantum
In classical dynamics, the time derivative of a quantity is given by the Poisson bracket:

(7)
where is the classical Hamiltonian of the system and is shorthand for a messy equation.[3] In quantum mechanics this equation is replaced with

(8)
where the square brackets signify a commutation relation and is the quantum mechanical Hamiltonian.[4] This holds true for any quantity , and is a number which commutes with everything, so we can argue that the quantum mechanical Hamiltonian operator is related to the classical Hamiltonian by

(9)
specifically.

Additionally, we can extend from here that any quantum operator is written in terms of its classical counterpart by

(10)

So, using (4) the quantum mechanical space translation operator is given by

(11)
and, using (5), the rotation operator by

(12)
or, from (6) any arbitrary (unitary) transformation, , can be written as

(13)
where is (an Hermitian operator and is) the classical generator of the transformation.


Time translation of a quantum state
Consider a quantum state at time described by the wavefunction . To see how the state changes with time, we want to find a ``time-translation'' operator which, when applied to the state , will give . That is,

(14)
From our previous discussion we know that if we know the classical generator of time translation we can write using (13). Well, classically, the generator of time translations is the Hamiltonian![5] So we can write

(15)
and (14) becomes

(16)

This holds true for any time translation, so we'll consider a small time translation and expand (16) using a Taylor expansion3 dropping all quadratic and higher terms:

(17)
Moving things around gives

(18)
In the limit the righthand side becomes a partial derivative giving the Schrödinger equation

(19)

For a system with conserved total energy, the classical Hamiltonian is the total energy

(20)
which, making the substitution for quantum mechanical momentum and substituting into (19) gives the familiar differential equation form of the Schrödinger equation

(21)


References
[1] J.J. Sakurai. Modern Quantum Mechanics. Addison-Wesley, San Francisco, CA, revised edition, 1993.
[2] J.D. Jackson. Classical Electrodynamics. John Wiley & Sons, Inc., 3rd edition, 1998.
[3] L.D. Landau and E.M. Lifshitz. Mechanics. Pergamon Press, Oxford, UK.
[4] L.D. Landau and E.M. Lifshitz. Quantum Mechanics. Butterworth-Heinemann, Oxford, UK.
[5] H. Goldstein, C. Poole, and J. Safko. Classical Mechanics. Cambridge University Press, San Francisco, CA, 3rd edition, 2002.



1
2 There are other ways to do this, differing by factors of in the definition of the generators and in the construction of the exponential, but I'm sticking with this one for now.
3 Kind of the reverse of how we got to this whole exponential notation in the first place...

Monday, April 27, 2009

The Dot and Cross Products

[Click here for a PDF of this post with nicer formatting]
A bad way
The dot product and cross product of two vectors are tools which are heavily used in physics. As such, they are typically introduced at the beginning of first semester physics courses, just after vector addition, subtraction, etc. Although they are not strictly required for these intro courses (see [1], for example), they make the development and computations of work and energy, torque, and electromagnetism far simpler.

Unfortunately, they are consistently introduced in an awful way: by straight definition. That is, using the dot product for example, for two vectors and they say something like
We define the dot product between and as:
or,
where is the angle between them.
Then, for the cross product, either they use an equation like the latter of the above two equations coupled with the ``right-hand rule,'' or a strange algebraic combination of the components of and , often ``simplified'' with help of a startling determinant.1 See [2], [3], [4], [5] and [6] as a few examples. Although a few of these give a geometric interpretation after the fact, it is usually in passing, and does not really contribute to their discussion. These approaches are not limited to textbooks, either. See [7] for an in-class lecture example.

In these examples, the dot product is introduced first and then the cross product. From one standpoint this makes some sense -- the dot product is definitionally simpler and usually easier to calculate. However, from a conceptual standpoint, I think this order is backwards. Furthermore, in my experience, students, by and large, miss the physical and graphical significance of these definitions, and upon encountering the concepts of work or torque later on, take the resulting expressions purely as definitions as well.2 This is yet another example of the fact that definition explanation.

Personally, it is my inclination to wait to introduce these products until they're needed, thus motivating the discussion in the first place. However, I do understand the notion of ``getting it over with,'' and, it's possible that introducing them as abstract concepts lends to easier application of the concepts to general problems. In any case, my discussion follows the latter approach (for better insertion into standard texts) and presupposes understanding of vector basics: addition, decomposition, etc..


A better way
The Cross Product
(a) Geometrical view of the cross product as the parallelogram area.

(b) Graphical derivation of area for two 2D arbitrary vectors, from [8].
Figure 1: The 2D cross product of vectors and .

Say we have two vectors and with lengths and , and we want to find something which is a measure of how much of is perpendicular to . Looking at Fig. 1(a), we can see that the area of the parallelogram sided by the two vectors is such a measure. The area of a parallelogram is
(1)
which, for our case, is the same as
(2)
That is, you can only have an area if you have a ``base'' and a ``height'' perpendicular to the base. Thus area is a good measure of perpendicularity.3

There are two different ways of calculating this area. If the angle between the two vectors is , as in Fig. 1(a), we see that, choosing as the ``base'' we can write the ``height'' as . Alternatively, choosing as the base, we write the perpendicular part as . Then the area is
.
(3)
However, if we don't know angle between them, we're not completely out of luck. If you look at Fig. 1(b), you can see that for a simple, two-dimensional case, we can express the area in terms of the and components of and :
(4a)
Of course, I could just have easily labeled the axes and which would give a different area
(4b)
or and , which would give yet another area
(4c)

If all we've done is relabel our axes, keeping and fixed, then we wouldn't expect the size of these areas to be different -- and they're not. However, although the amount of area is the same, in a way the areas are different in that they're facing different directions in each case. So, we need a way to distinguish these three areas from each other, and from an arbitrarily oriented area. What we'll do is pick a vector perpendicular to both to -- and thus perpendicular to the area of the parallelogram -- with magnitude equal to the area. We'll call this vector
(5)
and say it's the result of a ``cross product'' of and . However, in principle, we have a choice of two such perpendicular vectors. In Fig. 1, for example, we could choose the vector pointing in either the or direction. Additionally, this arbitrariness can be seen in choosing whether to measure the angle in (3) from to or vise-versa.

So, as a matter of convention, we'll decide to always measure angles from the first term in the cross product ( in (5)) such that
(6)
so if the fingers in your right hand point along the little arcs we draw for angles, your thumb points in the direction that this vector goes. Thus,
(7)
since your hand would curl in the other direction. This is called the ``Right-Hand Rule.'' Then, the areas we discussed in equations (4) become
(8a)
(8b)
and
(8c)
where the subscripts tell us which coordinate plane the two crossed vectors are in. Thus, the cross product represents how much these two vectors point in perpendicular directions, and is a signed area vector perpendicular to the plane described by and .

(a) Geometrical view of the 3D cross product as the parallelogram area.
(b) Looking at the area from the xy-plane (dashed outline), the yz-plane (shaded) and the zx-plane (solid).
Figure 2: The 3D cross product of vectors and and the decomposed area.

So far, though, we've only discussed vectors which have only two coplanar components. But it's fairly straightforward to generalize to arbitrary 3D vectors. See Fig. 2(a), for example. Here the area vector, and hence the cross product vector, is pointing in a complicated direction. However, we know we can decompose any vector into its , and components, and this area vector is no different:
(9)

All we need to do is find out how much area is pointing in each direction. To do that, look at Fig. 2(b). This picture shows what the area between the two vectors looks like if we look only at two coplanar components at a time -- in other words the , and components of the area. But we already know what each of these areas are from (8)! So, then we can combine these equations and write the cross product
(10)





The Dot Product
(a) When B < A.
(b) When B > A.
Figure 3: The projection of vector on to vector .

Having discussed the perpendicularity of two vectors, it's natural to ask if there's a similar measure of the parallelity of two vectors. There are two ways of doing this. The way I'll do it first is explicitly geometrical, the second way is only implicitly geometrical.

Say we have two vectors and again, and we want to know how much of is pointing (projected) along . From Fig. 3 we see that this is equal to
(11)
Similarly, the amount of that is projected along is
(12)
Now, it would be nice if we could have one statement which somehow combined the these two statements and gave a measure both of how much of is pointing along and of how much of is pointing along ; that is, a measure of how much these two vectors point in the same direction. Additionally, since (2) used a multiplicative combination of the two vectors as a measure of perpendicularity, we'll try a similar multiplicative measure here, as well.

If we multiply (11) by and (12) by we can write a single, symmetric statement
(13)
and say it's the result of a ``dot product'' of and , which amounts to multiplying together the parallel parts of two vectors. Here, too, if we don't know the angle between them, we're not out of luck. For a vector written in component form, it's straightforward to multiply the parallel parts together:
(14)

However, unlike the cross product which gave us an actual area with a natural direction, this area-like structure is actually a measure of ``non-area'' and doesn't really have a natural direction. Although we could, completely arbitrarily, define a direction for this dot product,4 and thus make it a vector as well, to the best of my knowledge such a quantity does not have any uses in physics, so we'll leave it alone and treat it only as a number (scalar).

Alternatively, we know that the largest area possible between two vectors occurs when they are perpendicular to each other, where the area is (you can also see this from (3)). If we are interested in the maximal ``amount perpendicular'' we can write
(15)
where they are squared to take care of sign problems. Now, when they are completely parallel there is no area, and we're left only with non-area, which, also, can't be larger than the total maximum area, so
(16)
as well.

Then using a rough analogue to the Pythagorean theorem we see that
(17)
which, choosing the positive root, is the same as (13).


References
[1] F.W. Sears and M.Z. Zemansky. University Physics. Addison-Wesley, Reading, MA, 2nd edition, 1955.
[2] D. Halliday amd R. Resnick and J. Walker. Fundamentals of Physics. John Wiley & Sons, Inc., 7 edition, 2005.
[3] G.R. Fowles and G.L. Cassiday. Analytical Mechanics. Thomson Brooks/Cole, Belmont, CA, 7th edition, 2005.
[4] J.R. Reitz, F.J. Milford, and R.W. Christy. Foundations of Electromagnetic Theory. Addison-Wesley, 4th edition, 1992.
[5] D.J. Griffths. Introduction to Quantum Mechanics. Pearson Prentice Hall, 2nd edition, 2005.
[6] J. Stewart. Multivariable Calculus. Brooks/Cole Publishing Company, Pacific Grove, CA, 4th edition, 1999.
[7] W. Lewin. Lec 3 | 8.01 Physics I: Classical Mechanics, Fall 1999 [online]. Available from:http://www.youtube.com/watch?v=fwNQKjTj-0w#t=13m45s [cited 16 March 2009].
[8] C.T.J. Dodson and T. Poston. Tensor Geometry: The Geometric Viewpoint and its Uses. Springer, 2nd edition, 1997.





1 Of course, not all first semester physics students even know what a determinant is, but that is not my point.
2 Work , and torque
3Another way to approach this is to start by calculating the area, and then explain that this can also be viewed as a measure of perpendicularity.
4 i.e. along either or , or along a line midway between them, or perpendicular to them, or some other arbitrary choice