All good database professionals have a set of tools they can apply to various problems they encounter.
Of course, to use any one tool effectively you have to understand how, when and where to apply it.
The ‘Cartesian join’ is a tool that should be in every professional’s toolbox, but is so misunderstood it’s often overlooked.
The word Cartesian means “relating to René Descartes”, the influential French philosopher and mathematician.
A Cartesian product in mathematics is the direct product of two sets. In relational databases, a table is, in essence, a set of data, so a Cartesian join is performed when we join every row in one table to every row in another table.
Rather sadly, you often hear the term used abusively, as in “You complete idiot! That was a Cartesian join! Don’t you know that…” This is mainly because the answer table produced by a Cartesian join can be what we call, in these politically correct times, ‘significantly challenged in the size department’ (or terrifyingly, massively vast).
To see why this happens, imagine two tables – one with four rows and the other with three. An example of a ‘normal’ query we could run is:
SELECT Car, Person
FROM Car INNER JOIN People ON Car.CarID=People.CarID
WHERE Car=”Land Rover”;
This returns one row because only one person was assigned the Land Rover. If we leave out the WHERE clause:
SELECT Car, Person
FROM Car INNER JOIN People ON Car.CarID=People.CarID;
then the query returns three rows, one for each person. However, if we don’t give the database engine any specific instructions about how to join the two tables:
SELECT Car.Car, People.Person
FROM Car, People;
then it assumes we want every single row in the first table joined to every single row in the second table. In other words, it assumes a Cartesian join.
So the number of rows returned is equal to the number of rows in the first table, multiplied by the number of rows in the second – in this case 12. Given these tiny tables, that’s hardly a problem, but tables are often larger in production systems – 100,000 row tables are perfectly common. Performing a Cartesian join on two such tables will generate an answer table of 10 billion rows.
All Software Applications Tags: Hands On, Databases
