In database design, there’s a concept of a many-to-many relationship, where one item of a particular type is related to many items of another type, but each one of those can in turn be related to many of the first type. A typical example might be where you have a list of people and a list of fruit, and you want to record which fruits are liked by which people. It’s simple to do if you restrict each person to picking only one type of fruit or, indeed, if you let each fruit be picked by only one person, but neither of those scenarios really is anything like the real world, in which Amy likes apples and bananas and Fred likes bananas, oranges and pears. When drawing such a relationship, you usually draw a box for each entity and lines joining them, with crow’s feet at both ends.
This is the simplest representation of a many-to-many relationship, but you can’t actually put such a design into a database such as SQL Server or Access, because neither of these products directly support many-to-many relationships. Relational databases just don’t do them because they don’t form part of the relational grammar on which RDBMSs (relational database management systems) are based. When you want to physically implement such a relationship, you have to slip an extra table in between the two lists of items – imagine cutting the connecting lines in the middle, slipping in an extra box and then flipping the line ends around, so that crow’s feet are on the inside, not on the outside.
In the absence of any more meaningful name, it’s usual to name this new intermediary table with both the names of the other tables, so, in our example, this would be called PersonsFruits. If all you’re holding is the fact that a person likes this fruit, the only columns in this table would be the primary keys of the Persons table and the Fruits table. The mere presence of a record tells you that “John likes Bananas” because it contains the IDs of John and Bananas. If there’s no record for John and Bananas, John doesn’t like that particular fruit.
From these tables, we can see that Amy likes Banana, and Brian likes Banana and Pear. We can also infer that Charles doesn’t like anything and no-one likes Apple or Kumquat.
The combination of these two fields could be used as the primary key for the PersonsFruits table, as it makes no logical sense to have two records that both say “Barry likes Oranges”. You can have a separate primary key if you wanted to and, indeed, you might want a non-clustered primary key of GUIDs (Globally Unique Identifiers) if you were going to replicate the tables to other servers or occasionally connected systems. As only one of the indexes on a table can be clustered, it would make sense to set the clustered index to be the PersonID followed by the FruitID if you normally need to know which fruit is liked by a person rather than vice versa. If, on the other hand, you normally need to know which person likes a particular fruit, you might create an index with FruitID followed by PersonID. The index that should be clustered is the one you most often search on. The data in the table is physically stored in the order specified by the clustered index, so accessing the data through the clustered index is the most efficient way. The primary key is usually the clustered index, but it doesn’t have to be and any index can be defined as unique, not just the primary key.
Now, it’s all very well being able to record this data in a database, but how do you show it on a screen? Usually, you’d display this sort of data for one person at a time and, if it’s read-only, you could list the fruits either one per line in a list box or you could concatenate all the names of the fruits this person likes into a string of text, separating each item with a comma or semi-colon.