How do indices work

Working with database indexes

Database queries are a common reason for slow loading times. If a website is just starting up, the databases are usually not that full. Little by little, new entries are added to the database and the database queries become slower. After all, more data records now have to be searched for SELECTs.

This search for the data records can be accelerated using so-called indices (singular index). This means that not every data record is run through and checked, which saves a lot of loading time.

How a database query without an index works:

Let's assume that we have the following table (no index stored):

Table without an index

If we now want to look for all models from BMW, we can do this with the following SELECT:

SELECT * FROM cars WHERE manufacturer = 'BMW';

The database now goes in the background and makes the following comparisons:

  1. "Opel" (Astra) = "BMW"?
  2. "Mercedes" (C-Class) = "BMW"?
  3. "Audi" (A4) = "BMW"?
  4. "BMW" (1 Series) = "BMW"?
  5. "VW" (Tiguan) = "BMW"?
  6. "BMW" (3 Series) = "BMW"?
  7. "Audi" (A3) = "BMW"?
  8. "VW" (Passat) = "BMW"?
  9. "VW" (Polo) = "BMW"?
  10. "VW" (Gold) = "BMW"?

He went through every data set and had to compare 10 manufacturers. We can do that faster!

Primary Key: A special index

There are several types of indexes. The best known is probably the primary key. This specifies for a table the criteria based on which an individual data record can be recognized. An ID is often used here, but a primary key via a varchar column or even several columns is also possible. In our case, a primary key on both columns would make sense. It works like this, for example:

ALTER TABLE`autos` CHANGE COLUMN `manufacturer`` manufacturer` VARCHAR (255) NOT NULL, CHANGE COLUMN `model`` model` VARCHAR (255) NOT NULL, ADD PRIMARY KEY (`manufacturer`,` model`);

In addition to the ADD PRIMARY KEY, the columns were also set to NOT NULL because NULL values ​​are not permitted in a primary key.

If you now do a SELECT on the table, you can see that the sorting has changed:

Table with primary key

Now it is only sorted on the basis of the manufacturer and, if they are the same, on the basis of the model. If you were to list the columns in our ADD PRIMARY KEY the other way around, the sorting would also be the other way around.

Now when we run our SELECT, the comparisons look something like this:

  1. "Mercedes" (C-Class) = "BMW"?
  2. "Audi" (A4) = "BMW"?
  3. "BMW" (1 Series) = "BMW"?
  4. "BMW" (3 Series) = "BMW"?

In principle, the database now goes there and fetches the data record from the middle (data record 5) and checks whether it matches. In this case it is the Mercedes C-Class. It's not a BMW. Since Mercedes is behind BMW in the alphabet, BMW has to be further ahead. So he now takes the middle of the four data records in front of it, which would then be the second (Audi A4). But it's not a BMW either. That means the data sets must lie between the Audi and Mercedes. So he goes back in the middle. This time this is only one record further and then he has found the first record. Now he goes through the next data records until he can no longer find a BMW. Then he would then go backwards, but since he has already checked the Audi, he doesn't need to do that anymore.

If you were to specify the manufacturer and model now, and you would no longer have to search up and down, the query would be even faster.

It comes around that we have now saved 6 comparisons with the primary key. In theory, this would mean that the query now runs twice as fast. With 10 data sets this is not really noticeable. But let's assume it was 10,000. The loading time can even be many times faster, depending on the query.

Create an index

But now you can also create normal indices without the special PRIMARY-KEY function. This means that the columns taken do not have to be unique. Let's create such an index via the "model" column:

ALTER TABLE `demo`.`autos` INDEX ADD` ix_modell` (`model` ASC);

Since there can be several indexes on a table, a name should also be specified here. I usually write an "ix_" in front of my indexes and then the name of the column or a term that describes the selected columns. Perhaps you will notice the word "ASC" here. You probably already know that from the ORDER BY command. As described above, indexes work via sorting. With the ASC or DESC you can determine how the table should be sorted internally.

By creating the index, a kind of assignment table is created in the background. In our case it could look like this:

Build an index

(Please note that this structure is only used for explanation. The actual internal structure can be quite different)

The models have now been sorted in this type of table and their entry positions noted in the main table. Since an index is not unique, there can be several positions for one model.

If the database now recognizes that you are only going to the column model in your WHERE condition, it will look for the records via this table. Before each search, the database weighs up which index fits best and is therefore executed the fastest.

However, inserting into the database is slightly slower with an index. After all, a new entry must now also be made in the assignment table. For this reason, you should ideally only create indices where you really need them.

UNIQUE indices

In addition to the normal index, there are also other types of indexes. One that you might want to know to get started is the UNIQUE index. Like the primary key, this only contains unique data records. In contrast to the primary key, it can also be created multiple times. However, it should not be seen as a replacement for the primary key, but only as an addition. Important: The UNIQUE index ignores NULL values. This means that a NULL value can appear more than once in it.

Optimizing database queries on an index

When optimizing the database queries, a new index does not always have to be created. You can also extend the WHERE conditions so that an existing index can be used. If you are looking for the “A4” model in our example and you already know that this is only available from Audi, you can simply add “Audi” to the Select and then go to the Primary Key. An index would no longer be necessary.

Find out which INDEX is being used

Ideally, you should take a look at the exact problem before optimizing a slow query. Especially when several tables are joined, it is sometimes not immediately obvious where an index is missing. In MySQL this is done via Explain:

EXPLAIN SELECT * FROM demo.autos where model = 'A4';

As a result, you can see in the "key" column which index is currently being used. The number in the "rows" column should be as small as possible. This is the number of lines that are selected for a JOIN, for example. The "Extra" column tells how the records are found. Ideally, "Using index" should appear here. If it says "Filesort" it should definitely be optimized here because every data record is checked here.

Conclusion

Indexes are an integral part of every major programming project. With them, a database query that previously took a few minutes can suddenly be carried out in less than a second. However, they should only be used in a targeted manner because they also slightly increase memory consumption and loading time when inserting and changing data records.

Related articles

Comments

Neeldarax wrote on 08/07/2013:

Hi, for "If we want to look for all models from Audi now, we can do this with the following SELECT: SELECT * FROM cars WHERE manufacturer = 'BMW';" you have to decide ma for Audi or BMW :) Otherwise, I find the article helpful!

Stefan Wienströer wrote on 08/07/2013:

Lol had caught me ^^ I had first chosen Audi, but then I noticed that this is right at the beginning and is therefore rather poorly suited to explain the search out ^^

Timo wrote on 08/07/2013:

In any case, this is really helpful if the search is only logarithmic instead of linear. Have already tried out a lot on a project, so you should always include it in your considerations about the database.

Handling IP addresses in MySQL wrote on 08/12/2013:

[& # 8230;] Avoid saving the IP as a string if possible. Because so you do not go over an index, what the whole query [& # 8230;]

about us