Saturday, July 14, 2007

AUTO_INCREMENT in InnoDB

It's very easy to have an auto_increment column in a table and maintain primary keys. Many people think so, and also they think that MySQL will do the best job for them. With InnoDB, the case is a little different as supposed to be. Basic assumption is that since InnoDB works with row-level locking, combining it with auto_increment should ward off worries. But it is just the beginning.

Let me clarify how auto_increment works. InnoDB maintains the auto_increment value in memory. Good approach. When an insert statement requests the auto_increment value, InnoDB takes the stored value, increment it by one and return. The value in memory is not incremented at this time. This is not bad, because who knows how many rows the insert statement has.

But, this means that InnoDB cannot provide auto_increment values to other threads until this finishes, hence we cannot have concurrent inserts even in InnoDB when using an auto_increment value. This is assured by a special AUTO-INC lock by InnoDB, which is an exclusive table level lock. We will see how this can affect your application and some workarounds.

Application may suffer

If you have concurrent inserts into an InnoDB table, start worrying. Single row inserts won't affect the performance noticeably. But, if you are doing bulk inserts and your inserts are taking enough time, one of your insert threads may exit because of lock wait timeout. Also, this will cause problems for other components of your application (even if they are read only) as these insert statements will keep connected and you may hit your max_connections.

Workarounds

  1. As you might think, moving to MyISAM won't help. MyISAM by default takes a table lock while writing.
  2. Do some magic and make sure your inserts do not execute concurrently. This is also not much possible in the case of web applications, where one doesn't have the control over the inserts, but neither web applications do bulk inserts. This can be managed where one is using a cron job to do bulk inserts, say from a log file.
  3. Get rid of auto_increment. If your application actually needs to do bulk inserts from multiple sources at the same time and the above solution doesn't make sense, figure out your own primary key. Something like a combination of MD5 checksum over data combined with the length of the data may be a good option in most cases, but do not rely on it. Also, remember a multi-column primary key is not always a good option.

Hope this blog was helpful to you. Keep posting your comments.

del.icio.us Tags: , ,

3 comments:

burtonator said...

MD5 isn't necessary a good idea EITHER since the primary key is using a clustered index......

Parvesh Garg said...

Kevin,

Good point, its always not advisable to have a multi column primary key, but than a generic solution for generating a unique key for every data item is also an unsolved problem. The decision is more or less dependent on the nature of the application. There are no generic answers in MySQL.

I mentioned it may be a good option, but do not rely on it.

Thanks for pointing this out. I have updated the post.

Regards,
Parvesh

Anonymous said...

Multi columns is not a good idea? Auto numbers always? You guys should really go look at tree structures(data structures) and how it affects layout of the data on the physical device i.e. the hard drive. In the MyISAM engine this is not so much of a problem but on InnoDB and MS SQL server having proper keys can make a huge difference.

It amazes me how many people in Relation database design dont understand basic Relation design.

Code in haste Debug forever