Thursday, November 22, 2007

GET_LOCK & FAMILY

Again I have a well neglected and documented feature of MySQL. As we all often need and use locks in MySQL, we tend to forget (or not bother about) MySQL internals and how they cause trouble. For example, try something like LOCK TABLE ... WRITE on an InnoDB table in a transaction and see the same transaction getting timed out while waiting for a lock on one of the rows, Ref: [Bug 5998]. All these problems occur when we have difference in semantics of statement at MySQL and Engine levels. But recently, we figured out this good technique of keeping the logic with ourselves and not relying on MySQL too much. Though even this technique is not fool proof in all the cases.

The secret is: use GET_LOCK function of MySQL. GET_LOCK(str, timeout) function tries to get an exclusive lock with a name (str) using timeout seconds. The return values are tri-state, 1 if the lock was obtained successfully, 0 if there was a timeout and NULL in case of an error. For applications just bothered about getting the lock, following can be used.

   1: SELECT COALESCE(GET_LOCK('lock_string_name', 2), 0)
This is guaranteed to return either 1 or 0 meaning lock taken or otherwise respectively. Locks taken using GET_LOCK function are released in following conditions
  • Lock is released explicitly using RELEASE LOCK function.
  • Issue a new GET_LOCK request.
  • Connection terminates.
Other functions of this family are IS_FREE_LOCK and IS_USED_LOCK which are used to check the lock string status. IS_FREE_LOCK checks if the string is free i.e., returns 1 if the string is free, 0 if it is in use and NULL in case of an error. IS_USED_LOCK checks if the lock is used and returns the connection identifier of the clients that holds the lock otherwise 0. Again NULL is used for error cases.

USEFULLNESS:

  • This is one way of maintaining the mutual exclusivity of a process in a distributed environment while ensuring BCP also. Just have a named lock which will can be obtained only by one process at a time and rest of the processes just timing out and retrying for the lock after a sound sleep.
  • The lock is very light weight and doesn't eat up too many resources at the MySQL server end.
  • Lock is at MySQL level and engines are not involved at all, thus saving you to read bundles of documentation before using something.
CONS:
  • It requires due diligence in the application to ensure that all the conflicting transactions are taken care of. Even one single left out transaction may spoil the whole game. So, I would suggest such techniques are not good for heavily loaded (read lines of code) applications or otherwise you have a good QA team.
  • It actually cannot equate other locks as you can have obtain only one lock for a connection.
MISC:
  • These locks are at connection level and do not respect transactions. Meaning, committing transactions do not have any effect on these locks. I think the fairness of this feature depends on the use case, hence this was put under MISC category :)

Hope this blog was helpful to you. Keep posting your comments.

Wednesday, September 12, 2007

Customized Order By Sequence - A Small Hack

What if you have to customize the sequence of order by clause on a column. Say, you have a integer column 'weight' that can have values ranging from 1 to 5 and you want to get the rows in the sequence 4,2,5,1,3. So what do you do? Here is a simple trick using the field function.

   1: mysql> SELECT * FROM table ORDER BY FIELD(weight, 4, 2, 5, 1, 3);
Now what does it do. Field function returns the index of the first argument in the rest of the list. So, effectively the order by logic will be the same, just that the values that are being ordered are generated on the fly.

Other approach can be to use CASE.. WHEN .. END statement as mentioned in the MySQL Documentation. The above code can be rewritten as:

   1: mysql> SELECT *, CASE 
   2:     -> WHEN weight = 1 THEN 4
   3:     -> WHEN weight = 2 THEN 2
   4:     -> WHEN weight = 3 THEN 5
   5:     -> WHEN weight = 4 THEN 1
   6:     -> WHEN weight = 5 THEN 3
   7:     -> END AS weight_custom
   8:     -> FROM table
   9:     -> ORDER BY weight_custom;
But I'm a little biased towards the first approach due to its simplicity. Though it's a matter of personal choice.

Such requirements are rare, just a food for thought.

Hope this blog was helpful to you. Keep posting your comments.

InnoDB Deadlock - Next Key Locking

So you use InnoDB, have indexes on your table, think of row level locking and concurrent queries, feel good and go to sleep. All this while forgetting that even UPDATE and SELECT .... FOR UPDATE statements will (or may) also use the same index for scanning or updating. Then what? You may ask.

Well, InnoDB row level locking works in a somewhat different manner when using indexes. In this case, InnoDB locks index records in shared or exclusive mode when searching or scanning an index. Therefore, as mentioned in MySQL Documentation, the row level locks are actually index record locks.

To complicate matters (or resolve issues) further, the lock is a gap lock. A gap lock refers to a lock that only locks a gap before some index record.

As per the example in MySQL Documentation, lets say we have a query like this.

   1: SELECT * FROM child WHERE id > 100 FOR UPDATE;
And there is an index on id column. The query will lock all the index records where id is greater than 100, like 103, 107, etc. Meanwhile, the query also locks out inserts made in the gaps, say for 101,102, 104, 105, 106, 108 and above. If this is not done, there may be inserts while the transaction is under way.

Now let us consider a little tricky problem. Suppose we have a table table_name with index on colx. We will have two queries running in parallel. For sure, both the queries are working on only one row each and as per the application, they are always different queries.

The queries we have are:
   1: mysql> SELECT * FROM table_name WHERE
   2:     -> colx IS NULL
   3:     -> LIMIT 1
   4:     -> FOR UPDATE;
and
   1: mysql> UPDATE table_name SET
   2:     -> colx = NULL, coly = NULL
   3:     -> WHERE colz = 'something';
The first query starts locking the index records. But locking is not done in a single shot, it is done record by record. Meanwhile, the second query also gets lock on the record that signifies the current value of colx and proceeds towards the NULL value record. Now by the time second query gets to the NULL value record, that gets locked by the first query and first query somehow waits for the lock already grabbed by second one. Huh!

You need to be very (un)lucky to achieve this.

Workarounds
  1. Figure out all such indexes and queries where a deadlock can happen. In most cases, theoretical investigation may lead to unnecessary paranoia, so its better to have a regression test on your system and monitor for deadlocks. If you can, part away from those indexes.
  2. If you think you cannot live without your indexes, and the queries involved in deadlocks are mutually exclusive, you can go ahead with enabling innodb_locks_unsafe_for_binlog without citing this blog as a reference. It's all your choice.
  3. The tests I have conducted have been done on MySQL 4.1.x. There is a claim that from MySQL 5.0.2, UPDATE and DELETE only locks rows that are going to be affected. So this should reduce the probability of deadlocks. I haven't tested it though.
Hope this blog was helpful to you. Keep posting your comments.

Wednesday, August 8, 2007

InnoDB Repeating AUTO_INCREMENT Values

This is a continuation of my earlier post with the same title. Recently while solving a problem where values of an auto_increment fields were getting repeated, a well documented and often neglected feature of InnoDB came forward. Lets first have a glimpse of the system and the problem.

We are maintaining a job queue in an InnoDB table for a distributed resource environment. This job queue is populated by distributed schedulers from the system, the job is assigned to a resource and once the job is done, it is removed from the job queue. The system runs on an assumption that the job id, which is an auto_increment field, always remain unique throughout the life of the system. I acknowledge that the usage is debatable, but lets keep it separate from this.

The problem: The operations team restarted the MySQL server in between. After that restart, some job id's got repeated. This broke the basic assumption of unique job id's and affected the whole business at large.

What caused this: After some hours of debugging and hypothesizing we figured out that the way InnoDB handles auto_increment values at server start is completely different from how MyISAM does and somewhat against the basic assumption of auto_increment values. InnoDB doesn't persists the auto_increment value between two MySQL instances. Rather when the server starts, for the first insert into the table, InnoDB executes the equivalent of this statement.

SELECT MAX(ai_col) FROM T FOR UPDATE;
As I said, this is a well documented feature, but often neglected. In our case, it happened that before the server shutdown, few of the values from the upper bracket were deleted, as normally they would have been.

Workarounds:

If you have such a system, where you rely too much on the auto_increment values from InnoDB, delete values frequently and assume them to be unique throughout the system, beware!
  1. Try and give up the auto_increment column, if you can afford to, and maintain the logic of incrementing values in your code. There might be other concerns, but then its all your own choice.
  2. Have another table which has only one auto_increment column, reference this from your original table and make sure you don't delete values from the upper bracket. You can safely expire the older entries in order to keep this table small enough.
Hope this blog was helpful to you. Keep posting your comments.

del.icio.us Tags: , ,