Tuesday, May 20, 2008

Variable's Day Out #13: binlog_format

Properties:

Applicable To MySQL Server
Introduced In 5.1.5
Server Startup Option --binlog-format=<value>
Scope Both
Dynamic Yes
Possible Values enum(ROW, STATEMENT, MIXED)
Default < 5.1.12: STATEMENT
>= 5.1.12: MIXED
Categories Replication, Performance

Description:

Starting with 5.1.5, MySQL has implemented ROW based replication format which logs the physical changes to individual row changes. This looks like the most optimal way to many users. But it is not always, rather not optimal most of the times. E.g. consider a statement that does bulk insert of thousands of rows. In ROW based logging, there will be those many entries in binlog and otherwise it would have been just one single statement.

STATEMENT based replication, the old good historical way, propagates SQL statements from master to slave and has been working good all those years except for few cases like statements using non deterministic UDFs.

In MIXED format, MySQL uses STATEMENT based replication by default other than a few cases like

  • when USER(), CURRENT_USER() are used
  • when a call to UDF is involved
  • when 2 or more tables with AUTO_INCREMENT columns are updated.

... For full list of all such cases, refer the official documentation.

What about UDF's?

A user defined function or stored procedure is very hard to predict. In such cases, statement based replication can create inconsistencies. Few days back, I saw a case where a table  included in Replicate_Ignore_Table list was propagating statements from a procedure. If one has procedures and such cases, consider using ROW or MIXED mode replication.

What to use?

Though ROW based replication is the safest in case of creating inconsistencies, it may lead to sheer performance degradation thanks to binlog size. On the other hand STATEMENT based replication has drawbacks like with UDFs etc. IMHO, MIXED mode is the best way out unless you have some very special case. The only problem is that there might be more cases that need to be handled by mixed mode than currently being served. We need time's stamp on it. :)

Read more:

 

Hope you enjoyed reading this.

2 comments:

Artem Russakovskii said...

If you're not sure how a certain statement is going to be logged, run a sample query and watch the log. If you see the query, that means it went through in STATEMENT mode; if you see a load of binary data, that means it went through in ROW mode.

A general rule of thumb is if a query matches a lot of rows, then it should use STATEMENT, otherwise if the query itself may take minutes/hours to run but only modifies a few rows, ROW format is a lot better.

One can change the format without restarting the server with a different option, like so:

mysql> SET SESSION binlog_format = 'STATEMENT';
mysql> SET SESSION binlog_format = 'ROW';
mysql> SET SESSION binlog_format = 'MIXED';

Parvesh Garg said...

Artem,

Thanks for the pointers. Yes, of course if everything in query/transaction is deterministic, it makes sense to follow it that way and would add a lot to performance of replication.

But there can be a few exceptions to the general rule of thumb as captured in the post, e.g. a statement using functions like USER() or other non-deterministic functions like RAND().

--
Parvesh