Properties:
Applicable To | MySQL Server |
Introduced In | 5.1.5 |
Server Startup Option | --binlog-format=<value> |
Scope | Both |
Dynamic | Yes |
Possible Values | enum(ROW, STATEMENT, MIXED) |
Default | < 5.1.12: STATEMENT >= 5.1.12: MIXED |
Categories | Replication, Performance |
Description:
Starting with 5.1.5, MySQL has implemented ROW based replication format which logs the physical changes to individual row changes. This looks like the most optimal way to many users. But it is not always, rather not optimal most of the times. E.g. consider a statement that does bulk insert of thousands of rows. In ROW based logging, there will be those many entries in binlog and otherwise it would have been just one single statement.
STATEMENT based replication, the old good historical way, propagates SQL statements from master to slave and has been working good all those years except for few cases like statements using non deterministic UDFs.
In MIXED format, MySQL uses STATEMENT based replication by default other than a few cases like
- when USER(), CURRENT_USER() are used
- when a call to UDF is involved
- when 2 or more tables with AUTO_INCREMENT columns are updated.
... For full list of all such cases, refer the official documentation.
What about UDF's?
A user defined function or stored procedure is very hard to predict. In such cases, statement based replication can create inconsistencies. Few days back, I saw a case where a table included in Replicate_Ignore_Table
list was propagating statements from a procedure. If one has procedures and such cases, consider using ROW or MIXED mode replication.
What to use?
Though ROW based replication is the safest in case of creating inconsistencies, it may lead to sheer performance degradation thanks to binlog size. On the other hand STATEMENT based replication has drawbacks like with UDFs etc. IMHO, MIXED mode is the best way out unless you have some very special case. The only problem is that there might be more cases that need to be handled by mixed mode than currently being served. We need time's stamp on it. :)
Read more:
- MySQL manual entry on binlog_format
- MySQL manual on Replication Formats
- MySQL manual: Mixed Binary Logging Format
- Statement based vs Row based replication
Hope you enjoyed reading this.
2 comments:
If you're not sure how a certain statement is going to be logged, run a sample query and watch the log. If you see the query, that means it went through in STATEMENT mode; if you see a load of binary data, that means it went through in ROW mode.
A general rule of thumb is if a query matches a lot of rows, then it should use STATEMENT, otherwise if the query itself may take minutes/hours to run but only modifies a few rows, ROW format is a lot better.
One can change the format without restarting the server with a different option, like so:
mysql> SET SESSION binlog_format = 'STATEMENT';
mysql> SET SESSION binlog_format = 'ROW';
mysql> SET SESSION binlog_format = 'MIXED';
Artem,
Thanks for the pointers. Yes, of course if everything in query/transaction is deterministic, it makes sense to follow it that way and would add a lot to performance of replication.
But there can be a few exceptions to the general rule of thumb as captured in the post, e.g. a statement using functions like USER() or other non-deterministic functions like RAND().
--
Parvesh
Post a Comment