Optim MySQL: May 2008

Friday, May 30, 2008

Variable's Day Out #14: log_queries_not_using_indexes

Properties:

Applicable To	MySQL Server
Server Startup Option	`--log-queries-not-using-indexes`
Scope	Global
Dynamic	Yes
Possible Values	Boolean
Default	False
Category	Performance, Monitoring, Best Practices

Description:

If you have slow query logs enabled (with --log-slow-queries), this variable will help interpret all those queries that are not using indexes as slow queries.

Usage:

While designing a MySQL oriented application, I generally design my schema first without introducing any index (only exception being the PRIMARY ones). Get the application ready. Next enable log_queries_not_using_indexes and start index usage testing. Analyze all queries, introduce indexes one by one and you are all set to go. This helps a lot in rewriting queries, figuring out the best possible index etc. Many times I have seen the need for swapping WHERE clause CONDITIONS for making it possible for the query to use an existing index.

For already deployed systems' performance glitches and production outages, this flag do serve as a starting point.

Though as always, usage of a variable is subject to circumstances. For my conditions and usage pattern, this variable comes very handy.

Read more:

Hope you enjoyed reading this.

Tuesday, May 20, 2008

Variable's Day Out #13: binlog_format

Properties:

Applicable To	MySQL Server
Introduced In	5.1.5
Server Startup Option	--binlog-format=<value>
Scope	Both
Dynamic	Yes
Possible Values	enum(ROW, STATEMENT, MIXED)
Default	< 5.1.12: STATEMENT >= 5.1.12: MIXED
Categories	Replication, Performance

Description:

Starting with 5.1.5, MySQL has implemented ROW based replication format which logs the physical changes to individual row changes. This looks like the most optimal way to many users. But it is not always, rather not optimal most of the times. E.g. consider a statement that does bulk insert of thousands of rows. In ROW based logging, there will be those many entries in binlog and otherwise it would have been just one single statement.

STATEMENT based replication, the old good historical way, propagates SQL statements from master to slave and has been working good all those years except for few cases like statements using non deterministic UDFs.

In MIXED format, MySQL uses STATEMENT based replication by default other than a few cases like

when USER(), CURRENT_USER() are used
when a call to UDF is involved
when 2 or more tables with AUTO_INCREMENT columns are updated.

... For full list of all such cases, refer the official documentation.

What about UDF's?

A user defined function or stored procedure is very hard to predict. In such cases, statement based replication can create inconsistencies. Few days back, I saw a case where a table included in Replicate_Ignore_Table list was propagating statements from a procedure. If one has procedures and such cases, consider using ROW or MIXED mode replication.

What to use?

Though ROW based replication is the safest in case of creating inconsistencies, it may lead to sheer performance degradation thanks to binlog size. On the other hand STATEMENT based replication has drawbacks like with UDFs etc. IMHO, MIXED mode is the best way out unless you have some very special case. The only problem is that there might be more cases that need to be handled by mixed mode than currently being served. We need time's stamp on it. :)

Read more:

Hope you enjoyed reading this.

Monday, May 19, 2008

Variable's Day Out #12: innodb_flush_method

Properties:

Applicable To	InnoDB on Unix like OS
Server Startup Option	--innodb_flush_method=<value>
Scope	Global
Dynamic	No
Possible Values	enum(O_DSYNC, O_DIRECT, <<none>> )
Default Value	<<none>>
Category	Performance

Description:

This variable changes the way InnoDB open files and flush data to disk and is should be considered as very important for InnoDB performance. By default, InnoDB uses fsync() (without O_DSYNC) to flush both log and data files.

Setting this variable to O_DIRECT will result in InnoDB using O_DIRECT while opening files and fsync() to flush both data and log files. O_DIRECT is useful when an application maintains it's own caching mechanism which is very well true for MySQL/InnoDB. O_DIRECT is the option that should be used in most of the cases as it takes the overhead of double buffering and reduces swap pressure. You should be careful if you are not using a battery backed up RAID cache as it may lead to data loss in case of a crash. Also, the MySQL documentation warns against using O_DIRECT when files are located on a SAN, I don't have any idea on that.

O_DSYNC makes InnoDB to use O_SYNC to open and flush the log files, and uses fsync() to flush data files.

For Windows, the flush method is always async_unbuffered.

Recommendation:

If you are not doing anything unusual like SAN storage etc (which otherwise also you should reconsider before doing), always use O_DIRECT for this. This leads to a significant improvement in InnoDB performance by removing double buffering.

Read More:

MySQL manual entry on innodb_flush_method
Heikki Tuuri recommends O_DIRECT: Check Q35 @ mysqlperformanceblog.com
man 2 fsync
man 2 open

Hope you enjoyed reading this.

Sunday, May 18, 2008

Variable's Day Out #11: large_pages

Properties:

Applicable To	MySQL/Linux
Server Startup Option	--large-pages
Scope	Global
Dynamic	No way
Possible Values	True\|False flag
Default Value	False
Category	Performance, Administration

Description:

This option, currently available only for Linux, if set enables the usage of large pages by MySQL. Many operating systems/system architectures support optional memory pages bigger than the default size (4 KB). Usually the large page size is 2 MB, this can be checked in the large_page_size variable in MySQL.

For applications that do a lot of memory accesses, as MySQL can, reduced TLB (Translation Lookaside Buffer) misses lead to better performance. Also having large pages, makes MySQL buffers less susceptible of being swapped out.

When to Use?

Try to keep it enabled, I have never heard of any side effects (In other words if you find any do let me know, I will update the blog post :) ). Other things that can be considered are:

How much of swapping is happening on the system?
What are the other applications running on that system? Though it is a bad idea to run other heavy applications on the same box as MySQL (at least in production), but if you can't get rid of them, using large-pages will make MySQL avoid swapping.

How to configure?

Before large pages can be used on Linux, it is necessary to configure the HugeTLB memory pool.

Check your total memory (MemTotal), huge page size (Hugepagesize) etc and decide how many huge pages you want to run with. This information is available in /proc/meminfo file.
Set the number of huge pages to be used by the system. use sysctl -w vm.nr_hugepages = <value>
Ensure to set memlock for mysql user in /etc/security/limits.conf to allow this user using large memory.
Stop MySQL.
Restart your system.
Start MySQL server with large-pages option.

If nothing goes wrong, you should be all set and going.

Read more:

Hope you enjoyed reading this.

Thursday, May 15, 2008

Memcached, but do you need it?

With all due respect to the technology and it's advocates (myself included), after a surge in articles describing the merits of using memcached I'm just pushing a thought breakpoint for developers to think whether they actually need it or not?

Recently, I ran into cases where the developers have decided to use memcached over MySQL style architecture after reading some/many good/nice articles about it without giving a damn to their requirements. I would like to list few things as a checklist for developers to decide on their architecture. There is still no precise answer but sometimes few cases can be just ruled out :).

What is the total size of your data? It might be a possibility that you can keep the data in memory in each node, or MySQL can just keep the whole thing (data+indexes) in a buffer.
How frequently your data is updated? Very frequent updates may lead to low cache hit ratio for memcached data. And refreshing memcached too many times may lead to unnecessary overhead. Remember doing [get,get,set] vs [get].
What is the peak load on your system? Consider if MySQL itself can handle the peak load or otherwise if even memcached cannot handle the peak load with given infrastructure.

I generally ask people a simple question, Why they think they should be using memcached (or something else even)? To shock them, I even ask them "Why they think they should be using MySQL?". And believe me, this is what I believe developers should be asking themselves.

There is only one good argument against this, what if tomorrow you need to suddenly scale or what if your projections need memcached? In such cases, I suggest people to design their data layers in a flexible way, flexible enough to allow things in and out.

Wednesday, May 14, 2008

Long Live Open Source

“Don’t worry about people stealing an idea. If it’s original, you will have to ram it down their throats.”
— Howard Aiken

MySQL is back on Open Source track and that is definitely the best news for all (including community, MySQL and Sun as well). I think that now Sun/MySQL have agreed to the importance of community, it becomes community's responsibility to give them more reasons to believe so. Let's participate like never before.

Kaj, in his post says "...model to be useful for both those who spend money to save time, and those who spend time to save money". This is what Open Source is, isn't it?

All in all, a decision most awaited and most welcome is taken. Thanks a ton to everybody who made this happen.

Tuesday, May 13, 2008

Last 10 Variable Day Outs

Well, the first summary of last 10 variable day outs.

Day Outs:

Day Out #1: delay_key_write
Day Out #2: key_buffer_size
Day Out #3: max_connect_errors
Day Out #4: innodb_buffer_pool_size
Day Out #5: innodb_thread_concurrency
Day Out #6: Innodb_buffer_pool_reads
Day Out #7: innodb_autoinc_lock_mode
Day Out #8: innodb_additional_mem_pool_size
Day Out #9: long_query_time
Day Out #10: innodb_file_per_table

(If you want some specific variable to be discussed here, do let me know. Contact me by posting a comment on the blog or send me a mail.)

Monday, May 12, 2008

Variable's Day Out #10: innodb_file_per_table

Properties:

Applicable To	InnoDB
Server Startup Option	--innodb-file-per-table
Scope	Global
Dynamic	General InnoDB Engine: No InnoDB plug-in: Yes
Possible Values	Enable\|Disable Flag
Default Value	Disabled
Category	Maintenance

Description:

This variable if enabled, makes InnoDB to create a separate .ibd file for storing indexes and data. Setting this variable makes it easy to manage disks with huge tables. Having this option enabled, makes it a lot easier to move bigger (or smaller) .ibd files to separate physical disks and/or have backups of certain tables without affecting others.

As I have mentioned in my earlier post, that InnoDB does not return back the space once claimed for a tablespace, this variable comes handy in managing such situations. Though I have not seen any performance gain while using this option, people do cite such metrics. Again, I would say as always that it may be case dependent.

It should be noted that this option only effects tables that are created after this option is enabled (and server is restarted). Tables that are created with this option disabled are created in the shared tablespace and remain there even if this option is enabled in between. Also, even if all the tables are created with this option enabled, InnoDB still needs the shared tablespace to store its internal data dictionary and undo logs.

Speical case: InnoDB plug-in

With the introduction of InnoDB plug-in, this variable becomes dynamic in nature and holds much more importance than just disk maintenance. You should have this variable enabled to be able to use the new file format. But even in InnoDB plug-in the restriction on tablespaces not returning disk space is intact.

Read More:

Friday, May 30, 2008

Variable's Day Out #14: log_queries_not_using_indexes

Tuesday, May 20, 2008

Variable's Day Out #13: binlog_format

Monday, May 19, 2008

Variable's Day Out #12: innodb_flush_method

Sunday, May 18, 2008

Variable's Day Out #11: large_pages

Thursday, May 15, 2008

Memcached, but do you need it?

Wednesday, May 14, 2008

Long Live Open Source

Tuesday, May 13, 2008

Last 10 Variable Day Outs

Monday, May 12, 2008

Variable's Day Out #10: innodb_file_per_table

$ grep -Ri posts/

$ whoami

$ sort tags | uniq -c

$ history

$ tail -f planetmysql.org

$ ls -l /home

Powered By