39
1 explain When you precede a SELECT statement with the keyword EXPLAIN, MySQL explains how it would process the SELECT, providing information about how tables are joined and in which order. EXPLAIN is for query optimization analysis for e.g. EXPLAIN SELECT * FROM students The output from EXPLAIN shows ALL in the type column when MySQL uses a table scan to resolve a query. The possible types are, from best to worst: system, const, eq_ref, ref, range, index and ALL. Only index in the “Extra” column indicates that information will be retrieved from index file without using the data file. SELECT * FROM t1, t2 FORCE INDEX (index_for_column) WHERE t1.col_name=t2.col_name;

Optimize do not_print

Embed Size (px)

DESCRIPTION

Mysql optimization

Citation preview

Page 1: Optimize do not_print

1

explain• When you precede a SELECT statement with the keyword

EXPLAIN, MySQL explains how it would process the SELECT, providing information about how tables are joined and in which order. EXPLAIN is for query optimization analysis for e.g.EXPLAIN SELECT * FROM students

• The output from EXPLAIN shows ALL in the type column when MySQL uses a table scan to resolve a query. The possible types are, from best to worst: system, const, eq_ref, ref, range, index and ALL.

• Only index in the “Extra” column indicates that information will be retrieved from index file without using the data file.

• SELECT * FROM t1, t2 FORCE INDEX (index_for_column) WHERE t1.col_name=t2.col_name;

Page 2: Optimize do not_print

2

PROCEDURE ANALYSE()

• SELECT * FROM mytable PROCEDURE ANALYSE()

• Explain gives more information about indexes and keys but procedure analyse() gives you more information on data returned.

Min_value 34

Max_value 232

Empties_or_zeros 0

Nulls 0

Avg_value 133

Optimal_fieldtype ENUM('34','232') NOT NULL

Page 3: Optimize do not_print

3

Remove duplicate entries• Assume the following table and data.

• CREATE TABLE IF NOT EXISTS dupTest(pkey int(11) NOT NULL auto_increment,a int, b int, c int, timeEnter timestamp(14),PRIMARY KEY (pkey));

• insert into dupTest (a,b,c) values (1,2,3),(1,2,3),(1,5,4),(1,6,4);

• Note, the first two rows contains duplicates in columns a and b. It contains other duplicates; but, leaves the other duplicates alone.

• ALTER IGNORE TABLE dupTest ADD UNIQUE INDEX(a,b);

Page 4: Optimize do not_print

4

Load data is faster than insert statements

• LOAD DATA INFILE '/tmp/history.dmp' REPLACE INTO TABLE tpcb.history FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n';

• load data local infile 'sample.csv' into table raw_data (id, createid, @a, country, telco_name) set ip = left(@a,12);

• Use into outfile to create a text fileSelect enroll_no, stud_name, stud_lname, stud_addressinto outfile '/home/CAT_ADV_OCT_07.tsv' FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n‘

Page 5: Optimize do not_print

5

Performance Tips Part I• Optimize WHERE clauses by using the rule of "column operator constant" • Slow query:

`birthdate` + INTERVAL 16 YEAR < NOW())• Fast query:

`birthdate` < NOW() - INTERVAL 16 YEAR • The following 3 queries are better in that order…

WHERE TO_DAYS(date_col) – TO_DAYS(CURRENT_DATE) < 30WHERE TO_DAYS(date_col) < 30 + TO_DAYS(CURRENT_DATE)WHERE date_col < DATE_ADD(CURRENT_DATE, INTERVAL 30 DAY)

• Use of OPTIMIZE TABLE helps keep performance on the MyISAM table from degrading.

Page 6: Optimize do not_print

6

Performance Tips Part III• Index will not be used if % sign is on both sides of the

string.Where col_name like “%Mac%”

• Index will be used if the % sign is used at the end of the string.Where col_name like “Mac%”

• You may try using STRAIGHT_JOIN to force a join to be done using tables in a particular order.

• The SQL_CALC_FOUND_ROWS keyword tells MySQL to calculate the total number of rows matching the query. This total number can then be retrieved via a call to the FOUND_ROWS() function.

• To retrieve all records from the specified offset to the end of the table, specify -1 as the number of rows to return. For e.g.SELECT * FROM tbl1 LIMIT 18, -1

Page 7: Optimize do not_print

7

Choose the right Data Type• Day is in the range 1 to 31. If so, you could save 3 bytes per row by

changing day from INT (4 bytes) to TINYINT (1 byte). Similarly, you could save 1 byte per row by changing yearmonth from INT to MEDIUMINT. 4 bytes per row * 38 million rows = about 150 Mb saved. Smaller rows make disk reads faster, and require less memory to process and cache. Also, smaller columns make for smaller indexes.

• Make all character columns CHAR rather than VARCHAR. The tradeoff is that your table will use more space, but if you can afford the extra space, fixed – length rows can be processed more quickly than variable – length rows.

• if you are only storing positive numbers, make it unsigned, you essentially double your capacity to store positive numbers without changing the column type.

• Declare columns to be NOT NULL so that the query will be faster since it need not check for NULL as a special case.

• Consider using ENUM columns since ENUM values are represented as numeric values internally.

Page 8: Optimize do not_print

8

Constraints: Primary, Unique and Index keys

• A table can have only 1 primary key, but multiple unique constraints.

• Columns in primary keys must be NOT NULL. • Columns in unique keys can be NULL (if they are NOT

NULL, then the unique key is functionally the same as a primary key).

• Simply create a UNIQUE index on the fields which you with to be unique.

• Add into your table creation the line UNIQUE (firstname, lastname)

• OR Alter the table once created using:ALTER TABLE tablename ADD UNIQUE (Column1, Column2)

Page 9: Optimize do not_print

9

Keys explainedMaximum key length is 500 bytes

Null Duplicates Type Purpose

PrimaryPrimary key values never change and is selected to be the key of first importance.There can be only one primary key on a table.

Unique To avoid duplicate records in a single or multiple columns

Key To Index a single or multiple columns

Key To Index a single or multiple columns

• There can only be one AUTO_INCREMENT column and it must be defined as a key.• A single column can be part of multiple keys.• Use fulltext index to avoid 500 bytes limitation or to search words those are less than 3

characters• a UNIQUE index that does not allow NULL is functionally equivalent to a PRIMARY KEY.• A key made up of more than one column is a composite key.• The keyword INDEX may be used instead of KEY.• You can name an index by including the name just before the column list.• For a PRIMARY KEY, you don't specify a name because its name is always PRIMARY.

Page 10: Optimize do not_print

10

Indexes Part I• Index is a separate data object in the database that lists the table rows in order to

allow rapid lookup.• Each index for each table is a separate object.• Primary keys, Unique and foreign keys are automatically indexed.• Disadvantages of indexes: Each index may be updated when a row is updated, so

indexes slow updates, insertions and deletes.• Disadvantages of indexes: Index file takes up disk space.• Practical maximum of 3 or 4 indexes per table. If others are needed on occasion, add

and drop them as needed.• If a database is mostly read, use many indexes to speed performance.• If database is mostly updates, use as few indexes as possible.• An index on a number column should be faster than the same sized char or varchar

column. • When you use indexed column in comparison, use columns that are of the same

type.• Make sure your column will accommodate your needs, both current and future.• Basic rule: everything after ON or in a WHERE clause should either be a primary key

or indexed, at least when there are many records in the table.

Page 11: Optimize do not_print

11

Indexes Part II• MySQL will use only one index per query. So having more indexes doesn’t always

help.• Creating a key will make the query execute very fast, but if that is the only reason for

the key you are going to be trading quite a lot of space for the speed of one query. How often are you going to run this query? If you have 324 million rows, then that index is going to consume somewhere in the order of 2G or more of disk space. Is it worth using all that space to make one query faster?

• MySQL uses multiple-column indexes in such a way that queries are fast when you specify a known quantity for the first column of the index in a WHERE clause, even if you don't specify values for the other columns.

• /* create table syntax should have fulltext(title,body) defined */SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('india');

Page 12: Optimize do not_print

12

Possible indexes on 3 column tableID Columns

1 a2 b3 c4 a,b5 a,c6 b,a7 b,c8 c,a9 c,b

10 a,b,c11 a,c,b12 b,a,c13 b,c,a14 c,a,b15 c,b,a

Index a,b,c in that order cover the single column index on ‘a’ as well as ‘a,b’

Page 13: Optimize do not_print

13

Indexes Part IV• Avoid single column indexes whenever practical. Most useful indexes

contain from 2 to 5 fields.• Don't forget that PRIMARY keys and UNIQUE constraints are also indexes.• Design your indexes after your most common or frequently used query

patterns. Analyze your WHERE clauses first, then look at speeding up certain queries by considering values in your ORDER BY clauses.

• Learn how to use EXPLAIN. It will give you excellent advice on how to help your queries.

• If the query doesn't use the index then you could use the FORCE INDEX to ensure it does. If that can't be used then try USE INDEX.

• Function call or an arithmetic expressions on a columns prohibits it from using indexes. In short, indexes are not used if you are using functions like lower(col_name) while comparing the text. You will need to reorganize the query, if possible, to take advantage of indexes.

Page 14: Optimize do not_print

14

Indexes Part V• A column that has ‘yes’ or ‘no’ for content won’t be improved by

indexing. On the other hand, a column where the values are unique (for example, Social Security Number) can benefit greatly from indexing.

• The smallest or largest value for an indexed column can be found quickly without examining every row when you use the MIN() and MAX() functions.

• MySQL can often use indexes to perform sorting operations quickly for ORDER BY clause

• Sometimes MySQL can avoid reading the data file entirely. Suppose you’re selecting values from an indexed numeric column and you’re not selecting other columns from the table. In this case, by reading an index value, you’ve already got the value you’d get by reading the data file. There’s no reason to read values twice, so the data file need not even be consulted. (this is called covering index)

Page 15: Optimize do not_print

15

Looks like a good query using index !

mysql> explain select max(DCDATE) from StagewisePassenger where DEPOT_CD ='TSGN'\G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: StagewisePassenger

type: index

possible_keys: NULL

key: bus_type

key_len: 55

ref: NULL

rows: 27143580

Extra: Using where; Using index

• Notice Using where in extra !• Scanning 27 million rows that is almost the entire table !!• Key length used is 55 characters

Page 16: Optimize do not_print

16

Check out the table structuremysql> explain select max(DCDATE) from

StagewisePassenger where DEPOT_CD ='TSGN'

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: StagewisePassenger

type: index

possible_keys: NULL

key: bus_type

key_len: 55

ref: NULL

rows: 27143580

Extra: Using where; Using index

mysql> show create table StagewisePassenger

*************************** 1. row ***************************

Table: StagewisePassenger

Create Table: CREATE TABLE `StagewisePassenger` (

`id` int(11) NOT NULL auto_increment,

`bus_type` varchar(5) NOT NULL,

`ttl_stages` varchar(20) NOT NULL,

`passenger` int(11) NOT NULL,

`full` int(11) NOT NULL,

`half` int(11) NOT NULL,

`DEPOT_CD` varchar(20) NOT NULL,

`status` varchar(1) NOT NULL,

`DCDATE` date default NULL,

PRIMARY KEY (`id`),

UNIQUE KEY `bus_type` (`bus_type`,`ttl_stages`,`DEPOT_CD`,`DCDATE`)

) ENGINE=InnoDB AUTO_INCREMENT=27161970 DEFAULT CHARSET=latin1

composite index on DEPOT_CD + DCDATE is requiredIn composite indexes where clause comes first before aggregate column like DCDATE

Page 17: Optimize do not_print

17

Query not using any index is fast ! ??

mysql> select max(DCDATE) from StagewisePassenger where DEPOT_CD ='TSGN';

+-------------+

| max(DCDATE) |

+-------------+

| 2013-08-18 |

+-------------+

1 row in set (0.00 sec)

mysql> explain select sql_no_cache max(DCDATE) from StagewisePassenger where DEPOT_CD ='TSGN'\G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: NULL

type: NULL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: NULL

Extra: Select tables optimized away

Page 18: Optimize do not_print

18

Page 19: Optimize do not_print

19

Page 20: Optimize do not_print

20

Page 21: Optimize do not_print

21

Page 22: Optimize do not_print

22

Page 23: Optimize do not_print

23

Mysql Bugs – Data is at stake!• Long texts are chopped to the correct size, not denied.

MySQL really should throw an exception/error rather than just quietly trim your invalid data and accept it. When your data is critical, and your business depends on it, you can't have bad data quietly going into the database.So if you have declared a column as int instead of bigint, then mysql will silently convert a value to "2147483647" if it is greater than that.

• The MySQL server only performs basic checking on the validity of a date: days 00-31, months 00-12, years 1000-9999. Any date not within this range will revert to 0000-00-00. Please note that this still allows you to store invalid dates such as 2002-04-31. It allows web applications to store data from a form without further checking. To ensure a date is valid, perform a check in your application.

• If no DEFAULT value is specified for a column, MySQL automatically assigns one. If the column may take NULL as a value, the default value is NULL. If the column is declared as NOT NULL, the default value depends on the column type:

• Only static data is allowed as default-values; functions, derived data etc, are not allowed.

Page 24: Optimize do not_print

24

Mysql Bugs Part II• Can’t rename database name:

Might be better to create a new database and then RENAME TABLE each table from the original database into the new database. Then drop the original database.

• FULLTEXT Index in InnoDB tables are not supported. MyISAM type table will not return any rows if the word appears too often. You have to use LIKE in such case.

• You can't use a temporary table more than once in a query. Also a query will not be cached, if it uses temporary tables.

• If you use any other option to ALTER TABLE than RENAME, MySQL always creates a temporary table, even if the data wouldn't strictly need to be copied (such as when you change the name of a column). In other words, almost all alters causes mysql to rebuild the table. This tells the optimizer to do a table scan. If you used INNODB it's already sorted by the primary key since INNODB supports clustered indexes. Doing a table scan on innodb is very slow due to it's MVCC control. That's why the alter table command takes a long time.

• Apparently CREATE INDEX, DROP INDEX and ALTER TABLE operations on InnoDB tables cause the entire table to be rebuilt. slow INDEX DROP and CREATE is a top complaint among MySQL users. Therefore you have to disable keys and not drop using IGNORE INDEX

Page 25: Optimize do not_print

25

Mysql Bugs Part III

• InnoDB bug - TRUNCATE TABLE does not reset the auto_increment counter (on MyISAM tables it does reset as expected)

• By default, there must be no whitespace between a function name and the parenthesis following it.

• If a range covers more then 30% of the table a table scan is performed, instead of an index scan.

• MySQL will only allow you to join 32 tables in a single SQL statement. Allows for up to 32 indexes for each table, and each index can incorporate up to 16 columns

Page 26: Optimize do not_print

26

Online Services

• Format your sql query• http://sqlformat.appspot.com/

• SQL fiddle will help you with test cases those can be shared with others.

• http://sqlfiddle.com/#%212/07c83d/1/1

Page 27: Optimize do not_print

27

Finding logical errors using online validatorAND takes precedence over OR

• SELECT a.*• FROM table_a a• LEFT JOIN table_b b• ON b.lid = a.lid• WHERE a.published > 0• AND a.published <= 123• AND (a.expired = 0• OR a.expired > 123)• AND a.offline = 0• AND b.cid = a.cid• OR (a.cid = 1• OR b.cid = 1)• ORDER BY published DESC

• SELECT a.*• FROM table_a a• LEFT JOIN table_b b• ON b.lid = a.lid• WHERE a.published > 0• AND a.published <= 123• AND (a.expired = 0• OR a.expired > 123)• AND a.offline = 0 AND (b.cid = a.cid OR (a.cid = 1 OR b.cid = 1))• ORDER BY published DESC

Page 28: Optimize do not_print

2828

Finding logical errors using online validatorAND takes precedence over OR

• SELECT a.*• FROM table_a a• LEFT JOIN table_b b• ON b.lid = a.lid• WHERE a.published > 0• AND a.published <= 123• AND (a.expired = 0• OR a.expired > 123)• AND a.offline = 0• AND b.cid = a.cid• OR (a.cid = 1• OR b.cid = 1)• ORDER BY published DESC

• SELECT a.*• FROM table_a a• LEFT JOIN table_b b• ON b.lid = a.lid• WHERE a.published > 0• AND a.published <= 123• AND (a.expired = 0• OR a.expired > 123)• AND a.offline = 0 AND (b.cid = a.cid OR (a.cid = 1 OR b.cid = 1))• ORDER BY published DESC

Page 29: Optimize do not_print

29

Use google for query syntax

• alter "on update cascade" filetype:sql• Note the double inverted comma to keep

the 3 words together• filetype:sql to search for the pages those

end with .sql extension

Page 30: Optimize do not_print

30

Table engine gets replaced

• mysql> CREATE TABLE myload (id int(11) DEFAULT NULL, age int(11) DEFAULT NULL, salary int(11) DEFAULT NULL) ENGINE=abc

• Query OK, 0 rows affected, 2 warnings (0.24 sec)

• mysql> show warnings;• +---------+------+-----------------------------------------------+• | Level | Code | Message |• +---------+------+-----------------------------------------------+• | Warning | 1286 | Unknown storage engine 'abc' |• | Warning | 1266 | Using storage engine InnoDB for table 'todel' |

Page 31: Optimize do not_print

31

Change sql mode to keep table engine

• mysql> set sql_mode='no_engine_substitution';Query OK, 0 rows affected (0.00 sec)

• mysql> CREATE TABLE myload (id int(11) DEFAULT NULL, age int(11) DEFAULT NULL, dob date, salary int(11) DEFAULT NULL) ENGINE=abc;

• ERROR 1286 (42000): Unknown storage engine 'abc'

Page 32: Optimize do not_print

32

Data getting truncated• mysql> insert into myload values ('123a', 2, '1970-12-30', 3);• Query OK, 1 row affected, 1 warning (0.01 sec)

• mysql> show warnings;• +---------+------+-----------------------------------------+• | Level | Code | Message |• +---------+------+-----------------------------------------+• | Warning | 1265 | Data truncated for column 'id' at row 1 |• +---------+------+-----------------------------------------+• mysql> select * from myload;• +------+------+------------+--------+• | id | age | dob | salary |• +------+------+------------+--------+• | 123 | 2 | 1970-12-30 | 3 |• +------+------+------------+--------+• 1 row in set (0.00 sec)

Page 33: Optimize do not_print

33

Replace the text to integer• mysql> insert into myload values ('xyz', 2, '1982-09-21', 3);• Query OK, 1 row affected, 1 warning (0.01 sec)

• mysql> show warnings;• +---------+------+---------------------------------------------------------+• | Level | Code | Message |• +---------+------+---------------------------------------------------------+• | Warning | 1366 | Incorrect integer value: 'xyz' for column 'id' at row 1 |• +---------+------+---------------------------------------------------------+• 1 row in set (0.00 sec)

• mysql> select * from myload;• +------+------+------------+--------+• | id | age | dob | salary |• +------+------+------------+--------+• | 123 | 2 | 1970-12-30 | 3 |• | 0 | 2 | 1982-09-21 | 3 |• +------+------+------------+--------+• 2 rows in set (0.00 sec)

Page 34: Optimize do not_print

34

Data truncated part III• mysql> insert into myload values ('9876543210123xyz’, 2, '1999-02-31', 3);• Query OK, 1 row affected, 2 warnings (0.02 sec)

• mysql> show warnings;• +---------+------+---------------------------------------------+• | Level | Code | Message |• +---------+------+---------------------------------------------+• | Warning | 1264 | Out of range value for column 'id' at row 1 |• | Warning | 1265 | Data truncated for column 'dob' at row 1 |• +---------+------+---------------------------------------------+• 2 rows in set (0.00 sec)

• mysql> select * from myload;• +------------+------+------------+--------+• | id | age | dob | salary |• +------------+------+------------+--------+• | 123 | 2 | 1970-12-30 | 3 |• | 0 | 2 | 1982-09-21 | 3 |• | 2147483647 | 2 | 0000-00-00 | 3 |• +------------+------+------------+--------+• 3 rows in set (0.00 sec)

Page 35: Optimize do not_print

35

Only solution: strict sql mode

• set sql_mode='no_engine_substitution,strict_all_tables,no_zero_date';

• mysql> insert into myload values ('9876543210123x', 2, '1999-02-31', 3);ERROR 1264 (22003): Out of range value for column 'id' at row 1

• mysql> insert into myload values ('98765', 2, '1999-02-31', 3);ERROR 1292 (22007): Incorrect date value: '1999-02-31' for column 'dob' at row 1

Page 36: Optimize do not_print

36

Only full group by sql mode

• mysql> select salary, dob from myload group by salary;• +--------+------------+• | salary | dob |• +--------+------------+• | 3 | 1970-12-30 |• +--------+------------+• set

sql_mode='no_engine_substitution,strict_all_tables,no_zero_date, only_full_group_by';

• mysql> select salary, dob from myload group by salary;• ERROR 1055 (42000): 'test.myload.dob' isn't in GROUP BY

Page 37: Optimize do not_print

37

Implicit type conversion while selecting data

• mysql> select * from myload;• +------------+------+------------+--------+• | id | age | dob | salary |• +------------+------+------------+--------+• | 123 | 2 | 1970-12-30 | 3 |• | 0 | 2 | 1982-09-21 | 3 |• | 2147483647 | 2 | 0000-00-00 | 3 |• +------------+------+------------+--------+• 3 rows in set (0.01 sec)• mysql> select * from myload where id = 'abc';• +------+------+------------+--------+• | id | age | dob | salary |• +------+------+------------+--------+• | 0 | 2 | 1982-09-21 | 3 |• +------+------+------------+--------+• 1 row in set, 1 warning (0.00 sec)• mysql> show warnings;• +---------+------+-----------------------------------------+• | Level | Code | Message |• +---------+------+-----------------------------------------+• | Warning | 1292 | Truncated incorrect DOUBLE value: 'abc' |

Page 38: Optimize do not_print

38

Strict sql mode works only while inserting data

• There is no strict sql mode for selects• value 123abc is implicitly converted to 123 • mysql> select * from myload where id = '123abc';• +------+------+------------+--------+• | id | age | dob | salary |• +------+------+------------+--------+• | 123 | 2 | 1970-12-30 | 3 |• +------+------+------------+--------+• 1 row in set, 1 warning (0.00 sec)

• mysql> show warnings;• +---------+------+--------------------------------------------+• | Level | Code | Message |• +---------+------+--------------------------------------------+• | Warning | 1292 | Truncated incorrect DOUBLE value: '123abc' |• +---------+------+--------------------------------------------+

Page 39: Optimize do not_print

39

There is no sql mode for this

• When you select data you have to make sure that you compare integer with integer and varchar with varchar

• id = ‘abc’ will be converted to id = 0 internally if id column is integer.

• This bug is still pending • Take care when you write a query like this.

select * from myload where id = ‘123abc';