Monday, May 10, 2010
Shan Tuyet Tea - shantuyet.com
http://www.shantuyet.com - Shan Tuyet tea: special green tea called Snow Green Tea. Only Shan Tuyet tea trees in Ha Giang, Lao Cai, Lai Chau, Son La are growing naturally, no fertilizer, no insecticides. Shan Tuyet tea is a natural and clean tea. Those tea trees in Ha Giang have a long-standing form with 6-9 meters of height. These special features bring a subtle drink. Green Tea in bulk Speciality in Diversity. Tea is a part of Vietnamese lifetime, we appreciate and enjoy our unique green tea. Green tea is a delicate, delicious-tasting beverage.
Wednesday, May 5, 2010
Tuesday, May 4, 2010
Đồng bộ file .htaccess bằng lệnh SCP
Hịên tại chỉ biết có thể đồng bộ file .htacces tới các server khác bằng cách dùng scp trực tiếp cho 1 file
scp -P 22 /path/to/.htaccess user@server_name:/path/to/
Monday, May 3, 2010
mysql - Covering index and prefix indexes
Covering index and prefix indexes
I already wrote in the blog as well mentioned in presentation there is often a choice you have to make between having prefix index - which can be significantly smaller in size and having index being covering index, which means query can be executed using only data from the index without reading the row itself.
Today I had a chance to do couple of experiments to see when exactly it works or does not work:
- CREATE TABLE `t` (
- `i` int(11) DEFAULT NULL,
- `j` char(10) DEFAULT NULL,
- `k` int(11) DEFAULT NULL,
- KEY `i` (`i`,`j`(5),`k`)
- ) ENGINE=MyISAM
Now lets see if index can be used as covering index if it has some key parts which are prefixes:
- mysql> EXPLAIN SELECT k FROM t WHERE i=5 \G
- *************************** 1. row ***************************
- id: 1
- select_type: SIMPLE
- TABLE: t
- type: ref
- possible_keys: i
- KEY: i
- key_len: 5
- ref: const
- rows: 1
- Extra: USING WHERE; USING INDEX
- 1 row IN SET (0.00 sec)
Great. As you see you actually can use index which has prefix key parts as covering index if you do not touch columns which only have prefixes in the index. Notice "Using Index" in Extra column.
- mysql> EXPLAIN SELECT j,k FROM t WHERE i=5 \G
- *************************** 1. row ***************************
- id: 1
- select_type: SIMPLE
- TABLE: t
- type: ref
- possible_keys: i
- KEY: i
- key_len: 5
- ref: const
- rows: 1
- Extra: USING WHERE
- 1 row IN SET (0.00 sec)
In this case index can't be used as covering index as we only have portion of "j" in the index. It is interesting I tested it with single character column values which all do fit in the index but MySQL does not look at the actual data in this case it only looks at definitions.
- mysql> EXPLAIN SELECT k FROM t WHERE i=5 AND j LIKE "a%" \G
- *************************** 1. row ***************************
- id: 1
- select_type: SIMPLE
- TABLE: t
- type: ref
- possible_keys: i
- KEY: i
- key_len: 5
- ref: const
- rows: 1
- Extra: USING WHERE
- 1 row IN SET (0.00 sec)
This example is less obvious - one may think why can't we read only from the index as we only select k column ? The reason is - we're using column j in where clause. Even if this particular like check can be done only by using index, MySQL is not smart enough to notice it - it simply checks if column is used in the query and if it does, covering index can't be used.
Note: MySQL is however smart enough to make sure prefix specified is actually prefix, not the full key length. If you would specify key length of 10 in this case it will convert it to the full key instead of prefix key. If you would use length longer than column length, lets say 20, MySQL will refuse to create an index.
So in general handling of prefix keys in MySQL is smart in this respect.
MySQL: Power of covering index
So what is covering index ?
Lets say you have the query "select b from tbl where a=5" Assuming column "a" is indexed and table is not clustered
by this column MySQL will need to perform at least 2 reads, at first to read index page to find row pointer and when
to read the row itself. If you would have "covering index" (a,b) instead of indexing just column "a" you will have only
index read needed, so you could expect 2 times performance improvement, sometimes a lot more.
When covering index helps a lot:
- When you have large tables. If your tables fits in memory this extra read required is inexpensive
- When you have long rows (ie with BLOBs), in such case you can have order of magnitude performance improvement
- When extra columns do not increase key length a lot.
- When you have large join with a lot of lookups on second table (see below)
- when a lot of rows match the same key value (by reading single key block you save tons of random IO)
- MyISAM tables benefit more than Innodb as MyISAM does not cache rows (only in OS cache) so random data IO is CPU intensive.
Things to watch:
- Innodb tables already clustered by PRIMARY KEY so all primary key lookups are already sort of "covered"
- If column "b" is frequently updated updating covering index on updates can be expensive
- Only full indexed columns work, if you will index column prefix, index will not be used as "covering index" for this
column.
- If you have large range scans or index scans they can slow down by increasing index length
- Watch out for adding VARCHAR part to INT index to make it covered. MySQL will pack such keyblocks and
your lookups can become unexpectedly slower
- Make sure MySQL actually chooses covered index, especially if you have several indexes with same prefix
- If there are no matching rows for the key, covering index does not help.
Example:
Here is example from BenchW (http://sourceforge.net/projects/benchw
No Covering index:
mysql> explain SELECT d0.dmth, count(f.fval ) FROM dim0 AS d0, fact0 AS f WHERE d0.d0key = f.d0key AND d0.ddate BETWEEN '2010-01-01' AND '2010-12-28' GROUP BY d0.dmth;
+----+-------------+-------+-------+----
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+----
| 1 | SIMPLE | d0 | range | dim0_d0key,ddate | ddate | 3 | NULL | 334 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | f | ref | d0key | d0key | 4 | benchw.d0.d0key | 1000 | |
+----+-------------+-------+-------+----
2 rows in set (0.00 sec)
Time: 1.94 sec
Query with key (dkey0) expanded to "covering index" (dkey0,fval0)
mysql> explain SELECT d0.dmth, count(f.fval ) FROM dim0 AS d0, fact0 AS f WHERE d0.d0key = f.d0key AND d0.ddate BETWEEN '2010-01-01' AND '2010-12-28' GROUP BY d0.dmth;
+----+-------------+-------+-------+----
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+----
| 1 | SIMPLE | d0 | range | dim0_d0key,ddate | ddate | 3 | NULL | 334 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | f | ref | d0key | d0key | 4 | benchw.d0.d0key | 1000 | Using index |
+----+-------------+-------+-------+----
2 rows in set (0.00 sec)
Time: 0.66 sec
Note, in this case we had working set in memory completely. In disk bound case I would guess larger difference.
"Using Index" in Extra column is what tells us "covering index" is being properly used for this table.
Saturday, May 1, 2010
25 Point Basic MySQL Setup/Optimization Checklist
Daily I run into new web programmers that are using PHP and MySQL to create their blogs and websites. I created this checklist as a guide for new and experienced to make sure they are covering the basics of a MySQL server setup.
This guide is by no means all inclusive, but should help to cover some of the major gaps in knowledge and commonly overlooked fundamentals that I run into on a daily basis.
The checklist is separated into 5 equal sections: Server Setup, Schema Design, Table Design, Index Optimization, Query Optimization, and a 6th Bonus Tips section.
You can also download a simplified summary on PDF form.
Section 1 – Server Setup
- Root User
For security reasons, the root MYSQL user must be setup with a secure password, and should only have access from localhost. It is a bad idea to allow outside access to the root account. Create additional users if you need to access the database remotely! - Backup and Restore
Before allowing a database to be used in a production environment, there should be a usable backup and restore process. I use the phrase “in case the database server is completely destroyed” because the backup location and method needs to be completely independent from the database server. Note: Even a weekly database backup is better than no backup at all. - Benchmarking
There’s no easy way to determine bottlenecks and trouble unless a method to benchmark performance is in place. The slow query log should always be enabled, and it’s a good idea to install a benchmarking program. Monyog is an external program that provides a number of real-time reports useful for monitoring and performance tuning. - DNS
If you do not allow outside access, or you can access your server from known IP addresses, disabling DNS look-ups can speed up server operations. Additionally, if the MySQL server loses it’s DNS look-up service, the usability of the entire Database can all but halt. - Privileges
When adding users to a database, only give them the permissions that are absolutely required, and be specific in where they can access from. “GRANT ALL ON EVERYTHING TO USER@ANYWHERE” is a really bad idea. If you do need to give users full permissions for installations or another purpose, it’s a good practice to change them back to the minimum once complete.
Section 2 – Schema Design
- Naming
A standardized naming scheme should always be used. The best practice is to use lowercase letters with an underscore connecting names, such as `my_personal_database`. Tables, and individual column names should carry the same naming convention. Use descriptive names for every column including id columns. `id` is not descriptive whereas `contact_id` is. - Collation
Use the same collation for all parts of the database, and avoid using UTF-8 or multi-byte formats unless you specifically need them. Keeping the same format on all tables and columns can help prevent data corruption and conversion problems. UTF-8 requires significantly more disk space and overhead which can reduce performance. If you need UTF-8, use it , but don’t make your entire database UTF-8 because you’re lazy. - Foreign Keys
Always use foreign keys to ensure that bad / incomplete data stays out of the database. Nothing replaces good application level programming, but foreign keys are the best way to prevent putting bad data into your database in the first place. You will need InnoDB to use foreign keys, but the benefit is worth it. - Logically Segmented Data in Tables
Tables should be segmented logically by the data they contain and their association with other tables. In this manner, there may be more total tables, but will help eliminate tables with a huge number of columns which can really hurt performance. Additionally, it will make querying easier as it’s unlikely that every column is needed for every query. This also allows for Single-to-many relationships such as storing multiple addresses related to a single entry. Don’t be afraid of 20 tables with 20 columns each, be afraid of 1 table with 400 columns! - Reserved Words
Avoid using reserved words for any name in your database schema. Words like date, time, decimal, etc. are often used, and can wreak havoc trying to get queries to work properly, and can cause even more difficulty in debugging. You can technically use these words if they are placed in back-ticks (`date`), but this is a bad practice and should always be avoided.
Section 3 – Table Design
- Data types
MySQL has many data types, probably more than any other database. Using the correct data type for the data being stored is one of the most important aspects in design. Failing at this step can break a database’s speed and the usability of an application.Whole Numbers – BIGINT, INT, MEDIUMINT, SMALLINT, TINYINT
Decimals – DECIMAL, FLOAT, DOUBLE, REAL
Dates – TIMESTAMP*, DATETIME, DATE, TIME, YEAR
Strings – CHAR, VARCHAR, BINARY, VARBINARY, BLOB, TEXT, ENUM, SETAdditionally, using the correct data type allows for the use of MySQL’s built-in functions which can sort, do math, comparisons, date conversions, etc. For example, I often see dates stored in VARCHAR columns, which completely prevents MySQL from sorting, or using any date related function.
- Large Numerical Keys
It’s common for new programmers to use a BIGINT(20) when they need a key column. While admirable, this is a waste of disk space. An UNSIGNED INT(10) has over 4 billion possible numbers, which is more than most will ever use. Even so, by that time, you will want to look into partitioning, and will have a variety of other problems on your hand. Don’t use BIGINT’s unless you need to store very-very large numbers. - Smallest Length
Using the smallest length data length is important. Every byte of savings adds up when a database’s size and usage goes up. Lazy programming by using VARCHAR(255) or DECIMAL(20,2) creates unnecessary overhead and causes problems down the road. Give yourself one extra byte of space if needed, but 100 is a overkill. - Avoid TEXT and BLOB Columns
TEXT and BLOB type columns can eat up a server’s resources when being selected. While these are most certainly needed to store larger amounts of data, they should only be used for that purpose. VARCHAR can hold up to 255 bytes and should always be used before TEXT whenever possible. - Non‐Relational Storage
A huge design mistake is storing data in a non-relational format. It’s common to see data stored in a CSV format like (value1,value2,value3) in a single column. This effectively bypasses MySQL’s ability to use the data. It’s best to use multiple tables for single-to-many relationships, as this allows for MySQL to handle the data in an elegant manner. There are some situations where storing csv-like data would make sense, but for all intensive purposes, avoid storing data like this.
Section 4 – Index Optimization
- Use proper indexes
MySQL supports several types of indexes (PRIMARY, UNIQUE, NORMAL, PARTIAL, and FULLTEXT). It is important to use the correct type of index for the job. It is also important to only use indexes when needed, and not to create duplicate indexes. For example a primary key column already has an index, so adding a second UNIQUE INDEX on the primary key is a complete waste of overhead and disk space. - Multi-Column Indexes
If there is a data set that is constantly queried with more than one column in the WHERE clause, it may be a good idea to create a multi-column index. If you have an index on (`user_id`,`user_category`) the index will work when both are in the where clause or the first column (`user_id`) is in the where clause. However, the index will not be used if only `user_category` is in the where clause. - Modifying Indexed Fields During a Query
Unless you specify the length of an index, modifying an indexed column will prevent the index from being used.
For example, if there is an index on `credit_card_number` and you perform a query like this:SELECT `user_id` FROM `my_table` WHERE LEFT(`credit_card_number`,4) = '5666';
The index will not be used. If this was a common scenario, you could create a partial index of (`credit_card_number`,4), and the above query would use the index. - Indexes With a High Cardinality
Indexes work best when there are many unique values in relation to the total number of rows. This allows the database engine to quickly reduce the number of possible rows in the result set. Indexes on columns with only a few unique values are inefficient and will end up being a waste of overhead. - Unique and NULL Column Indexes
Allowing NULLS in index columns adds an additional byte of storage per row to the index. This again equates to a waste of space and overhead and will slow down MySQL’s performance. It’s better to use no value rather than NULL.
Section 5 – Query Optimization
- Specific Column Names
Always use specific column names instead of * when querying a table. SELECT * is lazy programming. While it is completely valid syntax, you won’t know the columns that will be returned. If you don’t know what you’re going to get with a query, there’s no reason to use it.. right? Write out any column names that you need data from. This way your code is intuitive, you won’t have problems trying to use data from a column that doesn’t exist, and the next person using your script wont hate you. - MySQL’s Built-in Functions
MySQL has a variety of very advanced, and very fast, built-in functions. They probably are much more efficient than php or most other application level scripts. These functions can greatly increase your application’s speed, and reduce its complexity. MySQL has everything from mathematical operations, date comparisons, even spacial functions for calculation geographic equations. Learn to use them. - Selecting TEXT and BLOB Columns
When a TEXT or BLOB column is select in a query, MySQL will create a temporary internal table. If large result sets are selected with TEXT or BLOB columns, this can create a major load on the database, and unnecessary overhead. This relates back to SELECT *, don’t select a TEXT or BLOB type column unless you actually need to use the data. - Use Transactions
Transactions are another great way of preventing incomplete or corrupted data while inserting or altering data. When using a transaction, you can insert or alter any number of rows of data. If there is an error, all of the queries in the transaction will be aborted. Think of inserting 50,000 rows into a report table, and having 10 arbitrary rows not insert correctly. That entire set of data is now corrupt, and a transaction would have prevented that. - SQL_NO_CACHE
SQL_NO_CACHE is a great way to prevent MySQL from caching a query’s result. This is important for results with a rapidly changing data, or very large result sets. Both of these situations can eat up server resources without any gain to the application or end-user.
Bonus
- TIMESTAMP vs. DATETIME
TIMESTAMP and DATETIME store dates in the exact same format (YYYY-MM-DD HH:MM:SS) but TIMESTAMP uses less space to do so. The only limitation is that TIMESTAMP cannot be used for dates older than Jan 1st, 1970. - SIGNED INT
Unless you need to store negative numbers, only use UNSIGNED INT and other numerical data type fields. There’s no reason to allow for negative numbers if they will never be used. - Collation: _ci vs. _cs
The _ci in a collation stands for “case insensitive”. If you care about case sensitivity use a collation that ends in _cs. The can be very important for searching and other operations where John ≠ john! - InnoDB vs. MyISAM
If you’re using MyISAM as a storage engine only because it was the default, you may be making a mistake. InnoDB is superior in several areas (Reliability, Backups, Foreign Keys, and Performance in many situations) and while maybe not always the best option (Full Text Indexing), you should know why you’re using the engine you’re using. You can also mix the 2, but this can make performance tuning especially difficult. - Consult a Professional
When you get a project and the database design, usage or other factor is just over your head, it’s a good idea to consult a professional. It may cost a fair sum, but the cost down the road could be substantial. Planning is always cheaper than reacting.