person:yoshinori matsunobu

  • Proper handling of insert-mostly, select-recently datasets | Yoshinori Matsunobu’s blog
    http://yoshinorimatsunobu.blogspot.com/2011/05/proper-handling-of-insert-mostly-select.html

    c’est la problématique typique de #rezo ou de #seenthis : plein de données, dont seules les plus récentes servent vraiment souvent.

    One of the best approaches in #MySQL is using range partitioning, partitioned by date or primary key. This is one of my favorite features in MySQL. By using range partitioning, only the latest #partition is actively accessed. Data/indexes in the rest partitions are much less accessed so they don’t occupy buffer pool. Each partition size can be small enough to fit in memory, so insert performance does not drop.

    CREATE TABLE message (
    message_id BINGINT UNSIGNED,
    user_id INT UNSIGNED,
    body VARCHAR(255),
    ...
    created DATETIME,
    INDEX(message_id)
    INDEX(user_id)
    ) engine=InnoDB
    PARTITION BY RANGE(to_days(d1)) (
    PARTITION p201103 VALUES LESS THAN (to_days(’2011-03-01’)),
    PARTITION p201104 VALUES LESS THAN (to_days(’2011-04-01’)),
    PARTITION p201105 VALUES LESS THAN (to_days(’2011-05-01’)),
    PARTITION p201106 VALUES LESS THAN (to_days(’2011-06-01’)),
    PARTITION p201107 VALUES LESS THAN (to_days(’2011-07-01’)),
    ...
    );