Archiving data – great article

http://dev.mysql.com/tech-resources/articles/storage-engine.html

Some implementation notes:

in gentoo you must enable the storage engines in /etc/make.conf

USE="unicode big-tables extraengine

In debian

This article is so valuable to me so I will copy/paste the last part of it here…

Data Archiving Techniques

Data archiving, a subset of Information Lifecycle Management (ILM), typically involves horizontally partitioning data so that a set of current information resides in one or more objects, while the historical/archived data is placed in separate objects. Oftentimes, historical data can be transferred to lower-cost storage devices (like EMC Centera) that allow easy disk access, but provide significant financial savings. Of course, the idea behind data archiving is to lessen response times by reducing the sheer amount of data needed for current access. However, for analytic purposes, there are times when both the current and archive data needs to be accessed together.

For data archiving environments, Archive tables can easily be accessed in standalone fashion or be set up in union views, which is the primary vehicle used to reference both current and archived data in one object. By combining MySQL’s Archive storage engine with the new 5.0 Federated table support, a DBA can smartly transfer historical data to cheapter storage devices and then reference both current and archived data when needed.

For example, let’s say you have a set of recent information regarding customer transactions in a transactional InnoDB table called CLIENT_TRANSACTION, and a set of historical transactions stored in an Archive table called CLIENT_TRANSACTION_HIST. To lessen load on the active server, you can

Transfer the history table to another MySQL server that utilizes cheaper storage devices.
Set it up to use the Archive storage engine.
Create a federated table link to the historical data from the current database.
Build a union view to reference both together when needed.

On the historical MySQL Server:

mysql> alter table client_transaction_hist engine=archive;
Query OK, 112050 rows affected, 0 warning (1.98 sec)
Records: 112050  Duplicates: 0  Warnings: 0

On the MySQL Server that contains active data:

mysql> select count(*) from client_transaction;
+----------+
| count(*) |
+----------+
|    18675 |
+----------+
1 row in set (0.00 sec)

mysql> CREATE TABLE client_transaction_hist (
    ->   client_transaction_id int(11) NOT NULL,
    ->   client_id int(11) NOT NULL,
    ->   investment_id int(11) NOT NULL,
    ->   action varchar(10) NOT NULL,
    ->   price decimal(12,2) NOT NULL,
    ->   number_of_units int(11) NOT NULL,
    ->   transaction_status varchar(10) NOT NULL,
    ->   transaction_sub_timestamp datetime NOT NULL,
    ->   transaction_comp_timestamp datetime NOT NULL,
    ->   description varchar(200) default NULL,
    ->   broker_id bigint(10) default NULL,
    ->   broker_commission decimal(10,2) default NULL
    -> ) ENGINE=FEDERATED
    -> COMMENT='mysql://mysql:password@serv1:3306/gim/client_transaction_hist';
Query OK, 0 rows affected (0.14 sec)

mysql> CREATE VIEW client_transaction_all as
    -> select * from client_transaction
    -> union all
    -> select * from client_transaction_hist;
Query OK, 0 rows affected (0.08 sec)

mysql> select count(*) from client_transaction_all;
+----------+
| count(*) |
+----------+
|    130725|
+----------+
1 row in set (1.20 sec)

Note: rather than use a SELECT * in the view definition above, the actual columns should be listed out; the above was only done to save space for this article.

The targets for Archive tables in data warehousing or data archiving environments are fact and summary tables as they usually contain the bulk of the information. Dimension tables are rarely large enough to warrant a switch to Archive.

Miscellaneous Archive Engine Notes

We talked earlier about the Archive engine being a good choice for storing data auditing results as only INSERT and SELECT are supported. Let’s check to be sure:

mysql> delete from test_archive where client_id = 50;
ERROR 1031 (HY000): Table storage engine for 'test_archive' doesn't have this option
mysql> update test_archive set price=120 where client_id = 50;
ERROR 1031 (HY000): Table storage engine for 'test_archive' doesn't have this option

We also talked about how Archive doesn’t support indexes at this time:

mysql> create index my_index on test_archive (client_transaction_id);
ERROR 1069 (42000): Too many keys specified; max 0 keys allowed

One special thing to note about Archive tables and indexes is that, if you want to alter another storage engine table to be an Archive table, you must first drop any indexes that have been defined on the table.

Archive tables are supported by the new sql_mode data integrity enhancements in MySQL 5.0, so you can ensure only proper data finds its way into your Archive tables:

mysql> create table test (c1 int) engine=archive;
Query OK, 0 rows affected (0.11 sec)

mysql> insert into test values ('hi');
Query OK, 1 row affected, 1 warning (0.05 sec)

mysql> select * from test;
+------+
| c1   |
+------+
|    0 |
+------+
1 row in set (0.00 sec)

mysql> set sql_mode='strict_all_tables';
Query OK, 0 rows affected (0.00 sec)

mysql> insert into test values ('hi');
ERROR 1264 (22003): Out of range value adjusted for column 'c1' at row 1

Conclusion

DBAs facing the problem of corporate data explosion have an excellent new tool to help them in the MySQL 5.0 Archive storage engine. Whether it’s a data warehousing, data archiving, or data auditing situation, MySQL Archive tables can be just what the doctor ordered when it comes to maintaining large amounts of standard or sensitive information, while keeping storage costs at a bare-bones minimum.

Updated: 2005-09-16

Ivo Bardarov

Software development

Recent Posts

Categories

Katz Got Your Tongue

Archiving data – great article

Data Archiving Techniques

Miscellaneous Archive Engine Notes

Conclusion

Google analytics