Minnu's Blog on Informatica & Data warehouse concepts


Wednesday, October 29, 2008

DB Sizing Model

I'm trying to size tables; I've read about numerous ways to calculate the size of the table, such as multiplying the record length by the number of records to find the total size of the table. Is this calculation correct? If by "record" you mean raw input record length, then no, this calculation is not correct. It can give you a ballpark estimate, but it's not completely accurate by any means.

For starters, for each column in the record, you must also consider the overhead. For example, say I load the number 1,000,000,000 into a numeric field. That first number will take 2 bytes of storage. Say I then load the number 99,999 into that same field; that will take 4 bytes of storage. Dates always take 7 bytes. For varchars and chars, it depends on the actual number of characters and the field length, so storage overhead for these types of fields can vary widely.

You also have to consider block overhead. The system uses a couple of bytes for each block. In addition, you have to take into account the block size itself: The bigger the block size, the more rows you can store per block. The result is less total block overhead.

For example, if I configure my system to use 2K blocks, the system might get two rows for each block. If I configure 8K blocks, however, my system would get about nine rows per block. Although the block overhead would be about the same on a block-by-block basis, my 2K block size would result in four times as many blocks and hence four times as much block overhead. With that said,

my tried-and-true method for sizing a table involves three simple steps (you'll see these steps in action in the answer to the "More Sizing Matters" question):
Load some representative rows into a table.
Analyze the table.
Multiply the blocks column found in user_tables by the factor needed to make your sample as large as the anticipated number of total rows;

this will be the estimated table size.

If you get a decent sample, meaning a fairly representative set of the data your table will be holding, this process works well. If you cannot generate a decent sample, you won't be able to size this table no matter what. No simple formula is going to give you an answer if you don't know what the input data will be and the impact of overhead in your particular database tables and their contents. More Sizing Matters

How can I find out the size of a table in bytes? There are two ways of looking at table size.

One is the amount of space physically allocated to the table, whether it uses the space or not. This aspect of table size is visible in the USER_SEGMENTS view; you can use this query to get the number of bytes:

select segment_name, bytesfrom user_segments wheresegment_type = 'TABLE';

The other aspect of table size is the amount of space a table actually is using within the allocated space. To find the amount, you can use this query:
scott@dev8i> analyze table emp compute statistics;
Table analyzed.
scott@dev8i> select num_rows * avg_row_len from user_tableswhere table_name = 'EMP';
scott@dev8i> select bytesfrom user_segmentswhere segment_name = 'EMP';

The result of the query shows that the emp table is using 560 bytes of the 524,288 bytes allocated to it. One final note: Make sure you are aware of the ramifications of using the analyze command and how it may affect the optimizer. Either analyze all of your tables regularly or delete the statistics after running the analyze command.

You can use Informatica Velocity Template to compute estimates