Friday, September 2, 2016

Exadata - Hybrid Columnar Compression

Why HCC?

Hybrid Columnar Compression (HCC) is a method for organizing data within a set of database blocks. 
A logical construct called the compression unit (CU), is used to store a set of HCC-compressed data.
HCC uses choice of algorithms: LZO, ZLIB BZIP2 On compute node only optimizer aware of the compression. 

Hybrid Columnar Compression option available only for tables residing on Exadata storage, Axiom, SPARC SuperCluster, Sun ZFSUseful for data that is bulk loaded and query. 

How to Save Maximum Storage with HCC?

To maximize storage savings with Hybrid Columnar Compression, data must be loaded using data warehouse bulk loading techniques. 
Examples of bulk load operations commonly used in data warehouse environments are: 

- Insert statements with the APPEND hint 
- Parallel DML
- Direct Path SQL*LDR
- Create Table as Select (CTAS)

Having the data in the table mixed with OLTP and HCC does not create specific overhead for a range scan [which implies an index range scan and rowid fetches from the table]. It does mean that the data will not be as compressed as it could be, and thus may be worthwhile using alter table move to recompress it.

Advantages and Downsides of HCC:

Queries on hybrid columnar compressed data often run in the Exadata storage cells with Smart Scans, using a high performance query engine that utilizes special columnar processing techniques. Data sent back to the database server(s) is usually compressed (and is typically much less data than is read from disk) and the compressed data is subsequently processed by the database server(s). Note that data remains compressed not only on disk, but also remains compressed in the Exadata Smart Flash Cache, on Infiniband, in the database server buffer cache, as well as when doing back-ups or log shipping to Data Guard. 

One of the key benefits of the hybrid columnar approach is that it provides both the compression and performance benefits of columnar storage without sacrificing the robust feature set of the Oracle Database. For example, while optimized for scan-level access, Oracle is still able to provide efficient row-level access, with entire rows typically being retrieved with a single I/O, because row data is self-contained within compression units. 

In contrast, pure columnar formats require at least one I/O per column for row-level access. With data warehousing tables generally having hundreds of columns, it is easy to see the performance benefits of Hybrid Columnar Compression on Exadata. Further, tables using Hybrid Columnar Compression on Exadata still benefit from all of the high availability, performance, and security features of the Oracle Database. 

Note that while data in Hybrid Columnar compressed tables can be modified using conventional Data Manipulation Language (DML) operations - INSERT, UPDATE, DELETE - performing such operations could result in a reduction of the HCC compression ratio. It is recommended that HCC be enabled on tables or partitions with no or infrequent DML operations. If frequent DML operations are planned on a table or partition, then the Oracle Advanced Compression Option is better suited for such data.

HCC modes:

Warehouse compression modes

Query Low  -Recommended for data warehouse with data load and query performance time as critical factor
Query High -Recommended for data warehouse with focus on space saving

Online Archival(for history data) compression modes 

Archive low  - Recommended for archival data with data load and query performance time as critical factor
Archive High - Recommended for archival data with maximum space saving





How HCC Work?

- Traditionally, data organized within a DB block in a ‘row’ format, where all column data for a particular row is stored sequentially within a single database block. 









- In real-world situations, data is more often repeated in columns, not rows. 

For example-



- Smith values repeated multiple times. Compression can be achieved by replacing the “Smith” value with a smaller value. Which will reduce the size of the data significantly.




Using HCC..

- Creates a column vector for each column
- Compress the column vector
- Stores the column vector in the data blocks. 
- Collection of data blocks are called as Compressed  Unit.
- The blocks in the CU contains all the columns for a set of rows.



Compression Unit:

CU Header- Contains details about the offsets into the compression unit of all the columns.
Bitmap:  To identify the rows that have been deleted  or migrated to different location (update).
Column’s (‘n’): Each column consist of values extracted from each rows (in compressed format).
Once the oracle construct the CU, the rows are stored in the single column
When issue ”alter table t1 move compress for archive high;” that might pack the contents of the entire table into a single compression unit looking something like the below picture:

Compression Syntax:
Online Archival Compression
CREATE TABLE emp (…) COMPRESS FOR ARCHIVE [LOW | HIGH];
Warehouse Compression
CREATE TABLE emp (…) COMPRESS FOR QUERY [LOW | HIGH];

Decompression:
Queries run directly on HCC data does not require the data to be decompressed.
Data that is required does not need to be decompressed; only the columns and rows being returned to the client are decompressed in memory.
The decompression process typically takes place on the Oracle Exadata Storage Server in order to maximize performance and offload processing from the database server.

Compression Advisor: 
Oracle Compression advisor provides an estimate of the compression ratio
For DB prior to Oracle database 11G R2 use the DBMS_COMP_ADVISOR pkg
For Oracle database 11G R2 will use the DBMS_COMPRESSION pkg that is included with DB.
There are no specific hardware required.


HCC Testing:

Script to derive the compression ratio using various levels of compression
-----------------------------------------------------------------------------------------------

set serveroutput on
declare

    m_scratchtbsname    varchar2(32) := '<Tablespace Name>';
    m_ownname       varchar2(32) := '<Schema name>';
    m_tabname       varchar2(32) := '<Table name>';
    m_partname      varchar2(32) := '<Partition Name>';
    m_comptype      number(10,0) := dbms_compression.comp_for_archive_high;
--    m_comptype      number(10,0) := dbms_compression.comp_for_archive_low;
--    m_comptype      number(10,0) := dbms_compression.comp_for_query_high;
--    m_comptype      number(10,0) := dbms_compression.comp_for_query_low;
    m_blkcnt_cmp        pls_integer;
    m_blkcnt_uncmp      pls_integer;
    m_row_cmp       pls_integer;
    m_row_uncmp         pls_integer;
    m_cmp_ratio         number;
    m_comptype_str      varchar2(4000);

begin

    DBMS_COMPRESSION.GET_COMPRESSION_RATIO (
        scratchtbsname  => m_scratchtbsname,
        ownname     => m_ownname,
        tabname     => m_tabname,
        partname    => m_partname,
        comptype    => m_comptype,
        blkcnt_cmp  => m_blkcnt_cmp,
        blkcnt_uncmp    => m_blkcnt_uncmp,
        row_cmp     => m_row_cmp,
        row_uncmp   => m_row_uncmp,
        cmp_ratio   => m_cmp_ratio,
        comptype_str    => m_comptype_str
    );

    dbms_output.put_line('Blocks compressed:           ' || m_blkcnt_cmp);
    dbms_output.put_line('Blocks uncompressed:         ' || m_blkcnt_uncmp);
    dbms_output.put_line('Rows per block compressed:   ' || m_row_cmp);
    dbms_output.put_line('Rows per block uncompressed: ' || m_row_uncmp);
    dbms_output.put_line('Compression Ratio:           ' || m_cmp_ratio);
    dbms_output.put_line('Comment:                     ' || m_comptype_str);

end;
/


HCC result and Explanation 

I managed to run the HCC compression adviser script in one of our warehouse database and get compression estimates for some of the older partitions of varying sizes. Please see the estimates and approx. space savings in the below table. All of the below tables were run through the adviser with the different compression options. To run the compression adviser we need to have a minimum of 1 million rows in a table / partition. So I chose the first partition based on number of rows and picked the subsequent ones by doubling up on row count so that we could have estimates for varying table sizes. I manages to run the scripts multiple times for the below tables and this should give us sufficient information to implement HCC.

Schema
Table Name
Partition Name
Row Count
Compression Technique
Archive High
Archive Low
Query High
Query Low
Comp. Ratio
Approx. Space Saving
Comp. Ratio
Approx. Space Saving
Comp. Ratio
Approx. Space Saving
Comp. Ratio
Approx. Space Saving
MART
CC_TRIAD
CC_MAY2012
1,539,453
46.2
1.49 GB
38.5
1.48 GB
28
1.47 GB
7.2
1.31 GB
MART
CC_CONSUMER
CC_AUG2015
8,144,678
97.1
2.51 GB
67
2.505 GB
61.8
2.502 GB
22.9
2.43 GB
MART
CC_AR
CC_MAY2012
16,423,764
8.2
710 MB
6.8
690 MB
5.4
660 MB
2.5
495 MB
MART
CC_LOG
CC_JUL2014
30,186,021
18.1
2.43 GB
17.5
2.39G
12.7
2.34 GB
4.7
2 GB
MART
CC_ AM0C
CC_AUG2015
61,966,728
7.5
1.1 GB
6.9
1.09 GB
5.3
1.03 GB
2.6
818 MB











- Archival Data is when you would use underlying Data as read-only and less frequently for reporting (queried rarely)

- Data Warehouse type data would undergo data loads and looked up more frequently for reporting (queried regularly)


I consciously chose older partitions (of past years) for this exercise as those are the ones that could be safely targeted in the initial round of HCC compression activity. I would need confirmation from MI application team on whether the older partitions are used only for query look ups (read-only) or otherwise.

-          For archival type data, our recommendation would be to implement the “Archive High” compression to benefit from maximum space savings.
-          For data warehouse type data, our recommendation would be to implement “Query Low” compression to benefit from substantial space savings without compromising on query performance and data load times.


When it comes to implementing the compression technique against the partitions, it could be done online without any outage to table / partition availability and again we have two options here.

11)      ALTER TABLE MOVE PARTITION COMPRESS … (preferred method)

This would enable the compression technique on the chosen table / partition by making the object available for read-only query purposes but will prevent any DML transactions happening on the object until the compression activity is completed. This will be the fastest method.

22)      Online table redefinition using DBMS_REDEFINITION


This would enable the compression technique on the chosen table / partition whilst the object is fully available for query as well as DML transactions. However, this method is slower compared to ALTER TABLE MOVE option. 


Enabling HCC with Axiom:




















Enabling HCC with Sun ZFS:


Conclusion:
- Compression technologies can 
•Save space
•Improve performance
•Reduce the cost of storing
•Managing data
- Decompression was not need to transmit the data to memory(SGA) when using HCC.

- Studies proven that, on Retail data schema HCC was able to reduce capacity required by 19X and improves performance by 6X.

4 comments:

  1. Very Good Analysis and detail explanation on HCC, very usefull.

    Defenetly worth Reading thanks for this.

    ReplyDelete
  2. Hi Viswanath,

    Your explanation regarding HCC is very educative and covered all the aspects of HCC. There are very few blogs we find where someone has explained things so well.

    Thank you so much for sharing.

    Pankaj

    ReplyDelete
  3. Hi Viswanath,

    My primary database is running with exadata x5 machine and standby database is running on ODA box. Since now we don't use HCC option, my question would be if I enable HCC how it will work in standby as ODA we don't have any option like HCC.

    Thanks in advance

    ReplyDelete