Vishwanath Nayak : Exadata - Hybrid Columnar Compression

Why HCC?

Hybrid Columnar Compression (HCC) is a method for organizing data within a set of database blocks.
A logical construct called the compression unit (CU), is used to store a set of HCC-compressed data.
HCC uses choice of algorithms: LZO, ZLIB BZIP2 On compute node only optimizer aware of the compression.

Hybrid Columnar Compression option available only for tables residing on Exadata storage, Axiom, SPARC SuperCluster, Sun ZFSUseful for data that is bulk loaded and query.

How to Save Maximum Storage with HCC?

To maximize storage savings with Hybrid Columnar Compression, data must be loaded using data warehouse bulk loading techniques.
Examples of bulk load operations commonly used in data warehouse environments are:

- Insert statements with the APPEND hint
- Parallel DML
- Direct Path SQL*LDR
- Create Table as Select (CTAS)

Having the data in the table mixed with OLTP and HCC does not create specific overhead for a range scan [which implies an index range scan and rowid fetches from the table]. It does mean that the data will not be as compressed as it could be, and thus may be worthwhile using alter table move to recompress it.

Advantages and Downsides of HCC:

Queries on hybrid columnar compressed data often run in the Exadata storage cells with Smart Scans, using a high performance query engine that utilizes special columnar processing techniques. Data sent back to the database server(s) is usually compressed (and is typically much less data than is read from disk) and the compressed data is subsequently processed by the database server(s). Note that data remains compressed not only on disk, but also remains compressed in the Exadata Smart Flash Cache, on Infiniband, in the database server buffer cache, as well as when doing back-ups or log shipping to Data Guard.

One of the key benefits of the hybrid columnar approach is that it provides both the compression and performance benefits of columnar storage without sacrificing the robust feature set of the Oracle Database. For example, while optimized for scan-level access, Oracle is still able to provide efficient row-level access, with entire rows typically being retrieved with a single I/O, because row data is self-contained within compression units.

In contrast, pure columnar formats require at least one I/O per column for row-level access. With data warehousing tables generally having hundreds of columns, it is easy to see the performance benefits of Hybrid Columnar Compression on Exadata. Further, tables using Hybrid Columnar Compression on Exadata still benefit from all of the high availability, performance, and security features of the Oracle Database.

Note that while data in Hybrid Columnar compressed tables can be modified using conventional Data Manipulation Language (DML) operations - INSERT, UPDATE, DELETE - performing such operations could result in a reduction of the HCC compression ratio. It is recommended that HCC be enabled on tables or partitions with no or infrequent DML operations. If frequent DML operations are planned on a table or partition, then the Oracle Advanced Compression Option is better suited for such data.

HCC modes:

Warehouse compression modes

Query Low -Recommended for data warehouse with data load and query performance time as critical factor
Query High -Recommended for data warehouse with focus on space saving

Online Archival(for history data) compression modes

Archive low - Recommended for archival data with data load and query performance time as critical factor
Archive High - Recommended for archival data with maximum space saving

How HCC Work?

- Traditionally, data organized within a DB block in a ‘row’ format, where all column data for a particular row is stored sequentially within a single database block.

- In real-world situations, data is more often repeated in columns, not rows.

For example-

- Smith values repeated multiple times. Compression can be achieved by replacing the “Smith” value with a smaller value. Which will reduce the size of the data significantly.

Using HCC..

- Creates a column vector for each column

- Compress the column vector

- Stores the column vector in the data blocks.

- Collection of data blocks are called as Compressed Unit.

- The blocks in the CU contains all the columns for a set of rows.

Compression Unit:

CU Header- Contains details about the offsets into the compression unit of all the columns.
Bitmap: To identify the rows that have been deleted or migrated to different location (update).
Column’s (‘n’): Each column consist of values extracted from each rows (in compressed format).
Once the oracle construct the CU, the rows are stored in the single column
When issue ”alter table t1 move compress for archive high;” that might pack the contents of the entire table into a single compression unit looking something like the below picture:

Compression Syntax:

•Online Archival Compression

•CREATE TABLE emp (…) COMPRESS FOR ARCHIVE [LOW | HIGH];

•Warehouse Compression

•CREATE TABLE emp (…) COMPRESS FOR QUERY [LOW | HIGH];

Decompression:

•Queries run directly on HCC data does not require the data to be decompressed.

• Data that is required does not need to be decompressed; only the columns and rows being returned to the client are decompressed in memory.

•The decompression process typically takes place on the Oracle Exadata Storage Server in order to maximize performance and offload processing from the database server.

Compression Advisor:

•Oracle Compression advisor provides an estimate of the compression ratio

•For DB prior to Oracle database 11G R2 use the DBMS_COMP_ADVISOR pkg

•For Oracle database 11G R2 will use the DBMS_COMPRESSION pkg that is included with DB.

•There are no specific hardware required.

HCC Testing:

Script to derive the compression ratio using various levels of compression

-----------------------------------------------------------------------------------------------

set serveroutput on

declare

m_scratchtbsname varchar2(32) := '<Tablespace Name>';

m_ownname varchar2(32) := '<Schema name>';

m_tabname varchar2(32) := '<Table name>';

m_partname varchar2(32) := '<Partition Name>';

m_comptype number(10,0) := dbms_compression.comp_for_archive_high;

-- m_comptype number(10,0) := dbms_compression.comp_for_archive_low;

-- m_comptype number(10,0) := dbms_compression.comp_for_query_high;

-- m_comptype number(10,0) := dbms_compression.comp_for_query_low;

m_blkcnt_cmp pls_integer;

m_blkcnt_uncmp pls_integer;

m_row_cmp pls_integer;

m_row_uncmp pls_integer;

m_cmp_ratio number;

m_comptype_str varchar2(4000);

begin

DBMS_COMPRESSION.GET_COMPRESSION_RATIO (

scratchtbsname => m_scratchtbsname,

ownname => m_ownname,

tabname => m_tabname,

partname => m_partname,

comptype => m_comptype,

blkcnt_cmp => m_blkcnt_cmp,

blkcnt_uncmp => m_blkcnt_uncmp,

row_cmp => m_row_cmp,

row_uncmp => m_row_uncmp,

cmp_ratio => m_cmp_ratio,

comptype_str => m_comptype_str

);

dbms_output.put_line('Blocks compressed: ' || m_blkcnt_cmp);

dbms_output.put_line('Blocks uncompressed: ' || m_blkcnt_uncmp);

dbms_output.put_line('Rows per block compressed: ' || m_row_cmp);

dbms_output.put_line('Rows per block uncompressed: ' || m_row_uncmp);

dbms_output.put_line('Compression Ratio: ' || m_cmp_ratio);

dbms_output.put_line('Comment: ' || m_comptype_str);

end;

HCC result and Explanation

I managed to run the HCC compression adviser script in one of our warehouse database and get compression estimates for some of the older partitions of varying sizes. Please see the estimates and approx. space savings in the below table. All of the below tables were run through the adviser with the different compression options. To run the compression adviser we need to have a minimum of 1 million rows in a table / partition. So I chose the first partition based on number of rows and picked the subsequent ones by doubling up on row count so that we could have estimates for varying table sizes. I manages to run the scripts multiple times for the below tables and this should give us sufficient information to implement HCC.

Schema

Table Name

Partition Name

Row Count

Compression Technique

Archive High

Archive Low

Query High

Query Low

Comp. Ratio

Approx. Space Saving

Comp. Ratio

Approx. Space Saving

Comp. Ratio

Approx. Space Saving

Comp. Ratio

Approx. Space Saving

MART

CC_TRIAD

CC_MAY2012

1,539,453

46.2

1.49 GB

38.5

1.48 GB

1.47 GB

7.2

1.31 GB

MART

CC_CONSUMER

CC_AUG2015

8,144,678

97.1

2.51 GB

2.505 GB

61.8

2.502 GB

22.9

2.43 GB

MART

CC_AR

CC_MAY2012

16,423,764

8.2

710 MB

6.8

690 MB

5.4

660 MB

2.5

495 MB

MART

CC_LOG

CC_JUL2014

30,186,021

18.1

2.43 GB

17.5

2.39G

12.7

2.34 GB

4.7

2 GB

MART

CC_ AM0C

CC_AUG2015

61,966,728

7.5

1.1 GB

6.9

1.09 GB

5.3

1.03 GB

2.6

818 MB

- Archival Data is when you would use underlying Data as read-only and less frequently for reporting (queried rarely)

- Data Warehouse type data would undergo data loads and looked up more frequently for reporting (queried regularly)

I consciously chose older partitions (of past years) for this exercise as those are the ones that could be safely targeted in the initial round of HCC compression activity. I would need confirmation from MI application team on whether the older partitions are used only for query look ups (read-only) or otherwise.

- For archival type data, our recommendation would be to implement the “Archive High” compression to benefit from maximum space savings.

- For data warehouse type data, our recommendation would be to implement “Query Low” compression to benefit from substantial space savings without compromising on query performance and data load times.

When it comes to implementing the compression technique against the partitions, it could be done online without any outage to table / partition availability and again we have two options here.

11) ALTER TABLE MOVE PARTITION COMPRESS … (preferred method)

This would enable the compression technique on the chosen table / partition by making the object available for read-only query purposes but will prevent any DML transactions happening on the object until the compression activity is completed. This will be the fastest method.

22) Online table redefinition using DBMS_REDEFINITION

This would enable the compression technique on the chosen table / partition whilst the object is fully available for query as well as DML transactions. However, this method is slower compared to ALTER TABLE MOVE option.

Enabling HCC with Axiom:

Enabling HCC with Sun ZFS:

Conclusion:

- Compression technologies can

•Save space

•Improve performance

•Reduce the cost of storing

•Managing data

- Decompression was not need to transmit the data to memory(SGA) when using HCC.

- Studies proven that, on Retail data schema HCC was able to reduce capacity required by 19X and improves performance by 6X.

4 comments:

UnknownSeptember 2, 2016 at 5:13 AM
Very Good Analysis and detail explanation on HCC, very usefull.

Defenetly worth Reading thanks for this.
pankajApril 23, 2017 at 9:48 PM
Hi Viswanath,

Your explanation regarding HCC is very educative and covered all the aspects of HCC. There are very few blogs we find where someone has explained things so well.

Thank you so much for sharing.

Pankaj
PunitJanuary 3, 2019 at 11:51 AM
Nice Article Vishwanath
UnknownSeptember 12, 2019 at 6:22 PM
Hi Viswanath,

My primary database is running with exadata x5 machine and standby database is running on ODA box. Since now we don't use HCC option, my question would be if I enable HCC how it will work in standby as ODA we don't have any option like HCC.

Thanks in advance

Friday, September 2, 2016

Exadata - Hybrid Columnar Compression

4 comments: