Posted by: kalisipudi | November 17, 2011


Posted by: kalisipudi | May 1, 2008

Cluster the Clutter

I was working with Fannie Mae (the failed mortgage giant) these days in the Database Architecture team as one of the senior member and came across an interesting proposal. The P&E (Performance and Engineering) team has submitted a white paper on something they call ‘Data Co-Location’ apparently referring to clustering data according to the needs. The solution partially was implemented in Perl/Shell/SQL/pl-sql combination and I was authorized to re-engineer this beast to a meaningful and logical conclusion.

The theory sounded perfect bu the implementation was not child’s play. Keeping aside the complexity, the requirements were vague at the lower level. Nevertheless, why fear when I am hear, right? Anyway, the fruit juice is not about how I got to the solution but what’s the talk?

In these days of Partitioning, Parallel Processing, EMC Disk Arrays and Striping, does the way we store data really makes a difference, apparently it does. Boyce Codd had one of his principles set in stone is that “In Relational Database, the order of tuples or columns is not important”. He was absolutely right, but only theoretically. When milliseconds make difference to your system’s performance, you need to think outside of the box and Clustering does exactly that. When data is stored in a specific order complying to the exact same order of usage, it make more sense that the data is stored in that order indeed.

Like a well oiled machine, data that we know has reached a stage where read-only operations dominate manipulations, it makes more sense to set the cluster right. All those Oracle gurus out there know about the ‘Clustering Factor’ and its importance as well. I can guarantee that the gain in performance outweighs the effort to accomplish the task. When data is ‘ordered’, it is well understood by the so called performance paraphernalia of Statistics, Explain Plans, I/O, Buffer Gets, Cache Hits, etc.

Think outside the box, cluster the clutter.

Posted by: kalisipudi | September 12, 2007

ETL or ELT ?

ETL or Extract Transform and Load has been a norm in Data Warehousing world since its inception. It became the essense of Data Warehousing and everything else is built around it. ETL has been the most simplest and also the most complex pieces of puzzle when IT enthusiasts speak about DWs. Here is my take on it.

ETL in plain English is nothing but Extracting, Transforming and Loading of Data. Is it really so? I don’t think so. ETL in my view should have been ELT, unfortunately, ETL had been depicting non real-word scenarios in most of the cases. ┬áIT systems live and breathe on data, and having said that each byte of data does transform itself into company’s revenue stream. If a company fails to understand data, it failed to understand itself. So it is vital that these assets are preserved and preserved well. With the traditional ETL approach, you loose that essence. When the source is transformed, you lost the original. Transformation of data should happen after you load it and not before. With data space costs at their all time low, companies should focus on retaining data that is immediately available in its purest form possible, and what else could beat the ‘source’.

Transforming data means changing the source data into requirements and speaking from the bottom of my heart I can say, most of the times, the requirements are ill defined and often cut short. It’s like road to perdition and one has to understand the power of the source. Transformation should not be misunderstood for Translation or Deformation of data.

In ETL or ELT, the ‘T’ literally dictates the boundary of the data and its core value. If Transformation crosses the boundary of ‘perview’, it often leads to mistaken identity of data, jut like looking the colors of a rainbow through a pair of sunglasses.

ELT on the other hand gives the organization an opportunity to retain the source in its original form (do not mistake this for format) preserving each element of it. Transformation should then be applied based on organizational needs. These needs change and since the source is always available, changing needs dictate changing transformations and companies can adhere to these new nuances of data.

Modern day ETL can be compared to making wood from a tree. After the wood is made you cannot get back the tree. ELT is like gold. No matter what you make out of it, you can always melt it back to its original, gold.