I would like to ask about the different uses Column-oriented format and row-oriented format for a DMBS, and the pros and cons of these formats.
I built one
TPC-C Query engine with row-oriented format, a college project. In my opinion The line format is a more intuitive ideaIf we can easily store all row data as an object that is also collected in a container, e.g.
std :: vector or
std :: unorderd_map,
The reason why I'm curious is that I found that column format is a way that is more commonly used in practice when I read something. I would address some starting points that we would discuss:
- What is the individual? advantages and disadvantages for these two formats regarding the DBMS design?
- In which case is the column and row format the right way to be applied? (I read about it
column formatis better for
OLAPWorkload and the
line formatis better for
OLTP, Is that true and why is that?)
- How can this
column formatbe implemented effectively? Some descriptions will be nice.
A consistent example of the discussion:
Row oriented format:
001: 10, Smith, Joe, 40000;
002: 12, Jones, Mary, 50,000;
003: 11, Johnson, Cathy, 44000;
004: 22, Jones, Bob, 55000;
10: 001, 12: 002, 11: 003, 22: 004;
Smith: 001, Jones: 002, Johnson: 003, Jones: 004;
Joe: 001, Mary: 002, Cathy: 003, Bob: 004;
40000: 001, 50000: 002, 44000: 003, 55000: 004;
I hope the example could be helpful for the discussion.
This question is not relevant for any existing DBMS. Here is a discussion of the format that should be selected for a DBMS.