5 Pivot-Table: 2-dimensional Representation of Datacube

Datacubes must be represented on sheet of papers for reports or on a screen in order to be viewed by human beings, that is, they must be projected onto a two-dimensional space. We call a two-dimensional representation of a cube pivot-table.


PIC

Figure 14: Pivot Tables by Switching Dimensions Horizontally/Vertically

Datacubes can be represented in many ways on a two-dimensional space. Such representations are shown in Figure 14. The standard view – or database view, is a vertical (normally top-down) listing of all tuples in the datacube in a given order. Another is the horizontal view, other are mixtures. The Figure shows an example is the 3-dimensional cube ai,j,k with i ∈{i1,i2}, j ∈{j1,j2,j3}, and k ∈{k1,k2,k3,k4}. The measurement of the cube is i * j * k. In general for an n-dimensional cube this kind of switching horizontally/vertically the dimensions gives us n + 1 pivot-table representations.

Furthermore, one may consider any permutation order on the dimensions, to obtain n! possibilities of pivot-tables. These permutations are a rich source on projecting the datacube on to a 2-dimensional space. We call this going from one permutation to another pivoting. Hence the name of pivot-table. Three examples of these permutations are shown in Figure 15.


PIC

Figure 15: Pivot Tables by Permuting Dimensions

In the left one, 1 dimension is projected horizontally and the permutation is (i,k,j). The second projects 2 dimensions horizontally and the permutation is (k,j,i). The right one also projects 2 dimensions horizontally and the permutation is (k,i,j).

The aggregated cubes can be viewed by “taking out” one or several dimensions from the original cube. So Figure 16 represents the aggregated cubes bj,k = iai,j,k and ci,k = jai,j,k. The aggregate operator was SUM.


PIC

Figure 16: Two Aggregated Cubes

To summarize: Given any n-dimensional cube and a permutation on the dimension as well as two numbers h and k and an aggregate operator, one can generate any pivot-table of the cube or one of its aggregated cubes, h being the number of dimensions that are projected horizontally, and k being the number of dimensions of “taken out”. The permutation is than interpreted as follows: project the first dimensions vertically, the h following horizontally and the last k once are “taken out”. For example, the standard view of a 5-dimensional cube would be given as: perm = (1, 2, 3, 4, 5),h = 0,k = 0. The two pivot-tables in Figure 16 are defined by:

perm  = (2,3,1),h = 1, k = 1,op = sum    and    perm  = (1,3,2),h =  1,k = 1,op = sum

Furthermore, ordering can be taken into account. Each cell in the datacube is determined by the tuple of its members. Hence, the order in which the tuples are given does not matter. However, one could exploit this freedom to impose a specified order on each dimension’s member list. The order than imposed the sequence in which the cells are listed in a particular pivot-table. Any permutation order on the members on each dimension gives a particular pivot-table. The ordering is easy to specify: Given an initial order of the members, one only need to attach a permutation vector to each dimension.

Another operation in showing a cube as a particular pivot-table is selection. One can dice a cube first and then display the diced cube as a pivot-table. Another way to view this is to attach a Boolean on each member, and set its value to TRUE if the particular member should be displayed in the pivot-table and FALSE else wise. Hence, each dimension needs a boolean vector of the size of the dimension to specify a selection.

The augmented cube can also be integrated easily into the pivot-table presentations. In the terminology we used earlier, we first size the cube with its aggregates and then show the sized cube as a pivot-table. Another way again is to integrate this information into the displaying operations: given a cube, we add (1) a permutation, (2) define a h and k and an (3) an aggregate-op.

Normally however, there is no need to compute all aggregated cubes in order to display them in a particular pivot-table. Let’s explain this in the example of a 2-dimensional and then of a 3-dimensional cube.


PIC

Figure 17: A Pivot Table with Aggregated Cubes

Figure 17 shows a cube of dimension 2 on the left side a pivot-table with h = 1. All three aggregated cubes (a*, *b, **) are visible and attached as last row and last column. All aggregated cubes are needed (ab, a*, *b, and **), see Figure 18. The middle pivot-table with h = 0 displays also all aggregates. However, one could argue that in the last four rows only the last is needed (the total of all totals), hence, the aggregated cube *b is not needed. A more “natural way” to display the pivot-table would be the picture on the right.


PIC

Figure 18: A Pivot Table with Aggregated Cubes

For a 3-dimensional cube with h=0 only the aggregates abc, ab*, a**, *** (see Figure 18) are needed, with h = 1, we need abc, ab*, a**, ***, and a*b, **c. The rule is the following: We need all aggregates along maximally two paths from abc to *** in the lattice. It is easy to find them. If we have a 5-dimensional cube, for example, with the dimensions abcde (the top node of the lattice) then the dimensions are partitioned into the vertically and horizontally displayed dimensions for a particular pivot-table (say h = 2, then we have abc|de). The two aggregates needed – following the paths in the lattice – are : ab*|de and abc|d*. Hence, following the path in which the stars (*) are filled from right to left beginning with the very last entry and beginning with the entry left to |. Following the path down to ***** this way, generates to paths and the corresponding aggregates to calculate and integrate into the pivot-table are given by collecting them in the two paths. Now we also see, why in the cases of h = 0 (all vertically) and h = n - k (all horizontally displayed) a single path is needed only. It is easy to see, why this works in general.

Formatting: A pivot-table is – first of all – a 2-dimensional representation of a datacube. The parts of the tables can be thought to be “printed” in rows and columns, in vertically/horizontally arranged cells as shown in Figure 19 where the parts are just displayed in the grid without formatting.


PIC

Figure 19: Unformatted Pivot Table

However, not all cells in the grid have the same meaning. While some “cells” in the pivot-table display the name of the dimensions, others display the data. Basically, a pivot-table can be partitioned into 4 sections: (1) a header, where the dimensions are displayed, (2) the member names of the dimensions to identify a row and a column (3) the aggregates which are "SUM" rows/columns, and (4) the data part. The formatting of these sections is independent from the layout and we are free to enforce visually by colors and other attributes the different sections. An example is shown in Figure 20. Different formatting could be chosen.


PIC

Figure 20: Formatted Pivot Table

Another kind of formatting is data formatting: (1) the data can be formatted along a mask like ##.### (with three decimals, if they are numbers), (2) certain data can be shown in a different color (for example if they are negative), (3) The data can be shown as percent of a total, or as difference from a given value, etc.

The spreadsheet software Excel offers the functionality of pivot-tables. But it is a somewhat neglected tool and have serious disadvantages: : (1) The table is a one-way construct, one cannot change any data, (2) certain formats get lost if the table is manipulated by pivoting, (3) it takes quite a time to understand, what you can do, many operations can be done by mans different ways

In the LPL modeling system pivot table manipulation are fully integrated and easy to handle (see user manual of LPL). The LPL modeling system is designed to define and manipulate datacubes. The dimensions must be modeled as SETs and a datacube then is a multi-indexed entity in LPL. For example, to define a 3-dimensional cube one needs the declaration of four entities: 3 SETs, representing the dimensions and a parameter (or a variable, or whatever is indexed).

  set i:=[i1 i2]; j:=[j1 j2 j3]; k:=[k1 k2 k3 k4];
  parameter a{i,j,k} := i*j*k;

The different operations on datacubes can be implemented in various ways depending often from the context.

Slicing:

  parameter b{j,k} := a[’i1’,j,k];

Dicing:

  parameter c{i,j,k|a>=10} := a[i,j,k];

Sizing:

  set h:=[1 i2 i3];
  parameter d{h,j,k|a>=10} := if(h in i, a[h,j,k], sum{i} a[h,j,k];
                                                                                       
                                                                                       

Rising:

  set i:=[1..3]; j:=[a b c]; k:=[1 2 3 a b c]; m:=[1..2];
  parameter a{i}:=i; b{j}:=10*j;
    c{k}:=if(k<=#i,a[k],b[k-#i]);
    d{m,i}:=if(m=1,a[i],b[i]);

The generation of a pivot-table is a function of an extended Write statement. The input information that an algorithm must have to generate a particular pivot-table is:

  1. a datacube

  2. a permutation of the dimensions , h , k, aggregate operator

  3. a order for the members of each dimensions

  4. a Boolean for each member indicating of selecting it or not

  5. Formatting: (1) of the cell: mask, alignment, font, border, pattern, (2) of the value: as percent, as difference etc., (3) depend on an expression (p.e., negative number with another color).