Direct naar content

Impact of Data Modeling Often Underestimated

With our clients, it often comes up quickly during a QuickScan or HealthCheck. A data model that can be improved. We’re not just talking about applying the principles of standardization, but also the actual technical data model.

One indication of a “heavy” technical data model is memory pressure. The database system eats up memory. In the most ideal case, there is enough memory for the entire database, data and indexes. In many cases this is not possible, make sure that at least the clustered indexes and then the non-clustered indexes fit into memory.

Martijn Wallet

Principal DBA Consultant
Ellert van Koperen

Indexing

Regarding indexing, it is important that clustered indexes be narrow (smallest possible storage) and preferably incremental (increasing). This prevents sorting when adding each new record. Rather an integer than a biginteger, and rather an integer than a varchar(…).

If there is no suitable data type in the table then the advice is add an automatically generated incrementing integer field at the beginning of the table. The choice to use it or not in the future is independent of this decision. The option of not creating a clustered index is unwise because it runs the risk that not every record (every line) will be uniquely identifiable.

The following steps can be distinguished when applying an indexing strategy:

  • establish the primary key, which is purely and simply intended as a logical primary key for integrity checking of unique records;
  • establishing the clustered index, which is not necessarily the same as the primary key.
  • indexing for integrity checking:
  • unique columns or combinations of columns;
  • foreign key columns;
  • other useful indexes for performance.

Data types

De keuze van datatypen dient zorgvuldig te worden gemaakt. Wanneer bijvoorbeeld huisnummers moeten worden opgeslagen kan doorgaans worden volstaan met een smallint of int en zorgt een bigint voor verspilling van ruimte. Dit geldt voor alle Wat betreft varchar (varying character) kolommen is dit niet het geval.

 Data type  Range  Storage
 bigint  -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807)  8 Bytes
 int  -2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647)  4 Bytes
 smallint  -2^15 (-32,768) to 2^15-1 (32,767)  2 Bytes
 tinyint  0 to 255  1 Byte

Order of columns

The order of columns or fields can also materially affect the final performance of the database. It is often overlooked that the specifics of columns (mandatory/non-mandatory) and data type (fixed/variable width) affect the actual storage on the data pages. This will affect DML (data manipulation language) statements. Usually the following order is followed:

  • Primary key columns
  • Foreign key columns
  • Many selected columns
  • Many columns mutated
  • Nullable columns last
  • Few used nullable columns after more used nullable columns

Curious?

With the information above , we hope to have provided some guidelines for a “fast” data model. Would you like to know more? Feel free to contact us.