Big Data, artificial intelligence and machine learning have recently become popular buzzwords in medicine. Much effort is put on biomarker discovery on different omics levels. Omics data definitely provide important information for diagnostics and treatment. Nevertheless, one should not forget to about the classical lab values. For most of the methods for the analysis of multivariate data a normalisation of the single variables is necessary in order to be independent of measurement units and different scales. Typical normalization schemes in statistics and machine learning are z-score or min-max normalization, sometimes also involving Box-Cox transformations.
For lab values, additional information is available in the form of reference intervals that in a certain sense reflect the range of values within a healthy population. These reference intervals can strongly depend on age and sex. A normalization should take these effects into account. We propose a reference interval-based normalization in demonstrate that it is superior to the standard normalisation techniques that ignore the information provided by reference intervals.
Although there are no reference intervals available for most of the omics data, one can still transfer our results to such data by estimating reference intervals for omics data, at least if the sample size is sufficiently large.