Scales and Transformation

Notes on Scales and Transformation

This page discusses the concept of the “Transformed Ratio” and why we use it to compare signaling in flow cytometry experiments.

Calculating Distance

Scale transformation is important for making quantitative comparisons of distance in flow cytometry.  If you’re just quantifying “percent positive” or calculating median fluorescence intensity (MFI), scale transformations are less important because both of those metrics are scale independent.  However, if you want to compare distance between two MFIs (e.g. fold change of stimulated vs. unstimulated, a graph of basal MFIs for two sample sets), then the scale becomes important.

The main rule to keep in mind is that you want one scale for everything: statistical comparisons of distance, plotting data in a histogram, and graphs.  You don’t want to have a biexponential or Logicle scale for the display and then calculate distance on a linear scale.  This is because the distance between two fluorescence intensity values will be different on those two scales.  Much of the time this disconnect does not have a large impact on the perceived result, but when it does it can generate some surprising artifacts.

To get an intuitive sense for this, imagine using a ruler to measure the distance between three fluorescence intensity values on a plot: 2, 200, and 20,000.  On a log10 scale, these three values would be evenly spaced.  However, on a biexponential or arcsinh scale, it is likely that 2 and 200 would be much closer together than 200 and 20,000.  Thus, the biexponential and arcsinh scales are often used to de-emphasize the distance between points near zero that have a large numerical difference (2 vs. 200) but are not significantly different in the experimental context.

What is the right scale transformation?  Usually the scale used for visualization of your data in plots is being correctly selected for you by the analysis software and should also be used for statistical comparisons, graphing, and modeling.  This may not always be the case.  It is critical to get to know your data well, so that you can ensure statistical analysis and modeling results reflect what is apparent when you look at plots.

It is therefore important to be aware of the scale being used for each channel in your plots and to make sure that the scales selected for visualization are appropriate.  If it looks wrong — common artifacts are spikes and holes in the graph near zero — it probably is wrong.  This is another reason why it is extremely important to have all of your data “on scale” and not cut off the data display at 1 or 0 if you have a lot of values below these numbers.  When too many events pile up on the left of your plot you cannot tell whether the scale around zero is appropriate (and often in these cases it is not).

What is Transformed Ratio?

Transformed Ratio is shorthand for calculating fold change on the scale displayed in the plot.  The fold change equation used for transformed ratio depends on each channel’s scale settings.

In flow cytometry, display scales are usually linear, log10arcsinh, or biexponential (a modified form of arcsinh based on the Logicle display).  The key benefit of using a transformed ratio is that distance calculations on each scale will always reflect the appearance of the data when looking at figures.  It is important to match the statistical comparisons with the visualization of the data, so that figures and statistics match.

Where this can get tricky is when channels within the same data file have different scale settings (e.g. FITC vs. PE-Cy7 typicaly have different scale transformations near zero).  This case is increasingly the norm for newer, digital cytometers which store the raw data in linear format and then transform the display for visualization.

Linear

If the scale type is linear (x) for a channel (e.g. Forward Scatter), the Transformed Ratio is the fold change with no scale transformation:

x / control

On this scale, “±1.0” is no change, a shift from 1,000 to 100,000 is “+100.0”, and a shift from 100,000 to 1,000 is “+0.01” (sometimes written as “-100.0”).  Linear fold change is not appropriate for channels with values below 1.

Issues with this scale: the fold change scale is not continuous across zero — there is a “gap” in the scale between +1 and -1.  Sometimes values are written differently (e.g. “0.5” fold change can be written as “-2”).

Log10

If the scale type is log10(x) for a channel (e.g. CD3 on a FACSCalibur), the Transformed Ratio is fold change on the log10 scale:

log10(x) – log10(control)

On this scale, “0.0” is no change, a shift from 1,000 to 100,000 is “+2.0”, and a shift from 100,000 to 1,000 is “-2.0”.

Issues with this scale: Log10 fold change is not appropriate for channels whose raw values are below 1.

Arcsinh and biexponential

If the scale type is arcsinh(x) or biexponential(x) for a channel (e.g. CD3 on an LSRII), the Transformed Ratio is fold change on the arcsinh or biexponential scale:

arcsinh(x) – arcsinh(control)

biexponential(x) – biexponential(control)

On an arcsinh scale with a cofactor of 150, “0.0” is no change, a shift from 1,000 to 100,000 is “+4.6”, and a shift from 100,000 to 1,000 is “-4.6”.

Note that both biexponential and arcsinh are versions of the inverse hyperbolic sine function (arcsinh).  Arcsinh scaling uses 1 cofactor and biexponential uses 5 cofactors.  In both cases, the cofactors are used in a channel-specific manner to correct for spread in peaks near zero that can occur due to fluorophore spectral properties, autofluorescence, cytometer settings, and compensation.

Issues with this scale: the cofactor can emphasize or deemphasize changes near zero.  This scale is typically used to deemphasize noise near zero, but setting the cofactor “too high” can result in a spike of data near zero.  Setting the cofactor “too low” can create an artificial “hole in the data” near zero.

Because of these issues, biexponential scale implementations usually look at the structure of your data in one plot and use that information to set cofactors.

Why Use Transformed Ratio?

Fold change values returned by the Transformed Ratio are scaled to match the plots — what you see is what you get.  If you printed out your data and measured the distance between two peaks with a ruler, the ruler measured distance values would be directly comparable to the Transformed Ratio distance values.  When is this not the case?  If you did not use the Transformed Ratio and instead calculated linear fold change (x/control) for a scale where your plots displayed the data on a biexponential scale, the distance between peaks measured with a ruler (or by your eye) would not match the linear fold change values, especially near the axis.