What is “sf4”?

Simple Statistics Can Detect Systematic Errors ?

It Detects Systematic Chromatography Errors  in GC, HPLC, PLC/HPTLC.
By the way: this time based statistics concept works also in many other areas of data reproduction.


“sf4” is an abbreviation. It is a value of the repeatability standard deviation “s” in single “one next measurement” steps calculated with N = 4 consecutively measured data.
If we have only three repetitions, “sf4” values cannot be calculated.
If we have five repetitions, we can calculate two “sf4” values. Twenty repetitions allow 20-3=17 “sf4” values.
The power of such a series of s-values is the fact, that they offer time related information.
If for instance a series of (strictly constant) repetitions show strictly constant “sf4” values, than we can state: there is everything OK. The tested series of repetitions shows NO time correlated effects. In this case “sf4” cannot detect errors.
If however the first 3...4 “sf4” values are equal and relative to the next “sf4” values are small but the following are growing or swinging, than one must find out the reason(s). Something “goes bad” during this time of the measurement series. If the first “sf4” values are large, the later ones small and finally constant, than the measurements have a starting problem. This means: do not take the early birds, there must be something stabilized first.

NOTE: a single “s”-value from a series of 20...30 repetitions has NO any time related information. The corresponding mean value has NO quality value.
This is so important, that in this site you will find “repetitions” of this statement.
“sf4” calculations for systematic errors are of course done only in instrument tests or for procedure quality tests, not for the daily routine work.

We control the data quality of repeated quantity measurements like the peak area found by integration. Also final quantities like the weight-% of one selected substance in a sample can be used to calculate repeatability standard deviation values. We MUST repeat the measurement at least 7 times (N=7) , better about 20 times (N=20). Then we get four or up to 17 (N minus 3) quality controlled quantity data for one selected substance. In steps forwarded by one we calculate the standard deviation for always only four consecutive values. This procedure is named as “sf4”. “s” stands for standard deviation. “f” means forwarding. “4” says we calculate this in steps of 4 consecutive runs but forwarding by one next run. Always three values overlap. This brings time related information into “s” which is not available otherwise - see the procedure structure below:


    x1...x8: eight repeated measurements, quantitative data
    s (1-4) = standard deviation for the data [x1, x2, x3, x4].
    s (5-8) = standard deviation for the data [x5, x6, x7, x8].

It is important to see the change of the “sf4” values (if any change exists) over the N scale, which in fact corresponds with a time scale because all single chromatograms need time until an analysis is done. This procedure offers information not yet seen in mathematical statistics by other modes of data analysis. How clear is this new type of information is seen in the figures 2 and 3 below. The fundamental concept is shown graphically in figure 1 below representing sf4 data based on eight consecutive runs:.

xi:    3.240    3.241    3.245    3.255    3.251    3.222    2.994    3.782     mean:   3.2788
sf4:      -            -           -        0.0106  0.0070  0.0163  0.1846  0.3765   s total:  0.2213

N = 8;  amount of “sf4” values:  8 minus 3  = 5. The single value “s total” which is the standard deviation of all 8 runs shows nothing time based. Even poorer: a single number has no quality control for itself. Therefore “sf4” is much more qualified than “s total” only.
As a result based on only 5 time correlated sf4-data one realizes: something happened from on measurement number 6. Basically at first the method looked good. Now one must find the source of error. Probably a longer run - more “sf4” values - will help clearing the problem(s). The new run should however only start after a waiting period as long as the one for the run for 8 repetitions done before and given above.


How to get the graphics of figure 1 : “sf4” over time (“N-3”) ?. It is automatically available when using table calculation programs like Microsoft EXCEL. Be aware of the systematic calculation errors in case STDD is calculated using the semicolon as delimiter. The correct formula is
STDD(A1:H1) (for N=8 as the given above). Wrong data because of unseen exclusion result when using the formula STDD(A1;H1). Unfortunately several other delimiters besides the colon can be taken without any error warning by Excel, which is a serious program code weakness. For the figure above the x-data have been taken from position A1 to D1,  B1 to E1,  C1 to F1 and so on.

Practical Application example: figure 2 shows data from a seriously improved micro capillary laboratory GC instrument, figure 3 shows data from a micro capillary GC process instrument.

In case we measured 14 times the concentration of methane in wet natural gas the one step forwarding calculation of s with sets of four consecutive methane concentrations resulted in 11 “sf4” values. In case we made a series repetition of 30 consecutively methane quantitations, we got 30 minus 3 = 27 “sf4” values.
This series of time correlated data is calculated and seen graphically as sf4 - values on the Y-axis over the number of the repeated N runs on the X-axis. It is obvious, that the X-axis represents a time axis  - see the figures below.

Normally the analyst who took 30 consecutively repeated runs would calculate one mean value and one standard deviation value. Whilst the mean is a quality control number for the single values, s is a quality control number for the mean. There is no quality control number available for the total s under these conditions. But one understands that the “sf4” values are quality control numbers for the one total standard deviation value based on the total of N repetitions. An instrument or a method would get a quality certification based on these many 30 runs, its mean and its total s-value. But nothing is seen about systematic errors which may be time correlated. If however 30 - 3 = 27 sf4-values show what is seen in figure 3 below, the producer, user, regulator or certifier would very probably change his mind and would return to development and quality control procedures. This might be the only “bad” effect of “sf4”. As this is such a new mode of method/instrument/sampling/sample-stability quality control and as even statistically well trained experts have problems with “sf4”, it should be stressed, why “sf4” is so informative:

Reason A: ONE single “s” value is a number. Independent of the amount of work invested based on a large N one single “s” value has NO any own information about its own quality. The mean value however has a quality control number: This is the total s based on N. X+-7% is a poor X; X+-0.02% is good X.

Reason B: a consecutive series of three or many more correlated “sf4” values show TIME dependence. They have TIME related information. Most of systematic errors in chromatography are time correlated as already mentioned elsewhere in this site. N repeated analytical runs need N times the single runtime. Thus if anything changes with time it will change. If those changes - like the amount of non volatile or non solvable “dirt”  in a sample inlet system growing towards a critical limit - it will affect the sample composition. This MUST result in changed and now falsified qualitative and or quantitative analytical results. Thus the analytical result does no longer represent the real sample composition. But this is not the only effect. Not all chromatography modes have this problem. In case of PLC or HPTLC it does not exist because in PLC any next sample starts chromatography at a (hopefully) clean new stationary phase. Any GC or HPLC column is no longer untouched after already the very first run.

These are the real main effects detected by “sf4”:
- Temperature - pressure - flow may change with time and then will change the mobile phase action in
- Polarity changes, density changes, film thickness changes of the stationary phase are time correlated
   in all chromatography modes.
- Changes of the sample composition qualitatively and/or quantitatively from run to run are nearly always source of systematic errors because of the fundamental chromatography effect on any solid (or liquid) surface and because of very often existing “dead volumes” causing long lasting sample remixing effects. Thus it may look like a constant stable situation after a sample has been given three times, but the “sf4” data are very sensitively detecting even smallest composition changes and may tell, that even after twenty repetitions “sf4” data change - which means something invisible by other modes of data quality checks exists and should be found. Now what about the goodness of only one or two runs at all ?
“sf4” found immediately systematic errors in capillary GC, in micro gas capillary process instruments, in HPTLC scanners, in differing sampling procedures, in the use of pressure regulators or large volume needle valves, in too long or too wide or too chemically active sample lines and those with non constant temperature.

The whole procedure is simple and fast when using table calculation programs with graphics display. The latter is necessary for a correct evaluation - see two examples below.
Repeated WARNING: older EXCEL software (Microsoft Excel 2003), the corresponding table calculation software on MAC under MacOS 9 and higher (AppleWorks 7) and under LINUX (OpenOffice.org.1.1) - all three - calculate “s” data without any warning seriously wrong  if other delimiters are used than the COLON. Differing other delimiters reduce the series of to be taken data in the table without warning.




The following Figure 2 represents an example with problems at sampling. Only after a total of 9 consecutive runs the true quantitative composition is found correctly. The “sf4” values drop from +- 0.06 mole-% to below +-  0.01 %. Using “sf4” we could correct sampling trouble so completely that we finally reached a repeatability standard deviation of +- 0.002 % for methane in wet natural gas. (No misprint !)

sf4 (mole-%) over N - methane in wet natural gas, Lab micro GC


N-3 (number of repeated analyses minus 3)

Error analysis based upon “sf4” over N (or N-3) values:
If there is NO error visible by the “sf4” mode - that is: all sf4 data are more or less equal - this means that repeated analytical data remain constant over the whole repetition time. Thus nothing has changed with time: not the sample composition, nor the sampling procedure or instrument / column-capillary / detector characteristics and all electronics remained stable. That means, there is no visible temperature change nor a pressure or flow change seen which could alter analytical quantitative (and qualitative) values. If “sf4” data are constant, then their value equals the total repeatability standard deviation based on N runs.
IF the data look like seen above in figure 2 as quite often  experienced “standard trouble”, there is a time correlated problem to check for. There are errors: any change of the sample source within a period of less than 10 (ten !) consecutive runs will be seriously falsified, if the sample compositions changes. This serious problem may be based on too small product flow, too large sample inlet volume, too large dead volumes ahead of the sample inlet point, and many more other reasons for sample falsification including falsification caused by surface chromatography.

Changing “sf4” values are visible quantitatively and at best in the graphics display which is automatically drawn by any of qualified table calculation software. In the following Microsoft EXCEL has been used and special care was taken to have all calculations done exclusively with the delimiter “:”, the COLON only.

The following figure 3 represents an example with hardware problems in a newly designed micro GC instrument. The data represent a test run with 30 repeated calibration analyses using test cylinder gas. After about six consecutive runs showing “sf4” data of about  +- 0.008 to 0.01 mole-% standard deviation the instrument becomes unstable. sf4 rises to a multiple of those values existing at start of the test run, where sf4 looked good for at least 9 to ten runs.
NOTE: the X-axis shows N-3 values in figure 3

sf4 for methane (mole-%) in wet natural gas, Process micro GC


N-3 (number of repeated analyses minus 3)

What to do in case of the data as seen in figure 3 ?

Return to the basics. Use sf4 after each step of method, application and instrument improvement.
It will pay back.
This procedure allowed us to reduce the repetition number N from 4 to 3 in mass routine analysis still with s-values of around plus minus 0.0015 mole-% absolute for the main compound. This shows how incredibly good quantitative modern micro capillary GC can be. Important was to make the sampling process error free, which was possible only by a total micro dimensioning. This allowed us to avoid any pressure regulator. It is technically possible, we found the solution.
Of course: traces of substances in wet natural gas at concentrations far below 0.1 mole-% showed easily several % relative standard deviation. Any integration software has limits. Standard regulation rules however which insist to have all data independent of their absolute concentration at relative standard deviation values (RSD values) of below plus minus 1 % show only, that there is some limit in the knowledge about mathematical statistics at the regulation laboratories or tables.
The same level of knowledge can exist in certification of calibration mixtures. In some countries it has to be guaranteed, that independent of the substance concentration the certified accuracy (accuracy!) in a product mixture has to be 0.1 % relative.

Error detection for which the use of “sf4” is not necessary

One national metrological institute found by gas chromatography of a simple gas mix of 5 stable substances all far above any critical concentration level perfectly constant and equal repeatability standard deviation values. The five “s”-values belonged to five differing substances. All “found” standard deviation values were 1.00%, 1.00%, 1.00%, 1.00%, 1.00% (relative).
This of course is so far away from practice that one can fear, the analytical job was done by certified standard robots.
We should remember, that national metrological institutes regulate, standardize and certify. They have quite some power. The free internet is a good place to discuss such problems globally, as correct analytical data are vital for all of us, at least for a majority..

[Home] [We can help] [Systematic C-Errors] [Statistics] [Error Detector "sf4"] [Sampling/Calibration] [Qual.Error GC] [Quant.Error GC] [Qual.Error HPLC] [Quant.Error HPLC] [Qual.Error PLC] [Quant.Error PLC] [Integration] [Chrom. Combination] [µPLC Micro Planar LC] [Altern.Chrom.Theory] [Contact IfC] [About the Author]
[Home] [We can help] [Systematic C-Errors] [Statistics] [Error Detector "sf4"] [Sampling/Calibration] [Qual.Error GC] [Quant.Error GC] [Qual.Error HPLC] [Quant.Error HPLC] [Qual.Error PLC] [Quant.Error PLC] [Integration] [Chrom. Combination] [µPLC Micro Planar LC] [Altern.Chrom.Theory] [Contact IfC] [About the Author]