Home » Resources » Operating Profit Indicators Using Robust Regression
Dark Mode
Article, Blog

Operating Profit Indicators Using Robust Regression

website img – 10-2-23_benchmarking_b

Operating profit indicators such as the operating profit margin (μ), defined as the quotient of operating profits to net sales revenue, can vary between enterprises in the same industry:


(1)      μ(i) = { S(i) – C(i) } / S(i)

where S denotes net sales revenue, C denotes cost and expenses, and the index i =1, 2, 3, …, N observations (sample size).

Solve eq. (1) for S, and obtain the “return on cost”  (ROC) equation in which sales revenue is proportional to supply cost:

(2).     S(i) = λ(i) C(i)

where each quotient λ(i) = 1 / (1 − μ(i)) is the operating profit markup per enterprise.

The operating profit markup per product or per service is a romantic ideal deprived of accounting reality because enterprises used as comparables (peer group) to the tested party are multi-product entities reporting to the public consolidated group accounts. E.g., COGS, OPEX (aka XSGA), Depreciation (DFXA), or Amortization (AM) are reported in aggregate (multi-product) accounts; they are not reported per product.

The price markup over cost eq. (2) is vintage shared by many economists, including Adam Smith (1776), David Ricardo (1818), Augustin Cournot (1838), Karl Marx (1867), Piero Sraffa (1926), Abba Lerner (1934), and Michael Kalecki (1943).

If λ = λ(1) = λ(2) = … = λ(N), which is a standard convergence assumption, we can use ordinary least squares (OLS) or robust regression algorithms to calculate the expected (uniform) slope coefficient or operating profit markup for the combined group or the individual N enterprises:

(3)      S(i) = λ C(i) + U(i)

Random uncertainty (U) is added to eq. (2) to morph into the stochastic eq. (3) because the functional relationship between sales revenue and total cost is not exact. See Hacking (2006) for the origins of probabilistic versus deterministic conceptions of nature and society.

We can obtain the profit margin per enterprise, or per comparable group of enterprises, from the price markup eq. (3) by indirect least squares (ILS), because the operating profit markup and the operating profit margin are related by the coefficient equations:

(4)      λ = 1 / (1 – μ) implies μ = (λ – 1) / λ for λ > 1

Var(U(i)) = σ2 is assumed to be a constant between the N enterprises in the sample. However, this equal variance assumption is troubling in the presence of outliers. Ordinary least squares (OLS) produces unreliable estimates of the regression coefficients and their standard errors if the sample is contaminated with outliers.

In RoyaltyStat, we built a Robust Regression module and below we show the Huber regression results resilient to outliers on Exhibits 1, 2 and 3. See Lawrence & Arthur (1990), Chapter 13 (A Comparison of Regression Estimators), Staudte & Sheather (1990), Chapter 7 (Regression), or Huber & Ronchetti (2009), Chapter 7 (Regression).

We estimate eq. (3) defining Total Cost in two accounting variants:

First, Total Cost (Lato) = (COGS + XSGA) + (DP  AM), where COGS is cost of goods sold, XSGA is operating (selling, general & administration) expenses, DP is the sum of the depreciation of tangible assets (DFXA) and the amortization of acquired intangibles (AM). In Standard & Poor’s Compustat mnemonics, DP = DFXA + AM.

Plugged into eq. (3), Total Cost (Lato) can produce an operating profit markup after “a reasonable allowance for depreciation and amortization.” However, reported DP is a dirty accounting number, especially when it includes impairment charges. A more reliable alternative is to exclude DP before computing operating profit indicators, and then allow a reasonable deduction for depreciation before making a transfer pricing adjustment.

Second, Total Cost (Stricto) = (COGS + XSGA), excluding DFXA and AM. Exhibits 1 and 2 show that the reported allowance for depreciation (even after excluding amortization) may not be reasonable, or at least we can observe substantial DFXA variations among enterprises in the same industry. Exhibit 3 compares the visual results of the two prior exhibits 1 and 2.

Operating profit indicators such as the operating profit markups, or the ILS derived operating profit margins, obtained from applying robust regression algorithms are reliable, enabling the computation of efficient confidence intervals for the slope coefficients (reliable ranges of profit indicators).

We can infer two useful findings from our empirical illustration using annual financials from big oil multinational enterprises (MNE): (i) the operating profit markups vary among peer MNE in the sample; also (ii) reported DFXA accounts show substantial variations among the sampled enterprises.

From these two empirical revelations, we can infer further that (like the precedent set in thin-capitalization rules) the most reliable comparable for the controlled tested party may be the consolidated group’s (after intra-group account eliminations) operating profit indicator. Otherwise, the operating profit markups (or the ILS derived operating profit margins) may be more reliable if they are computed before DP (not verified in Exhibits 1, 2, or 3), or else if sound economics selected operating profit indicators are computed using robust regression methods.


Ian Hacking, The Emergence of Probability (2nd edition), Cambridge University Press, 2006. Hacking traces the shift in the dominant paradigm from deterministic to probabilistic (or stochastic) conceptions of nature.

Peter Huber & Elvezio Ronchetti, Robust Statistics (2nd edition), Wiley, 2009.

Kenneth Lawrence & Jeffrey Arthur (editors), Robust Regression, Marcel Dekker, 1990. Like the R library, Statsmodel in Python includes several algorithms to compute robust statistics. R and Statsmodel contain richer statistical algorithms compared to the IBM Scientific Subroutines that I used at the University of California at Berkeley in 1979-1980 to program in Fortran my Ph.D. (Econ.) dissertation’s survey data. I operated a screenless IBM 1130 single-user computer system combined with a magnetic tape deck and keypunch machine.

Robert Staudte & Simon Sheather, Robust Estimation and Testing, Wiley, 1990. The authors of this textbook use Minitab Macros to supplement internal statistics functions. We built-in a more agile Huber robust regression algorithm in RoyaltyStat.

Exhibit 1: Sales v. Total Cost (Lato), Robust Regression

GVKEY Company Name Count Slope Coef. Std Err. t-Stat R2 (%) Intercept at 5%
2410 BP PLC 59 1.068 0.003 317.842 99.640 Significant
2991 Chevron Corp 62 1.131 0.004 324.640 99.380 Significant
8549 Conocophillips 72 1.123 0.002 532.872 99.700 Insignificant
61616 Eni SpA 33 1.166 0.033 35.501 98.100 Insignificant
4503 Exxon Mobil Corp 72 1.116 0.004 264.331 99.410 Insignificant
7017 Marathon Oil Corp 66 1.084 0.004 278.778 99.520 Insignificant
12384 Shell Plc 40 1.079 0.008 127.807 99.490 Insignificant
24625 TotalEnergies SE 33 1.159 0.014 85.565 99.370 Insignificant
15247 Valero Energy Corp 43 1.039 0.001 1114.045 99.920 Insignificant
All 493 1.099 0.001 912.865 99.520 Insignificant

Exhibit 2: Sales v. Total Cost (Stricto), Robust Regression

GVKEY Company Name Count Slope Coef. Std Err. t-Stat R(%) Intercept at 5%
2410 BP PLC 59 1.113 0.0033 332.836 99.710 Significant
2991 Chevron Corp 62 1.222 0.0051 240.659 99.780 Significant
8549 Conocophillips 72 1.176 0.0020 593.169 99.430 Significant
61616 Eni SpA 33 1.284 0.0337 38.152 98.310 Insignificant
4503 Exxon Mobil Corp 72 1.177 0.0042 282.376 99.670 Insignificant
7017 Marathon Oil Corp 66 1.126 0.0048 232.167 99.080 Significant
12384 Shell Plc 40 1.118 0.0083 135.054 99.670 Significant
24625 TotalEnergies SE 33 1.240 0.0079 156.418 99.640 Significant
15247 Valero Energy Corp 43 1.050 0.0019 559.532 99.920 Insignificant
All 493 1.162 0.0015 768.774 99.500 Significant

Exhibit 3: ROC Regression Slope

TC Chart-2