Disney BSSRDF, sample scattering profile, lower

Percentage Accurate: 61.5% → 99.3%
Time: 24.8s
Alternatives: 15
Speedup: 2.8×

Specification

?
\[\left(0 \leq s \land s \leq 256\right) \land \left(2.328306437 \cdot 10^{-10} \leq u \land u \leq 0.25\right)\]
\[s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right) \]
(FPCore (s u)
  :precision binary32
  :pre (and (and (<= 0.0 s) (<= s 256.0))
     (and (<= 2.328306437e-10 u) (<= u 0.25)))
  (* s (log (/ 1.0 (- 1.0 (* 4.0 u))))))
float code(float s, float u) {
	return s * logf((1.0f / (1.0f - (4.0f * u))));
}
real(4) function code(s, u)
use fmin_fmax_functions
    real(4), intent (in) :: s
    real(4), intent (in) :: u
    code = s * log((1.0e0 / (1.0e0 - (4.0e0 * u))))
end function
function code(s, u)
	return Float32(s * log(Float32(Float32(1.0) / Float32(Float32(1.0) - Float32(Float32(4.0) * u)))))
end
function tmp = code(s, u)
	tmp = s * log((single(1.0) / (single(1.0) - (single(4.0) * u))));
end
s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right)

Local Percentage Accuracy vs ?

The average percentage accuracy by input value. Horizontal axis shows value of an input variable; the variable is choosen in the title. Vertical axis is accuracy; higher is better. Red represent the original program, while blue represents Herbie's suggestion. These can be toggled with buttons below the plot. The line is an average while dots represent individual samples.

Accuracy vs Speed?

Herbie found 15 alternatives:

AlternativeAccuracySpeedup
The accuracy (vertical axis) and speed (horizontal axis) of each alternatives. Up and to the right is better. The red square shows the initial program, and each blue circle shows an alternative.The line shows the best available speed-accuracy tradeoffs.

Initial Program: 61.5% accurate, 1.0× speedup?

\[\left(0 \leq s \land s \leq 256\right) \land \left(2.328306437 \cdot 10^{-10} \leq u \land u \leq 0.25\right)\]
\[s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right) \]
(FPCore (s u)
  :precision binary32
  :pre (and (and (<= 0.0 s) (<= s 256.0))
     (and (<= 2.328306437e-10 u) (<= u 0.25)))
  (* s (log (/ 1.0 (- 1.0 (* 4.0 u))))))
float code(float s, float u) {
	return s * logf((1.0f / (1.0f - (4.0f * u))));
}
real(4) function code(s, u)
use fmin_fmax_functions
    real(4), intent (in) :: s
    real(4), intent (in) :: u
    code = s * log((1.0e0 / (1.0e0 - (4.0e0 * u))))
end function
function code(s, u)
	return Float32(s * log(Float32(Float32(1.0) / Float32(Float32(1.0) - Float32(Float32(4.0) * u)))))
end
function tmp = code(s, u)
	tmp = s * log((single(1.0) / (single(1.0) - (single(4.0) * u))));
end
s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right)

Alternative 1: 99.3% accurate, 1.0× speedup?

\[\left(0 \leq s \land s \leq 256\right) \land \left(2.328306437 \cdot 10^{-10} \leq u \land u \leq 0.25\right)\]
\[-1 \cdot \left(s \cdot \mathsf{log1p}\left(-4 \cdot u\right)\right) \]
(FPCore (s u)
  :precision binary32
  :pre (and (and (<= 0.0 s) (<= s 256.0))
     (and (<= 2.328306437e-10 u) (<= u 0.25)))
  (* -1.0 (* s (log1p (* -4.0 u)))))
float code(float s, float u) {
	return -1.0f * (s * log1pf((-4.0f * u)));
}
function code(s, u)
	return Float32(Float32(-1.0) * Float32(s * log1p(Float32(Float32(-4.0) * u))))
end
-1 \cdot \left(s \cdot \mathsf{log1p}\left(-4 \cdot u\right)\right)
Derivation
  1. Initial program 61.5%

    \[s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right) \]
  2. Taylor expanded in u around 0

    \[\leadsto s \cdot \left(u \cdot \left(4 + u \cdot \left(8 + \frac{64}{3} \cdot u\right)\right)\right) \]
  3. Applied rewrites91.1%

    \[\leadsto s \cdot \left(u \cdot \left(4 + u \cdot \left(8 + 21.333333333333332 \cdot u\right)\right)\right) \]
  4. Applied rewrites91.1%

    \[\leadsto s \cdot \left(u \cdot \mathsf{fma}\left(\mathsf{fma}\left(u, 21.333333333333332, 8\right), u, 4\right)\right) \]
  5. Taylor expanded in s around 0

    \[\leadsto -1 \cdot \left(s \cdot \log \left(1 + -4 \cdot u\right)\right) \]
  6. Applied rewrites63.9%

    \[\leadsto -1 \cdot \left(s \cdot \log \left(1 + -4 \cdot u\right)\right) \]
  7. Applied rewrites99.3%

    \[\leadsto -1 \cdot \left(s \cdot \mathsf{log1p}\left(-4 \cdot u\right)\right) \]
  8. Add Preprocessing

Alternative 2: 98.4% accurate, 0.7× speedup?

\[\left(0 \leq s \land s \leq 256\right) \land \left(2.328306437 \cdot 10^{-10} \leq u \land u \leq 0.25\right)\]
\[\begin{array}{l} \mathbf{if}\;4 \cdot u \leq 0.014499999582767487:\\ \;\;\;\;s \cdot \mathsf{fma}\left(u \cdot u, \mathsf{fma}\left(u, 21.333333333333332, 8\right), u \cdot 4\right)\\ \mathbf{else}:\\ \;\;\;\;\log \left(\mathsf{fma}\left(u, -4, 1\right)\right) \cdot \left(-s\right)\\ \end{array} \]
(FPCore (s u)
  :precision binary32
  :pre (and (and (<= 0.0 s) (<= s 256.0))
     (and (<= 2.328306437e-10 u) (<= u 0.25)))
  (if (<= (* 4.0 u) 0.014499999582767487)
  (* s (fma (* u u) (fma u 21.333333333333332 8.0) (* u 4.0)))
  (* (log (fma u -4.0 1.0)) (- s))))
float code(float s, float u) {
	float tmp;
	if ((4.0f * u) <= 0.014499999582767487f) {
		tmp = s * fmaf((u * u), fmaf(u, 21.333333333333332f, 8.0f), (u * 4.0f));
	} else {
		tmp = logf(fmaf(u, -4.0f, 1.0f)) * -s;
	}
	return tmp;
}
function code(s, u)
	tmp = Float32(0.0)
	if (Float32(Float32(4.0) * u) <= Float32(0.014499999582767487))
		tmp = Float32(s * fma(Float32(u * u), fma(u, Float32(21.333333333333332), Float32(8.0)), Float32(u * Float32(4.0))));
	else
		tmp = Float32(log(fma(u, Float32(-4.0), Float32(1.0))) * Float32(-s));
	end
	return tmp
end
\begin{array}{l}
\mathbf{if}\;4 \cdot u \leq 0.014499999582767487:\\
\;\;\;\;s \cdot \mathsf{fma}\left(u \cdot u, \mathsf{fma}\left(u, 21.333333333333332, 8\right), u \cdot 4\right)\\

\mathbf{else}:\\
\;\;\;\;\log \left(\mathsf{fma}\left(u, -4, 1\right)\right) \cdot \left(-s\right)\\


\end{array}
Derivation
  1. Split input into 2 regimes
  2. if (*.f32 #s(literal 4 binary32) u) < 0.0144999996

    1. Initial program 61.5%

      \[s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right) \]
    2. Taylor expanded in u around 0

      \[\leadsto s \cdot \left(u \cdot \left(4 + u \cdot \left(8 + \frac{64}{3} \cdot u\right)\right)\right) \]
    3. Applied rewrites91.1%

      \[\leadsto s \cdot \left(u \cdot \left(4 + u \cdot \left(8 + 21.333333333333332 \cdot u\right)\right)\right) \]
    4. Applied rewrites91.3%

      \[\leadsto s \cdot \mathsf{fma}\left(u \cdot u, \mathsf{fma}\left(u, 21.333333333333332, 8\right), u \cdot 4\right) \]

    if 0.0144999996 < (*.f32 #s(literal 4 binary32) u)

    1. Initial program 61.5%

      \[s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right) \]
    2. Taylor expanded in u around 0

      \[\leadsto s \cdot \left(u \cdot \left(4 + u \cdot \left(8 + \frac{64}{3} \cdot u\right)\right)\right) \]
    3. Applied rewrites91.1%

      \[\leadsto s \cdot \left(u \cdot \left(4 + u \cdot \left(8 + 21.333333333333332 \cdot u\right)\right)\right) \]
    4. Applied rewrites91.1%

      \[\leadsto s \cdot \left(u \cdot \mathsf{fma}\left(\mathsf{fma}\left(u, 21.333333333333332, 8\right), u, 4\right)\right) \]
    5. Taylor expanded in s around 0

      \[\leadsto -1 \cdot \left(s \cdot \log \left(1 + -4 \cdot u\right)\right) \]
    6. Applied rewrites63.9%

      \[\leadsto -1 \cdot \left(s \cdot \log \left(1 + -4 \cdot u\right)\right) \]
    7. Applied rewrites63.9%

      \[\leadsto \log \left(\mathsf{fma}\left(u, -4, 1\right)\right) \cdot \left(-s\right) \]
  3. Recombined 2 regimes into one program.
  4. Add Preprocessing

Alternative 3: 98.3% accurate, 0.7× speedup?

\[\left(0 \leq s \land s \leq 256\right) \land \left(2.328306437 \cdot 10^{-10} \leq u \land u \leq 0.25\right)\]
\[\begin{array}{l} \mathbf{if}\;4 \cdot u \leq 0.014499999582767487:\\ \;\;\;\;u \cdot \mathsf{fma}\left(4, s, u \cdot \left(s \cdot \mathsf{fma}\left(u, 21.333333333333332, 8\right)\right)\right)\\ \mathbf{else}:\\ \;\;\;\;\log \left(\mathsf{fma}\left(u, -4, 1\right)\right) \cdot \left(-s\right)\\ \end{array} \]
(FPCore (s u)
  :precision binary32
  :pre (and (and (<= 0.0 s) (<= s 256.0))
     (and (<= 2.328306437e-10 u) (<= u 0.25)))
  (if (<= (* 4.0 u) 0.014499999582767487)
  (* u (fma 4.0 s (* u (* s (fma u 21.333333333333332 8.0)))))
  (* (log (fma u -4.0 1.0)) (- s))))
float code(float s, float u) {
	float tmp;
	if ((4.0f * u) <= 0.014499999582767487f) {
		tmp = u * fmaf(4.0f, s, (u * (s * fmaf(u, 21.333333333333332f, 8.0f))));
	} else {
		tmp = logf(fmaf(u, -4.0f, 1.0f)) * -s;
	}
	return tmp;
}
function code(s, u)
	tmp = Float32(0.0)
	if (Float32(Float32(4.0) * u) <= Float32(0.014499999582767487))
		tmp = Float32(u * fma(Float32(4.0), s, Float32(u * Float32(s * fma(u, Float32(21.333333333333332), Float32(8.0))))));
	else
		tmp = Float32(log(fma(u, Float32(-4.0), Float32(1.0))) * Float32(-s));
	end
	return tmp
end
\begin{array}{l}
\mathbf{if}\;4 \cdot u \leq 0.014499999582767487:\\
\;\;\;\;u \cdot \mathsf{fma}\left(4, s, u \cdot \left(s \cdot \mathsf{fma}\left(u, 21.333333333333332, 8\right)\right)\right)\\

\mathbf{else}:\\
\;\;\;\;\log \left(\mathsf{fma}\left(u, -4, 1\right)\right) \cdot \left(-s\right)\\


\end{array}
Derivation
  1. Split input into 2 regimes
  2. if (*.f32 #s(literal 4 binary32) u) < 0.0144999996

    1. Initial program 61.5%

      \[s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right) \]
    2. Taylor expanded in u around 0

      \[\leadsto u \cdot \left(4 \cdot s + u \cdot \left(8 \cdot s + \frac{64}{3} \cdot \left(s \cdot u\right)\right)\right) \]
    3. Applied rewrites91.3%

      \[\leadsto u \cdot \mathsf{fma}\left(4, s, u \cdot \mathsf{fma}\left(8, s, 21.333333333333332 \cdot \left(s \cdot u\right)\right)\right) \]
    4. Taylor expanded in s around 0

      \[\leadsto u \cdot \mathsf{fma}\left(4, s, u \cdot \left(s \cdot \left(8 + \frac{64}{3} \cdot u\right)\right)\right) \]
    5. Applied rewrites91.3%

      \[\leadsto u \cdot \mathsf{fma}\left(4, s, u \cdot \left(s \cdot \left(8 + 21.333333333333332 \cdot u\right)\right)\right) \]
    6. Applied rewrites91.3%

      \[\leadsto u \cdot \mathsf{fma}\left(4, s, u \cdot \left(s \cdot \mathsf{fma}\left(u, 21.333333333333332, 8\right)\right)\right) \]

    if 0.0144999996 < (*.f32 #s(literal 4 binary32) u)

    1. Initial program 61.5%

      \[s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right) \]
    2. Taylor expanded in u around 0

      \[\leadsto s \cdot \left(u \cdot \left(4 + u \cdot \left(8 + \frac{64}{3} \cdot u\right)\right)\right) \]
    3. Applied rewrites91.1%

      \[\leadsto s \cdot \left(u \cdot \left(4 + u \cdot \left(8 + 21.333333333333332 \cdot u\right)\right)\right) \]
    4. Applied rewrites91.1%

      \[\leadsto s \cdot \left(u \cdot \mathsf{fma}\left(\mathsf{fma}\left(u, 21.333333333333332, 8\right), u, 4\right)\right) \]
    5. Taylor expanded in s around 0

      \[\leadsto -1 \cdot \left(s \cdot \log \left(1 + -4 \cdot u\right)\right) \]
    6. Applied rewrites63.9%

      \[\leadsto -1 \cdot \left(s \cdot \log \left(1 + -4 \cdot u\right)\right) \]
    7. Applied rewrites63.9%

      \[\leadsto \log \left(\mathsf{fma}\left(u, -4, 1\right)\right) \cdot \left(-s\right) \]
  3. Recombined 2 regimes into one program.
  4. Add Preprocessing

Alternative 4: 98.1% accurate, 0.8× speedup?

\[\left(0 \leq s \land s \leq 256\right) \land \left(2.328306437 \cdot 10^{-10} \leq u \land u \leq 0.25\right)\]
\[\begin{array}{l} \mathbf{if}\;4 \cdot u \leq 0.014499999582767487:\\ \;\;\;\;\left(s \cdot \mathsf{fma}\left(\mathsf{fma}\left(u, 21.333333333333332, 8\right), u, 4\right)\right) \cdot u\\ \mathbf{else}:\\ \;\;\;\;\log \left(\mathsf{fma}\left(u, -4, 1\right)\right) \cdot \left(-s\right)\\ \end{array} \]
(FPCore (s u)
  :precision binary32
  :pre (and (and (<= 0.0 s) (<= s 256.0))
     (and (<= 2.328306437e-10 u) (<= u 0.25)))
  (if (<= (* 4.0 u) 0.014499999582767487)
  (* (* s (fma (fma u 21.333333333333332 8.0) u 4.0)) u)
  (* (log (fma u -4.0 1.0)) (- s))))
float code(float s, float u) {
	float tmp;
	if ((4.0f * u) <= 0.014499999582767487f) {
		tmp = (s * fmaf(fmaf(u, 21.333333333333332f, 8.0f), u, 4.0f)) * u;
	} else {
		tmp = logf(fmaf(u, -4.0f, 1.0f)) * -s;
	}
	return tmp;
}
function code(s, u)
	tmp = Float32(0.0)
	if (Float32(Float32(4.0) * u) <= Float32(0.014499999582767487))
		tmp = Float32(Float32(s * fma(fma(u, Float32(21.333333333333332), Float32(8.0)), u, Float32(4.0))) * u);
	else
		tmp = Float32(log(fma(u, Float32(-4.0), Float32(1.0))) * Float32(-s));
	end
	return tmp
end
\begin{array}{l}
\mathbf{if}\;4 \cdot u \leq 0.014499999582767487:\\
\;\;\;\;\left(s \cdot \mathsf{fma}\left(\mathsf{fma}\left(u, 21.333333333333332, 8\right), u, 4\right)\right) \cdot u\\

\mathbf{else}:\\
\;\;\;\;\log \left(\mathsf{fma}\left(u, -4, 1\right)\right) \cdot \left(-s\right)\\


\end{array}
Derivation
  1. Split input into 2 regimes
  2. if (*.f32 #s(literal 4 binary32) u) < 0.0144999996

    1. Initial program 61.5%

      \[s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right) \]
    2. Taylor expanded in u around 0

      \[\leadsto u \cdot \left(4 \cdot s + u \cdot \left(8 \cdot s + \frac{64}{3} \cdot \left(s \cdot u\right)\right)\right) \]
    3. Applied rewrites91.3%

      \[\leadsto u \cdot \mathsf{fma}\left(4, s, u \cdot \mathsf{fma}\left(8, s, 21.333333333333332 \cdot \left(s \cdot u\right)\right)\right) \]
    4. Applied rewrites91.0%

      \[\leadsto \mathsf{fma}\left(u \cdot u, \mathsf{fma}\left(8, s, 21.333333333333332 \cdot \left(s \cdot u\right)\right), 4 \cdot \left(s \cdot u\right)\right) \]
    5. Applied rewrites91.1%

      \[\leadsto \left(s \cdot \mathsf{fma}\left(\mathsf{fma}\left(u, 21.333333333333332, 8\right), u, 4\right)\right) \cdot u \]

    if 0.0144999996 < (*.f32 #s(literal 4 binary32) u)

    1. Initial program 61.5%

      \[s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right) \]
    2. Taylor expanded in u around 0

      \[\leadsto s \cdot \left(u \cdot \left(4 + u \cdot \left(8 + \frac{64}{3} \cdot u\right)\right)\right) \]
    3. Applied rewrites91.1%

      \[\leadsto s \cdot \left(u \cdot \left(4 + u \cdot \left(8 + 21.333333333333332 \cdot u\right)\right)\right) \]
    4. Applied rewrites91.1%

      \[\leadsto s \cdot \left(u \cdot \mathsf{fma}\left(\mathsf{fma}\left(u, 21.333333333333332, 8\right), u, 4\right)\right) \]
    5. Taylor expanded in s around 0

      \[\leadsto -1 \cdot \left(s \cdot \log \left(1 + -4 \cdot u\right)\right) \]
    6. Applied rewrites63.9%

      \[\leadsto -1 \cdot \left(s \cdot \log \left(1 + -4 \cdot u\right)\right) \]
    7. Applied rewrites63.9%

      \[\leadsto \log \left(\mathsf{fma}\left(u, -4, 1\right)\right) \cdot \left(-s\right) \]
  3. Recombined 2 regimes into one program.
  4. Add Preprocessing

Alternative 5: 97.8% accurate, 0.8× speedup?

\[\left(0 \leq s \land s \leq 256\right) \land \left(2.328306437 \cdot 10^{-10} \leq u \land u \leq 0.25\right)\]
\[\begin{array}{l} \mathbf{if}\;4 \cdot u \leq 0.014499999582767487:\\ \;\;\;\;\left(s \cdot u\right) \cdot \mathsf{fma}\left(\mathsf{fma}\left(u, 21.333333333333332, 8\right), u, 4\right)\\ \mathbf{else}:\\ \;\;\;\;\log \left(\mathsf{fma}\left(u, -4, 1\right)\right) \cdot \left(-s\right)\\ \end{array} \]
(FPCore (s u)
  :precision binary32
  :pre (and (and (<= 0.0 s) (<= s 256.0))
     (and (<= 2.328306437e-10 u) (<= u 0.25)))
  (if (<= (* 4.0 u) 0.014499999582767487)
  (* (* s u) (fma (fma u 21.333333333333332 8.0) u 4.0))
  (* (log (fma u -4.0 1.0)) (- s))))
float code(float s, float u) {
	float tmp;
	if ((4.0f * u) <= 0.014499999582767487f) {
		tmp = (s * u) * fmaf(fmaf(u, 21.333333333333332f, 8.0f), u, 4.0f);
	} else {
		tmp = logf(fmaf(u, -4.0f, 1.0f)) * -s;
	}
	return tmp;
}
function code(s, u)
	tmp = Float32(0.0)
	if (Float32(Float32(4.0) * u) <= Float32(0.014499999582767487))
		tmp = Float32(Float32(s * u) * fma(fma(u, Float32(21.333333333333332), Float32(8.0)), u, Float32(4.0)));
	else
		tmp = Float32(log(fma(u, Float32(-4.0), Float32(1.0))) * Float32(-s));
	end
	return tmp
end
\begin{array}{l}
\mathbf{if}\;4 \cdot u \leq 0.014499999582767487:\\
\;\;\;\;\left(s \cdot u\right) \cdot \mathsf{fma}\left(\mathsf{fma}\left(u, 21.333333333333332, 8\right), u, 4\right)\\

\mathbf{else}:\\
\;\;\;\;\log \left(\mathsf{fma}\left(u, -4, 1\right)\right) \cdot \left(-s\right)\\


\end{array}
Derivation
  1. Split input into 2 regimes
  2. if (*.f32 #s(literal 4 binary32) u) < 0.0144999996

    1. Initial program 61.5%

      \[s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right) \]
    2. Taylor expanded in u around 0

      \[\leadsto u \cdot \left(4 \cdot s + u \cdot \left(8 \cdot s + \frac{64}{3} \cdot \left(s \cdot u\right)\right)\right) \]
    3. Applied rewrites91.3%

      \[\leadsto u \cdot \mathsf{fma}\left(4, s, u \cdot \mathsf{fma}\left(8, s, 21.333333333333332 \cdot \left(s \cdot u\right)\right)\right) \]
    4. Applied rewrites91.0%

      \[\leadsto \mathsf{fma}\left(u \cdot u, \mathsf{fma}\left(8, s, 21.333333333333332 \cdot \left(s \cdot u\right)\right), 4 \cdot \left(s \cdot u\right)\right) \]
    5. Applied rewrites91.1%

      \[\leadsto \left(s \cdot \mathsf{fma}\left(\mathsf{fma}\left(u, 21.333333333333332, 8\right), u, 4\right)\right) \cdot u \]
    6. Applied rewrites90.8%

      \[\leadsto \left(s \cdot u\right) \cdot \mathsf{fma}\left(\mathsf{fma}\left(u, 21.333333333333332, 8\right), u, 4\right) \]

    if 0.0144999996 < (*.f32 #s(literal 4 binary32) u)

    1. Initial program 61.5%

      \[s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right) \]
    2. Taylor expanded in u around 0

      \[\leadsto s \cdot \left(u \cdot \left(4 + u \cdot \left(8 + \frac{64}{3} \cdot u\right)\right)\right) \]
    3. Applied rewrites91.1%

      \[\leadsto s \cdot \left(u \cdot \left(4 + u \cdot \left(8 + 21.333333333333332 \cdot u\right)\right)\right) \]
    4. Applied rewrites91.1%

      \[\leadsto s \cdot \left(u \cdot \mathsf{fma}\left(\mathsf{fma}\left(u, 21.333333333333332, 8\right), u, 4\right)\right) \]
    5. Taylor expanded in s around 0

      \[\leadsto -1 \cdot \left(s \cdot \log \left(1 + -4 \cdot u\right)\right) \]
    6. Applied rewrites63.9%

      \[\leadsto -1 \cdot \left(s \cdot \log \left(1 + -4 \cdot u\right)\right) \]
    7. Applied rewrites63.9%

      \[\leadsto \log \left(\mathsf{fma}\left(u, -4, 1\right)\right) \cdot \left(-s\right) \]
  3. Recombined 2 regimes into one program.
  4. Add Preprocessing

Alternative 6: 96.9% accurate, 0.9× speedup?

\[\left(0 \leq s \land s \leq 256\right) \land \left(2.328306437 \cdot 10^{-10} \leq u \land u \leq 0.25\right)\]
\[\begin{array}{l} \mathbf{if}\;4 \cdot u \leq 0.004000000189989805:\\ \;\;\;\;u \cdot \mathsf{fma}\left(4, s, 8 \cdot \left(s \cdot u\right)\right)\\ \mathbf{else}:\\ \;\;\;\;\log \left(\mathsf{fma}\left(u, -4, 1\right)\right) \cdot \left(-s\right)\\ \end{array} \]
(FPCore (s u)
  :precision binary32
  :pre (and (and (<= 0.0 s) (<= s 256.0))
     (and (<= 2.328306437e-10 u) (<= u 0.25)))
  (if (<= (* 4.0 u) 0.004000000189989805)
  (* u (fma 4.0 s (* 8.0 (* s u))))
  (* (log (fma u -4.0 1.0)) (- s))))
float code(float s, float u) {
	float tmp;
	if ((4.0f * u) <= 0.004000000189989805f) {
		tmp = u * fmaf(4.0f, s, (8.0f * (s * u)));
	} else {
		tmp = logf(fmaf(u, -4.0f, 1.0f)) * -s;
	}
	return tmp;
}
function code(s, u)
	tmp = Float32(0.0)
	if (Float32(Float32(4.0) * u) <= Float32(0.004000000189989805))
		tmp = Float32(u * fma(Float32(4.0), s, Float32(Float32(8.0) * Float32(s * u))));
	else
		tmp = Float32(log(fma(u, Float32(-4.0), Float32(1.0))) * Float32(-s));
	end
	return tmp
end
\begin{array}{l}
\mathbf{if}\;4 \cdot u \leq 0.004000000189989805:\\
\;\;\;\;u \cdot \mathsf{fma}\left(4, s, 8 \cdot \left(s \cdot u\right)\right)\\

\mathbf{else}:\\
\;\;\;\;\log \left(\mathsf{fma}\left(u, -4, 1\right)\right) \cdot \left(-s\right)\\


\end{array}
Derivation
  1. Split input into 2 regimes
  2. if (*.f32 #s(literal 4 binary32) u) < 0.00400000019

    1. Initial program 61.5%

      \[s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right) \]
    2. Taylor expanded in u around 0

      \[\leadsto u \cdot \left(4 \cdot s + 8 \cdot \left(s \cdot u\right)\right) \]
    3. Applied rewrites86.9%

      \[\leadsto u \cdot \mathsf{fma}\left(4, s, 8 \cdot \left(s \cdot u\right)\right) \]

    if 0.00400000019 < (*.f32 #s(literal 4 binary32) u)

    1. Initial program 61.5%

      \[s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right) \]
    2. Taylor expanded in u around 0

      \[\leadsto s \cdot \left(u \cdot \left(4 + u \cdot \left(8 + \frac{64}{3} \cdot u\right)\right)\right) \]
    3. Applied rewrites91.1%

      \[\leadsto s \cdot \left(u \cdot \left(4 + u \cdot \left(8 + 21.333333333333332 \cdot u\right)\right)\right) \]
    4. Applied rewrites91.1%

      \[\leadsto s \cdot \left(u \cdot \mathsf{fma}\left(\mathsf{fma}\left(u, 21.333333333333332, 8\right), u, 4\right)\right) \]
    5. Taylor expanded in s around 0

      \[\leadsto -1 \cdot \left(s \cdot \log \left(1 + -4 \cdot u\right)\right) \]
    6. Applied rewrites63.9%

      \[\leadsto -1 \cdot \left(s \cdot \log \left(1 + -4 \cdot u\right)\right) \]
    7. Applied rewrites63.9%

      \[\leadsto \log \left(\mathsf{fma}\left(u, -4, 1\right)\right) \cdot \left(-s\right) \]
  3. Recombined 2 regimes into one program.
  4. Add Preprocessing

Alternative 7: 86.9% accurate, 1.3× speedup?

\[\left(0 \leq s \land s \leq 256\right) \land \left(2.328306437 \cdot 10^{-10} \leq u \land u \leq 0.25\right)\]
\[u \cdot \mathsf{fma}\left(4, s, 8 \cdot \left(s \cdot u\right)\right) \]
(FPCore (s u)
  :precision binary32
  :pre (and (and (<= 0.0 s) (<= s 256.0))
     (and (<= 2.328306437e-10 u) (<= u 0.25)))
  (* u (fma 4.0 s (* 8.0 (* s u)))))
float code(float s, float u) {
	return u * fmaf(4.0f, s, (8.0f * (s * u)));
}
function code(s, u)
	return Float32(u * fma(Float32(4.0), s, Float32(Float32(8.0) * Float32(s * u))))
end
u \cdot \mathsf{fma}\left(4, s, 8 \cdot \left(s \cdot u\right)\right)
Derivation
  1. Initial program 61.5%

    \[s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right) \]
  2. Taylor expanded in u around 0

    \[\leadsto u \cdot \left(4 \cdot s + 8 \cdot \left(s \cdot u\right)\right) \]
  3. Applied rewrites86.9%

    \[\leadsto u \cdot \mathsf{fma}\left(4, s, 8 \cdot \left(s \cdot u\right)\right) \]
  4. Add Preprocessing

Alternative 8: 86.7% accurate, 1.6× speedup?

\[\left(0 \leq s \land s \leq 256\right) \land \left(2.328306437 \cdot 10^{-10} \leq u \land u \leq 0.25\right)\]
\[\left(s \cdot \mathsf{fma}\left(u, 8, 4\right)\right) \cdot u \]
(FPCore (s u)
  :precision binary32
  :pre (and (and (<= 0.0 s) (<= s 256.0))
     (and (<= 2.328306437e-10 u) (<= u 0.25)))
  (* (* s (fma u 8.0 4.0)) u))
float code(float s, float u) {
	return (s * fmaf(u, 8.0f, 4.0f)) * u;
}
function code(s, u)
	return Float32(Float32(s * fma(u, Float32(8.0), Float32(4.0))) * u)
end
\left(s \cdot \mathsf{fma}\left(u, 8, 4\right)\right) \cdot u
Derivation
  1. Initial program 61.5%

    \[s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right) \]
  2. Taylor expanded in u around 0

    \[\leadsto u \cdot \left(4 \cdot s + 8 \cdot \left(s \cdot u\right)\right) \]
  3. Applied rewrites86.9%

    \[\leadsto u \cdot \mathsf{fma}\left(4, s, 8 \cdot \left(s \cdot u\right)\right) \]
  4. Applied rewrites87.2%

    \[\leadsto \mathsf{fma}\left(s, u \cdot 4, \left(s \cdot 8\right) \cdot \left(u \cdot u\right)\right) \]
  5. Applied rewrites86.7%

    \[\leadsto \left(s \cdot \mathsf{fma}\left(u, 8, 4\right)\right) \cdot u \]
  6. Add Preprocessing

Alternative 9: 86.4% accurate, 1.6× speedup?

\[\left(0 \leq s \land s \leq 256\right) \land \left(2.328306437 \cdot 10^{-10} \leq u \land u \leq 0.25\right)\]
\[\left(s \cdot u\right) \cdot \mathsf{fma}\left(u, 8, 4\right) \]
(FPCore (s u)
  :precision binary32
  :pre (and (and (<= 0.0 s) (<= s 256.0))
     (and (<= 2.328306437e-10 u) (<= u 0.25)))
  (* (* s u) (fma u 8.0 4.0)))
float code(float s, float u) {
	return (s * u) * fmaf(u, 8.0f, 4.0f);
}
function code(s, u)
	return Float32(Float32(s * u) * fma(u, Float32(8.0), Float32(4.0)))
end
\left(s \cdot u\right) \cdot \mathsf{fma}\left(u, 8, 4\right)
Derivation
  1. Initial program 61.5%

    \[s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right) \]
  2. Taylor expanded in u around 0

    \[\leadsto u \cdot \left(4 \cdot s + 8 \cdot \left(s \cdot u\right)\right) \]
  3. Applied rewrites86.9%

    \[\leadsto u \cdot \mathsf{fma}\left(4, s, 8 \cdot \left(s \cdot u\right)\right) \]
  4. Applied rewrites87.2%

    \[\leadsto \mathsf{fma}\left(s, u \cdot 4, \left(s \cdot 8\right) \cdot \left(u \cdot u\right)\right) \]
  5. Applied rewrites86.7%

    \[\leadsto \left(s \cdot \mathsf{fma}\left(u, 8, 4\right)\right) \cdot u \]
  6. Applied rewrites86.4%

    \[\leadsto \left(s \cdot u\right) \cdot \mathsf{fma}\left(u, 8, 4\right) \]
  7. Add Preprocessing

Alternative 10: 73.8% accurate, 2.8× speedup?

\[\left(0 \leq s \land s \leq 256\right) \land \left(2.328306437 \cdot 10^{-10} \leq u \land u \leq 0.25\right)\]
\[s \cdot \left(u \cdot 4\right) \]
(FPCore (s u)
  :precision binary32
  :pre (and (and (<= 0.0 s) (<= s 256.0))
     (and (<= 2.328306437e-10 u) (<= u 0.25)))
  (* s (* u 4.0)))
float code(float s, float u) {
	return s * (u * 4.0f);
}
real(4) function code(s, u)
use fmin_fmax_functions
    real(4), intent (in) :: s
    real(4), intent (in) :: u
    code = s * (u * 4.0e0)
end function
function code(s, u)
	return Float32(s * Float32(u * Float32(4.0)))
end
function tmp = code(s, u)
	tmp = s * (u * single(4.0));
end
s \cdot \left(u \cdot 4\right)
Derivation
  1. Initial program 61.5%

    \[s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right) \]
  2. Taylor expanded in u around 0

    \[\leadsto s \cdot \left(u \cdot \left(4 + 8 \cdot u\right)\right) \]
  3. Applied rewrites86.7%

    \[\leadsto s \cdot \left(u \cdot \left(4 + 8 \cdot u\right)\right) \]
  4. Taylor expanded in u around 0

    \[\leadsto s \cdot \left(u \cdot 4\right) \]
  5. Applied rewrites73.8%

    \[\leadsto s \cdot \left(u \cdot 4\right) \]
  6. Add Preprocessing

Alternative 11: 73.6% accurate, 2.8× speedup?

\[\left(0 \leq s \land s \leq 256\right) \land \left(2.328306437 \cdot 10^{-10} \leq u \land u \leq 0.25\right)\]
\[4 \cdot \left(s \cdot u\right) \]
(FPCore (s u)
  :precision binary32
  :pre (and (and (<= 0.0 s) (<= s 256.0))
     (and (<= 2.328306437e-10 u) (<= u 0.25)))
  (* 4.0 (* s u)))
float code(float s, float u) {
	return 4.0f * (s * u);
}
real(4) function code(s, u)
use fmin_fmax_functions
    real(4), intent (in) :: s
    real(4), intent (in) :: u
    code = 4.0e0 * (s * u)
end function
function code(s, u)
	return Float32(Float32(4.0) * Float32(s * u))
end
function tmp = code(s, u)
	tmp = single(4.0) * (s * u);
end
4 \cdot \left(s \cdot u\right)
Derivation
  1. Initial program 61.5%

    \[s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right) \]
  2. Taylor expanded in u around 0

    \[\leadsto 4 \cdot \left(s \cdot u\right) \]
  3. Applied rewrites73.6%

    \[\leadsto 4 \cdot \left(s \cdot u\right) \]
  4. Add Preprocessing

Alternative 12: 22.0% accurate, 2.6× speedup?

\[\left(0 \leq s \land s \leq 256\right) \land \left(2.328306437 \cdot 10^{-10} \leq u \land u \leq 0.25\right)\]
\[\begin{array}{l} \mathbf{if}\;s \leq 4.307599612825561 \cdot 10^{-29}:\\ \;\;\;\;0\\ \mathbf{else}:\\ \;\;\;\;s + s\\ \end{array} \]
(FPCore (s u)
  :precision binary32
  :pre (and (and (<= 0.0 s) (<= s 256.0))
     (and (<= 2.328306437e-10 u) (<= u 0.25)))
  (if (<= s 4.307599612825561e-29) 0.0 (+ s s)))
float code(float s, float u) {
	float tmp;
	if (s <= 4.307599612825561e-29f) {
		tmp = 0.0f;
	} else {
		tmp = s + s;
	}
	return tmp;
}
real(4) function code(s, u)
use fmin_fmax_functions
    real(4), intent (in) :: s
    real(4), intent (in) :: u
    real(4) :: tmp
    if (s <= 4.307599612825561e-29) then
        tmp = 0.0e0
    else
        tmp = s + s
    end if
    code = tmp
end function
function code(s, u)
	tmp = Float32(0.0)
	if (s <= Float32(4.307599612825561e-29))
		tmp = Float32(0.0);
	else
		tmp = Float32(s + s);
	end
	return tmp
end
function tmp_2 = code(s, u)
	tmp = single(0.0);
	if (s <= single(4.307599612825561e-29))
		tmp = single(0.0);
	else
		tmp = s + s;
	end
	tmp_2 = tmp;
end
\begin{array}{l}
\mathbf{if}\;s \leq 4.307599612825561 \cdot 10^{-29}:\\
\;\;\;\;0\\

\mathbf{else}:\\
\;\;\;\;s + s\\


\end{array}
Derivation
  1. Split input into 2 regimes
  2. if s < 4.30759961e-29

    1. Initial program 61.5%

      \[s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right) \]
    2. Applied rewrites16.5%

      \[\leadsto 0 \]

    if 4.30759961e-29 < s

    1. Initial program 61.5%

      \[s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right) \]
    2. Applied rewrites17.3%

      \[\leadsto s \cdot 2 \]
    3. Applied rewrites17.3%

      \[\leadsto s + s \]
  3. Recombined 2 regimes into one program.
  4. Add Preprocessing

Alternative 13: 20.8% accurate, 4.8× speedup?

\[\left(0 \leq s \land s \leq 256\right) \land \left(2.328306437 \cdot 10^{-10} \leq u \land u \leq 0.25\right)\]
\[s \cdot 0.015625 \]
(FPCore (s u)
  :precision binary32
  :pre (and (and (<= 0.0 s) (<= s 256.0))
     (and (<= 2.328306437e-10 u) (<= u 0.25)))
  (* s 0.015625))
float code(float s, float u) {
	return s * 0.015625f;
}
real(4) function code(s, u)
use fmin_fmax_functions
    real(4), intent (in) :: s
    real(4), intent (in) :: u
    code = s * 0.015625e0
end function
function code(s, u)
	return Float32(s * Float32(0.015625))
end
function tmp = code(s, u)
	tmp = s * single(0.015625);
end
s \cdot 0.015625
Derivation
  1. Initial program 61.5%

    \[s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right) \]
  2. Applied rewrites20.8%

    \[\leadsto s \cdot 0.015625 \]
  3. Add Preprocessing

Alternative 14: 20.3% accurate, 4.0× speedup?

\[\left(0 \leq s \land s \leq 256\right) \land \left(2.328306437 \cdot 10^{-10} \leq u \land u \leq 0.25\right)\]
\[\begin{array}{l} \mathbf{if}\;s \leq 1.235573164014442 \cdot 10^{-17}:\\ \;\;\;\;0\\ \mathbf{else}:\\ \;\;\;\;u\\ \end{array} \]
(FPCore (s u)
  :precision binary32
  :pre (and (and (<= 0.0 s) (<= s 256.0))
     (and (<= 2.328306437e-10 u) (<= u 0.25)))
  (if (<= s 1.235573164014442e-17) 0.0 u))
float code(float s, float u) {
	float tmp;
	if (s <= 1.235573164014442e-17f) {
		tmp = 0.0f;
	} else {
		tmp = u;
	}
	return tmp;
}
real(4) function code(s, u)
use fmin_fmax_functions
    real(4), intent (in) :: s
    real(4), intent (in) :: u
    real(4) :: tmp
    if (s <= 1.235573164014442e-17) then
        tmp = 0.0e0
    else
        tmp = u
    end if
    code = tmp
end function
function code(s, u)
	tmp = Float32(0.0)
	if (s <= Float32(1.235573164014442e-17))
		tmp = Float32(0.0);
	else
		tmp = u;
	end
	return tmp
end
function tmp_2 = code(s, u)
	tmp = single(0.0);
	if (s <= single(1.235573164014442e-17))
		tmp = single(0.0);
	else
		tmp = u;
	end
	tmp_2 = tmp;
end
\begin{array}{l}
\mathbf{if}\;s \leq 1.235573164014442 \cdot 10^{-17}:\\
\;\;\;\;0\\

\mathbf{else}:\\
\;\;\;\;u\\


\end{array}
Derivation
  1. Split input into 2 regimes
  2. if s < 1.23557316e-17

    1. Initial program 61.5%

      \[s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right) \]
    2. Applied rewrites16.5%

      \[\leadsto 0 \]

    if 1.23557316e-17 < s

    1. Initial program 61.5%

      \[s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right) \]
    2. Applied rewrites11.9%

      \[\leadsto 1 \cdot u \]
    3. Applied rewrites11.9%

      \[\leadsto u \]
  3. Recombined 2 regimes into one program.
  4. Add Preprocessing

Alternative 15: 16.5% accurate, 19.4× speedup?

\[\left(0 \leq s \land s \leq 256\right) \land \left(2.328306437 \cdot 10^{-10} \leq u \land u \leq 0.25\right)\]
\[0 \]
(FPCore (s u)
  :precision binary32
  :pre (and (and (<= 0.0 s) (<= s 256.0))
     (and (<= 2.328306437e-10 u) (<= u 0.25)))
  0.0)
float code(float s, float u) {
	return 0.0f;
}
real(4) function code(s, u)
use fmin_fmax_functions
    real(4), intent (in) :: s
    real(4), intent (in) :: u
    code = 0.0e0
end function
function code(s, u)
	return Float32(0.0)
end
function tmp = code(s, u)
	tmp = single(0.0);
end
0
Derivation
  1. Initial program 61.5%

    \[s \cdot \log \left(\frac{1}{1 - 4 \cdot u}\right) \]
  2. Applied rewrites16.5%

    \[\leadsto 0 \]
  3. Add Preprocessing

Reproduce

?
herbie shell --seed 2026089 +o generate:egglog
(FPCore (s u)
  :name "Disney BSSRDF, sample scattering profile, lower"
  :precision binary32
  :pre (and (and (<= 0.0 s) (<= s 256.0)) (and (<= 2.328306437e-10 u) (<= u 0.25)))
  (* s (log (/ 1.0 (- 1.0 (* 4.0 u))))))