Beckmann Distribution sample, tan2theta, alphax == alphay

Percentage Accurate: 56.2% → 99.0%
Time: 3.3s
Alternatives: 9
Speedup: 2.4×

Specification

?
\[\left(0.0001 \leq \alpha \land \alpha \leq 1\right) \land \left(2.328306437 \cdot 10^{-10} \leq u0 \land u0 \leq 1\right)\]
\[\left(\left(-\alpha\right) \cdot \alpha\right) \cdot \log \left(1 - u0\right) \]
(FPCore (alpha u0)
  :precision binary32
  :pre (and (and (<= 0.0001 alpha) (<= alpha 1.0))
     (and (<= 2.328306437e-10 u0) (<= u0 1.0)))
  (* (* (- alpha) alpha) (log (- 1.0 u0))))
float code(float alpha, float u0) {
	return (-alpha * alpha) * logf((1.0f - u0));
}
real(4) function code(alpha, u0)
use fmin_fmax_functions
    real(4), intent (in) :: alpha
    real(4), intent (in) :: u0
    code = (-alpha * alpha) * log((1.0e0 - u0))
end function
function code(alpha, u0)
	return Float32(Float32(Float32(-alpha) * alpha) * log(Float32(Float32(1.0) - u0)))
end
function tmp = code(alpha, u0)
	tmp = (-alpha * alpha) * log((single(1.0) - u0));
end
\left(\left(-\alpha\right) \cdot \alpha\right) \cdot \log \left(1 - u0\right)

Local Percentage Accuracy vs ?

The average percentage accuracy by input value. Horizontal axis shows value of an input variable; the variable is choosen in the title. Vertical axis is accuracy; higher is better. Red represent the original program, while blue represents Herbie's suggestion. These can be toggled with buttons below the plot. The line is an average while dots represent individual samples.

Accuracy vs Speed?

Herbie found 9 alternatives:

AlternativeAccuracySpeedup
The accuracy (vertical axis) and speed (horizontal axis) of each alternatives. Up and to the right is better. The red square shows the initial program, and each blue circle shows an alternative.The line shows the best available speed-accuracy tradeoffs.

Initial Program: 56.2% accurate, 1.0× speedup?

\[\left(0.0001 \leq \alpha \land \alpha \leq 1\right) \land \left(2.328306437 \cdot 10^{-10} \leq u0 \land u0 \leq 1\right)\]
\[\left(\left(-\alpha\right) \cdot \alpha\right) \cdot \log \left(1 - u0\right) \]
(FPCore (alpha u0)
  :precision binary32
  :pre (and (and (<= 0.0001 alpha) (<= alpha 1.0))
     (and (<= 2.328306437e-10 u0) (<= u0 1.0)))
  (* (* (- alpha) alpha) (log (- 1.0 u0))))
float code(float alpha, float u0) {
	return (-alpha * alpha) * logf((1.0f - u0));
}
real(4) function code(alpha, u0)
use fmin_fmax_functions
    real(4), intent (in) :: alpha
    real(4), intent (in) :: u0
    code = (-alpha * alpha) * log((1.0e0 - u0))
end function
function code(alpha, u0)
	return Float32(Float32(Float32(-alpha) * alpha) * log(Float32(Float32(1.0) - u0)))
end
function tmp = code(alpha, u0)
	tmp = (-alpha * alpha) * log((single(1.0) - u0));
end
\left(\left(-\alpha\right) \cdot \alpha\right) \cdot \log \left(1 - u0\right)

Alternative 1: 99.0% accurate, 0.9× speedup?

\[\left(0.0001 \leq \alpha \land \alpha \leq 1\right) \land \left(2.328306437 \cdot 10^{-10} \leq u0 \land u0 \leq 1\right)\]
\[\left(\left(-\alpha\right) \cdot \alpha\right) \cdot \mathsf{log1p}\left(-u0\right) \]
(FPCore (alpha u0)
  :precision binary32
  :pre (and (and (<= 0.0001 alpha) (<= alpha 1.0))
     (and (<= 2.328306437e-10 u0) (<= u0 1.0)))
  (* (* (- alpha) alpha) (log1p (- u0))))
float code(float alpha, float u0) {
	return (-alpha * alpha) * log1pf(-u0);
}
function code(alpha, u0)
	return Float32(Float32(Float32(-alpha) * alpha) * log1p(Float32(-u0)))
end
\left(\left(-\alpha\right) \cdot \alpha\right) \cdot \mathsf{log1p}\left(-u0\right)
Derivation
  1. Initial program 56.2%

    \[\left(\left(-\alpha\right) \cdot \alpha\right) \cdot \log \left(1 - u0\right) \]
  2. Step-by-step derivation
    1. Applied rewrites99.0%

      \[\leadsto \left(\left(-\alpha\right) \cdot \alpha\right) \cdot \mathsf{log1p}\left(-u0\right) \]
    2. Add Preprocessing

    Alternative 2: 99.0% accurate, 0.9× speedup?

    \[\left(0.0001 \leq \alpha \land \alpha \leq 1\right) \land \left(2.328306437 \cdot 10^{-10} \leq u0 \land u0 \leq 1\right)\]
    \[\left(\mathsf{log1p}\left(-u0\right) \cdot \left(-\alpha\right)\right) \cdot \alpha \]
    (FPCore (alpha u0)
      :precision binary32
      :pre (and (and (<= 0.0001 alpha) (<= alpha 1.0))
         (and (<= 2.328306437e-10 u0) (<= u0 1.0)))
      (* (* (log1p (- u0)) (- alpha)) alpha))
    float code(float alpha, float u0) {
    	return (log1pf(-u0) * -alpha) * alpha;
    }
    
    function code(alpha, u0)
    	return Float32(Float32(log1p(Float32(-u0)) * Float32(-alpha)) * alpha)
    end
    
    \left(\mathsf{log1p}\left(-u0\right) \cdot \left(-\alpha\right)\right) \cdot \alpha
    
    Derivation
    1. Initial program 56.2%

      \[\left(\left(-\alpha\right) \cdot \alpha\right) \cdot \log \left(1 - u0\right) \]
    2. Step-by-step derivation
      1. Applied rewrites56.2%

        \[\leadsto \left(\left(-\log \left(1 - u0\right)\right) \cdot \left(-\left|\alpha\right|\right)\right) \cdot \left(-\left|\alpha\right|\right) \]
      2. Step-by-step derivation
        1. Applied rewrites56.2%

          \[\leadsto \left(\log \left(1 - u0\right) \cdot \left(-\alpha\right)\right) \cdot \alpha \]
        2. Step-by-step derivation
          1. Applied rewrites99.0%

            \[\leadsto \left(\mathsf{log1p}\left(-u0\right) \cdot \left(-\alpha\right)\right) \cdot \alpha \]
          2. Add Preprocessing

          Alternative 3: 96.7% accurate, 0.7× speedup?

          \[\left(0.0001 \leq \alpha \land \alpha \leq 1\right) \land \left(2.328306437 \cdot 10^{-10} \leq u0 \land u0 \leq 1\right)\]
          \[\begin{array}{l} \mathbf{if}\;1 - u0 \leq 0.9959999918937683:\\ \;\;\;\;\left(\left(-\alpha\right) \cdot \alpha\right) \cdot \log \left(1 - u0\right)\\ \mathbf{else}:\\ \;\;\;\;\mathsf{fma}\left(u0 \cdot 0.5, u0 \cdot \alpha, u0 \cdot \alpha\right) \cdot \alpha\\ \end{array} \]
          (FPCore (alpha u0)
            :precision binary32
            :pre (and (and (<= 0.0001 alpha) (<= alpha 1.0))
               (and (<= 2.328306437e-10 u0) (<= u0 1.0)))
            (if (<= (- 1.0 u0) 0.9959999918937683)
            (* (* (- alpha) alpha) (log (- 1.0 u0)))
            (* (fma (* u0 0.5) (* u0 alpha) (* u0 alpha)) alpha)))
          float code(float alpha, float u0) {
          	float tmp;
          	if ((1.0f - u0) <= 0.9959999918937683f) {
          		tmp = (-alpha * alpha) * logf((1.0f - u0));
          	} else {
          		tmp = fmaf((u0 * 0.5f), (u0 * alpha), (u0 * alpha)) * alpha;
          	}
          	return tmp;
          }
          
          function code(alpha, u0)
          	tmp = Float32(0.0)
          	if (Float32(Float32(1.0) - u0) <= Float32(0.9959999918937683))
          		tmp = Float32(Float32(Float32(-alpha) * alpha) * log(Float32(Float32(1.0) - u0)));
          	else
          		tmp = Float32(fma(Float32(u0 * Float32(0.5)), Float32(u0 * alpha), Float32(u0 * alpha)) * alpha);
          	end
          	return tmp
          end
          
          \begin{array}{l}
          \mathbf{if}\;1 - u0 \leq 0.9959999918937683:\\
          \;\;\;\;\left(\left(-\alpha\right) \cdot \alpha\right) \cdot \log \left(1 - u0\right)\\
          
          \mathbf{else}:\\
          \;\;\;\;\mathsf{fma}\left(u0 \cdot 0.5, u0 \cdot \alpha, u0 \cdot \alpha\right) \cdot \alpha\\
          
          
          \end{array}
          
          Derivation
          1. Split input into 2 regimes
          2. if (-.f32 #s(literal 1 binary32) u0) < 0.995999992

            1. Initial program 56.2%

              \[\left(\left(-\alpha\right) \cdot \alpha\right) \cdot \log \left(1 - u0\right) \]

            if 0.995999992 < (-.f32 #s(literal 1 binary32) u0)

            1. Initial program 56.2%

              \[\left(\left(-\alpha\right) \cdot \alpha\right) \cdot \log \left(1 - u0\right) \]
            2. Step-by-step derivation
              1. Applied rewrites56.2%

                \[\leadsto \left(\left(-\log \left(1 - u0\right)\right) \cdot \left(-\left|\alpha\right|\right)\right) \cdot \left(-\left|\alpha\right|\right) \]
              2. Step-by-step derivation
                1. Applied rewrites56.2%

                  \[\leadsto \left(\log \left(1 - u0\right) \cdot \left(-\alpha\right)\right) \cdot \alpha \]
                2. Taylor expanded in u0 around 0

                  \[\leadsto \left(u0 \cdot \left(\alpha + \frac{1}{2} \cdot \left(\alpha \cdot u0\right)\right)\right) \cdot \alpha \]
                3. Step-by-step derivation
                  1. Applied rewrites87.1%

                    \[\leadsto \left(u0 \cdot \left(\alpha + 0.5 \cdot \left(\alpha \cdot u0\right)\right)\right) \cdot \alpha \]
                  2. Step-by-step derivation
                    1. Applied rewrites87.1%

                      \[\leadsto \mathsf{fma}\left(u0 \cdot 0.5, u0 \cdot \alpha, u0 \cdot \alpha\right) \cdot \alpha \]
                  3. Recombined 2 regimes into one program.
                  4. Add Preprocessing

                  Alternative 4: 96.7% accurate, 0.7× speedup?

                  \[\left(0.0001 \leq \alpha \land \alpha \leq 1\right) \land \left(2.328306437 \cdot 10^{-10} \leq u0 \land u0 \leq 1\right)\]
                  \[\begin{array}{l} \mathbf{if}\;1 - u0 \leq 0.9959999918937683:\\ \;\;\;\;\left(\left(-\alpha\right) \cdot \alpha\right) \cdot \log \left(1 - u0\right)\\ \mathbf{else}:\\ \;\;\;\;\left(u0 \cdot \left(\alpha + 0.5 \cdot \left(\alpha \cdot u0\right)\right)\right) \cdot \alpha\\ \end{array} \]
                  (FPCore (alpha u0)
                    :precision binary32
                    :pre (and (and (<= 0.0001 alpha) (<= alpha 1.0))
                       (and (<= 2.328306437e-10 u0) (<= u0 1.0)))
                    (if (<= (- 1.0 u0) 0.9959999918937683)
                    (* (* (- alpha) alpha) (log (- 1.0 u0)))
                    (* (* u0 (+ alpha (* 0.5 (* alpha u0)))) alpha)))
                  float code(float alpha, float u0) {
                  	float tmp;
                  	if ((1.0f - u0) <= 0.9959999918937683f) {
                  		tmp = (-alpha * alpha) * logf((1.0f - u0));
                  	} else {
                  		tmp = (u0 * (alpha + (0.5f * (alpha * u0)))) * alpha;
                  	}
                  	return tmp;
                  }
                  
                  real(4) function code(alpha, u0)
                  use fmin_fmax_functions
                      real(4), intent (in) :: alpha
                      real(4), intent (in) :: u0
                      real(4) :: tmp
                      if ((1.0e0 - u0) <= 0.9959999918937683e0) then
                          tmp = (-alpha * alpha) * log((1.0e0 - u0))
                      else
                          tmp = (u0 * (alpha + (0.5e0 * (alpha * u0)))) * alpha
                      end if
                      code = tmp
                  end function
                  
                  function code(alpha, u0)
                  	tmp = Float32(0.0)
                  	if (Float32(Float32(1.0) - u0) <= Float32(0.9959999918937683))
                  		tmp = Float32(Float32(Float32(-alpha) * alpha) * log(Float32(Float32(1.0) - u0)));
                  	else
                  		tmp = Float32(Float32(u0 * Float32(alpha + Float32(Float32(0.5) * Float32(alpha * u0)))) * alpha);
                  	end
                  	return tmp
                  end
                  
                  function tmp_2 = code(alpha, u0)
                  	tmp = single(0.0);
                  	if ((single(1.0) - u0) <= single(0.9959999918937683))
                  		tmp = (-alpha * alpha) * log((single(1.0) - u0));
                  	else
                  		tmp = (u0 * (alpha + (single(0.5) * (alpha * u0)))) * alpha;
                  	end
                  	tmp_2 = tmp;
                  end
                  
                  \begin{array}{l}
                  \mathbf{if}\;1 - u0 \leq 0.9959999918937683:\\
                  \;\;\;\;\left(\left(-\alpha\right) \cdot \alpha\right) \cdot \log \left(1 - u0\right)\\
                  
                  \mathbf{else}:\\
                  \;\;\;\;\left(u0 \cdot \left(\alpha + 0.5 \cdot \left(\alpha \cdot u0\right)\right)\right) \cdot \alpha\\
                  
                  
                  \end{array}
                  
                  Derivation
                  1. Split input into 2 regimes
                  2. if (-.f32 #s(literal 1 binary32) u0) < 0.995999992

                    1. Initial program 56.2%

                      \[\left(\left(-\alpha\right) \cdot \alpha\right) \cdot \log \left(1 - u0\right) \]

                    if 0.995999992 < (-.f32 #s(literal 1 binary32) u0)

                    1. Initial program 56.2%

                      \[\left(\left(-\alpha\right) \cdot \alpha\right) \cdot \log \left(1 - u0\right) \]
                    2. Step-by-step derivation
                      1. Applied rewrites56.2%

                        \[\leadsto \left(\left(-\log \left(1 - u0\right)\right) \cdot \left(-\left|\alpha\right|\right)\right) \cdot \left(-\left|\alpha\right|\right) \]
                      2. Step-by-step derivation
                        1. Applied rewrites56.2%

                          \[\leadsto \left(\log \left(1 - u0\right) \cdot \left(-\alpha\right)\right) \cdot \alpha \]
                        2. Taylor expanded in u0 around 0

                          \[\leadsto \left(u0 \cdot \left(\alpha + \frac{1}{2} \cdot \left(\alpha \cdot u0\right)\right)\right) \cdot \alpha \]
                        3. Step-by-step derivation
                          1. Applied rewrites87.1%

                            \[\leadsto \left(u0 \cdot \left(\alpha + 0.5 \cdot \left(\alpha \cdot u0\right)\right)\right) \cdot \alpha \]
                        4. Recombined 2 regimes into one program.
                        5. Add Preprocessing

                        Alternative 5: 96.7% accurate, 0.7× speedup?

                        \[\left(0.0001 \leq \alpha \land \alpha \leq 1\right) \land \left(2.328306437 \cdot 10^{-10} \leq u0 \land u0 \leq 1\right)\]
                        \[\begin{array}{l} \mathbf{if}\;1 - u0 \leq 0.9959999918937683:\\ \;\;\;\;\left(\log \left(1 - u0\right) \cdot \left(-\alpha\right)\right) \cdot \alpha\\ \mathbf{else}:\\ \;\;\;\;\left(u0 \cdot \left(\alpha + 0.5 \cdot \left(\alpha \cdot u0\right)\right)\right) \cdot \alpha\\ \end{array} \]
                        (FPCore (alpha u0)
                          :precision binary32
                          :pre (and (and (<= 0.0001 alpha) (<= alpha 1.0))
                             (and (<= 2.328306437e-10 u0) (<= u0 1.0)))
                          (if (<= (- 1.0 u0) 0.9959999918937683)
                          (* (* (log (- 1.0 u0)) (- alpha)) alpha)
                          (* (* u0 (+ alpha (* 0.5 (* alpha u0)))) alpha)))
                        float code(float alpha, float u0) {
                        	float tmp;
                        	if ((1.0f - u0) <= 0.9959999918937683f) {
                        		tmp = (logf((1.0f - u0)) * -alpha) * alpha;
                        	} else {
                        		tmp = (u0 * (alpha + (0.5f * (alpha * u0)))) * alpha;
                        	}
                        	return tmp;
                        }
                        
                        real(4) function code(alpha, u0)
                        use fmin_fmax_functions
                            real(4), intent (in) :: alpha
                            real(4), intent (in) :: u0
                            real(4) :: tmp
                            if ((1.0e0 - u0) <= 0.9959999918937683e0) then
                                tmp = (log((1.0e0 - u0)) * -alpha) * alpha
                            else
                                tmp = (u0 * (alpha + (0.5e0 * (alpha * u0)))) * alpha
                            end if
                            code = tmp
                        end function
                        
                        function code(alpha, u0)
                        	tmp = Float32(0.0)
                        	if (Float32(Float32(1.0) - u0) <= Float32(0.9959999918937683))
                        		tmp = Float32(Float32(log(Float32(Float32(1.0) - u0)) * Float32(-alpha)) * alpha);
                        	else
                        		tmp = Float32(Float32(u0 * Float32(alpha + Float32(Float32(0.5) * Float32(alpha * u0)))) * alpha);
                        	end
                        	return tmp
                        end
                        
                        function tmp_2 = code(alpha, u0)
                        	tmp = single(0.0);
                        	if ((single(1.0) - u0) <= single(0.9959999918937683))
                        		tmp = (log((single(1.0) - u0)) * -alpha) * alpha;
                        	else
                        		tmp = (u0 * (alpha + (single(0.5) * (alpha * u0)))) * alpha;
                        	end
                        	tmp_2 = tmp;
                        end
                        
                        \begin{array}{l}
                        \mathbf{if}\;1 - u0 \leq 0.9959999918937683:\\
                        \;\;\;\;\left(\log \left(1 - u0\right) \cdot \left(-\alpha\right)\right) \cdot \alpha\\
                        
                        \mathbf{else}:\\
                        \;\;\;\;\left(u0 \cdot \left(\alpha + 0.5 \cdot \left(\alpha \cdot u0\right)\right)\right) \cdot \alpha\\
                        
                        
                        \end{array}
                        
                        Derivation
                        1. Split input into 2 regimes
                        2. if (-.f32 #s(literal 1 binary32) u0) < 0.995999992

                          1. Initial program 56.2%

                            \[\left(\left(-\alpha\right) \cdot \alpha\right) \cdot \log \left(1 - u0\right) \]
                          2. Step-by-step derivation
                            1. Applied rewrites56.2%

                              \[\leadsto \left(\left(-\log \left(1 - u0\right)\right) \cdot \left(-\left|\alpha\right|\right)\right) \cdot \left(-\left|\alpha\right|\right) \]
                            2. Step-by-step derivation
                              1. Applied rewrites56.2%

                                \[\leadsto \left(\log \left(1 - u0\right) \cdot \left(-\alpha\right)\right) \cdot \alpha \]

                              if 0.995999992 < (-.f32 #s(literal 1 binary32) u0)

                              1. Initial program 56.2%

                                \[\left(\left(-\alpha\right) \cdot \alpha\right) \cdot \log \left(1 - u0\right) \]
                              2. Step-by-step derivation
                                1. Applied rewrites56.2%

                                  \[\leadsto \left(\left(-\log \left(1 - u0\right)\right) \cdot \left(-\left|\alpha\right|\right)\right) \cdot \left(-\left|\alpha\right|\right) \]
                                2. Step-by-step derivation
                                  1. Applied rewrites56.2%

                                    \[\leadsto \left(\log \left(1 - u0\right) \cdot \left(-\alpha\right)\right) \cdot \alpha \]
                                  2. Taylor expanded in u0 around 0

                                    \[\leadsto \left(u0 \cdot \left(\alpha + \frac{1}{2} \cdot \left(\alpha \cdot u0\right)\right)\right) \cdot \alpha \]
                                  3. Step-by-step derivation
                                    1. Applied rewrites87.1%

                                      \[\leadsto \left(u0 \cdot \left(\alpha + 0.5 \cdot \left(\alpha \cdot u0\right)\right)\right) \cdot \alpha \]
                                  4. Recombined 2 regimes into one program.
                                  5. Add Preprocessing

                                  Alternative 6: 87.1% accurate, 1.1× speedup?

                                  \[\left(0.0001 \leq \alpha \land \alpha \leq 1\right) \land \left(2.328306437 \cdot 10^{-10} \leq u0 \land u0 \leq 1\right)\]
                                  \[\left(u0 \cdot \left(\alpha + 0.5 \cdot \left(\alpha \cdot u0\right)\right)\right) \cdot \alpha \]
                                  (FPCore (alpha u0)
                                    :precision binary32
                                    :pre (and (and (<= 0.0001 alpha) (<= alpha 1.0))
                                       (and (<= 2.328306437e-10 u0) (<= u0 1.0)))
                                    (* (* u0 (+ alpha (* 0.5 (* alpha u0)))) alpha))
                                  float code(float alpha, float u0) {
                                  	return (u0 * (alpha + (0.5f * (alpha * u0)))) * alpha;
                                  }
                                  
                                  real(4) function code(alpha, u0)
                                  use fmin_fmax_functions
                                      real(4), intent (in) :: alpha
                                      real(4), intent (in) :: u0
                                      code = (u0 * (alpha + (0.5e0 * (alpha * u0)))) * alpha
                                  end function
                                  
                                  function code(alpha, u0)
                                  	return Float32(Float32(u0 * Float32(alpha + Float32(Float32(0.5) * Float32(alpha * u0)))) * alpha)
                                  end
                                  
                                  function tmp = code(alpha, u0)
                                  	tmp = (u0 * (alpha + (single(0.5) * (alpha * u0)))) * alpha;
                                  end
                                  
                                  \left(u0 \cdot \left(\alpha + 0.5 \cdot \left(\alpha \cdot u0\right)\right)\right) \cdot \alpha
                                  
                                  Derivation
                                  1. Initial program 56.2%

                                    \[\left(\left(-\alpha\right) \cdot \alpha\right) \cdot \log \left(1 - u0\right) \]
                                  2. Step-by-step derivation
                                    1. Applied rewrites56.2%

                                      \[\leadsto \left(\left(-\log \left(1 - u0\right)\right) \cdot \left(-\left|\alpha\right|\right)\right) \cdot \left(-\left|\alpha\right|\right) \]
                                    2. Step-by-step derivation
                                      1. Applied rewrites56.2%

                                        \[\leadsto \left(\log \left(1 - u0\right) \cdot \left(-\alpha\right)\right) \cdot \alpha \]
                                      2. Taylor expanded in u0 around 0

                                        \[\leadsto \left(u0 \cdot \left(\alpha + \frac{1}{2} \cdot \left(\alpha \cdot u0\right)\right)\right) \cdot \alpha \]
                                      3. Step-by-step derivation
                                        1. Applied rewrites87.1%

                                          \[\leadsto \left(u0 \cdot \left(\alpha + 0.5 \cdot \left(\alpha \cdot u0\right)\right)\right) \cdot \alpha \]
                                        2. Add Preprocessing

                                        Alternative 7: 87.1% accurate, 1.1× speedup?

                                        \[\left(0.0001 \leq \alpha \land \alpha \leq 1\right) \land \left(2.328306437 \cdot 10^{-10} \leq u0 \land u0 \leq 1\right)\]
                                        \[\left(\mathsf{fma}\left(0.5 \cdot \alpha, u0, \alpha\right) \cdot u0\right) \cdot \alpha \]
                                        (FPCore (alpha u0)
                                          :precision binary32
                                          :pre (and (and (<= 0.0001 alpha) (<= alpha 1.0))
                                             (and (<= 2.328306437e-10 u0) (<= u0 1.0)))
                                          (* (* (fma (* 0.5 alpha) u0 alpha) u0) alpha))
                                        float code(float alpha, float u0) {
                                        	return (fmaf((0.5f * alpha), u0, alpha) * u0) * alpha;
                                        }
                                        
                                        function code(alpha, u0)
                                        	return Float32(Float32(fma(Float32(Float32(0.5) * alpha), u0, alpha) * u0) * alpha)
                                        end
                                        
                                        \left(\mathsf{fma}\left(0.5 \cdot \alpha, u0, \alpha\right) \cdot u0\right) \cdot \alpha
                                        
                                        Derivation
                                        1. Initial program 56.2%

                                          \[\left(\left(-\alpha\right) \cdot \alpha\right) \cdot \log \left(1 - u0\right) \]
                                        2. Step-by-step derivation
                                          1. Applied rewrites56.2%

                                            \[\leadsto \left(\left(-\log \left(1 - u0\right)\right) \cdot \left(-\left|\alpha\right|\right)\right) \cdot \left(-\left|\alpha\right|\right) \]
                                          2. Step-by-step derivation
                                            1. Applied rewrites56.2%

                                              \[\leadsto \left(\log \left(1 - u0\right) \cdot \left(-\alpha\right)\right) \cdot \alpha \]
                                            2. Taylor expanded in u0 around 0

                                              \[\leadsto \left(u0 \cdot \left(\alpha + \frac{1}{2} \cdot \left(\alpha \cdot u0\right)\right)\right) \cdot \alpha \]
                                            3. Step-by-step derivation
                                              1. Applied rewrites87.1%

                                                \[\leadsto \left(u0 \cdot \left(\alpha + 0.5 \cdot \left(\alpha \cdot u0\right)\right)\right) \cdot \alpha \]
                                              2. Step-by-step derivation
                                                1. Applied rewrites87.1%

                                                  \[\leadsto \left(\mathsf{fma}\left(0.5 \cdot \alpha, u0, \alpha\right) \cdot u0\right) \cdot \alpha \]
                                                2. Add Preprocessing

                                                Alternative 8: 74.3% accurate, 1.8× speedup?

                                                \[\left(0.0001 \leq \alpha \land \alpha \leq 1\right) \land \left(2.328306437 \cdot 10^{-10} \leq u0 \land u0 \leq 1\right)\]
                                                \[\left(\left(-u0\right) \cdot \left(-\alpha\right)\right) \cdot \alpha \]
                                                (FPCore (alpha u0)
                                                  :precision binary32
                                                  :pre (and (and (<= 0.0001 alpha) (<= alpha 1.0))
                                                     (and (<= 2.328306437e-10 u0) (<= u0 1.0)))
                                                  (* (* (- u0) (- alpha)) alpha))
                                                float code(float alpha, float u0) {
                                                	return (-u0 * -alpha) * alpha;
                                                }
                                                
                                                real(4) function code(alpha, u0)
                                                use fmin_fmax_functions
                                                    real(4), intent (in) :: alpha
                                                    real(4), intent (in) :: u0
                                                    code = (-u0 * -alpha) * alpha
                                                end function
                                                
                                                function code(alpha, u0)
                                                	return Float32(Float32(Float32(-u0) * Float32(-alpha)) * alpha)
                                                end
                                                
                                                function tmp = code(alpha, u0)
                                                	tmp = (-u0 * -alpha) * alpha;
                                                end
                                                
                                                \left(\left(-u0\right) \cdot \left(-\alpha\right)\right) \cdot \alpha
                                                
                                                Derivation
                                                1. Initial program 56.2%

                                                  \[\left(\left(-\alpha\right) \cdot \alpha\right) \cdot \log \left(1 - u0\right) \]
                                                2. Taylor expanded in u0 around 0

                                                  \[\leadsto \left(\left(-\alpha\right) \cdot \alpha\right) \cdot \left(-1 \cdot u0\right) \]
                                                3. Step-by-step derivation
                                                  1. Applied rewrites74.3%

                                                    \[\leadsto \left(\left(-\alpha\right) \cdot \alpha\right) \cdot \left(-1 \cdot u0\right) \]
                                                  2. Step-by-step derivation
                                                    1. Applied rewrites74.3%

                                                      \[\leadsto \left(\left(-u0\right) \cdot \left(-\alpha\right)\right) \cdot \alpha \]
                                                    2. Add Preprocessing

                                                    Alternative 9: 74.3% accurate, 2.4× speedup?

                                                    \[\left(0.0001 \leq \alpha \land \alpha \leq 1\right) \land \left(2.328306437 \cdot 10^{-10} \leq u0 \land u0 \leq 1\right)\]
                                                    \[\left(\alpha \cdot \alpha\right) \cdot u0 \]
                                                    (FPCore (alpha u0)
                                                      :precision binary32
                                                      :pre (and (and (<= 0.0001 alpha) (<= alpha 1.0))
                                                         (and (<= 2.328306437e-10 u0) (<= u0 1.0)))
                                                      (* (* alpha alpha) u0))
                                                    float code(float alpha, float u0) {
                                                    	return (alpha * alpha) * u0;
                                                    }
                                                    
                                                    real(4) function code(alpha, u0)
                                                    use fmin_fmax_functions
                                                        real(4), intent (in) :: alpha
                                                        real(4), intent (in) :: u0
                                                        code = (alpha * alpha) * u0
                                                    end function
                                                    
                                                    function code(alpha, u0)
                                                    	return Float32(Float32(alpha * alpha) * u0)
                                                    end
                                                    
                                                    function tmp = code(alpha, u0)
                                                    	tmp = (alpha * alpha) * u0;
                                                    end
                                                    
                                                    \left(\alpha \cdot \alpha\right) \cdot u0
                                                    
                                                    Derivation
                                                    1. Initial program 56.2%

                                                      \[\left(\left(-\alpha\right) \cdot \alpha\right) \cdot \log \left(1 - u0\right) \]
                                                    2. Taylor expanded in u0 around 0

                                                      \[\leadsto u0 \cdot \left(\frac{1}{2} \cdot \left({\alpha}^{2} \cdot u0\right) + {\alpha}^{2}\right) \]
                                                    3. Step-by-step derivation
                                                      1. Applied rewrites87.1%

                                                        \[\leadsto u0 \cdot \mathsf{fma}\left(0.5, {\alpha}^{2} \cdot u0, {\alpha}^{2}\right) \]
                                                      2. Step-by-step derivation
                                                        1. Applied rewrites87.2%

                                                          \[\leadsto u0 \cdot \mathsf{fma}\left(-\left(-\left(-\left|\alpha\right|\right)\right), -\left(-\left(-\left|\alpha\right|\right)\right), \left(--0.5 \cdot \left(\alpha \cdot \alpha\right)\right) \cdot u0\right) \]
                                                        2. Taylor expanded in alpha around inf

                                                          \[\leadsto u0 \cdot \left({\alpha}^{4} \cdot {\left(\left|\frac{1}{\alpha}\right|\right)}^{2}\right) \]
                                                        3. Step-by-step derivation
                                                          1. Applied rewrites74.0%

                                                            \[\leadsto u0 \cdot \left({\alpha}^{4} \cdot {\left(\left|\frac{1}{\alpha}\right|\right)}^{2}\right) \]
                                                          2. Applied rewrites74.3%

                                                            \[\leadsto \left(\alpha \cdot \alpha\right) \cdot u0 \]
                                                          3. Add Preprocessing

                                                          Reproduce

                                                          ?
                                                          herbie shell --seed 2026070 
                                                          (FPCore (alpha u0)
                                                            :name "Beckmann Distribution sample, tan2theta, alphax == alphay"
                                                            :precision binary32
                                                            :pre (and (and (<= 0.0001 alpha) (<= alpha 1.0)) (and (<= 2.328306437e-10 u0) (<= u0 1.0)))
                                                            (* (* (- alpha) alpha) (log (- 1.0 u0))))