A Cheap Sine Implementation

Jan 03, 2024

Introduction

I recently took a shot at implementing a custom sine function in go, to compare it’s performance against the built-in sine implementation. It trades accuracy for speed.

How Did I Implement It?

One interesting result of introductory Calculus is Taylor’s Theorem which allows you to approximate a differentiable function (like $sin(x)$) with a polynomial, called a Taylor polynomial. If the function is $k$-times differentiable, you can approximate the function with up to a $k$-degree polynomial.

In particular, if we calculate the Taylor polynomial centered around $x=0$, we get what’s referred to as a Maclaurin series expansion which looks like this for $sin(x)$:

sin(x) \approx x - \frac{1}{6}x^3 + \frac{1}{120}x^5 - \frac{1}{5040}x^7 + ...

In my implementation, I take the first 3 terms and use that as an approximation for $sin(x)$. As a result, we end up with an implementation that only uses a few multiplication, division, and addition operations.

How Fast

On my Apple M1, here’s my benchmark results:

goos: darwin
goarch: arm64
pkg: chipbell4.github.com/m/v2
BenchmarkOriginalSine-8         255438804                4.412 ns/op
BenchmarkFastSine-8             1000000000               0.3128 ns/op
PASS
ok      chipbell4.github.com/m/v2       2.132s

So, this looks like a ~14x speed-up.

How Off Are We?

The entrypoint main.go prints a table of relative errors. At π / 2, we end up with an error of ~0.45%. Here’s a few selected values:

Angle (radians)	Relative Error
0	0
π / 8	0.000075%
π / 4	0.005129%
3π / 8	0.066355%
π / 2	0.45%

We never get above 1% error.

How Does math.Sin Even Work Anyways? Are we better off?

The original source is here, https://cs.opensource.google/go/go/+/refs/tags/go1.21.5:src/math/sin.go

But if you do some reading, they’re doing a few things worth noting:

They’re using the same approximation I’m using: Taylor series expansion around $x=0$.
However, they’re doing some work to handle the modular nature of trigonometric functions, which I am not.
They also will call a system-specific sine function if it’s available.

Then Why are We That Much Faster?

When I compile my OriginalSine function using Godbolt, the following bit of assembly seems to be the culprit:

CALL    math.sin(SB)

It looks like the code is making a call to an external sine (probably system-specific) implementation. I imagine that overhead is the majority of runtime cost. I suspect there’s some compile flags to inline that function that would minimize or eliminate some of that overhead.

So What?

Well, this may be helpful in a game development setting where minor optimizations can sometimes add up to improved framerate (like the fast inverse square root algorithm from Quake III). But there’s some caveats:

This implementation doesn’t currently handle the modular nature of sine. We’d have to add some code to support that.
The accuracy is lower, which may not be acceptable for certain applications.

Either way, it was pretty fun to build! The final code is here if you’re interested.