TL;DR this article compares the performance of mattn/sqlite3 (a wrapper around sqlite's C code, requiring `cgo` to compile and embed) with modernc.org/sqlite, an automatic conversion of C to Go code, and finds that the "native" Go code is half as fast as the cgo version.
Which brings us to the obvious question, what improvements can be made? What if the Go code was handcrafted instead of automatically generated?
https://gitlab.com/cznic/sqlite/-/issues/39 is one issue from a year ago where they talk about insert performance and some optimizations as well, so I believe the author is aware of it.
Anyway, I hope a native Go version does pick up (for everything currently depending on cgo for that matter), it makes cross-compilation a lot easier.
Yeah, this seemed fishy to me too (I made a similar comment here: https://www.reddit.com/r/golang/comments/uo5mix/comment/i8dn...). It seems the mattn/cgo version is doing a constant amount of work, but the modernc/non-cgo version is doing an amount of work linear(ish) to the number of rows. The latter makes more sense for this query, so I wonder if there's something wrong here?
How could one be doing constant work and the other O(rows) work if it's the same code (just compiled from one language to another)?
I also thought the sub-ms numbers can't be right — SQLite is fast but not that fast on millions of rows, but didn't look into it until now.
Turns out the benchmarking code is wrong: it didn't read the rows returned from db.Query, so the mattn version simply didn't wait for the results to arrive. Once you apply this patch:
diff --git a/cgo/main.go b/cgo/main.go
index 8796b3d..9a74a2f 100644
--- a/cgo/main.go
+++ b/cgo/main.go
@@ -82,11 +82,15 @@ CREATE TABLE people (
panic(err)
}
}
- fmt.Printf("%f,%d,insert,cgo\n", float64(time.Now().Sub(t1)) / 1e9, rows)
+ fmt.Printf("%f,%d,insert,cgo\n", float64(time.Now().Sub(t1))/1e9, rows)
t1 = time.Now()
- _, err = db.Query("SELECT COUNT(1), age FROM people GROUP BY age ORDER BY COUNT(1) DESC")
- fmt.Printf("%f,%d,group_by,cgo\n", float64(time.Now().Sub(t1)) / 1e9, rows)
+ res, _ := db.Query("SELECT COUNT(1), age FROM people GROUP BY age ORDER BY COUNT(1) DESC")
+ for res.Next() {
+ var count, age int
+ _ = res.Scan(&count, &age)
+ }
+ fmt.Printf("%f,%d,group_by,cgo\n", float64(time.Now().Sub(t1))/1e9, rows)
}
}
}
modernc SELECT performance becomes pretty comparable, actually a little bit faster than mattn on my Intel Mac with high row count.
Not only that, modernc INSERT is noticeably faster on my Intel Mac...
Good point, thanks for checking out the code! I replaced db.Query with a db.Exec for the SELECT count (just to avoid the row iteration/deserialization) and I'm seeing closer performance.
I don't hope someone is going to attempt to rewrite sqlite in go. The C version is very well tested. A rewrite can only add bugs. Automatic transcription is the best way forward, the performance hit (when you want to avoid cgo) be damned.
I do suppose that the group_by performance can be improved by working on the c->go compiler, though. I think that's a more worthwhile effort than rewriting sqlite.
> Automatic transcription is the best way forward, the performance hit (when you want to avoid cgo) be damned
But if performance is worse than the C wrapper, there is surely no point whatsoever for such a native version to exist? Isn't the whole point of a native Go/whatever version to avoid the interop penalty?
> The C version is very well tested
Couldn't the same test suite be used for a Go/whatever rewrite, or at least be ported to Go too?
> Couldn't the same test suite be used for a Go/whatever rewrite, or at least be ported to Go too?
And then port each and every update as well? The only use of such a rewrite is performance improvement in go programs. Surely improving the c-to-go compiler is a better investment of effort? It would benefit other projects too. Rewriting sqlite in go will only lead to less effort on the c-to-go compiler.
Dumb question, on which system with a go compiler is a C compiler hard to install?
Most *nix distros either come with gcc (and/ or clang) already or have it easily available as a package. AFAIK the same is applicable for MacOS with homebrew.
Is this a problem for windows users or am I overlooking someone? I'm genuinly interested since I've recently started with Go and would like to use cgo in the future.
My primary use case is windows and embedded.
Also sometimes while the compiler is easy to install, make c compilation work seamlessly across multiple machine with unreliable vendor toolchain or weird path issue where the compiler can't find the correct header/library are not that uncommon.
Thank you for your response and perspective. I didn't consider the various toolchains and environments from this POV. Your points make sense and are very helpful.
I agree. And I work at https://github.com/goplus/c2go recently. Its goal is converting any C project into Go without any human intervention and keeping performance close to C.
Which brings us to the obvious question, what improvements can be made? What if the Go code was handcrafted instead of automatically generated?
https://gitlab.com/cznic/sqlite/-/issues/39 is one issue from a year ago where they talk about insert performance and some optimizations as well, so I believe the author is aware of it.
Anyway, I hope a native Go version does pick up (for everything currently depending on cgo for that matter), it makes cross-compilation a lot easier.