Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

TL;DR this article compares the performance of mattn/sqlite3 (a wrapper around sqlite's C code, requiring `cgo` to compile and embed) with modernc.org/sqlite, an automatic conversion of C to Go code, and finds that the "native" Go code is half as fast as the cgo version.

Which brings us to the obvious question, what improvements can be made? What if the Go code was handcrafted instead of automatically generated?

https://gitlab.com/cznic/sqlite/-/issues/39 is one issue from a year ago where they talk about insert performance and some optimizations as well, so I believe the author is aware of it.

Anyway, I hope a native Go version does pick up (for everything currently depending on cgo for that matter), it makes cross-compilation a lot easier.



> and finds that the "native" Go code is half as fast as the cgo version.

That's not what TFA finds.

What it finds is that INSERT may be about half as fast, but SELECT with a GROUP BY is ridiculously slow with modernc as your row count grows:

  # rows   mattn avg (s)  modernc avg (s)
  10000    0.000048       0.003762
  479827   0.000048       0.230283
  4798270  0.000051       2.791617
I wish more usage patterns were tested.


Yeah, this seemed fishy to me too (I made a similar comment here: https://www.reddit.com/r/golang/comments/uo5mix/comment/i8dn...). It seems the mattn/cgo version is doing a constant amount of work, but the modernc/non-cgo version is doing an amount of work linear(ish) to the number of rows. The latter makes more sense for this query, so I wonder if there's something wrong here?

How could one be doing constant work and the other O(rows) work if it's the same code (just compiled from one language to another)?


I also thought the sub-ms numbers can't be right — SQLite is fast but not that fast on millions of rows, but didn't look into it until now.

Turns out the benchmarking code is wrong: it didn't read the rows returned from db.Query, so the mattn version simply didn't wait for the results to arrive. Once you apply this patch:

  diff --git a/cgo/main.go b/cgo/main.go
  index 8796b3d..9a74a2f 100644
  --- a/cgo/main.go
  +++ b/cgo/main.go
  @@ -82,11 +82,15 @@ CREATE TABLE people (
        panic(err)
       }
      }
  -   fmt.Printf("%f,%d,insert,cgo\n", float64(time.Now().Sub(t1)) / 1e9, rows)
  +   fmt.Printf("%f,%d,insert,cgo\n", float64(time.Now().Sub(t1))/1e9, rows)
   
      t1 = time.Now()
  -   _, err = db.Query("SELECT COUNT(1), age FROM people GROUP BY age ORDER BY COUNT(1) DESC")
  -   fmt.Printf("%f,%d,group_by,cgo\n", float64(time.Now().Sub(t1)) / 1e9, rows)
  +   res, _ := db.Query("SELECT COUNT(1), age FROM people GROUP BY age ORDER BY COUNT(1) DESC")
  +   for res.Next() {
  +    var count, age int
  +    _ = res.Scan(&count, &age)
  +   }
  +   fmt.Printf("%f,%d,group_by,cgo\n", float64(time.Now().Sub(t1))/1e9, rows)
     }
    }
   }
modernc SELECT performance becomes pretty comparable, actually a little bit faster than mattn on my Intel Mac with high row count.

Not only that, modernc INSERT is noticeably faster on my Intel Mac...


Good point, thanks for checking out the code! I replaced db.Query with a db.Exec for the SELECT count (just to avoid the row iteration/deserialization) and I'm seeing closer performance.

I'll post an update and credit you.


It's up now, thank you! The result is that INSERTs are still the same on my machine but SELECTs are at worst twice as bad and at best 10% as bad.


The code is on Github, links in the post. You can add whatever more usage patterns you'd like!


Unfortunately the benchmark isn't measuring SELECT performance right, see my other post https://news.ycombinator.com/item?id=31366759.


I don't hope someone is going to attempt to rewrite sqlite in go. The C version is very well tested. A rewrite can only add bugs. Automatic transcription is the best way forward, the performance hit (when you want to avoid cgo) be damned.

I do suppose that the group_by performance can be improved by working on the c->go compiler, though. I think that's a more worthwhile effort than rewriting sqlite.


> Automatic transcription is the best way forward, the performance hit (when you want to avoid cgo) be damned

But if performance is worse than the C wrapper, there is surely no point whatsoever for such a native version to exist? Isn't the whole point of a native Go/whatever version to avoid the interop penalty?

> The C version is very well tested

Couldn't the same test suite be used for a Go/whatever rewrite, or at least be ported to Go too?


> there is surely no point whatsoever for such a native version to exist

If you want to avoid cgo. There are two reasons for that: easy cross-compilation, and https://dave.cheney.net/2016/01/18/cgo-is-not-go

> Couldn't the same test suite be used for a Go/whatever rewrite, or at least be ported to Go too?

And then port each and every update as well? The only use of such a rewrite is performance improvement in go programs. Surely improving the c-to-go compiler is a better investment of effort? It would benefit other projects too. Rewriting sqlite in go will only lead to less effort on the c-to-go compiler.


> Isn't the whole point of a native Go/whatever version to avoid the interop penalty?

CGO require a c compiler (not always easy available) and make cross-compilation harder


Dumb question, on which system with a go compiler is a C compiler hard to install? Most *nix distros either come with gcc (and/ or clang) already or have it easily available as a package. AFAIK the same is applicable for MacOS with homebrew. Is this a problem for windows users or am I overlooking someone? I'm genuinly interested since I've recently started with Go and would like to use cgo in the future.


My primary use case is windows and embedded. Also sometimes while the compiler is easy to install, make c compilation work seamlessly across multiple machine with unreliable vendor toolchain or weird path issue where the compiler can't find the correct header/library are not that uncommon.


Thank you for your response and perspective. I didn't consider the various toolchains and environments from this POV. Your points make sense and are very helpful.


It also makes it harder to debug or do performance profiling. Or to manage builds in CI/CD.. there is a lot of cost to CGO.

I still would not rewrite SQLite. It's just too good how it is.


I agree. And I work at https://github.com/goplus/c2go recently. Its goal is converting any C project into Go without any human intervention and keeping performance close to C.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: