I think that the challenge for most is that the PostGIS query planner does the indexing for you in most queries, while a naive all-pairs comparison in geopandas/shapely won't tell you to use the .sindex attribute instead.
But, many R tools are already vectorised, so your shift from lapply() to mclapply() is about as fair a comparison as claiming it's "just" a shift from python's builtin map() to pool.map(). Anybody can play this game, and it's not helpful. I've been using+teaching R now for nearly seven years and the number of times I've used lapply can be counted on one hand.
I use sapply all the time to transform data all the time. It tends to be less code (no counter, no output initialisation ) and easier to follow if that style is familiar.
I wonder whether this just creates another "underclass" of scientific laborers... I think we need tool-builders as academics. The only thing that academics recognize intellectually are peers. The rise of RSEs is a kind of intellectual outsourcing... implementation of your science should be part-in-parcel to its creation, not deferred to second-class non-academic "technicians." The friends and former students I've talked to who've entered RSE careers have largely been treated as second-class citizens in a research environment that they are integral to!
I think, like many things, the buck stops with academia itself, its metrics, demands, and incentives. We need more research engineering/science about science academics within compute-intensive science departments themselves. Things like JoSS [1] or Scientific Data [2] are awesome first steps at addressing this.
This also sounds a bit like Planetside 2 [1], which had a similar structure. A relatively large open world where small "provinces" were contested by factions in FPS King-of-the-Hill combat. This meant that any one province action was a part of a larger "front," across which factions would often mass & press offensives. Capturing the entire map led to some kind of reward, and then a reset iirc.
Nothing like rolling up in an APC with 12 people in voice chat on the tip of the spear, or coordinating an entire battery of MAXs keeping the skies clear. Some of the best gaming in general I've ever experienced. Gradually, though, pay to win mechanics pushed me away, and I've not played since 2014.
No, nulls matter a great deal. If you want to test a claim in Null Hypothesis Statistical Testing, the "significance" of the claim is in direct reference to the null. Changing a null will change the significance of the alternative. My favorite statement of this is from Gelman:
> the p-value is a strongly nonlinear transformation of data that is interpretable only under the null hypothesis, yet the usual purpose of the p-value in practice is to reject the null. My criticism here is not merely semantic or a clever tongue-twister or a “howler” (as Deborah Mayo would say); it’s real. In settings where the null hypothesis is not a live option, the p-value does not map to anything relevant.
Given the explosion in the number of journals and the impossibility of effective peer review, being published in a journal does not mean what it used to. This is part of the material drivers for the replication crisis (journals can no longer effectively gatekeep scientific validity), but it also reflects something real about the practice of science: little social cliques come up with pet theories and, over time, "fight" with these theories on epistemic common ground. The successful ones, we'd like to think, are the ones that last the most rounds in the fight, but that probably only holds in the long run. Contradiction, in itself, is normal (and was before!)
We use jupyterbook (geographicdata.science/book), and it has seriously simplified our workflow. The project is building useful features very quickly, and is very responsive to feedback & requests. Big props to their team.
I disagree that it's a "true horror to use." The set of supported built-in classes grows significantly by the day. It's not as good when used for wholly unstructured streams of data (e.g. tuples of mixed type, dicts with complex objects inside of them), but if you can spend the design time to arange things in a structured manner, it's super easy to use, and can seriously boost performance on simple algos.
I've had a ton of success using it in statistical and computational geometry applications.
Let me explain my line of reasoning here (been there at least three times, situation gave 3 different outcomes) :
- case 1 : need to make calculations with a specialized library in C (precision arithmetics). Build a bare FFI, later replaced with CFFI.
- case 2 : loop-intensive on very simple calculations, time-constrained development. Identified as horrible performance in python : tried numba didn't work, ended up using cython, worked really well.
- case 3 : optimize numpy-intensive routine for performance. Tried numba, didn't work. Too expensive to recode in cython. Look at numpy C/C++ interfaces, and numpy-friendly C++ alternatives. Also try to trick numpy to function better (do not ever try this, it'll give worse results). Ended up doing nothing as time to develop was the main constraint here.
If you have a lot of time and work on a small program, maybe you can spend the time to optimize. Team producing lots of complicated algorithms and no way to re-develop everything, stick to python, identify perf losses and choose wisely what you'll optimize.
Numba has progressed but is still not a "drop-in decorator" as advertized. Can even give worse performance in some cases. Nevertheless the idea is good and I praise the effort, when it's done it'll be massive !
> This situation is very annoying, especially for a touch typist as my fingers are always on hjkl and my thumb on the spacebar. This makes my thumb knuckle constantly brush the trackpad and activate it.
I thought standard touch typing position was index fingers on the notches (f & j in US QWERTY). With that & some fairly large hands, the trackpad rests comfortably outside of my palm area? the trackpad seems pretty well-designed to just miss the palm in standard position.
I think this is exactly how R's accreted features to make it do things that it was never designed to do. When the "tool you use" becomes the "kind of developer you are," things get a little restrictive.
It's also how languages like Scala become "everything and the kitchen sink" multi-paradigm monsters. People making the tool want it to do everything, so that they can get all the developers, so they make it a functional object-oriented imperative procedural declarative hodgepodge of 15 different barely-mutually-operable sub-languages, and you get a mess.
https://geopandas.org/en/stable/docs/reference/sindex.html
I think that the challenge for most is that the PostGIS query planner does the indexing for you in most queries, while a naive all-pairs comparison in geopandas/shapely won't tell you to use the .sindex attribute instead.