I've been using R nonstop for pretty much 5+ years. I'm happy that there's estab...

veddox · on Jan 18, 2021

I deeply loath R for its terrible type idiosyncracies, syntax, and slowness.

However, even I must admit that it is incredibly good at what it was meant to do - analyse and display data. (And yes, the tidyverse is a huge improvement of the syntax, although it's telling that they basically reinvented the language to do so.)

As an ecological modeller, I create my actual simulation models in Julia, because it is a much, much better language for any real programming. But I still analyse the output in R.

dm319 · on Jan 18, 2021

I don't understand how people can loath R. If you take a functional approach, especially using pipes, dplyr and a split, apply, combine style, it is quite beautiful. Much nicer than trying to, say, divide a time period by an integer in Go.

kescobo · on Jan 19, 2021

> If you take a functional approach, especially using pipes, dplyr and a split, apply, combine style, it is quite beautiful

Sure, but what if you don't? Sometimes, this is the right way to do things, other times there are other approaches that are more natural/beautiful. In many cases, a loop with conditionals is much easier to understand.

grayclhn · on Jan 18, 2021

I use a lot of R, and like many aspects of it. But the fact that `f(stop("Hi!"))` may or may not throw an error depending on the internals of `f` is a little maddening. (And there are tons of similar issues.)

_v7gu · on Jan 19, 2021

Isn't that just lazy evaluation?

CoolGuySteve · on Jan 18, 2021

When it comes to data wrangling, one huge advantage of Julia over tidyverse/R dataframes/Pandas is that you can write a damn for loop and it won't be brutally slow.

It's so much simpler and faster to use a loop that says "pick this row only if this and that and this other thing are sometimes true" vs having to construct an algebra of column filters to do the same.

laplacesdemon48 · on Jan 18, 2021

I think that is absolutely a fair criticism. Personally, I rarely run into an issue where I absolutely am bottlenecked by a slow loop. But this sort of thing drew me to Julia in the first place.

There was also an R update in ~2017 that introduced some JIT speed-ups for loops, which made a noticeable difference.

If this is a problem you run into often, I suggest converting your object to a data.table. You can pass a function row-wise over the object very quickly:

https://stackoverflow.com/questions/25431307/r-data-table-ap...

dm319 · on Jan 18, 2021

I think loops are not ideal for data analysis. They are prone to human error, especially ones that modify the data, and in a way that can be hard to sort (i.e.iterating over the dimensions of the wrong object). A stepwise creation of new logical fields using mutate, and then a vectorised ifelse command is more robust and you can clearly see steps of the logic.

tikej · on Jan 18, 2021

I also like pipe syntax and I've found there is nice support for it in Julia. There are some nice packages to improve it over base [1].

Have you checked queryverse [2]?

[1] https://github.com/jkrumbiegel/Chain.jl [2] https://www.queryverse.org

laplacesdemon48 · on Jan 18, 2021

I haven't heard of queryverse, thank you for that. This also brings up a good point I wanted to highlight.

I get that Julia is a young language with a growing ecosystem. But the lack of "one obvious way to do something" may scare new users away.

"I want to quickly wrangle data. Do I use Query.jl, DataFramesMeta.jl, SplitApplyCombine.jl or something else?"

"I need pipes to help me wrangle data more efficiently do I use Base Julia, Chain.jl, Pipe.jl, or Lazy.jl?"

For a new R user it seems so much simpler:

1. run "library(dplyr)" 2. Google "how to XYZ in dplyr" 3. ??? 4. Profit

oxinabox · on Jan 18, 2021

I mean I get yout point. Julia has a bit of a Lisp's Curse http://winestockwebdesign.com/Essays/Lisp_Curse.html Writing a performant and easy to use data wrangling library for R is a bunch of work and means dealing with C/C++ etc. So few people are willing to do so, and just contribute to a small number of libraries like dplyr. (I feel like there are at least 2 other major compeditors to that in R?) Where as in julia it's really easy to write a new data wrangling library. Its just not that much work. So people: A) do it for just fun / student projects (None of those ones are though). B) do it because they have a nontrivially resolvable opinion (e.g. Queryverse has a marginally more performant but marginally harder to use system for missing data)

Nice thing about julia, especially for tabular data (thanks to Tables.jl), is everything works together. It's actually completely possible to mix and match all of those libraries in a single data processing pipeline. Which while is generally a weird thing to do, it does mean if you have a external package uses any of them it works into a pipeline of another. (One common case is that queryverse has CSVFiles.jl, but CSV.jl actually is generally faster, and you can just swap one for ther other, inside a Query.jl pipeline)

I absolutely argee this makes learning harder.

---

Also that particular example:

> "I need pipes to help me wrangle data more efficiently do I use Base Julia, Chain.jl, Pipe.jl, or Lazy.jl?"

It's piping. Something would have to massively be screwed up if any of those options were more or less efficient than the others. The only question is what semantics do you want. Each is pretty opinionated about how piping should look.

kazinator · on Jan 18, 2021

The Lisp Curse was written by then inexperienced web developer, with (then, and likely now still) zero Lisp experience, based on extrapolating something he read about Lisp in an essay by Mark Tarver. He prefers it not be submitted to HN due to the embarrassment, yet for some reason keeps the article up (probably because it generates traffic).

phonebucket · on Jan 18, 2021

> For a new R user it seems so much simpler:

> 1. run "library(dplyr)" 2. Google "how to XYZ in dplyr" 3. ??? 4. Profit

I beg to differ here. There’s much to be said for using data.table and base R instead of the tidyverse.

This article is worth a read in my view: https://github.com/matloff/TidyverseSkeptic

disgruntledphd2 · on Jan 18, 2021

Yeah, NSE (non-standard evaluation) is really annoying to work with in dplyr/tidyverse codebases, and this definitely inhibits people from building on top of them.

They are an 80% solution for a lot of data analytic needs, but base-R is 100% the right choice if you want your code to run for a long time without needing updates.

I've never really gotten into data.table for some reason, normally dplyr is fast enough, or I'm using something more efficient than R.

arduanika · on Jan 18, 2021

What a constructive, positive, down-to-earth, well-written comment, and what a nice reprieve from everything that's broken about the tone of web discussions these days. You point out that there's still another player in this space (R), but not in a way that's whiny, dismissive, or doctrinaire, and you celebrate the healthy competition. You suggest a streamlined path toward Julia ecosystem maturity, rooted in real-world needs. Nicely done!

I have no real dog in this fight, but I hope Julia team members (and/or aspiring Julia ecosystem contributors) will read and consider your point.

DNF2 · on Jan 18, 2021

This whole thread seems to be quite civilized. I can see no name-calling or off-topic rants, only a frank exchange of opinions, mixed in with some facts.

Your post seem to indicate that there is some sort of 'fight' going on, or that the tone is broken. I disagree. If most web discussions were like this one, we would have fewer problems in this world.

arduanika · on Jan 18, 2021

Oh, that's exactly what I mean -- when I say "everything that's broken about the tone of web discussions these days", I'm talking about threads and topics other than this one. I don't see any 'fight' here, and that's what's so refreshing.

DNF2 · on Jan 18, 2021

All right! I got the impression you were contrasting that particular post with the rest of this discussion, but apparently not. Still slightly confused here. Oh well, carry on.

j7ake · on Jan 18, 2021

Another big thing that R has an edge over python (and I guess Julia, but not sure) is making quick yet presentable plots of data that contain different factors that you want to show together. The matplotlib equivalent requires tracking different indices and manually adding layers for different indices.

amkkma · on Jan 18, 2021

Julia has plenty of plotting solutions that are better for stats than matplotlib:

https://github.com/JuliaPlots/AlgebraOfGraphics.jl https://github.com/queryverse/VegaLite.jl https://github.com/JuliaPlots/StatsPlots.jl

nextos · on Jan 19, 2021

Gadfly is amazing too, and Makie is the future.

laplacesdemon48 · on Jan 18, 2021

Here's a neat website that captures this: https://www.r-graph-gallery.com/

If you click into any of the plots and scroll down you can see how little code is needed for most of these plots.

For example: https://www.r-graph-gallery.com/135-stacked-density-graph.ht...

notagoodidea · on Jan 18, 2021

I worked with R and Python during the last 3 years but learning and dabbling with Julia since 0.6. Since the availability of [PyCall.jl] and [RCall.jl], the transition to Julia can already be easier for Python/R users.

I agree that most of the time data wrangling is super confortable in R due to the syntax flexibility exploited by the big packages (tidyverse/data.table/etc). At the same time, Julia and R share a bigger heritage from Lisp influence that with Python, because R is also a Lisp-ish language (see [Advanced R, Metaprogramming]). My main grip from the R ecosystem is not that most of the perfomance sensitive packages are written in C/C++/Fortran but are written so deeply interconnect with the R environment that porting them to Julia that provide also an easy and good interface to C/C++/Fortran (and more see [Julia Interop] repo) seems impossible for some of them.

I also think that Julia reach to broader scientific programming public than R, where it overlaps with Python sometimes but provides the Matlab/Octave public with an better alternative. I don't expected to see all the habits from those communities merge into Julia ecosystem. On the other side, I think that Julia bigger reach will avoid to fall into the "base" vs "tidyverse" vs "something else in-between" that R is now.

[PyCall.jl]: https://github.com/JuliaPy/PyCall.jl

[RCall.jl]: https://github.com/JuliaInterop/RCall.jl

[Julia Interop]: https://github.com/JuliaInterop

[Advanced R, Metaprogramming] by Hadley Wickham: https://adv-r.hadley.nz/metaprogramming.html

kescobo · on Jan 19, 2021

Out of curiosity, when was the last time you looked at DataFrames.jl? A huge amount has happened in the last year. Plus, if you want more tidy-like syntax, you can go with Query.jl, (or DataFramesMeta.jl, though that isn't quite finished updating to the the new DataFrames syntax), or of you just want pipes on DataFrame operations, there's Pipe.jl and Chain.jl.

I don't think your comments are harsh, you need what you need and you like what you like. I do mostly data wrangling too, but feel much less constrained with Julia than with tidyr. Sometimes having constraints and one right way to do things is good, but it's not for me.

Also worth noting it's not necessarily on the language developers to do this. Even in R, tidyverse is in packages, not in the base language.

valarauko · on Jan 19, 2021

My experience with R was somewhat different. R was my first computational language in 2006 (version 2.3, IIRC), and parsing real life data (biological, in my case) into a format acceptable to R was a non-trivial exercise. I had somebody write me a perl script to parse the raw data into a clean CSV, but that has its own problems. The tools that were the kernel of the tidyverse (created 2014) were just beginning to show up, and even magrittr pipes were many years away. The only tidyverse tool even close to mature at the time was ggplot. For me data munging was the limiting factor, and at some point I discovered many people prefer Python for these initial steps. In 2013 I learnt Python with the explicit aim of data munging, while continuing analyses in R. With Pandas I could cover 80% of my use case for R, and eventually dropped it completely. Again, this predates the creation of the tidyverse, which I noted with some irony.

For what its worth, Hadley Wickham was asked in a Reddit AMA several years ago about which platform he'd choose if he was just starting out. He pointed to Julia as his pick.

snicker7 · on Jan 19, 2021

> My message to the Julia team: You can get a very large portion of the R userbase to switch over if you focus on a Julia version of the tidyverse (especially dplyr).

How about the Queryverse?

https://www.queryverse.org/

blablablerg · on Jan 18, 2021

Coudn't agree more! Everytime I look at Julia, I check if they have an alternative to the tidyverse (esp. dplyr and tidyr) yet.

ryndbfsrw · on Jan 18, 2021

If we removed dplyr, then R scripts would absolutely scream so I find the speed argument for 'why switch to X' unconvincing. If users cared so deeply about speed, almost no one would be using tidyverse instead we'd all be using base-R or data.table.

Multiple dispatch? Hmm is this really a problem that I'm going to come across in the real-world when 90% of our time is spent ingesting a poorly-formatted csv, doing some quick plots and perhaps building a model to test something out. If the goal of Julia is to replace R/Python then their priorities feel way off the mark

goatlover · on Jan 18, 2021

> If the goal of Julia is to replace R/Python then their priorities feel way off the mark

There's a lot more to scientific computing than wrangling tabular data. Julia is competing in that overall space with R/Python/Fortran/Java/C++. If R or Pandas is better at data wrangling, then Julia won't win out there. But so be it. No PL is best at everything.

laplacesdemon48 · on Jan 18, 2021

> There's a lot more to scientific computing than wrangling tabular data.

Also a point that gets ignored way too often. My original post differentiated between time spent writing models and time spent data wrangling.

I would never even attempt to write a symplectic integrator in base R (OK maybe Rcpp would be fine but that's not really "R"). Julia, by design, is better at that. But the R ecosystem is so good that I can use the best practical implementation of a symplectic integrator to solve common modeling problems via RStan.

Yes, Stan is a standalone framework that can be accessed from Julia as well. But the following workflow can be done in R much easier:

  1) Read in badly formatted CSV data
  2) Wrangle the data into a useable form
  3) Do some basic exploratory analysis (including plots)
  4) Write several models in brms/raw Stan (via rstan)
  5) Simulate from the priors and reset them to more sensible values
  6) Run the model over the data to generate the posterior
  7) Plot/run posterior predictive checks, counterfactual analysis, outlier analysis (PSIS or WAIC), etc.

Again, the above represents my common use case. I fully appreciate that people use Julia to do awesome stuff like "the exploration of chaos and nonlinear dynamics." [0]. I understand that the modern R ecosystem isn't really built for this.

[0] https://juliadynamics.github.io/DynamicalSystems.jl/latest/

ryndbfsrw · on Jan 18, 2021

Totally agree there. It is not a replacement and it is trying to solve a different problem. I dont believe Julia contributers are lying awake at night upset that other languages exist and feel they need to put a stop to that. My point (put across clumsily I see) is that IF that was their goal then they are going about it the wrong way as most R/Python users have different priorities. But it is a moot point as that would be an absurd motivation to create a whole new language

BadInformatics · on Jan 18, 2021

> is this really a problem that I'm going to come across in the real-world when 90% of our time is spent ingesting a poorly-formatted csv, doing some quick plots and perhaps building a model to test something out

Yes, multiple dispatch is not some highfalutin ivory tower concept that only comes up in specialized code. For example, the model in question could define custom plotting recipes[1] so that you can just call plot() and have it produce something useful.

Also, why shouldn't dplyr perform comparably against data.table? Seems like there would be no need for a fragmented library ecosystem here if the abstractions the tidyverse is built upon were lower-cost. Moreover, what if my data isn't CSV or in a table-like shape at all? "real world" does not mean the same thing across different domains.

[1] http://docs.juliaplots.org/latest/recipes/

disgruntledphd2 · on Jan 18, 2021

> Yes, multiple dispatch is not some highfalutin ivory tower concept that only comes up in specialized code. For example, the model in question could define custom plotting recipes[1] so that you can just call plot() and have it produce something useful.

This is literally the whole conception behind generic functions in R (print, plot, summary etc).

I agree it's great, but Julia is building on a lot of prior art here.

BadInformatics · on Jan 18, 2021

For sure, and one would be remiss not to mention Dylan, CL/CLOS and Clojure here as well. My quibble was with the claim that multiple dispatch rarely shows up in practice, which you've pretty clearly shown is not the case in R!

disgruntledphd2 · on Jan 18, 2021

Yup, the R-FAQ specifically calls out Dylan and CL as influences.

ryndbfsrw · on Jan 18, 2021

'highfalutin ivory tower' is a great name for a band :D

Naturally you are correct and I am wrong to dismiss it as unimportant. What I'm saying is that the majority of R/Python users today are not looking for ultimate speed or sophisticated programming paradigms. Most users are doing the unsexy bread and butter of 'Take some tabular data' -> analyse -> report on it and I want to dismiss the argument of 'users will migrate to Julia because of these nifty features' because it ignores the very reasons the existing users use these tools in the first place. It would be as absurd as proclaiming Excel users will switch to Python because the accounts deparment suddenly cares about NLP.