The first programming language I learned was Java. And for us non-native speakers who didn't know English very well at that point public static void did indeed sound like a magic spell. It was behind both an understanding and a language barriers
When I first saw Java, I had already seen multiple dialects of BASIC, plus Turing (a Pascal dialect), HyperTalk (the scripting language of HyperCard, and predecessor of AppleScript), J (an APL derivative), C and C++. I'm also a native speaker of English.
Your perception is still warranted. It was clear enough to me what all of that meant, but I was well aware that static is an awkward, highly overloaded term, and I already had the sense that all this boilerplate is a negative.
One of the problems is that a lot of bioinformatics formats nowadays have to hold so much data that most text editors stop working properly. For example, FASTA splits DNA data into lines of 50-80 characters for readability. But in FASTQ, where the '>' and '+' characters collide with the quality scores, as far as I know, DNA and the quality data are always put into one line each. Trying to find a location in a 10k long line gets very awkward. And I'm sure some people can eyeball Phred scores from ASCII, but I think they are a minority, even among researchers.
Similarly, NEXUS files are also human-readable, but it'd be tough to discern the shape of inlined 200 node Newick trees.
When I was asking people who did actual bioinformatics (well, genomics) what some of their annoyances when working with the bioinf software were, having to do a bunch of busywork on files in-between pipeline steps (compressing/uncompressing, indexing) was one of the complaints mentioned.
I think there's a place in bioinformatics for a unified binary format which can take care of compression, indexing, and metadata. But with that list of requirements it'd have to be binary. Data analysis moved from CSVs and Excel files to Parquet, and I think there's a similar transition waiting to happen here
My hypothesis is that bioinformatics favors text files, because open source tools usually start as research code.
That means two things. First, the initial developers are rarely software engineers, and they have limited experience developing software. They use text files, because they are not familiar with the alternatives.
Second, the tools are usually intended to solve research problems. The developers rarely have a good idea what the tools eventually end up doing and what data the files need to store. Text-based formats are a convenient choice, as it's easy extend and change them. By the time anyone understands the problem well enough to write a useful specification, the existing file format may already be popular, and it's difficult to convince people to switch to a new format.
Yes, most bioinformatics tools are the result of research projects.
However, the most common bioinformatics file formats have actually been devised by excellent software engineers (e.g. SAM/BAM, VCF, BED).
I think it is just very convenient to have text-based formats as you don't need any special libraries to read/modify the files and can reach for basic Unix text-processing tools instead. Such modifications are often needed in a research context.
Also, space-efficient file formats (e.g. CRAM) are often within reach once disk space becomes a pressing issue. Now you only need to convince the team to use them. :)
Totally. A good chuck of the formats are just TSV files with some metadata in header. Setting aside the drawbacks, this approach is both straightforward and flexible.
I think we're seeing some change in that regard, though. VCF got BCF and SAM and got BAM
Yeah, I found `anyhow`'s `Contex` to be a great way of annotating bubbled up
errors. The only problem is that using the lazy `with_context` can get
somewhat unwieldy. For all the grief people give to Go's `if err != nil`
Rust's method chaining can get out of hand too. One particular offender I
wrote:
match operator.propose(py).with_context(|| {
anyhow!(
"Operator {} failed while generating a proposal",
operator.repr(py).unwrap()
)
})? {
Which is a combination of `rustfmt` giving up on long lines and also not
formatting macros as well as functions
Thanks! So I guess the best recourse then is to resize the table? Seems like it should be part of the analysis, even if it's low probability of it happening. I haven't read the paper, though, so no strong opinion here...
(By the way, the text fragment does works somewhat in Firefox. Not on the first load, but if load it, then focus the URL field and press enter)
Yeah, I presume so. At least that's what Swiss Tables do. The paper is focused more on the asymptotics rather than the real-world hardware performance, so I can see why they chose not to handle such edge cases
This bothered me too, reading it and the sample implementations I've found so far just bail out. I thought one of the benefits of hash tables was that they don't have a predefined size?
The hash tables a programmer interacts with generally very much have a fixed size, but resize on demand. The idea of a fixed size is very much a part of the open addressing style hash tables -- how else could they even talk of how full a hash table is?
A spreadsheet engine. It's a React app with a Rust backend, but it impressed me how snappy it was[0]. Of course, it's not nearly as feature rich as Google Sheets, not to mention Excel.
"backend" seemed to imply it was contacting some server, but https://github.com/ironcalc/ironcalc#early-testing claims (and the network tab confirms) it is just Rust compiled to wasm, no "backend" required
MIT or Apache 2 (player's choice) if anyone else has grown deeply suspicious about any "open source" HN headlines of late
Right, I've made a mistake! I keep getting surprised by the fact it's
possible to simply compile a Rust crate with a WASM target and run it in
the browser.
Backend is a general word, not limited to client-server or the web. You can have a rendering backend with various configurable choices, like in Matplotlib (https://matplotlib.org/stable/users/explain/figure/backends....), or the deep learning library Keras has a choice between PyTorch, JAX and TensorFlow backends.
what's the backend of a spreadsheet engine going to be doing? updating the datastructures of the spreadsheet.
is it going to be local or remote? that's not part of the question.
is it foreground or background? that's an implementation choice. apple II, yeah, everything freezes while it recalcs. windows? recalcs when it can, don't let the mouse freeze.
Yep, I've misunderstood, realized it after seeing mdaniel's comment.
Thanks for making this in the first place! I saw IronCalc in the list
of projects supported by NLnet and it grabbed my attention.
By the way, if You don't mind me asking, how'd Tuta end up sponsoring
IronCalc? It seems that lately they and Proton have been trying to
expand their business away from just email. The fact that Tuta is
interested in IronCalc makes me think they want to have an office-like
offering.
Tuta sponsors by providing us with free email accounts, that's all. I reached out months ago, they liked the project and were kind enough to help us out with the email.
I haven't have talks with them about integrating IronCalc, but it is something that is on my mind.
There are a few projects where I'd love to see a modern spreadsheet
implementation. CryptPad comes to mind. They use OnlyOffice, which is
quite featurefull, but takes awhile to load and isn't as responsive.
Contrary to the name, I don't think that it's very good. The whole
thing is a canvas via WASM, so scrolling isn't smooth, selection doesn't
work, and accessibility is seemingly non-existent.
But I think the technology itself is interesting. While most modern UI
toolkits use HTML or React-like components, this uses a set of JSONs,
which describe the page.
I was searching for a Meilisearch alternative (which sends out telemetry
by default) and found Tantivy. It's more of a search engine builder,
but the setup looks pretty simple [0].
Hm, I am interested, but I would love to use it as a rust lib and just have rust types instead of some json config...
The java sdk of meilisearch was also nice, same: no need for a cli and manual configuration. I just pointed it to a db entity and indexed whole tables...
But instead of this, I would prefer some way to just hand it JSON and for it to just index all the fields...
for comparison, this is my meilisearch SDK code:
fun createCustomers() {
val client = Client(Config("http://localhost:7700", "password"))
val index = client.index("customers")
val customers = transaction {
val customers = Customer.all()
val json = customers.map { CustomerJson.from(it) }
Json.encodeToString(ListSerializer(CustomerJson.serializer()), json)
}
index.addDocuments(customers, "id")
}
OP is entitled to make political choices when selecting software.
Some of us have specific principles of which things like opt-out telemetry might run afoul.
OP will choose their software, I choose mine and you choose yours; none of us need to call each other petty or otherwise cast such negative judgement; a free market is a free market.
Suggesting you should be less judgemental is not white-knighting, nor is it irrational. Sorry bud, but not everyone thinks the way you do, different people have different principles.
Feel free to explain how either of the two comments of yours I've replied to represent principled discussion or added value, because I'm not seeing it.
It's a minor complaint, but I'm also evaluating it for a minor project.
I just don't like the fact that I can forget to add a flag once and, oh,
now I'm sending telemetry on my personal medical documents.
Meilisearch only sends anonymized telemetry events. We only send API endpoints usage; nothing like raw documents goes through the wire. You can look at the exhaustive list of all collected data on our website [1].
Hey PSeitz, Meilisearch CEO here. Sorry to hear that you failed to index a low volume of data. When did you last try Meilisearch? We have made significant improvements in the indexing speed. We have a customer with hundreds of gigabytes of raw data on our cloud, and it scales amazingly well. https://x.com/Kerollmops/status/1772575242885484864
Frankly, I'm okay with Meillisearch for instant search because y'all are clear about analytics choices, offer understandable FOSS Rust, and have a non-AGPL license. If/when we make some money, I'm in favor of $upporting and consulting of tools used to keep them alive out of self-interest.
It's an old project, based on Sphinx[0]. But unlike many other code searches,
this one indexes Codeberg, SourceHut, and a number of other forges non-GitHub
forges.