Perhaps look into using dlt from https://dlthub.com, using pyarrow or polars. It handles large datasets well, especially when using generators to process the data in chunks.
It was not uncommon to draw a line through the zero to distinguish it from the letter ‘O’. Similarly, a slash was often added to the letter ‘Z’ to prevent confusion with the number 2.
Given that many data engineers have a data science, data analytics, BI, or software engineering background, I'm curious if you've noticed any trends in their approach to data security?
Yes. Generally it can be summarized as "What data security?".
Snark aside, there's usually a reflexive assumption that more data is always better and that anything that gets in the way of more data is bad. Anything that limits how data is analyzed is bad. Anything that limits or restricts their choice of tooling or where they use it is rejected.
Data scientists and engineers are people who are, often, working with a company's crown jewels. They are trusted with data representing the private lives of hundreds of millions of people assembled in a data warehouse. I want them to have a care. To treat this with due gravitas. All too often, all they seem to see is a neat data set to feed into R on their Macbook.
Gemini keeps disappointing me. It keeps making code up that is not accurate. I asked it some questions about a python library and the answers were inaccurate. I even instructed it to refer to the docs, but it still fails as Gemini made up methods that don't exist.
I also asked Gemini about git and that didn't go well either.
It disappoints me because when I installed it my ability to set a reminder saying "ok, Google, set a reminder" no longer worked. It wasn't available for the fancy new AI. Somehow they overlooked one of the most basic and functions that people use with that keyword.
I used it for a few days, and it would spin up the fans on my MacBook Pro after a few minutes. I also wasn’t pleased with the tab management. And then I forgot about it.