One thing to know about data scientists are (1) there's the job posting and (2) there's the work. Often you'll be hired for one thing and end up doing entirely different work.
To improve your chops in this field, (a) learn the basics of NLP, (b) build yourself a RAG using llamaindex or langchain (or other but build it).
As for fine tuning and deep learning, without experience in that field it's tough. It's like any complex thing (fine tuning not so much but deep learning), knowledge comes with time and exposure. So go find a reason to fine tune or build and train a neural network and go friggin do it.
Private data especially in the enterprise cannot use public LLM’s like GPT-4 or 5 or N. Use cases needing data privacy have to use an internally implemented LLM application. In Currently, RAG is a concrete and pragmatic enterprise use of LLM’s aside from summarization, which is not amenable to using GPT-4.
GPT-5 may very well be amazing. But unless it runs on-prem it can’t be used in many scenarios because of data privacy.
To the OP - learning how to run LLM’s locally via say Ollama (see ollama.ai) will get you started in a hands on manner. See the /r/LocaLlama subreddit for a very active community around running LLM’s locally.
I think you’re missing my point. RAG is something that you currently implement yourself, or pay someone else who already implemented it. With GPT5 (or GPT6 at the latest), you just give it the same access you would give to a RAG system, and describe what you want it to do. It will do the rest.
Edit: I’m assuming the scenario where you do want to use the best model.
To improve your chops in this field, (a) learn the basics of NLP, (b) build yourself a RAG using llamaindex or langchain (or other but build it).
As for fine tuning and deep learning, without experience in that field it's tough. It's like any complex thing (fine tuning not so much but deep learning), knowledge comes with time and exposure. So go find a reason to fine tune or build and train a neural network and go friggin do it.