You’re almost certainly going to have to write your own splitting code for anything nontrivial. LlamaIndex breaks down hard when there’s a lot of markup in the document, for example. You’ll also want control over the vector search strategy (just using the query or chunk embedding may not be enough)