that's a full Debevec style light stage but you can do this stuff with a couple of dSLRs and a home lighting setup with polarised light - kind of standard photogrammetry/hq texture/normal map generation techniques.
You can see from the normal map in the video it's pretty detailed but at least for a single face capture you can do this at home. I'm not sure what secret sauce they have for capturing multiple facial expressions and some ML magic how to morph/animate between those.
I don't see why an iPhone 15 Pro wouldn't be able to capture these scans, especially with the new "spatial video" feature, which takes a "3d" video using multiple lenses.
You'll get decent results but it won't be as good as with dSLRs and polarised studio light - if you want super detailed textures, be able to relight them etc
They should open up the Codec avatar creation - lots of people have the hardware and the time/expertise to create them