Diarization might be my next step,(recognizing the speaker on the recording).
Combining the information from multiple sources as you say will get you a complete view (location history and time of the recording will let you know if you where speaking with a college or your spouse for example)
Combining the information from multiple sources as you say will get you a complete view (location history and time of the recording will let you know if you where speaking with a college or your spouse for example)