Coincidentally enough I pretty much only know any of this because a few years ago I created a GUI for a client which enabled an assistive technology researcher to "draw" in the pitch contour required for a word/phrase (not unlike the project demonstrated in your video :) ) from which the SSML was then generated.
Of course, whether a particular speech synthesis system supports such features is another thing.
It also has a `pitch_contour` attribute: https://www.w3.org/TR/speech-synthesis11/#pitch_contour
Coincidentally enough I pretty much only know any of this because a few years ago I created a GUI for a client which enabled an assistive technology researcher to "draw" in the pitch contour required for a word/phrase (not unlike the project demonstrated in your video :) ) from which the SSML was then generated.