Leverage Gemini 2.5 Pro Preview's native audio capabilities for STT and TTS

* If calling real time API the STT and TTS phases can be shortcut, because the model itself is capable. The question in that case is how to obtain the user speech's text format for vector indexing
* The newest Gemini API audio output sounds like can produce Journey like speech output