Yunyi-Liu (Rein)

I’m currently a research intern at Dolby Laboratories in Beijing working on text to object-based spatial audio generation with multi-modal large language models (MMLLMs). I recently submitted my Phd thesis at University of Sydney in Electrical and Information Engineering. My Phd thesis is ‘Towards controllable neural audio synthesis: An exploration of sound effects creation with generative models’. My research focus is creating controllable and interactive sound generation tools that empowers creativity. To this end, I proposed several methods by taking advantage of pre-trained large audio representation models (CLAP, PANNs, VGGish, etc) to distill information from limited audio datasets and used such information to guide and control the audio synthesis. I typically work with generative models (Diffusion, Transformer, VAE, GAN) and differentiable digital signal processing (DDSP).
My research interests are:
Machine learning, Deep learning, Generative AI, Controllable Generative Models, Audio Signal Processing, Time Series Prediction, Data Analysis, Sound Synthesis and Sound Design, Multi-Modal Large Audio-Language Modeels(MLLM), Human Computer Interaction.
I come from a highly diverse background. Prior to my Phd, I received my B.A. Music and Sound Design at University of Technology Sydney and M.A. Interaction Design and Electronic Arts at University of Sydney. You can access my portfolio prior to 2021, when I was a media artist and sound designer. In 2023, I was a research intern at Dolby Laboratories Advanced Technology Group(ATG) working on DDSP for general audio synthesis and control.
Apart from work, I play and teach the first electronic instrument called the Theremin. You can find my performances here.
Skill sets:
- Machine learning, deep learning, generative AI, large language models(LLM), multi-modal data synthesis, time series data analysis and synthesis (Python, Pytorch, Tensorflow, Keras, Linux, AWS)
- Audio signal processing, digital signal processing, spatial audio (Matlab, Python, Librosa, Essentia, ffmpeg)
- Creative audio programming (MAX MSP, C++, Chuck, FAUST)
- Sound design for film and games (Ableton, Pro Tools, Wwise, FMOD, Unity)
- Web development and visual programming (Javascript, HTML, P5JS, Processing)