I am a master’s student at Carnegie Mellon University. I work with Professor Bhiksha Raj on Speech Processing and Audio Language Models and Professor Chris Donahue on Text-to-Audio Models. My research interests focus on speech/audio processing and LLMs.
Previously, I worked with Dr Satrajit Ghosh at MIT on the explainability of self-supervised learning (SSL) embeddings, such as WavLM, for speech emotion recognition. I have also worked at EPFL on room acoustics simulation. I completed my undergraduate degree in Electrical Engineering at IIT Delhi, where my concentration was on signals processing and ML.
Education
- M.S., Computer Engineering
- Carnegie Mellon University
- August 2023 - December 2024 (Expected)
- B. Tech., Electrical Engineering
- Indian Institute of Technology, Delhi
- August 2019 - August 2023
Publications and Preprints
MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Satvik Dixit, Soham Deshmukh, Bhiksha Raj
- Under review at ICASSP 2025 Speech and Audio Language Models (SALMA) Workshop
- Paper
- Code
Vision Language Models Are Few-Shot Audio Spectrogram Classifiers
Satvik Dixit, Laurie Heller, Chris Donahue
- Accepted at NeuRIPS Audio Imagination Workshop
- Paper
Improving Speaker Representations Using Contrastive Losses on Multi-scale Features
Satvik Dixit, Massa Baali, Rita Singh, Bhiksha Raj
- Under review at ICASSP 2025 (Main)
- Paper
- Code
Explaining Deep Learning Embeddings for Speech Emotion Recognition by Predicting Interpretable Acoustic Features
Satvik Dixit, Daniel M. Low, Gasser Elbanna, Fabio Catania, Satrajit S. Ghosh
- Under review at ICASSP 2025 (Main)
- Paper
- Code