Publications
2025
- Aaron Broukhim, Yiran Shen, Prithviraj Ammanabrolu, and 1 more authorarXiv preprint, Nov 2025
Despite the parallel challenges that audio and text domains face in evaluating generative model outputs, preference learning remains remarkably underexplored in audio applications. Through a PRISMA-guided systematic review of approximately 500 papers, we find that only 30 (6%) apply preference learning to audio tasks. Our analysis reveals a field in transition: pre-2021 works focused on emotion recognition using traditional ranking methods (rankSVM), while post-2021 studies have pivoted toward generation tasks employing modern RLHF frameworks. We identify three critical patterns: (1) the emergence of multi-dimensional evaluation strategies combining synthetic, automated, and human preferences; (2) inconsistent alignment between traditional metrics (WER, PESQ) and human judgments across different contexts; and (3) convergence on multi-stage training pipelines that combine reward signals. Our findings suggest that while preference learning shows promise for audio, particularly in capturing subjective qualities like naturalness and musicality, the field requires standardized benchmarks, higher-quality datasets, and systematic investigation of how temporal factors unique to audio impact preference learning frameworks.
- Robert Kaufman, Aaron Broukhim, and Michael HauptProc. ACM Hum.-Comput. Interact., Oct 2025
Social media platforms enhance the propagation of online misinformation by providing large user bases with a quick means to share content. One way to disrupt the rapid dissemination of misinformation at scale is through warning tags, which label content as potentially false or misleading. However, past warning tag mitigation studies yield mixed results for diverse audiences. We hypothesize that personalizing warning tags to the individual characteristics of their diverse users may enhance mitigation effectiveness. To reach the goal of personalization, we need to understand how people differ and how those differences predict a person’s attitudes and behaviors toward tags and tagged content. In this study, we leverage Amazon Mechanical Turk (n = 132) and undergraduate students (n = 112) to provide this foundational understanding. With all participants combined, we find attitudes towards warning tags and self-described behaviors are significantly influenced by factors such as Need for Cognitive Closure (NFCC), Political orientation, and Trust in Medical Scientists when controlled for covariates such as age and recruiting platform. Analyses of each sample further show that tag attitudes were influenced by Trust in Religious Leaders, and Big Five Inventory (BFI) traits for Openness and Conscientiousness. We synthesize these results into design insights and a future research agenda for more effective and personalized warning tags and misinformation mitigation strategies more generally.
- Robert A Kaufman, Aaron Broukhim, David Kirsh, and 1 more authorIn Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, May 2025
Explanations for autonomous vehicle (AV) decisions may build trust, however, explanations can contain errors. In a simulated driving study (n = 232), we tested how AV explanation errors, driving context characteristics (perceived harm and driving difficulty), and personal traits (prior trust and expertise) affected a passenger’s comfort in relying on an AV, preference for control, confidence in the AV’s ability, and explanation satisfaction. Errors negatively affected all outcomes. Surprisingly, despite identical driving, explanation errors reduced ratings of the AV’s driving ability. Severity and potential harm amplified the negative impact of errors. Contextual harm and driving difficulty directly impacted outcome ratings and influenced the relationship between errors and outcomes. Prior trust and expertise were positively associated with outcome ratings. Results emphasize the need for accurate, contextually adaptive, and personalized AV explanations to foster trust, reliance, satisfaction, and confidence. We conclude with design, research, and deployment recommendations for trustworthy AV explanation systems.
2022
- Esin Darici Haritaoglu, Nicholas Rasmussen, Daniel CH Tan, and 20 more authorsarXiv preprint, Jan 2022
The Covid-19 pandemic has been one of the most devastating events in recent history, claiming the lives of more than 5 million people worldwide. Even with the worldwide distribution of vaccines, there is an apparent need for affordable, reliable, and accessible screening techniques to serve parts of the World that do not have access to Western medicine. Artificial Intelligence can provide a solution utilizing cough sounds as a primary screening mode for COVID-19 diagnosis. This paper presents multiple models that have achieved relatively respectable performance on the largest evaluation dataset currently presented in academic literature. Through investigation of a self-supervised learning model (Area under the ROC curve, AUC = 0.807) and a convolutional nerual network (CNN) model (AUC = 0.802), we observe the possibility of model bias with limited datasets. Moreover, we observe that performance increases with training data size, showing the need for the worldwide collection of data to help combat the Covid-19 pandemic with non-traditional means.