Data Dropping Media

Last updated on Apr 27, 2025 1 min read

Imagine dropping <1% of the data in your dataset and seeing the conclusions to your data analysis change. It’d be nice to have a fast way to check whether such a small fraction of data existed! Recent works have proposed approximations for performing this check. In realistic data (yep, even basic linear regression!) we show that such approximations can break down, suggesting ways forward for users and developers.

Venues where our work has appeared:

International Conference on Continuous Optimization, Robustness Session. USC, Los Angeles, California, July 2025. (upcoming!) [invited talk]
MIT Robustness and Influence Functions Workshop. Cambridge, Massachusetts, August 2024. [invited talk]
Second Workshop on Navigating and Addressing Data Problems for Foundation Models @ ICLR. Singapore, April 2025. [poster]
Second Workshop on Attributing Model Behavior at Scale @ NeurIPS. Vancouver, Canada, December 2024. [poster]
Women in Machine Learning (WiML) Symposium @ ICML. Vienna, Austria, July 2024. [poster]

Data Dropping Media

Jenny Y. Huang

jhuang9@mit.edu