The MIT School of Science and the MIT Libraries presented the inaugural MIT Prize for Open Data in 2022. The following winners and honorable mentions were selected from more than 70 nominees representing all five schools and several research centers across MIT.
Recipients were honored at the “Open Data @ MIT” event on October 28, 2022, in Hayden Library, featuring remarks from School of Science Dean Nergis Mavalvala and MIT Libraries Director Chris Bourg, award presentations, and short talks by the winners. Read more in MIT News.
Winners
- Yunsie Chung
Graduate student, Department of Chemical Engineering
SolProp, the largest open-source dataset with temperature-dependent solubility values of organic compounds - Matthew Groh, graduate student, MIT Media Lab; Caleb Harris, MEng, MIT Media Lab; Luis Soenksen, postdoc, MIT; Felix Lau, research engineer, Scale AI; Rachel Han, software engineer, Scale AI; Aerin Kim, manager, Scale AI; Arash Koochek, dermatologist, Banner Health; Omar Badri, dermatologist, Northeast Dermatology Associate
Fitzpatrick 17k dataset, an open dataset consisting of 16,577 images of skin disease alongside skin disease and skin tone annotations. - Tom Pollard, research scientist; Benjamin Moody, programmer analyst; Li-Wei Lehman; research scientist; Brian Gow, technical associate; Chen Xie, engineer; Jesse Raffa, research scientist; Dana Moukheiber, technical associate; Lama Moukheiber; Ken Paik, research scientist; Leo Celi, principal research scientist; Alistair Johnson, research scientist; Roger Mark, Distinguished Professor of Health Sciences and Technology; Laboratory for Computational Physiology, Institute for Medical Engineering & Science
PhysioNet, a data-sharing platform that enables thousands of clinical and machine-learning research studies each year and which allows researchers to share sensitive resources that would not be possible through typical data sharing platforms - Joseph Replogle
Graduate student, Whitehead Institute
Genome-wide Perturb-seq dataset, the largest publicly available, single-cell transcriptional dataset collected to date - Pedro Reynolds-Cuéllar, graduate student, MIT Media Lab/ACT; Diana Duarte, co-founder at Diversa, the Diversa team and the Retos’ network of community partners and universities.
Retos, an open-data platform for detailed documentation and sharing of local innovations from under-resourced settings, also aiding with matching hundreds of university students with challenges from rural collectives - Maanas Sharma
Undergraduate student
States of Emergency, a nationwide project analyzing and grading the responses of prison systems to COVID-19 using data scraped from public databases and manually collected data - Djuna von Maydell
Graduate student, Department of Brain and Cognitive Sciences
First publicly available dataset of single-cell gene expression from post-mortem human brain tissue of patients who are carriers of APOE4, the major Alzheimer’s disease risk gene - Raechel Walker, graduate researcher; Olivia Dias, undergraduate researcher; Zeynep Yalcin, undergraduate researcher; Lina Henriquez, undergraduate researcher; Sophia Brady, undergraduate researcher; Matt Taylor, senior research team; Cynthia Breazeal, director, Personal Robots group, MIT Media Lab
Data Activism Curriculum for high school students through the Mayor’s Summer Youth Employment Program (Cambridge); activities involved using data science and open data to challenge power inequalities, such as racism, and students learned how to use data science to recognize, mitigate, and advocate for people that are disproportionately impacted by systemic inequality - Suyeol Yun
Graduate Student, Department of Political Science
DeepWTO, a project creating open data for use in legal NLP (Natural Language Processing) research using cases from the World Trade Organization (WTO) - Jonathan Zheng
Graduate student, Department of Chemical Engineering
An open IUPAC dataset for acid dissociation constants, or “pKas,” physicochemical properties that govern how acidic a chemical is in a solution, transformed into FAIR (findable, accessible, interoperable and reusable) data from verified data locked in print
Honorable Mentions
- Awad Abdelhalim and Ilham Ali
KhartouMap Initiative Project: Mapping the semi-formal bus system of Khartoum, generating public maps, data for research, and GTFS-compliant feeds.
- Alvina Adimoelja and Advait Athreya
Policy position paper recommending the establishment of detailed federal open science guidelines - Simon Axelrod
Open-source Geometric Ensemble of Molecules dataset (GEOM) - Junyi Chu, Gal Raz, Sabrina Piccolo, Catherine Mei, Peter Hart, Peng Cao, Shari Liu, Melissa Kline Struhl, Katherine Fairchild, Joshua Tenenbaum
iCatcher+: Robust and automated annotation of infant gaze from videos collected in laboratory, field, and online studies - Neil S. Gaikwad (@neilsgaikwad)
Data-driven Humanitarian Mapping and Policymaking: a Global Initiative in Data Science Research for Planetary-Scale Resilience, Equity, and Sustainability, collaboration across academia, industry, governments, and communities @HumanitarianSys | ACM KDD Paper - Rikab Gambhir, In collaboration with Benjamin Nachman and Jesse Thaler
Using open data to improve the calibration of particle physics measurements with machine learning - Carmelo Ignaccolo
Living Heritage Atlas | Beirut; Mapping and activating Beirut’s crafts
Joint work with Daniella Maamari, Ashley Louie, Sarah Williams, Azra Aksamija - Benjamin Lahner
Algonauts Action Video (AAV) dataset - Barrett M. Powell
Mining cryo-electron tomography open data resource with tomoDRGN - Suhas Eswarappa Prameela
Magnesium Database Project (MDP) - Julian Rippy
A Mixed Methods Approach to Force Estimation in Military Operations Other Than War - Martin Schrimpf, Tiago Marques, and Michael Ferguson
Brain-Score.org - Rahul Singh
Causal Inference with Corrupted Data: Measurement Error, Missing Values, Discretization, and Differential Privacy, a framework for policy analysis using privacy-protected open data. Joint work with Anish Agarwal (Amazon) - Dylan Walsh, Nathan Rebello, Wizhong Zou, Jiale Shi, Bruno Salomão, Ardiana Osmani, Reid Mello, Mike Deagen, Navid Harari, Berenger Dalle-Cort, Carlos Villa, Klavs Jensen, Tzyy-Shyang Lin, Brad Olsen
Community Resource for Innovation in Polymer Technology (CRIPT) - Victory Yinka-Banjo, Tessa Bertozzi, Yi Hua Chen
Curation of a list of genes that are recalcitrant to the effect of CRISPRi knockdown - Jonathan Zong and J. Nathan Matias
Bartleby
Committee
Committee Co-Chairs
- Chris Bourg, Director, MIT Libraries
- Rebecca Saxe, Associate Dean of Science, School of Science (SoS)
Committee Members
- Michael Bishop, School of Science Events Planner
- Iain Cheeseman, Herman and Margaret Sokol Professor of Biology, SoS and Whitehead
- Fotini Christia, Ford International Professor in the Social Sciences, School of Humanities, Arts, and Social Sciences (SHASS) and Institute for Data, Systems, and Society (IDSS)
- Katharine Dunn, Scholarly Communications Librarian, MIT Libraries
- Satrajit Ghosh, McGovern Institute, SoS, and Director of Data Models and Integration, ReproNim
- Nick Lindsay, Director of Journals and Open Access, MIT Press
- Amy Nurnberger, Program Head, Data Management Services, MIT Libraries
- Jack Payette, graduate student, Earth and Planetary Sciences, SoS
- Dave Rand, Erwin H. Schell Professor and Professor of Management Science and Brain and Cognitive Sciences, Sloan School of Management
- Devavrat Shah, Andrew (1956) and Erna Viterbi Professor of EECS, School of Engineering and IDSS
- Virginia Spanoudaki, Scientific Director, Preclinical Imaging and Testing facility, Koch Institute, SoS
- Greg Wagner, Research Scientist, Earth and Planetary Sciences, SoS