Abstract: Recent advances in machine learning and large language models have produced extraordinary results. For example, Alphafold has solved a grand challenge in biology that has been open for over fifty years. However, if you look carefully, you will find that Alphafold was only possible with the existence of high-quality, well-curated data, specifically the Protein Database. In general, these realizations have led to the emergence of what is called data-centric AI, an approach in which one focuses on the creation of high-quality data, rather than focusing on the creation of new models. In this talk, I will describe our efforts in creating high-quality FAIR data for high-quality reproducible machine learning applications. I will also present examples from the FaceBase data repository and smaller-scale collaborations in Glaucoma detection.
Speaker Bio:
Carl Kesselman is the William M. Keck Professor of Engineering in the USC Viterbi School of Engineering. He is a professor in the Daniel J. Epstein Department of Industrial and Systems Engineering and holds positions in the Department of Computer Science, the Department of Population and Public Health Sciences in the Keck School of Medicine, and Biomedical Sciences in the Ostrow School of Dentistry. Dr. Kesselman is a USC Information Sciences Institute Fellow, where he directs the Informatics Systems Research Division and the Director of the Center of Excellence for Discovery Informatics in the Michelson Center for Convergent Biosciences. He has been the PI on collaboration and data management and analysis infrastructure for numerous large-scale NIH-funded initiatives in areas such as craniofacial development, kidney reconstruction, synaptic mapping, and genito-urinary tract development.
Dr. Kesselman has received numerous honors for his pioneering research, including the Lovelace Medal from the British Computing Society, the Goode Memorial Award from the IEEE Computing Society, and the IEEE Internet Award. He is a Fellow of the British Computing Society, the IEEE, and the Association for Computing Machinery (ACM).