Chloe Hsu

I'm currently a final-year PhD student at the University of California, Berkeley, advised by Jennifer Listgarten and Moritz Hardt.

My research contributes to the machine learning foundation for the design, engineering, and interpretation of proteins. This includes new methods for structure-based protein design and for ranking sequence variants in protein engineering campaigns.

Most recently, I have been focusing on immune repertoires. I believe that machine learning advances in structural biology and in protein sequence modeling will lead to progress in immunology, such as a better understanding of autoimmunity and immunogenicity.

Some of my recent work was done during an internship with Adam Lerer and the Protein team (led by Alex Rives and Tom Sercu) at Facebook AI Research. I was also fortunate to explore the topics of protein design and immunology from a commercial perspective as a Bio-IT Fellow at 8VC.

Earlier work experience as machine learning engineer at Google Health helped shape my interests in human health. Deep gratitude also goes towards Chris Umans and Peter Schröder for their kind and inspiring mentorship during my undergraduate years at Caltech.

(Last updated: January 2023)

Email  /  Google Scholar  /  Twitter

profile photo

Research

project image

Learning inverse folding from millions of predicted structures


Chloe Hsu, Robert Verkuil, Jason Liu, Zeming Lin, Brian Hie, Tom Sercu, Adam Lerer*, Alexander Rives* (Equal contribution*)
ICML (long oral presentation; Outstanding Paper Runner Up Award), 2022
paper | code | slides | colab notebook

Inverse folding aims to design sequences to fold into desired structures. To better learn inverse folding, we generated predicted structures for 12M protein sequences using AlphaFold2 and therefore augmented the training data by nearly three orders of magnitude. With this additional data, our new model is more accurate at structure-based sequence design, while also generalizing to a variety of more complex tasks, including the design of protein complexes, partially masked structures, binding interfaces, and multiple states.

project image

Learning protein fitness models from evolutionary and assay-labeled data


Chloe Hsu, Hunter Nisonoff, Clara Fannjiang, and Jennifer Listgarten
Nature Biotechnology, 2022
paper | talk | code

A simple yet highly effective combination approach to protein fitness prediction, learning from both (unlabeled) evolutionarily related protein sequences and variant protein sequences with experimentally measured labels. Also an analysis that highlights the importance of systematic evaluations and sufficient baselines.


Design and source code from Jon Barron's website and Leonid Keselman's Jekyll fork.