Today at Berkeley Lab

Seeking Abstracts for Berkeley Lab’s First ML4Sci Workshop

ML4Sci, a new workshop on machine learning (ML) for science at Berkeley Lab, will take place Sept. 4-6. Lab scientists and affiliates are encouraged to submit abstracts on their relevant ML for science projects. The workshop will feature overviews of ML applications and provide hands-on training on NERSC systems. The deadline for abstract submissions is Aug. 20. Registration deadline for the workshop is Aug. 27. More>


  1. Title:
    Predicting Bacterial Two-Component Signaling with a Deep Recurrent Neural Network

    Two-component systems (2CS) are a primary method that bacteria use to detect and respond to environmental stimuli. Receptor histidine kinases (HK) detect an environmental signal, activating the appropriate response regulator (RR). Genes for such ‘cognate’ HK-RR pairs are often located proximally on the chromosome, allowing easier identification of the target for a particular signal. However, almost half of all HK and RR proteins are ‘orphans’, with no known nearby partner, complicating identification of the proteins that respond to a particular signal.

    This work describes the use of a long short-term memory (LSTM) based neural network to score the likelihood of interaction for two arbitrary proteins defined by their amino acid sequences. The single ML model was trained on 500k known cognate and non-cognate HK-RR pairs for over 1300 bacterial species. The trained model can then score an arbitrary pair of HK and RR sequences, allowing the recommender algorithm to rank the most likely HKs to pair with a specific RR in a given species.

    We demonstrate the top-2 recommendations contain known cognate pairs for the majority of investigated species and can predict most likely 2CS pairs for all orphan proteins for more than 1300 bacterial species.