Ítem
|
Palarea‐Albaladejo, Javier
Sánchez, Mateo |
|
| Universitat de Girona. Escola Politècnica Superior | |
| Garçon, Albert | |
| juny 2024 | |
|
This work is a project on enzyme annotation. In particular, Fe and α-ketoglutarate dependent dioxygenases (aKGs), and halogenases. These enzymes are very similar, as halogenases are a subclass of aKGs, only differentiated by one amino acid in their binding site.
Proteins can be found in huge (+250 million) public databases. They use multiple algorithms based on information and properties manually inputted. As the number of proteins grow faster and faster, automatic annotation algorithms are needed.
Enzymes are characterised by their amino acid sequence. This sequence codifies all relevant information about the protein, but it is not trivial to extract it. However, automatic algorithms need to use only the sequence, as it is the only thing known about a protein in a vacuum.
As the protein sequence is a list of letters, very powerful transformer models have been developed recently to try to convert these sequences to embeddings imitating NLP embedding models. This project uses ProtBert and ProtT5, based around Bert and T5.
With the embeddings, multiple downstream tasks can be done. The one that interests us is classification: knowing if a protein is an aKG or not, or an halogenase or not.
In order to test these algorithms, a high quality dataset is required. This is the central part of the project, as it needs to be sufficiently big for ML tasks 9 |
|
| application/pdf | |
| 26592 | |
| http://hdl.handle.net/10256/27574 | |
| eng | |
| Attribution-NonCommercial-NoDerivatives 4.0 International | |
| http://creativecommons.org/licenses/by-nc-nd/4.0/ | |
|
Enzims
Enzymes Proteines -- Estructura Proteins -- Structure Aprenentatge automàtic Machine learning Bioinformatics Bioinformàtica |
|
| Machine learning guided identification of 2-oxoglutarate dependent halogenases | |
| info:eu-repo/semantics/masterThesis | |
| DUGiDocs |
