“Unambiguous Discrimination of All 20 Proteinogenic Amino Acids and Their Modifications by Nanopore”, 2023-09-25 ():
Natural proteins are composed of 20 proteinogenic amino acids and their post-translational modifications (PTMs). However, due to the lack of a suitable nanopore sensor that can simultaneously discriminate between all 20 amino acids and their PTMs, direct sequencing of protein with nanopores has not yet been realized.
Here, we present an engineered hetero-octameric Mycobacterium smegmatis porin A (MspA) nanopore containing a sole Ni2+ modification. It enables full discrimination of all 20 proteinogenic amino acids and 4 representative modified amino acids, Nω,N’ω-dimethyl-arginine (Me-R), O-acetyl-threonine (Ac-T), N4-(β-N-acetyl-D-glucosaminyl)-asparagine (GlcNAc-N) and O-phosphoserine (P-S).
Assisted by machine learning, an accuracy of 98.6% was achieved. Amino acid supplement tablets and peptidase-digested amino acids from peptides were also analyzed using this strategy.
This capacity for simultaneous discrimination of all 20 proteinogenic amino acids and their PTMs suggests the potential to achieve protein sequencing using this nanopore-based strategy.
…7 inbuilt classifiers, that is, ensemble, SVM (support vector machine), decision trees, naive Bayes, neural network, discriminant analysis and KNN (k-nearest neighbor) were evaluated. To avoid overfitting, the model performance was evaluated with 10× cross-validation.
The derived quadratic SVM model, which has a 98.8% validation accuracy, was found to be the best-performing model.