A data-driven bioinformatic investigation into protein post-translational modifications LIFUYI 2020 To address the post-translational modification (PTM) site identification problem, this thesis focuses on developing data-driven bioinformatic approaches to predict PTM substrates and sites from sequence and/or structural information. Specifically, this thesis presents four novel machine learning frameworks and a comprehensive 3D structure database of PTMs. Each of the four frameworks represents a solution to one type of PTM prediction problem. Extensive benchmarking experiments demonstrate that these four frameworks show a competitive and robust performance in their particular PTM site prediction problems. In addition, publicly available online webservers have been developed and deployed as implementations of these frameworks to facilitate bioinformatics studies of novel PTM sites and generate novel biological hypotheses.