Publications
You can also find my articles on my Google Scholar profile.
Pre-prints
Liu Z-Y, Berthel A, Czech E, Stitzer M, Hsu S-K, Pennell M, Buckler ES, Zhai J. GeneCAD: Plant Genome Annotation with a DNA Foundation Model. bioRxiv. 2025 Oct 31:2025-10
Zhai J, Gokaslan A, Hsu S-K, Chen S-P, Liu Z-Y, Marroquin E, Czech E, Cannon B, Berthel A, Romay MC, Pennell M, Kuleshov V, Buckler ES. PlantCAD2: A Long-Context DNA Language Model for Cross-Species Functional Annotation in Angiosperms. bioRxiv. 2025 Aug 27:2025-08
Oren E, Zhai J, Rooney TE, Ruthie Angelovici, Hale C, Brindisi L, Hsu SK, Gault CM, Hua J, La T, Lepak N, Fu Q, Buckler ES, Romay MC. Grass Rhizome Proteomics Reveals Convergent Freezing-Tolerance Strategies. bioRxiv. 2025 May 15:2025-05
Selected publications
Zhai, J., Gokaslan, A., Schiff, Y., Berthel, A., Liu, Z. Y., Lai, W. L., Miller, Z. R., Scheben, A., Stitzer, M. C., Romay, M. C., Buckler, E. S., & Kuleshov, V. (2025). Cross-species modeling of plant genomes at single nucleotide resolution using a pretrained DNA language model. Proceedings of the National Academy of Sciences, 122(24), e2421738122.
Hale CO., Hsu SK., Zhai J., Schulz AJ., Aubuchon-Elder T., Costa-Neto G., Gelfond A., El-Walid M., Hufford M., Kellogg EA., La T., Marand AP., Seetharam AS., Scheben A., Stitzer M., Wrightsman T., Romay MC., Buckler ES. (2025). Widespread turnover of a conserved cis-regulatory code across 589 grass species. Molecular Biology and Evolution.
Schulz AJ, Zhai J, AuBuchon-Elder T, Andorf CM, El-Walid MZ, Ferebee TH, Gilmore EH, Hufford MB, Johnson LC, Kellogg EA, La T, Long E, Miller ZR, Portwood JL II, Romay MC, Seetharam AS, Stitzer MC, Woodhouse MR, Wrightsman T, Buckler ES, Monier B, Hsu SK. Fishing for a reelGene: evaluating gene models with evolution and machine learning. The Plant Journal. 2025 Sep 22.
Zhai J, Zhang Y, Zhang C, Yi X, Song M, Tang C, Ding P, Li Z, Ma C. deepTFBS: Improving within- and cross-species prediction of transcription factor binding using deep multi-task and transfer learning. Advanced Science. 2025 Mar 20:e03135.
Zhai J, Song J, Zhang T, Xie S, Ma C. deepEA: a containerized web server for interactive analysis of epitranscriptome sequencing data. Plant Physiology, 2021, 185(1):29-33.
Zhai J, Song J, Cheng Q, Tang Y, Ma C. PEA: an integrated R toolkit for plant epitranscriptome analysis. Bioinformatics, 2018, 34(21):3747-3749.
Song J, Zhai J, Bian E, Song Y, Yu J, Ma C. Transcriptome-wide annotation of m5C RNA modifications using machine learning. Frontiers in Plant Science, 2018, 9:519. (Co-first author).
Chen S, Ren C, Zhai J, Yu J, Zhao X, Li Z, Zhang T, Ma W, Han Z, Ma C. CAFU: a galaxy framework for exploring unmapped RNA-Seq data. Briefings in Bioinformatics, 2020, 21(2): 676-686 (Co-first author).
Zhai J, Tang Y, Yuan H, Wang L, Shang H, Ma C. A meta-analysis based method for prioritizing candidate genes involved in a pre-specific function. Frontiers in Plant Science, 2016, 7:1914 (Co-first author).
Zhang T, Zhai J, Zhang X, Ling L, Li M, Xie S, Song M, Ma C. Interactive web-based annotation of plant microRNAs with iwa-miRNA. Genomics, Proteomics & Bioinformatics, 2021, doi: 10.1016/j.gpb.2021.02.010.
Cui H, Zhai J, Ma C. miRLocator: machine learning-based prediction of mature microRNAs within plant pre-miRNA sequences. PLoS one 10: e0142753.
