bitBites (2026-01-28): Posit Recap, EcoTyper and H&E to proteomics

bitBite
Author

Chun Su

Published

January 28, 2026

This week we will recap Posit summary of 2025, Claude-assisted refactoring stories, a few bioinformatics relevant new R packages, and some exciting work predicting spatial biology from H&E images.

R, data science and AI

Posit 2025 recap

Posit 2025 recap provides a very comprehensive list of packages supported/developed by Posit. The parts new to me include

  • {nanonext} is a R binding to NNG (Nanomsg Next Generation), a C messaging library that implements scalability protocols for distributed systems. By transferring data between languages through socket, it enables seamless interoperability between R and other modern programming languages.

  • {mirai} is a foundation for asynchronous and parallel computing across the R ecosystem. I encountered asynchronous computing in my previously recap about LLM shiny app written by Veerle Eeftink - van Leemput. Besides async shiny application ({promise}), mirai also powers parallel map in {purrr} (vs{furrr}, {future}) and parallel hyperparameter tuning in {tune}

  • Tidymodels: {important} and {filtro} becomes part of tidymodel universe for feature selection with pipes. sparsity matrix is supported in tidymodels now

  • AI tools: instead of rephrasing the updates that Posit did on AI, I would like to directly cite Stephen Turner’s summary on AI tools in R universe.1

Made by Turner in Paired Ends blog

Claude Code assisted code update

Claude Code popped up twice in blogs I read this week: From scripts to pipelines in the age of LLMs by Bruno Rodrigues and Semi-automating 200 Pull Requests with Claude Code by Davis Vaughan.

Both are about using Claude to modernize legacy R code, but in very different ways. Rodrigues showed how he used LLMs to help turn messy, undocumented packages into something more structured and pipeline-friendly—basically getting machines to understand old code without spending days digging through documentation. Vaughan used Claude to fix reverse dependencies, semi-automating ~200 pull requests to replace the now-obsolete dplyr::id() across a huge R ecosystem, and he even shared the exact prompts he used (which is gold if you’re learning this stuff).

What struck me is that both posts landed on the same lesson: when you ask an agent to code for you, limit the context and force a clear structure.

New/Updated R packages of interest

  • blastar: NCBI BLAST tool realized in R, including functions to fetch sequence with accession (genebank API access), blast nucleotide and protein sequences and build a sequence phylogenetic tree

  • mascarade: create a boundary around cell clusters for seurat object (Idents) with clear boundary and label the clusters. It will be useful to label umap with too many clusters that the color scheme becomes hard to recognize by eye.

  • orthogene: a bioconductor package to map ortholog within and across species.

  • querychat: a LLM tool to convert user input into SQL to filter data. It is embedded in the shiny context. However, the tool calling part can be isolated out to custom to specific usage (eg. build app across multiple datasets).

Querychat is a particularly good fit for Shiny apps that have:

  1. A single data source (or a set of related tables that can be joined)

  2. Multiple filters that let users slice and explore the data in different ways

  3. Several visualizations and outputs that all depend on the same filtered da

Bioinformatics and Biology

EcoTyper

EcoTyper, introduced in 2021, is a framework for the systematic identification of cell states and cellular communities (ecotypes) from bulk, single-cell and spatial gene expression data. It is a extention of in silico cytometry tool CIBERSORTx (Newman et al., Nature Biotechnology 2019), and created finer “sorting” and state-state community. The innovation relies on:

  1. Adaptive false positive index (AFI) metric to filter spurious clusters after cNMF without prior knowledge
  2. Using co-association between states to identify multicellular ecotypes without prior spatial information

H&E image to spatial proteomics

Two recent papers made me realize how serious H&E images are becoming for biomarker discovery, especially for clinical trials where spatial assays are still expensive and hard to scale.

  • Valanarasu et al Cell 2026 introduce GigaTIME, a model trained on paired H&E and Multiplex immunofluorescence (mIF) data across 21 proteins to generate virtual spatial protein maps. They in silico predicted 21 proteins across 24 cancer types and 300+ subtypes, and then linked predicted biomarkers to clinical phenotypes like tumor stage and survival.

  • Li et al Nature Medicine 2026 introduce HEX which predicts the spatial expression of ~40 biomarkers (immune, structural, functional) directly from routine H&E slides. The prediction accuracy is pretty solid (AUC around 0.7–0.8), and combining these virtual proteomics features improved lung cancer prognosis and immunotherapy response prediction compared with standard clinical and molecular biomarkers.

These papers really make H&E feel like a compressed “omics file”, cheap to generate, but increasingly rich enough to pull out spatial biology and potential biomarkers at scale.

No matching items

Footnotes

  1. The Modern R Stack for Production AI, Paired Ends, https://blog.stephenturner.us/p/r-production-ai↩︎