bitBites (2026-01-01): LLM shiny, Advert of Agent, FOCUS, Scalable Perturb-seq
Inspired by Stephen Turner1 ’s weekly recap series, I’m starting a series called bitBites to regularly review what I read and learn. This is part of my New Year’s resolution. In an era of information overload, I benefit greatly from curators like Stephen who consistently surface content aligned with my interests. At the same time, I’ve realized that because everyone’s interests and perspectives differ, it’s important to create a personal “recap” as an output of my own digestion. It allows me to highlight ideas that resonate most with me and to internalize them more deeply.
Even in the AI era, where curation is easier than ever, the act of digestion remains uniquely human. Knowledge written on paper is not the same as knowledge embedded in the brain. Fifteen years after graduating from college, I can clearly feel my memory has declined, not necessarily all because of age, but because modern practice rarely requires memorization. Even in graduate school, memorization was no longer the key determinant of success. As a result, I’ve become heavily dependent on my “second brain” (Notion in particular) to retrieve information I’ve encountered before. We’ve long known that the best way to learn is to produce. Writing, teaching, and summarizing force us to transform passive input into active understanding. That’s why I’m starting the bitBites series as a record of my own learning journey to turn passive consumption into active creation.
Enough rambling, let’s get to business.
R, data science and AI
The shiny side of LLM blog series: this is probably my biggest learning activity during the holiday. The blog includes three parts: What LLMs Actually Do (and What They Don’t), Talking to LLMs: From Prompt to Response, and Build Your First LLM App with Shiny for Python or R. Each of them is a delight reading. It covers from basics of LLM, to tools (ellmer for R and chatlas for Python) that talks to LLM, and finally building a shiny app that utilizes the LLM to retrieve, summarize and evaluate content. I also learned a new programming concept called “asynchronous operation”, and expanded my reading to package documention of
{promises}, non-blocking operations and extended task.R code optimization: this is also a blog series, summarizing many tricks that I learned along the way while using R (ie, vectorization, on-disk memory, parallelization). It also covers the part that, as a pure R programmer myself, never touched. It painted a complete picture of program optimization not just limited to R.
Introducing docorator to the pharmaverse: an extension for
{gt}that “decorates” them with custom headers, footers, etc. in production-ready outputs.UMAP in R and Python is an interesting read that surprised me by showing how loaded in-memory objects can influence UMAP results in R. Although the author does not fully get to the root cause in my opinion, the post serves as a thundering warning for me (a computational biologist who frequently represents data using UMAP) about reproducibility issues that go beyond seed randomness and parameter choices.
Advent of Agent 2025: It is a 25 days tutorial created by google that covers the tutorials on how to build a AI agent from basics to application. Even though I have not gone through the whole course series, it has been my next to-do list in the rest of holiday.
FOCUS: an AI-assisted reading workflow for information overload: highlighted by Stephen in his weekly recap, was a timely and reassuring read. The short article outlining the Find–Organize–Condense–Understand–Synthesize workflow and I especially appreciated the concrete prompts that can be directly incorporated into a custom ChatGPT workflow. The article resonated strongly with my long-standing anxiety about information overload, particularly in research, where I constantly feel behind. I’ll admit that I’m a slow reader and a working mother, and I simply don’t have the capacity to keep up with hundreds of social media threads, RSS feeds, and subscribed articles—unlike the impressive system Stephen recently described in Staying Current in Data Science and Computational Biology: 2026 Edition. While this workflow won’t magically solve the problem, it offers a practical and compassionate structure that helps reduce some of the anxiety associated with staying current at work and makes valuable to me.
biology and bioinformatics
Linking regulatory variants to target genes by integrating single-cell multiome methods and genomic distance (Nature Genetics, 2025): the authors introduced an modeling framework called pgBoost that will integrate multiome data with eQTL in a non-linear way to produce a score for the linkage between SNP and gene.
Scalable genetic screening for regulatory circuits using compressed Perturb-seq (Nature Biotechnology 2023): compressed design in composite sample together with sparse promoting inference algorithm enables cost and efficiency enhanced approach to use Perturb-seq. This is under assumpation of “biological sparsity and modality in cell circut”, thus applies best to illustrate effects of GWAS SNPs.
Systematic reconstruction of molecular pathway signatures using scalable single-cell perturbation screens (Nature Cell Biology 2025): a nice Perturb-seq dataset that can be used for atlas building and in silico perturbation training.
SMMILe enables accurate spatial quantification in digital pathology using multiple-instance learning: it is a new spatial and pathology image labeling method through weakly supervised multi-instance learning (MIL). Enchoring the single cell sample-specific analysis method (multiMIL) in which treating cell as “instance” and sample as “bag”, SMMILe treated pixel patch as instance to classify the sample.
Footnotes
Steve served on my thesis committee for the defense, and he also introduced me to the
{tidyverse}through a workshop at UVa. Over the years, I’ve been deeply inspired by his blog posts and social media presence. Each time he made a career transition from academia to industry and then back to academia, I found myself re-examining my own career path. His moves consistently reminded me of how important it is to stay current, both in knowledge and skills, especially in today’s insecure job market and fast-paced field. I feel fortunate to have had such a mentor in my career and would like to dedicate this first bitBite blog post to him.↩︎