Functional programming with purrr, parallelized with mirai
purrr
functional programming
mirai
parallel computing
map() replaces the for-loop; mirai replaces ‘one core at a time’. Learn purrr’s functional style, then parallelize it with a single wrapper function – no rewrite required.
Author
Nelson Amaya
Published
July 5, 2026
Modified
July 5, 2026
“We build our computer (systems) the way we build our cities: over time, without a plan, on top of ruins.” –Ellen Ullman
PART I: The for-loop has to go
You already know the shape of this problem. You have a list of things –countries, files, model subsets, API responses– and you need to do the same operation to each one. The instinct from other languages is a for loop:
results <-vector("list", length(inputs))for (i inseq_along(inputs)) { results[[i]] <-some_function(inputs[[i]])}
It works. It is also four lines of bookkeeping (pre-allocate, index, assign, remember the loop variable) around one line of actual logic. purrr::map() is that one line:
results <- purrr::map(inputs, some_function)
map() always returns a list. Its typed siblings return a vector instead, and fail loudly if your function doesn’t produce the type you promised –a feature, not an inconvenience: silently getting a list back when you expected numbers is a classic source of bugs.
map2() and pmap() extend this to two or many parallel inputs walked together:
Click me!
map2_dbl(1:3, 4:6, \(x, y) x + y)list(a =1:3, b =4:6, c =7:9) |>pmap_dbl(\(a, b, c) mean(c(a, b, c)))
1
map2_dbl() walks two vectors together, element by element.
2
pmap_dbl() generalizes this to as many parallel inputs as you like, taken from a list.
[1] 5 7 9
[1] 4 5 6
And when you want the side effect, not the return value –printing, saving a file, sending a Slack message– walk() is map()’s quiet twin: it runs your function for every element and returns its input invisibly, so nothing clutters your console.
Click me!
1:3|>walk(\(x) cat("Processing", x, "\n"))
Processing 1
Processing 2
Processing 3
A real (slightly slow) example
Here’s where this stops being a toy. Say you need to fit a model per group –a common pattern: split a dataset, fit the same model shape to every piece, compare the results. We’ll pretend each fit takes real work (a tenth of a second) so the timing later in this session means something:
Three groups, a tenth of a second each: about 0.3 seconds if nothing overlaps. Hold that number.
PART II: When parallel is actually worth it
Everything above ran sequentially –one group after another, on one CPU core. Your computer almost certainly has several cores sitting idle while that happened. The obvious next thought is “why not do all three at once?” Sometimes you should. Sometimes you shouldn’t:
NoteParallel helps when…
Each iteration takes a real amount of time –at least a millisecond, ideally much more. Model fitting, API calls, web scraping, heavy simulation.
The work is CPU-bound (lots of computation) or I/O-bound (waiting on a network/disk) –both benefit, for different reasons: CPU-bound work spreads across cores, I/O-bound work overlaps the waiting.
The data passed to each worker isn’t enormous –sending gigabytes to every parallel process costs more than it saves.
Warning…and hurts when
Spinning up parallel workers has fixed overhead –starting processes, sending data and code to them, collecting results back. For quick operations (simple arithmetic, a fast filter, anything sub-millisecond), that overhead outweighs the work itself. Sequential map() will often be faster on small, fast tasks. Parallelize the slow thing, not the whole script.
This is a judgment call, not a rule you can memorize –the right response to “should I parallelize this?” is usually to time both versions and look, which is exactly what the rest of this session does.
PART III: mirai, purrr’s parallel engine
mirai is a lightweight async/parallel framework from Posit’s Charlie Gao, and as of purrr 1.1.0 it’s purrr’s own official parallel backend. daemons() launches persistent background R processes (“daemons”) that stick around waiting for work:
.progress shows a live progress bar; .flat returns a flat vector instead of a list of length-1 results –both are optional subsetting flags on the result.
PART IV: The one-line bridge –purrr::in_parallel()
You don’t have to rewrite your map() calls to use mirai directly. purrr::in_parallel() wraps the function you’re already passing to map(), and purrr routes the work to whatever daemons are running:
The only change from PART I’s code: slow_lm(...) wrapped in in_parallel(...), with slow_lm = slow_lm passed through explicitly (more on why in PART V).
2
Same R² values as the sequential version –parallelizing doesn’t change what you compute, only how long it takes.
Same three groups, same tenth-of-a-second delay each, same results –but this time the three slow_lm() calls overlap instead of queueing, so the wall-clock time drops toward “however long the slowest single group takes,” not “the sum of all of them.” If you never call daemons(), in_parallel() silently falls back to running sequentially –it’s a real gotcha: your code won’t error, it just won’t be any faster, and you might not notice for a while.
PART V: The gotcha –parallel functions must be self-contained
A mirai daemon is a separate R process with no access to your R session’s workspace. Whatever in_parallel() wraps gets serialized and sent over, which means it can’t casually reach for variables sitting in your global environment:
The same applies to packages: reference functions with pkg::fun(), or library()inside the function body, rather than relying on a package you loaded in your main session:
map(1:3, in_parallel(\(x) vctrs::vec_init(integer(), x))) # ✅ explicit namespacemap(1:3, in_parallel(\(x) {library(vctrs)vec_init(integer(), x)})) # ✅ also fine
ImportantWhy this is the #1 way parallel code silently misbehaves
Forgetting to pass a dependency doesn’t always error clearly –sometimes it errors confusingly, sometimes (if the daemon happens to have something similarly named loaded) it does the wrong thing quietly. When a map(in_parallel(...)) call breaks and a plain map() call with the same function doesn’t, check this first.
Already used this, one level up
Session 9’s ellmer::parallel_chat_structured() –scoring 48 songs concurrently– is this exact idea, wrapped: ellmer manages the daemons and the request batching for you, because “call this LLM prompt many times at once” is common enough to deserve its own shortcut. Now you know what’s underneath it, and that the same mechanism is available for any R function, not just LLM calls.
TipWhere this goes next: responsive Shiny apps
mirai also ships event-driven promises and is the recommended async backend for Shiny’s ExtendedTask –meaning a slow computation (a big model fit, a batch API call) can run in the background without freezing the app for every other user. That’s a production-Shiny concern, not something we build out today, but it’s the same daemons()/mirai() machinery from PART III underneath.
Practice
NoteEasy
Rewrite one for loop from an earlier session (session 2’s API calls, or session 6’s scraping) as a map() call.
Time the sequential PART I example again, but with 6 groups instead of 3 (hint: you’ll need a bigger dataset, or repeat the same groups) –does the gap between sequential and parallel grow?
ImportantIntermediate
Take one of session 6’s scraping functions (fetching one page per artist/URL) and parallelize the fetch with daemons() + in_parallel(). Time both versions –I/O-bound tasks like network requests often parallelize better than CPU-bound ones, since the wait, not the computation, is what overlaps.
Deliberately break the self-containment rule (reference a variable from your global environment without passing it to in_parallel()) and read the resulting error –this is the error you want to recognize instantly in the wild.
CautionAdvanced
Read mirai’s docs on remote daemons (SSH, or an HPC scheduler like Slurm) –the purrr code you write doesn’t change at all; only the daemons() call does. What does that separation buy you?