Run Puppeteer with Docker on Fly.io

I've been building an "analyzer" tool for PicPerf.dev (it's not live yet). Give it a URL, and you'll get a snapshot of how many kilobytes you could save by running your images through PicPerf.

The heavy lifting for the tool lives in a small Node application deployed on Fly.io. To collect images from the provided page, I'm using Google's Puppeteer to spin up headless browser, render the page, and extract the image URLs. Locally, it works fine. Deploying it, however, is more challenging.

Puppeteer requires a series of system-level dependencies in order to function, and you don't get them for free when running a Docker image. Without them, you'll get an error that resembles this:

Could not find Chrome (ver. 114.0.5735.133). This can occur if either
 1. you did not perform an installation before running the script (e.g. `npm install`) or
 2. your cache path is incorrectly configured (which is: /root/.cache/puppeteer).
For (2), check out our guide on configuring puppeteer at https://pptr.dev/guides/configuration.

Because things were simple, I was allowing Fly to auto-detect my Node application, so I had no Dockerfile – running fly deploy did everything I needed. But this meant I lacked these necessary Puppeteer dependencies. As a result, I decided to go the more fine-tuned, controlled Docker route.

To be clear, there's no shortage of results out there that help you Dockerize Puppeteer. But the ones I found were often no longer supported, relied on old versions of Node, or came bundled with other baggage I didn't want. And knowing that Fly is extremely Docker-friendly, starting with a more vanilla image and then tacking on more goodies was a much more attractive path. There didn't seem to be much help out there on that front, so you're getting this post.

Generating a Dockerfile

Being so Docker-friendly, Fly makes it really easy to generate a basic Dockerfile for a Node application. Let's kick it off by running the following:

npx @flydotio/dockerfile

That'll get you a basic Dockerfile from the dockerfile-node project. At the time mine was generated, here's how it looked:

# syntax = docker/dockerfile:1

# Adjust NODE_VERSION as desired
ARG NODE_VERSION=20.3.0
FROM node:${NODE_VERSION}-slim as base

LABEL fly_launch_runtime="Node.js"

# Node.js app lives here
WORKDIR /app

# Set production environment
ENV NODE_ENV=production

# Throw-away build stage to reduce size of final image
FROM base as build

# Install packages needed to build node modules
RUN apt-get update -qq && \
    apt-get install -y python-is-python3 pkg-config build-essential 

# Install node modules
COPY --link package-lock.json package.json ./
RUN npm ci --include=dev

# Copy application code
COPY --link . .

# Build application
RUN npm run build

# Remove development dependencies
RUN npm prune --omit=dev

# Final stage for app image
FROM base

# Copy built application
COPY --from=build /app /app

# Start the server by default, this can be overwritten at runtime
EXPOSE 3000
CMD [ "npm", "run", "start" ]

It's fairly straightforward multi-stage setup:

Start with a base Node image.
Install my dependencies & build the application code in a separate stage.
Dump that built code into a "fresh" stage.
Start the application.

The result is a slim container that only has what's needed to run the application.

Finding Puppeteer's Dependencies

Resources out there on which dependencies are required to run Puppeteer are inconsistent. It was confusing when I started digging in. Eventually, I realized that Puppeteer's documentation is already clear about what's needed.

On its troubleshooting page, there's an example Dockerfile that lists those dependencies. It's even on a Node base image similar to what I wanted, so it's ready to largely copy & paste. The most important piece is placing that copied portion into a particular stage of our Dockerfile.

If you look back, the final deployed stage inherits from base. When it comes to the code needed for runtime, the build stage has nothing to do with it. Its only purpose was to build the code and then be thrown away to keep the final image smaller. Here's a slimmer, annotated version of that file:

# syntax = docker/dockerfile:1

FROM node:20.3.0-slim as base

# Start with a plain Node image.

FROM base as build

# Install dependencies & build application code.

FROM base

# Copy the application code from `build` and start it up.
COPY --from=build /app /app

EXPOSE 3000
CMD [ "npm", "run", "start" ]

And that means I needed to install those dependencies either in the final stage or the initial base stage. We're going with the latter:

# syntax = docker/dockerfile:1

FROM node:20.3.0-slim as base

# Start with a plain Node image.

+ # Install latest chrome dev package and fonts to support major charsets (Chinese, Japanese, Arabic, Hebrew, Thai and a few others)
+ # Note: this installs the necessary libs to make the bundled version of Chrome that Puppeteer
+ # installs, work.
+ RUN apt-get update \
+     && apt-get install -y wget gnupg \
+     && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | gpg --dearmor -o /usr/share/keyrings/googlechrome-linux-keyring.gpg \
+     && sh -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/googlechrome-linux-keyring.gpg] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
+     && apt-get update \
+     && apt-get install -y google-chrome-stable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-khmeros fonts-kacst fonts-freefont-ttf libxss1 \
+       --no-install-recommends \
+     && rm -rf /var/lib/apt/lists/*

FROM base as build

# Install dependencies & build application code.

FROM base

# Copy the application code from `build` and start it up.
COPY --from=build /app /app

EXPOSE 3000
CMD [ "npm", "run", "start" ]

Since the application is run in a stage inheriting from base, we'll have those dependencies available when everything's deployed.

No Need to Use Bundled Chromium

You might've noticed that google-chrome-stable is being installed as a system dependency – the Linux version of Chrome ready to use inside the container. But since puppeteer is a dependency in our package.json file, it's actually downloading Chrome twice. Wasteful.

To use that Linux version and bypass a full download by the puppeteer package itself, we can set this environment variable before npm install is run:

ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true

And then, when initializing a browser, the executablePath is set to this installed version:

const browser = await puppeteer.launch({
	executablePath: "/usr/bin/google-chrome",
	// ... other options
});

That'll save a bit of build time, since Chrome will no longer be downloaded twice.

Testing It

It sucks to deploy something only to find out it still doesn't work. Because of that, let's build the image locally to try things out before shipping.

First, build it:

docker build -t my-puppeteer-image .

And then, run it:

docker run -p 3000:3000 my-puppeteer-image

The application should be accessible at http://localhost:3000, ready to run. Assuming you're running an HTTP service, sending a request should get you a successful response with no Puppeteer-related errors.

Deploying It

The easiest part (thanks, Fly). Run fly deploy. Wait a bit. Then party.

4 comments

Ramesh
09/02/2024

Unable to locate package google-chrome-stable
Pushpendra Upadhyay
07/12/2024

Unable to locate package google-chrome-stable
1 reply
- Ramesh
  09/02/2024
  
  Getting same issue. Not able to solve.
Daniel Kanem
05/19/2024

Thanks a lot, Alex! Not sure if you can provide a GitHub link for this example, or if you have some new workaround for yourself you would like to share!
Million thanks!
Steve
08/18/2023

This article was a BIG help, thank you!!

Run Puppeteer with Docker on Fly.io

Generating a Dockerfile

Finding Puppeteer's Dependencies

No Need to Use Bundled Chromium

Testing It

Deploying It

Hopefully At Least Marginally Related Posts

Collect All Requested Images on a Website Using Puppeteer

Run Puppeteer with Docker on Fly.io

Drop a Comment

4 comments

Ramesh

Pushpendra Upadhyay

1 reply

Ramesh

Daniel Kanem

Steve

0 replies

Search My Site

Run Puppeteer with Docker on Fly.io

Generating a Dockerfile

Finding Puppeteer's Dependencies

No Need to Use Bundled Chromium

Testing It

Deploying It

Hopefully At Least Marginally Related Posts

Collect All Requested Images on a Website Using Puppeteer

Run Puppeteer with Docker on Fly.io

Drop a Comment

4 comments

Ramesh

Pushpendra Upadhyay

1 reply

Ramesh

Daniel Kanem

Steve

0 replies

Get emailed about new posts.

Search My Site