GitOps with R and K8s Part 1: A New Hope

PreReqs: R, Rmarkdown, git, Docker, and an IBM Cloud account

Once upon a time…

If you ain’t one for poetry and you ain’t one for prose, or the reasoning behind my choices, then skip the next blog post.

I’ve had a Wordpress blog for the last 6 years. While it was initially nice and easy to get my genius out to the world, I eventually ran into some limitations.

I’m first and foremost an R user. Secondly, I spend too much time on Slack and reddit where Markdown has been the formatting of choice since I can remember. Naturally, this has made Rmarkdown my tool of choice for authoring everything and anything. Over time I’ve realized that I could write in Rmarkdown, compile to an html, View Page Source, and then copypasta the html into Wordpress. This works well enough… with some exceptions:

  1. Images/plots don’t render. Those have to be manually inserted.
  2. Shiny apps, Leaflet maps, etc. still don’t work.
  3. It’s a lot of manual work for a subpar experience/outcome!

Enter Blogdown

If you don’t know blogdown or it’s cousin bookdown, you’ll soon see what you’ve been missing all of your life. As the names suggest, they’re R packages that allow you to generate blogs or books entirely from Rmarkdown! Thank you Yihui Xie!

In short, you write your posts/projects in Rmarkdown, that gets dressed up and compiled to a fully functioning static html site via Hugo:

Hugo is one of the most popular open-source static site generators. With its amazing speed and flexibility, Hugo makes building websites fun again.

To review:

Rmarkdown <- R + markdown
static.html.site <- Hugo(Blogdown(Rmarkdown))

Kubernetes

The next question is:

Where do I host my website?

There are options. Wordpress ain’t it. The blogdown book has a few options; netlify and github/gitlab. These suggestions are all prefaced by Yihui with:

Since the website is basically a folder containing static files, it is much easier to deploy than websites that require dynamic server-side languages such as PHP or databases.

While this is a great way to get your website hosted, it:

  1. still requires hosting Shinyapps elsewhere (and iframe-ing them into your pages)
  2. is not my employer’s preferred answer; i.e. Kubernetes (k8s)

So that’s what this guide is. It’s a guide on deploying a blogdown site to k8s. You can use any k8s, even a local instance running on your laptop; e.g. micro.k8s, oc cluster up, Code Ready Containers, etc. But without a static IP address to your house, maintaining a functioning website will be difficult. But the part of the value of GitOps/Code as Infrastructure setup is that it’s easy to scale up. But…

I’ll be running on the IBM Kubernetes Service (IKS) because I get free compute there :) The instructions are probably not that different from deploying to any other k8s service (e.g. EKS, GKS, AKS), but I promise you: The IBM Cloud is the best! ;)

Deploying to k8s has several advantages:

  1. You can run shinyapps on your website!
  2. You get to use/some experience with:
    1. k8s
    2. containers
    3. git
    4. all that the continuous integration/continuous delivery (CI/CD) stuff
  3. With GitOps, k8s can scale to do real compute; e.g. Kubeflow… which brings me to my next point

Hell hath no fury…

“What about Kubeflow!?” you ask. Well… Kubeflow (KF) is made for Python data scientists. Rstudio? No. KF uses Jupyter Notebooks. Tools like Plumber, OpenCPU, or RestRserve for APIs? Nope. Gotta use Seldon or KFServing. Shiny apps? Nope. You gotta use Flask.

After realizing that the R ecosystem already has an answer to almost every KF component (except Argo CD), and with many of those components already neatly packaged by The Rocker Project, I decided:

Moreover, after working on KF-like systems for a few years now I’ve come to the conclusion that I really like working on my own machine(s). Don’t get me wrong, The Cloud ™ is great and all, but it’s kind of a pain when doing data work. Need a C++ library not on the platform? Gotta ask the admin. Data connection not working? Go bother your DBA again! Aside from all of the well-deserved hype around deep learning and Spark, I rarely need a distributed or deep learning computing framework.

What I need is a webserver for my Rmarkdown docs and Shiny apps that can scale when my genius hits the masses. The fact that k8s is extensible, conforms to the CI/CD GitOps workflow style, and will continue to allow me to predominantly work on my own machines… I got excited enough to write it all down! Then that original blog post kept getting larger and larger until I had to break it up :)

The Plan

First, we’ll get a website built, containerized, and hosted on Docker Hub; that’s continuous integration (CI). Then we’ll stand up a k8s cluster and deploy your container/website to the cluster using Argo CD; that’s continuous delivery (CD). This will give you a system where you update your website locally, push your changes to Github, and your website is then updated automatically.

There are endless variations of this recipe. You can use any git solution, not just Github. You can use any container registry, (even a self-hosted one on your k8s cluster), not just Docker Hub. You can run local k8s or hosted. You could use Flux instead of Argo CD… or even a cron job looking for changes to your repos. I tried to use the most popular tools.

In Part 2, I’ll discuss the CI part. In part 3, I’ll discuss the continuous delivery (CD) part.