Getting started with CockroachDB in R

CockroachDB is known for handling terabytes of mission-critical transactions where the data and users are globally distributed.  R is known for “statistical computing and graphics” where the data is in the RAM of a single system used by a single user.  With CockroachDB’s recent vectorized SQL engine, R users can take advantage of CockroachDB’s globally distributed data.  

Screen Shot 2020-01-02 at 9.06.36 PM

CockroachDB is built to be largely compatible with PostgreSQL, meaning that software written to use PostgreSQL can sometimes (often!) be used with CockroachDB without changes.  This means R’s existing Postgres DBI interface should work with CockroachDB.

The original inspiration was taken from Getting started with PostgresSQL in R from R-bloggers.  Changes were made to reflect the use of RPostgres, CockroachDB MovR sample application, and CockroachDB.

Start CockroachDB and MovR

Open a terminal and start the following.  We will start a simple single-node CockroachDB.  Then start the MovR application running in the background while we work on R.  

cockroach start --insecure --port=26257 --http-port=26258 --store=cockroach-data/1 --cache=256MiB --background
cockroach workload init movr 'postgresql://root@localhost:26257?sslmode=disable'
cockroach workload run movr --concurrency 1

Start R with CockroachDB and ggplot2

Open a second terminal and start the following.  R DBI interface and ggplot2 setup are the same. Note the default port number on CockroachDB is 26257 (highlighted below).

# Install the latest RPostgres release from CRAN:
install.packages("RPostgres")
# Install ggplot and related to graph MovR activities
install.packages("dplyr")
install.packages("ggplot2")
# Load the libraries
require("ggplot2")
library(DBI)

# Connect to the default CockroachDB instance and MovR database
con <- dbConnect(RPostgres::Postgres(),dbname = "movr",
                 host = "localhost", port = 26257,
                 user = "root")

Differences between CockroachDB and Postgres

CockroachDB uses 64-bit integer whereas Postgres uses 32-bit integer.  ggplot2 will complain regarding the integer64.  The fix on ggplot2 can be tracked via #2377.  A workaround, for now, is to cast 64-bit integer to 32-bit integer using ::int4 in CockroachDB or as.integer in R.  In the R CLI, the following will demonstrate both workarounds.

# below will return CockroachDB default 64 bit integer
> df_users_64 <- dbGetQuery(con, "select city, count(*) from users group by city;")
> str(df_users_64)
'data.frame': 9 obs. of  2 variables:
 $ city : chr  "amsterdam" "boston" "los angeles" "new york" ...
 $ count:integer64 1897 1924 1938 1882 1938 1902 1929 1893 ...

# below will return count with Postgres default 32 bit integer
> df_users <- dbGetQuery(con, "select city, count(*)::int4 from users group by city;")
> str(df_users)
'data.frame': 9 obs. of  2 variables:
 $ city : chr  "amsterdam" "boston" "los angeles" "new york" ...
 $ count: int  1986 2009 2029 1963 2021 1989 2019 1960 2023

# df_users has count as int4 which maps to R integer
> ggplot(df_users, aes(city,count,fill=city)) + geom_col()

# df_users_64 has count as integer64.  Use as.integer to convert 64 bit integer
> ggplot(df_users_64, aes(city,as.integer(count),fill=city)) + geom_col()
This entry was posted in cockroachdb, R and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s