CockroachDB is known for handling terabytes of mission-critical transactions where the data and users are globally distributed. R is known for “statistical computing and graphics” where the data is in the RAM of a single system used by a single user. With CockroachDB’s recent vectorized SQL engine, R users can take advantage of CockroachDB’s globally distributed data.
CockroachDB is built to be largely compatible with PostgreSQL, meaning that software written to use PostgreSQL can sometimes (often!) be used with CockroachDB without changes. This means R’s existing Postgres DBI interface should work with CockroachDB.
Start CockroachDB and MovR
cockroach start --insecure --port=26257 --http-port=26258 --store=cockroach-data/1 --cache=256MiB --background cockroach workload init movr 'postgresql://root@localhost:26257?sslmode=disable' cockroach workload run movr --concurrency 1
Start R with CockroachDB and ggplot2
Open a second terminal and start the following. R DBI interface and ggplot2 setup are the same. Note the default port number on CockroachDB is 26257 (highlighted below).
# Install the latest RPostgres release from CRAN: install.packages("RPostgres") # Install ggplot and related to graph MovR activities install.packages("dplyr") install.packages("ggplot2") # Load the libraries require("ggplot2") library(DBI) # Connect to the default CockroachDB instance and MovR database con <- dbConnect(RPostgres::Postgres(),dbname = "movr", host = "localhost", port = 26257, user = "root")
Differences between CockroachDB and Postgres
CockroachDB uses 64-bit integer whereas Postgres uses 32-bit integer. ggplot2 will complain regarding the integer64. The fix on ggplot2 can be tracked via #2377. A workaround, for now, is to cast 64-bit integer to 32-bit integer using ::int4 in CockroachDB or as.integer in R. In the R CLI, the following will demonstrate both workarounds.
# below will return CockroachDB default 64 bit integer > df_users_64 <- dbGetQuery(con, "select city, count(*) from users group by city;") > str(df_users_64) 'data.frame': 9 obs. of 2 variables: $ city : chr "amsterdam" "boston" "los angeles" "new york" ... $ count:integer64 1897 1924 1938 1882 1938 1902 1929 1893 ... # below will return count with Postgres default 32 bit integer > df_users <- dbGetQuery(con, "select city, count(*)::int4 from users group by city;") > str(df_users) 'data.frame': 9 obs. of 2 variables: $ city : chr "amsterdam" "boston" "los angeles" "new york" ... $ count: int 1986 2009 2029 1963 2021 1989 2019 1960 2023 # df_users has count as int4 which maps to R integer > ggplot(df_users, aes(city,count,fill=city)) + geom_col() # df_users_64 has count as integer64. Use as.integer to convert 64 bit integer > ggplot(df_users_64, aes(city,as.integer(count),fill=city)) + geom_col()