Example of using DuckDB to convert from JSON to Parquet
Load needed packages
library(httr)library(jsonlite)library(openalexR)
Thank you for using openalexR!
To acknowledge our work, please cite the package by calling `citation("openalexR")`.
To suppress this message, add `openalexR.message = suppressed` to your .Renviron file.
## For doing the conversionlibrary(DBI)library(duckdb)if (packageVersion("duckdb") <"1.0.99.9000") {warning("`duckdb` versions smaller than 1.0.99.9000 have a bug \nin the export to hive partitioned parquet files \nwhich can result in invalid parquet datasets!")}library(tibble)## Just for timing and other useful stuff in the reportlibrary(tictoc)library(knitr)list.files(path ="R",pattern =".R$",full.names =TRUE,recursive =FALSE) |>sapply(FUN = source )
unlink("data",recursive =TRUE,force =TRUE)dir.create("data")# search_term <- "toast AND biodiversity" # about 800 records# search_term <- '"deep sea" AND fishing AND illegal' # about 2500 recordssearch_term <-'"deep sea" AND fishing'# about 18600 records
Warning in oa_request(oa_query(filter = filter_i, multiple_id = multiple_id, :
The following work(s) have truncated lists of authors: W2088891049, W2970419732, W4250796065.
Query each work separately by its identifier to get full list of authors.
For example:
lapply(c("W2088891049", "W2970419732"), \(x) oa_fetch(identifier = x))
Details at https://docs.openalex.org/api-entities/authors/limitations.
Warning in oa_request(oa_query(filter = filter_i, multiple_id = multiple_id, :
The following work(s) have truncated lists of authors: W2088891049, W2970419732, W4250796065.
Query each work separately by its identifier to get full list of authors.
For example:
lapply(c("W2088891049", "W2970419732"), \(x) oa_fetch(identifier = x))
Details at https://docs.openalex.org/api-entities/authors/limitations.
Warning in oa_request(openalexR::oa_query(entity = "works", fulltext.search = search_term), :
The following work(s) have truncated lists of authors: W2088891049, W2970419732, W4250796065.
Query each work separately by its identifier to get full list of authors.
For example:
lapply(c("W2088891049", "W2970419732"), \(x) oa_fetch(identifier = x))
Details at https://docs.openalex.org/api-entities/authors/limitations.