Convenience wrapper around pro_request,
pro_request_jsonl and
pro_request_jsonl_parquet.
Usage
pro_fetch(
query_url,
pages = 10000,
project_folder = NULL,
overwrite = FALSE,
api_key = Sys.getenv("openalexPro.apikey"),
workers = 1,
verbose = FALSE,
progress = TRUE,
count_only,
error_log = NULL
)Arguments
- query_url
The URL of the API query or a list of URLs returned from
pro_query().- pages
The number of pages to be downloaded. The default is set to 10000, which would be 2,000,000 works. It is recommended to not increase it beyond 100000 due to server load and to use the snapshot instead. If
NULL, all pages will be downloaded. Default: 100000.- project_folder
Directory where all intermediate (
json,jsonl) and final (parquet) results are stored. If it does not exist, it is created. IfNULL, a temporary directory is created.- overwrite
Logical. If
TRUE,outputwill be deleted if it already exists.- api_key
Character string API key or
NULL. Defaults toSys.getenv("openalexPro.apikey"). IfNULLor"", requests are sent without an API key (subject to OpenAlex's unauthenticated limits).- workers
Number of parallel workers to use if
query_urlis a list. Defaults to 1.- verbose
Logical indicating whether to show verbose messages.
- progress
Logical indicating whether to show a progress bar. Default
TRUE.- count_only
Do not use it here. The function will abort if it set to
TRUEand give a warning ifFALSE- error_log
location of error log of API calls. (default:
NULL(none)).
Value
Invisibly, the normalized path of the parquet subfolder
inside project_folder, i.e. the value returned by
pro_request_jsonl_parquet().
Details
The function
downloads records from OpenAlex via
pro_request()into a"json"subfolder ofproject_folder,converts the JSON files to
jsonlviapro_request_jsonl()into a"jsonl"subfolder, andconverts the jsonl files to an Apache Parquet dataset via
pro_request_jsonl_parquet()into a"parquet"subfolder.
This is a high-level helper for the common workflow of going from an OpenAlex query URL to a local Parquet dataset in a single call. In most cases, this function should be sufficient, but if more control is needed, the individual functions have to be called separately.
This function assumes count_only == FALSE