Extracts DOIs or specific DOI components (resolver, prefix, or suffix) from a character vector. Assumes that each element of `x` contains at most one DOI (with or without resolver).
Usage
extract_doi(
x,
non_doi_value = "",
normalize = TRUE,
what = c("doi", "resolver", "prefix", "suffix")
)Arguments
- x
A character vector potentially containing DOIs (e.g., raw DOIs, DOI URLs, or strings with embedded DOIs).
- non_doi_value
Value to use for elements where no DOI or component is found. If `NULL`, only matched elements are returned.
- normalize
Logical. If `TRUE` (default), convert extracted DOIs and suffixes to lowercase and trim surrounding whitespace. Has no effect for `what = "prefix"` or `what = "resolver"`.
- what
What to extract from each element. One of:
- "doi"
The full DOI name (prefix + "/" + suffix). Example: `"10.5281/zenodo.1234567"` (default)
- "resolver"
The resolver URL (e.g., `"https://doi.org/"`, `"http://dx.doi.org/"`) if present
- "prefix"
The DOI prefix only (e.g., `"10.5281"`)
- "suffix"
The DOI suffix only (e.g., `"zenodo.1234567"`)
Value
A character vector: - If `non_doi_value` is not `NULL`, a vector of the same length as `x`, with unmatched entries replaced. - If `non_doi_value` is `NULL`, a vector of only matched entries.
Examples
x <- c(
"https://doi.org/10.5281/zenodo.1234567",
" 10.1000/XYZ456 ",
"no doi here",
NA
)
extract_doi(x) # Full DOIs (default)
#> [1] "10.5281/zenodo.1234567" "10.1000/xyz456" ""
#> [4] ""
extract_doi(x, what = "resolver")
#> [1] "https://doi.org/" "" "" ""
extract_doi(x, what = "prefix")
#> [1] "10.5281" "10.1000" "" ""
extract_doi(x, what = "suffix")
#> [1] "zenodo.1234567" "xyz456" "" ""
extract_doi(x, non_doi_value = NA_character_)
#> [1] "10.5281/zenodo.1234567" "10.1000/xyz456" NA
#> [4] NA
extract_doi(x, non_doi_value = NULL)
#> [1] "10.5281/zenodo.1234567" "10.1000/xyz456"