kagiPro: First End-to-End Workflow
This quickstart is a complete first run with kagiPro: install the package, create a secure connection, build queries, execute requests, and store responses in a format ready for analysis.
The package is intentionally structured around a stable pattern:
- Create query objects with endpoint-specific constructors.
- Reuse a single
kagi_connection()object. - Either use
kagi_fetch()for project-folder workflows, or usekagi_request()+kagi_request_parquet()directly. - Keep JSON as the request audit trail, and parquet as analysis-ready output.
Once this pattern is familiar, every endpoint follows the same operational logic.
Project-folder first workflow
For endpoint-scoped project outputs (aligned with openalexPro conventions), use kagi_fetch():
q_project <- query_search("biodiversity policy", expand = FALSE)
kagi_fetch(
connection = conn,
query = q_project,
project_folder = "kagi_project"
)This writes to:
kagi_project/search/jsonkagi_project/search/parquet
Install and load the package
If kagiPro is not installed yet, install it from GitHub and load it:
if (requireNamespace("remotes", quietly = FALSE)) {
install.packages("remotes")
}
remotes::install_github("rkrug/kagiPro")
library(kagiPro)Create a secure API connection
Store your API key in your local keyring once:
keyring::key_set("API_kagi")Then create a reusable connection object:
conn <- kagi_connection(
api_key = function() keyring::key_get("API_kagi")
)Using a function for api_key keeps credentials out of scripts and supports reproducible batch runs.
Build your first search query
Search is a good first endpoint because it shows how query constructors work:
q <- query_search(
query = 'biodiversity "annual report"',
filetype = c("pdf", "docx"),
site = c("example.com", "gov"),
inurl = c("2024", "report"),
intitle = "summary",
expand = FALSE
)query_search() returns a named list, even for a single input. That consistency makes it easy to scale the same code from one query to many.
If you want to inspect the generated search string interactively:
open_search_query(q[[1]])Execute and persist search results
Create an output folder and run the request:
out_search <- tempfile("kagiPro-search-")
dir.create(out_search, recursive = TRUE, showWarnings = FALSE)
kagi_request(
connection = conn,
query = q,
limit = 5,
output = out_search,
overwrite = TRUE
)
list.files(out_search, pattern = "\\.json$", full.names = TRUE)Each request writes a JSON file. This makes reruns and audits straightforward.
Run one example from each endpoint
The remaining endpoints use the same connection and request function. Only the query constructor changes.
Enrich web
q_web <- query_enrich_web("open data portals", site = "gov", expand = FALSE)
out_web <- tempfile("kagiPro-enrich-web-")
dir.create(out_web, recursive = TRUE, showWarnings = FALSE)
kagi_request(
connection = conn,
query = q_web,
output = out_web,
overwrite = TRUE
)Enrich news
q_news <- query_enrich_news("biodiversity policy", expand = FALSE)
out_news <- tempfile("kagiPro-enrich-news-")
dir.create(out_news, recursive = TRUE, showWarnings = FALSE)
kagi_request(
connection = conn,
query = q_news,
output = out_news,
overwrite = TRUE
)Summarize (text input)
q_sum_text <- query_summarize(
text = paste(
"Biodiversity underpins ecosystem services including pollination,",
"soil fertility, water purification, and climate regulation.",
"Species decline has implications for resilience and wellbeing."
),
engine = "cecil",
summary_type = "summary",
target_language = "EN",
cache = TRUE
)
out_sum <- tempfile("kagiPro-summarize-")
dir.create(out_sum, recursive = TRUE, showWarnings = FALSE)
kagi_request(
connection = conn,
query = q_sum_text,
output = out_sum,
overwrite = TRUE
)FastGPT
q_fast <- query_fastgpt(
query = "What are ecosystem services?",
cache = TRUE,
web_search = TRUE
)
out_fast <- tempfile("kagiPro-fastgpt-")
dir.create(out_fast, recursive = TRUE, showWarnings = FALSE)
kagi_request(
connection = conn,
query = q_fast,
output = out_fast,
overwrite = TRUE
)Convert JSON results to parquet
When you move from inspection to analysis pipelines, parquet is usually more convenient:
parquet_dir <- tempfile("kagiPro-parquet-")
kagi_request_parquet(
input_json = out_search,
output = parquet_dir,
overwrite = TRUE
)Bridge to OpenAlex-style vector input
If you want to pass results into openalexPro/openalexVectorComp workflows, use the modular content pipeline: download content -> extract markdown -> summarize markdown.
download_content(
project_folder = "kagi_project",
endpoint = "search"
)
content_markdown(
project_folder = "kagi_project",
endpoint = "search"
)
markdown_abstract(
project_folder = "kagi_project",
endpoint = "search",
summarizer_fn = summarize_with_openai,
model = "gpt-4.1-mini"
)
# abstract parquet output is written under:
# kagi_project/search/abstract/query=<query_name>/id is a deterministic hash of normalized URL.
Where to go next
For deeper endpoint-specific workflows (batching patterns, robust error handling, and endpoint-focused examples), continue with: