Skip to contents

kagiPro: First End-to-End Workflow

This quickstart is a complete first run with kagiPro: install the package, create a secure connection, build queries, execute requests, and store responses in a format ready for analysis.

The package is intentionally structured around a stable pattern:

  1. Create query objects with endpoint-specific constructors.
  2. Reuse a single kagi_connection() object.
  3. Either use kagi_fetch() for project-folder workflows, or use kagi_request() + kagi_request_parquet() directly.
  4. Keep JSON as the request audit trail, and parquet as analysis-ready output.

Once this pattern is familiar, every endpoint follows the same operational logic.

Project-folder first workflow

For endpoint-scoped project outputs (aligned with openalexPro conventions), use kagi_fetch():

q_project <- query_search("biodiversity policy", expand = FALSE)

kagi_fetch(
  connection = conn,
  query = q_project,
  project_folder = "kagi_project"
)

This writes to:

  • kagi_project/search/json
  • kagi_project/search/parquet

Install and load the package

If kagiPro is not installed yet, install it from GitHub and load it:

if (requireNamespace("remotes", quietly = FALSE)) {
  install.packages("remotes")
}
remotes::install_github("rkrug/kagiPro")
library(kagiPro)

Create a secure API connection

Store your API key in your local keyring once:

keyring::key_set("API_kagi")

Then create a reusable connection object:

conn <- kagi_connection(
  api_key = function() keyring::key_get("API_kagi")
)

Using a function for api_key keeps credentials out of scripts and supports reproducible batch runs.

Build your first search query

Search is a good first endpoint because it shows how query constructors work:

q <- query_search(
  query = 'biodiversity "annual report"',
  filetype = c("pdf", "docx"),
  site = c("example.com", "gov"),
  inurl = c("2024", "report"),
  intitle = "summary",
  expand = FALSE
)

query_search() returns a named list, even for a single input. That consistency makes it easy to scale the same code from one query to many.

If you want to inspect the generated search string interactively:

Execute and persist search results

Create an output folder and run the request:

out_search <- tempfile("kagiPro-search-")
dir.create(out_search, recursive = TRUE, showWarnings = FALSE)

kagi_request(
  connection = conn,
  query = q,
  limit = 5,
  output = out_search,
  overwrite = TRUE
)

list.files(out_search, pattern = "\\.json$", full.names = TRUE)

Each request writes a JSON file. This makes reruns and audits straightforward.

Run one example from each endpoint

The remaining endpoints use the same connection and request function. Only the query constructor changes.

Enrich web

q_web <- query_enrich_web("open data portals", site = "gov", expand = FALSE)

out_web <- tempfile("kagiPro-enrich-web-")
dir.create(out_web, recursive = TRUE, showWarnings = FALSE)

kagi_request(
  connection = conn,
  query = q_web,
  output = out_web,
  overwrite = TRUE
)

Enrich news

q_news <- query_enrich_news("biodiversity policy", expand = FALSE)

out_news <- tempfile("kagiPro-enrich-news-")
dir.create(out_news, recursive = TRUE, showWarnings = FALSE)

kagi_request(
  connection = conn,
  query = q_news,
  output = out_news,
  overwrite = TRUE
)

Summarize (text input)

q_sum_text <- query_summarize(
  text = paste(
    "Biodiversity underpins ecosystem services including pollination,",
    "soil fertility, water purification, and climate regulation.",
    "Species decline has implications for resilience and wellbeing."
  ),
  engine = "cecil",
  summary_type = "summary",
  target_language = "EN",
  cache = TRUE
)

out_sum <- tempfile("kagiPro-summarize-")
dir.create(out_sum, recursive = TRUE, showWarnings = FALSE)

kagi_request(
  connection = conn,
  query = q_sum_text,
  output = out_sum,
  overwrite = TRUE
)

FastGPT

q_fast <- query_fastgpt(
  query = "What are ecosystem services?",
  cache = TRUE,
  web_search = TRUE
)

out_fast <- tempfile("kagiPro-fastgpt-")
dir.create(out_fast, recursive = TRUE, showWarnings = FALSE)

kagi_request(
  connection = conn,
  query = q_fast,
  output = out_fast,
  overwrite = TRUE
)

Convert JSON results to parquet

When you move from inspection to analysis pipelines, parquet is usually more convenient:

parquet_dir <- tempfile("kagiPro-parquet-")

kagi_request_parquet(
  input_json = out_search,
  output = parquet_dir,
  overwrite = TRUE
)

Bridge to OpenAlex-style vector input

If you want to pass results into openalexPro/openalexVectorComp workflows, use the modular content pipeline: download content -> extract markdown -> summarize markdown.

download_content(
  project_folder = "kagi_project",
  endpoint = "search"
)

content_markdown(
  project_folder = "kagi_project",
  endpoint = "search"
)

markdown_abstract(
  project_folder = "kagi_project",
  endpoint = "search",
  summarizer_fn = summarize_with_openai,
  model = "gpt-4.1-mini"
)

# abstract parquet output is written under:
# kagi_project/search/abstract/query=<query_name>/

id is a deterministic hash of normalized URL.

Where to go next

For deeper endpoint-specific workflows (batching patterns, robust error handling, and endpoint-focused examples), continue with:

Session info