Summarize Markdown into Query-Level Abstract Parquet
Source:R/markdown_abstract.R
markdown_abstract.RdRead markdown files generated for a specific endpoint/query and summarize each record with either OpenAI or Kagi text summarization. The result is written as a single parquet file per query under `abstract/`.
Usage
markdown_abstract(
project_folder,
endpoint = NULL,
query_name = NULL,
workers = 4,
progress = interactive(),
verbose = FALSE,
summarizer_fn = summarize_with_openai,
model = "gpt-4.1-mini",
connection = NULL,
provider_args = list(),
markdown_root = "markdown",
abstract_root = "abstract"
)Arguments
- project_folder
Root project folder containing endpoint subfolders.
- endpoint
Optional endpoint selector (for example `"search"` or `"enrich_news"`). If `NULL`, all supported endpoints are considered.
- query_name
Optional query selector. If `NULL`, all query partitions are considered.
- workers
Number of parallel workers to use for summarization.
- progress
Logical indicating whether progress messages should be shown.
- verbose
Logical indicating whether detailed messages should be shown.
- summarizer_fn
Function with signature `fn(text, model, ...) -> character(1) | NA_character_`.
- model
Provider-specific model/engine.
- connection
Optional [kagi_connection()] object. Used for [summarize_with_kagi()] when not supplied via `provider_args`.
- provider_args
Optional named list forwarded to `summarizer_fn`.
- markdown_root
Root folder name containing markdown files.
- abstract_root
Root folder name for abstract parquet outputs.