Read corpus from Parquet Dataset — read

This function reads a corpus in Apache Parquet format and returns an ArrowObject representing the corpus which can be fed into a dplyr pipeline or a tibble which contains all the data.

Usage

read_corpus(corpus, return_data = FALSE)

Arguments

corpus: The directory of the Parquet files.
return_data: Logical indicating whether to return an ArrowObject representing the corpus (default) or a tibble containing the whole corpus shou,d be returned.

Value

An ArrowObject representing the corpus or a tibble.