Title and abstract screening with Gemini API models using function calls
Source:R/tabscreen_gemini.R
tabscreen_gemini.Rd
This function supports title and abstract screening using API models in R.
Specifically, it allows users to draw on Gemini's API completion models, including fine-tuned versions.
The function enables title and abstract screening across multiple prompts, with
repeated questions to assess consistency across responses. All of this can be performed in parallel.
The function utilizes function calling, which is invoked via the
tools argument in the request body. See Vembye, Christensen, Mølgaard, and Schytt. (2025)
for guidance on how adequately to conduct title and abstract screening with GPT models.
Usage
tabscreen_gemini(data, prompt, studyid, title, abstract,
api_url = "https://generativelanguage.googleapis.com", model = "gemini-3.1-flash-lite",
role = "user", tools = NULL, tool_choice = NULL, top_p = 1,
time_info = TRUE, token_info = TRUE, api_key = get_api_key_gemini(), max_tries = 16,
max_seconds = NULL, is_transient = gpt_is_transient, backoff = NULL,
after = NULL, rpm = 10000, reps = 1, seed_par = NULL, progress = TRUE,
decision_description = FALSE, messages = TRUE, incl_cutoff_upper = NULL,
incl_cutoff_lower = NULL, force = FALSE, custom_model = FALSE,
reasoning_effort = "medium", overinclusive = TRUE, ...)Arguments
- data
Dataset containing the titles and abstracts.
- prompt
Prompt(s) to be added before the title and abstract.
- studyid
Unique Study ID. If missing, this is generated automatically.
- title
Name of the variable containing the title information.
- abstract
Name of variable containing the abstract information.
- api_url
Character string with the Gemini API base URL. Default is
"https://generativelanguage.googleapis.com". The v1beta path and model endpoint will be appended automatically.- model
Character string with the name of the Gemini completion model. Can take multiple models. Default is
"gemini-3.1-flash-lite". Find available models at https://ai.google.dev/gemini-api/docs/models.- role
Character string indicating the role of the user. Default is
"user"(required for Gemini API).- tools
This argument allows users to apply customized function declarations. See https://ai.google.dev/gemini-api/docs/function-calling. Default is
NULL. If not specified, the default function calls fromAIscreenR(Gemini format) are used.- tool_choice
If a customized function is provided, this argument controls which mode Gemini uses for function calling ("auto", "any", "none"). Default is
NULL. If set toNULLwhen using a customized function, the default is"auto". See https://ai.google.dev/gemini-api/docs/function-calling.- top_p
'An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Default is 1.
- time_info
Logical indicating whether the run time of each request/question should be included in the data. Default is
TRUE.- token_info
Logical indicating whether token information should be included in the output data. Default is
TRUE. WhenTRUE, the output object will include price information of the conducted screening.- api_key
Numerical value with your personal API key. Default setting draws on the
get_api_key()to retrieve the API key from the R environment, so that the key is not compromised. The API key can be added to the R environment viaset_api_key()or by usingusethis::edit_r_environ(). In the.Renvironfile, writeCHATGPT_KEY=INSERT_YOUR_KEY_HERE. After entering the API key, close and save the.Renvironfile and restartRStudio(ctrl + shift + F10). Alternatively, one can usehttr2::secret_make_key(),httr2::secret_encrypt(), andhttr2::secret_decrypt()to scramble and decrypt the API key.- max_tries, max_seconds
'Cap the maximum number of attempts with
max_triesor the total elapsed time from the first request withmax_seconds. If neither option is supplied (the default),httr2::req_perform()will not retry' (Wickham, 2023). The default ofmax_triesis 16.- is_transient
'A predicate function that takes a single argument (the response) and returns
TRUEorFALSEspecifying whether or not the response represents a transient error' (Wickham, 2023). This function runs automatically in the AIscreenR but can be customized by the user if necessary.- backoff
'A function that takes a single argument (the number of failed attempts so far) and returns the number of seconds to wait' (Wickham, 2023).
- after
'A function that takes a single argument (the response) and returns either a number of seconds to wait or
NULL, which indicates that a precise wait time is not available that thebackoffstrategy should be used instead' (Wickham, 2023).- rpm
Numerical value indicating the number of requests per minute (rpm) available for the specified model. Rate limits are not available through the Gemini API and must be manually checked via your Google AI Studio dashboard at https://aistudio.google.com/app/apikey under the rate limits section. Default is 10000 rpm, but adjust this based on your actual quota.
- reps
Numerical value indicating the number of times the same question should be send to the server. This can be useful to test consistency between answers, and/or can be used to make inclusion judgments based on how many times a study has been included across a the given number of screenings. Default is
1.- seed_par
Numerical value for a seed to ensure that proper, parallel-safe random numbers are produced.
- progress
Logical indicating whether a progress line should be shown when running the title and abstract screening in parallel. Default is
TRUE.- decision_description
Logical indicating whether a detailed description should follow the decision made by GPT. Default is
FALSE. When conducting large-scale screening, we generally recommend not using this feature as it will substantially increase the cost of the screening. We generally recommend using it when encountering disagreements between GPT and human decisions.- messages
Logical indicating whether to print messages embedded in the function. Default is
TRUE.- incl_cutoff_upper
Numerical value indicating the probability threshold for which a studies should be included. ONLY relevant when the same questions is requested multiple times (i.e., when any reps > 1). Default is 0.1, indicating that titles and abstracts should only be included if GPT has included the study more than 10 percent of the times (e.g., 1 out of 10 screenings). This has been shown by Vembye et al. (2025) to work well with cheaper models.
- incl_cutoff_lower
Numerical value indicating the probability threshold above which studies should be checked by a human. ONLY relevant when the same questions is requested multiple times (i.e., when any reps > 1) and
incl_cutoff_upper> 0.1. Records with inclusion probabilities betweenincl_cutoff_lowerandincl_cutoff_upperwill be flagged for human checking. Default isNULL, which means that no studies will be flagged for human checking.- force
Logical argument indicating whether to force the function to use more than 10 iterations for gpt-3.5 models and more than 1 iteration for gpt-4 models other than gpt-4o-mini. This argument is developed to avoid the conduct of wrong and extreme sized screening. Default is
FALSE.- custom_model
Logical indicating whether a fine-tuned or custom model is used. Default is
FALSE.- reasoning_effort
Character string indicating the level of reasoning effort required for the task. Default is
"low". Can take the values"minimal","low","medium", and"high". For Gemini 3.1 Pro,"minimal"is not supported. Be aware that 2.5 models don't supportthinkingLevel/reasoning_effort, but usethinkingBudgetinstead. Therefore, reasoning_effort is mapped to budget values when using 2.5 models. See https://ai.google.dev/gemini-api/docs/thinking#rest for more information.- overinclusive
Logical indicating whether uncertain decisions (
"1.1") should be allowed in the default function calling setup. Default isTRUE, which means that the default function calling setup will allow for uncertain decisions. IfFALSE, the default function calling setup will not allow for uncertain decisions and will only return binary decisions (i.e., "1" or "0"). This argument only affects the default function calling setup.- ...
Further argument to pass to the request body. See https://ai.google.dev/gemini-api/docs/text-generation#rest.
Value
An object of class 'gpt'. The object is a list containing the following
datasets and components:
- answer_data
dataset of class
'gpt_tbl'with all individual answers.- price_dollar
numerical value indicating the total price (in USD) of the screening.
- price_data
dataset with prices across all gpt models used for screening.
- run_date
string indicating the date when the screening was ran. In some frameworks, time details are considered important to report (see e.g., Thomas et al., 2024).
- ...
some additional attributed values/components, including an attributed list with the arguments used in the function. These are used in
screen_errors()to re-screen transient errors.
If the same question is requested multiple times, the object will also contain the following dataset with results aggregated across the iterated requests/questions.
- answer_data_aggregated
dataset of class
'gpt_agg_tbl'with the summarized, probabilistic inclusion decision for each title and abstract across multiple repeated questions.
Note
The answer_data data contains the following mandatory variables:
| studyid | integer | indicating the study ID of the reference. |
| title | character | indicating the title of the reference. |
| abstract | character | indicating the abstract of the reference. |
| promptid | integer | indicating the prompt ID. |
| prompt | character | indicating the prompt. |
| model | character | indicating the specific Gemini model used. |
| iterations | numeric | indicating the number of times the same question has been sent to Gemini API. |
| question | character | indicating the final question sent to Gemini API. |
| top_p | numeric | indicating the applied top_p. |
| decision_gpt | character | indicating the raw Gemini decision - either "1", "0", "1.1" for inclusion, exclusion, or uncertainty, respectively. |
| detailed_description | character | indicating detailed description of the decision made by Gemini. ONLY included if the detailed function calling is used. |
| decision_binary | integer | indicating the binary decision (1 = include, 0 = exclude). |
| prompt_tokens | integer | indicating the number of prompt tokens used. |
| completion_tokens | integer | indicating the number of completion tokens used. |
| submodel | character | indicating the exact model version used for screening. |
| run_time | numeric | indicating the time it took to obtain a response from the server. |
| run_date | character | indicating the date the response was received. |
| n | integer | indicating iteration ID (only different from 1 when reps > 1). |
If any requests failed, the gpt object contains an
error dataset (error_data) containing the same variables as answer_data
but with failed request references only.
When the same question is requested multiple times, the answer_data_aggregated data contains the following mandatory variables:
| studyid | integer | indicating the study ID of the reference. |
| title | character | indicating the title of the reference. |
| abstract | character | indicating the abstract of the reference. |
| promptid | integer | indicating the prompt ID. |
| prompt | character | indicating the prompt. |
| model | character | indicating the specific gpt-model used. |
| question | character | indicating the final question sent to Gemini's API models. |
| top_p | numeric | indicating the applied top_p. |
| incl_p | numeric | indicating the probability of inclusion calculated across multiple repeated responses on the same title and abstract. |
| final_decision_gpt | character | indicating the final decision reached by gpt - either 'Include', 'Exclude', or 'Check'. |
| final_decision_gpt_num | integer | indicating the final numeric decision reached by gpt - either 1 or 0. |
| longest_answer | character | indicating the longest gpt response obtained
across multiple repeated responses on the same title and abstract. Only included when decision_description = TRUE.
See 'Examples' below for how to use this function. |
| reps | integer | indicating the number of times the same question has been sent to Gemini's API models. |
| n_mis_answers | integer | indicating the number of missing responses. |
| submodel | character | indicating the exact (sub)model used for screening. |
The price_data data contains the following variables:
| prompt | character | if multiple prompts are used this variable indicates the given prompt-id. |
| model | character | the specific gpt model used. |
| iterations | integer | indicating the number of times the same question was requested. |
| input_price_dollar | integer | price for all prompt/input tokens for the correspondent gpt-model. |
| output_price_dollar | integer | price for all completion/output tokens for the correspondent gpt-model. |
| total_price_dollar | integer | total price for all tokens for the correspondent gpt-model. |
Find current token pricing at https://ai.google.dev/gemini-api/docs/pricing or model_prizes.
References
Vembye, M. H., Christensen, J., Mølgaard, A. B., & Schytt, F. L. W. (2025). Generative Pretrained Transformer Models Can Function as Highly Reliable Second Screeners of Titles and Abstracts in Systematic Reviews: A Proof of Concept and Common Guidelines. Psychological Methods. doi:10.1037/met0000769
Thomas, J. et al. (2024). Responsible AI in Evidence SynthEsis (RAISE): guidance and recommendations. https://osf.io/cn7x4
Wickham H (2023). httr2: Perform HTTP Requests and Process the Responses. https://httr2.r-lib.org, https://github.com/r-lib/httr2.
Examples
if (FALSE) { # \dontrun{
library(future)
set_api_key()
prompt <- "Is this study about a Functional Family Therapy (FFT) intervention?"
plan(multisession)
tabscreen_gpt.tools(
data = filges2015_dat[1:2,],
prompt = prompt,
studyid = studyid,
title = title,
abstract = abstract
)
plan(sequential)
# Get detailed descriptions of the gpt decisions.
plan(multisession)
tabscreen_gpt.tools(
data = filges2015_dat[1:2,],
prompt = prompt,
studyid = studyid,
title = title,
abstract = abstract,
decision_description = TRUE
)
plan(sequential)
} # }