Function to write/save fine tune dataset in required jsonl format
Source:R/generate_fine_tune_data.R
write_ft_data.Rd
This function creates jsonl
training data that can be used to fine tune models from OpenAI.
To generate a fine tuned model, this writing data can be uploaded to
https://platform.openai.com/finetune/.
Usage
write_ft_data(
data,
role_and_subject,
file,
true_answer,
roles = c("system", "user", "assistant")
)
Arguments
- data
Dataset with questions strings that should be used for training. The data must be of class
'ft_data'
, containing two variables named question and true_answer.- role_and_subject
Descriptions of the role of the GPT model and the subject under review, respectively.
- file
A character string naming the file to write to. If not specified the written file name and format will be
"fine_tune_data.jsonl"
.- true_answer
Optional name of the variable containing the true answers/decisions used for training. Only relevant, if the the dataset contains a variable with the name true_answer.
- roles
String variable defining the various role the model should take. Default is
roles = c("system", "user", "assistant")
.
Examples
if (FALSE) { # \dontrun{
# Extract 5 irrelevant and relevant records, respectively.
library(dplyr)
dat <- filges2015_dat[c(1:5, 261:265),]
prompt <- "Is this study about functional family therapy?"
ft_dat <-
generate_ft_data(
data = dat,
prompt = prompt,
studyid = studyid,
title = title,
abstract = abstract
) |>
mutate(true_answer = if_else(human_code == 1, "Include", "Exclude"))
role_subject <- paste0(
"Act as a systematic reviewer that is screening study titles and ",
"abstracts for your systematic reviews regarding the the effects ",
"of family-based interventions on drug abuse reduction for young ",
"people in treatment for non-opioid drug use."
)
# Saving data in jsonl format (required format by OpenAI)
write_ft_data(
data = ft_dat,
role_and_subject = role_subject,
file = "fine_tune_data.jsonl"
)
} # }