Reading and Writing RIS Files • AIscreenR

Important note

These functions were developed because existing R tools for loading RIS files often fail to preserve the original formatting from PsycINFO. The new functions are designed to handle this RIS file structure with other database formats. For an full-scale use of these functions, see the Using OpenAI’s GPT API models for Title and Abstract Screening in Systematic Reviews vignette. If you experience any issues with these functions, please report them on GitHub. Otherwise, you can try using synthesisr::read_refs().

Overview

This vignette introduces two helpers for working with RIS files. The function read_ris_to_dataframe(file_path) parses an RIS file into a data frame with the following features:

Automatically maps RIS tags to descriptive column names (e.g., AU → author, TI → title, PY → year)
Preserves the order of tags as they first appear in the file
Collapses repeated tags within a record into a single semicolon-separated string
Stores metadata to preserve original formatting when writing back

The function save_dataframe_to_ris(df, file_path) writes a data frame back to RIS format:

Writes TY (source type) first for each record, followed by all other fields
Splits semicolon-separated values into multiple RIS tag lines
Preserves original formatting when available (from metadata)
Terminates each record with ER -

Load the package

library(AIscreenR)

Read an RIS file

The example below builds a small RIS file in a temporary location and reads it.

ris <- c(
  "TY  - JOUR",
  "AU  - Author, One",
  "AU  - Author, Two",
  "TI  - An example title",
  "PY  - 2020",
  "ER  - ",
  "",
  "TY  - CHAP",
  "TI  - Another title",
  "AU  - Author, Three",
  "ER  - "
)

tmp_in <- tempfile(fileext = ".ris")
writeLines(ris, tmp_in, useBytes = TRUE)

df <- read_ris_to_dataframe(tmp_in)
df

  source_type                   author            title year
1        JOUR Author, One; Author, Two An example title 2020
2        CHAP            Author, Three    Another title   NA

The output data frame has descriptive column names instead of RIS tags. For example:

TY (type) becomes source_type
AU (author) becomes author
TI (title) becomes title
PY (publication year) becomes year

Repeated tags, such as multiple AU lines, are collapsed to a single string with “;” (e.g., “Author, One; Author, Two”).

Write a data frame to RIS

Create a data frame and write it to a .ris file. You can use either descriptive column names (as returned by read_ris_to_dataframe()) or raw RIS tags. Semicolon-separated values are automatically split into multiple tag lines.

# Using raw RIS tags
df_out <- data.frame(
  TY = c("JOUR", "CHAP"),
  AU = c("Author, One; Author, Two", "Author, Three"),
  TI = c("An example title", "Another title"),
  PY = c("2020", ""),
  stringsAsFactors = FALSE
)

tmp_out <- tempfile(fileext = ".ris")
invisible(capture.output(save_dataframe_to_ris(df_out, tmp_out)))
readLines(tmp_out, encoding = "UTF-8")

 [1] "TY  - JOUR"             "AU  - Author, One"      "AU  - Author, Two"
 [4] "TI  - An example title" "PY  - 2020"             "ER  - "
 [7] ""                       "TY  - CHAP"             "AU  - Author, Three"
[10] "TI  - Another title"    "ER  - "                 ""

Each record writes the TY tag first, splits any field value containing “;” into multiple RIS tag lines, and ends with ER - followed by a blank line.

You can also use descriptive column names (they will be automatically mapped back to RIS tags):

# Using descriptive names
df_descriptive <- data.frame(
  source_type = c("JOUR", "CHAP"),
  author = c("Author, One; Author, Two", "Author, Three"),
  title = c("An example title", "Another title"),
  year = c("2020", ""),
  stringsAsFactors = FALSE
)

tmp_out2 <- tempfile(fileext = ".ris")
invisible(capture.output(save_dataframe_to_ris(df_descriptive, tmp_out2)))
readLines(tmp_out2, encoding = "UTF-8")

 [1] "TY  - JOUR"             "AU  - Author, One"      "AU  - Author, Two"
 [4] "TI  - An example title" "PY  - 2020"             "ER  - "
 [7] ""                       "TY  - CHAP"             "AU  - Author, Three"
[10] "TI  - Another title"    "ER  - "                 ""

Both approaches produce identical RIS output.