| Title: | Import Data from Spanish Sociological Research Center (CIS) |
|---|---|
| Description: | Search and import data directly to R from the Spanish Sociological Research Center (CIS) <https://www.cis.es/inicio>. The CIS is a public institution that conducts electoral and sociological research studies on the Spanish society. The CIS has a large database of surveys that can be accessed through its website. The package includes functions to search for surveys, survey questions and timeseries, and import the data directly to R. |
| Authors: | Héctor Meleiro [aut, cre] |
| Maintainer: | Héctor Meleiro <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.0 |
| Built: | 2026-05-12 09:04:06 UTC |
| Source: | https://github.com/hmeleiro/opencis |
Opens a PDF document from a CIS study in the default browser.
browse_pdf(study_code, wanted_file = "cues")browse_pdf(study_code, wanted_file = "cues")
study_code |
A string with the study code. |
wanted_file |
A keyword used to match the PDF filename inside the ZIP.
Use |
CIS study ZIP files typically contain two PDF documents:
The questionnaire (cuestionario): use wanted_file = "cues".
The technical sheet (ficha técnica): use wanted_file = "ft".
Called for its side effect of opening the PDF in the browser.
Returns NULL invisibly.
if (interactive()) { # Open the questionnaire (cuestionario) for study 3328 browse_pdf("3328") # Open the technical sheet (ficha técnica) for study 3328 browse_pdf("3328", wanted_file = "ft") }if (interactive()) { # Open the questionnaire (cuestionario) for study 3328 browse_pdf("3328") # Open the technical sheet (ficha técnica) for study 3328 browse_pdf("3328", wanted_file = "ft") }
Clears the in-memory cache used by search_cis and
read_cis. Call this when you want to force fresh data
to be retrieved from the CIS server within the same R session.
clear_cache()clear_cache()
NULL invisibly.
Downloads the data ZIP file for a CIS study to a specified directory, instead of a temporary folder. Useful for projects that need to keep the raw data files.
download_study(study_code, destdir = ".")download_study(study_code, destdir = ".")
study_code |
A string with the study code. |
destdir |
A string with the directory where the ZIP file will be saved. Defaults to the current working directory. |
The path to the saved ZIP file, invisibly.
# Save the ZIP file to a temporary directory path <- download_study("3328", destdir = tempdir()) cat("Saved to:", path, "\n")# Save the ZIP file to a temporary directory path <- download_study("3328", destdir = tempdir()) cat("Saved to:", path, "\n")
Returns a tibble listing each variable in the data along with its
variable label and value labels, as loaded by haven.
get_data_dictionary(data)get_data_dictionary(data)
data |
A data.frame loaded from a CIS |
A tibble with columns:
Variable name.
Variable label, or NA if none.
A named numeric vector of value labels,
or NULL for unlabelled variables (list-column).
# Create a small labelled data frame df <- data.frame( SEXO = haven::labelled(c(1, 2, 1), labels = c(Hombre = 1, Mujer = 2)), EDAD = c(34, 51, 29) ) attr(df$SEXO, "label") <- "Sexo" attr(df$EDAD, "label") <- "Edad" # Inspect its variable dictionary dict <- get_data_dictionary(df) print(dict) # Find variables with a specific keyword in their label dict[grepl("sexo", dict$label, ignore.case = TRUE), ] # Inspect value labels for a specific variable sex_var <- match("SEXO", dict$variable) if (!is.na(sex_var)) { dict$value_labels[[sex_var]] }# Create a small labelled data frame df <- data.frame( SEXO = haven::labelled(c(1, 2, 1), labels = c(Hombre = 1, Mujer = 2)), EDAD = c(34, 51, 29) ) attr(df$SEXO, "label") <- "Sexo" attr(df$EDAD, "label") <- "Edad" # Inspect its variable dictionary dict <- get_data_dictionary(df) print(dict) # Find variables with a specific keyword in their label dict[grepl("sexo", dict$label, ignore.case = TRUE), ] # Inspect value labels for a specific variable sex_var <- match("SEXO", dict$variable) if (!is.na(sex_var)) { dict$value_labels[[sex_var]] }
Retrieves the technical metadata of a CIS study from its detail page, including study dates, type, country, author, and thematic indices.
get_metadata(study_code)get_metadata(study_code)
study_code |
A string with the study code. |
A tibble with two columns: field and value.
# Get metadata for study 3328 meta <- get_metadata("3328") print(meta) # Access a specific field meta$value[meta$field == "Tipo de estudio"]# Get metadata for study 3328 meta <- get_metadata("3328") print(meta) # Access a specific field meta$value[meta$field == "Tipo de estudio"]
Download and import the data of a CIS study.
read_cis(study_code)read_cis(study_code)
study_code |
A string with the study code. |
A data.frame with the study data.
# If you know the study code you can just read it into R df <- read_cis("3328") print(df) # If you dont know the study code, you can search for a study using search_cis() function: studies <- search_cis(q = "gastronomia") print(studies) df <- read_cis(studies$study[1]) print(df)# If you know the study code you can just read it into R df <- read_cis("3328") print(df) # If you dont know the study code, you can search for a study using search_cis() function: studies <- search_cis(q = "gastronomia") print(studies) df <- read_cis(studies$study[1]) print(df)
Calls search_cis repeatedly, incrementing the page index until
no more results are returned, and returns all results in a single tibble.
search_all_cis( q = "", from = NULL, to = NULL, sort = "relevance", catalogo = "estudio", ... )search_all_cis( q = "", from = NULL, to = NULL, sort = "relevance", catalogo = "estudio", ... )
q |
String. The search query. Default is an empty string. |
from |
Date or NULL. The start date for filtering results. Default is NULL. The date format must be "YYYY-MM-DD". |
to |
Date or NULL. The end date for filtering results. Default is NULL. The date format must be "YYYY-MM-DD". |
sort |
String. The sorting order for the results
( |
catalogo |
String. The catalog type ( |
... |
Additional parameters passed to |
A tibble with all search results across all pages.
# Retrieve all postelectoral studies (all pages) all_studies <- search_all_cis(q = "postelectoral") print(nrow(all_studies)) # Filter by date range studies_2010_2020 <- search_all_cis( q = "ideologia", from = "2010-01-01", to = "2020-12-31" ) print(studies_2010_2020)# Retrieve all postelectoral studies (all pages) all_studies <- search_all_cis(q = "postelectoral") print(nrow(all_studies)) # Filter by date range studies_2010_2020 <- search_all_cis( q = "ideologia", from = "2010-01-01", to = "2020-12-31" ) print(studies_2010_2020)
Searches for CIS studies using the CIS search engine.
search_cis( start = 1, q = "", from = NULL, to = NULL, sort = "relevance", catalogo = "estudio", ... )search_cis( start = 1, q = "", from = NULL, to = NULL, sort = "relevance", catalogo = "estudio", ... )
start |
Integer. The starting page for the search results. Default is 1, iterate to get more results. |
q |
String. The search query. Default is an empty string. |
from |
Date or NULL. The start date for filtering results. Default is NULL. The date format must be "YYYY-MM-DD". |
to |
Date or NULL. The end date for filtering results. Default is NULL. The date format must be "YYYY-MM-DD". |
sort |
String. The sorting order for the results ("publishDate-", "publishDate+", "relevance"). Default is "relevance". |
catalogo |
String. The catalog type ("estudio", "pregunta", "serie"). Default is "estudio". |
... |
Additional parameters (not used). |
A data.frame with the search results.
# Search by search terms studies <- search_cis(q = "postelectoral") print(studies) # Narrow the search by dates studies <- search_cis(q = "postelectoral", from = "2011-01-01", to = "2020-01-01") print(studies) # Use the catalogo parameter to search for questions ("pregunta") or data series ("serie") studies <- search_cis(q = "ideologia", from = "2011-01-01", to = "2020-01-01", catalogo = "serie") print(studies)# Search by search terms studies <- search_cis(q = "postelectoral") print(studies) # Narrow the search by dates studies <- search_cis(q = "postelectoral", from = "2011-01-01", to = "2020-01-01") print(studies) # Use the catalogo parameter to search for questions ("pregunta") or data series ("serie") studies <- search_cis(q = "ideologia", from = "2011-01-01", to = "2020-01-01", catalogo = "serie") print(studies)