--- title: "Database diagnostics" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{a01_DatabaseDiagnostics} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, fig.width = 7 ) library(CDMConnector) if (Sys.getenv("EUNOMIA_DATA_FOLDER") == "") Sys.setenv("EUNOMIA_DATA_FOLDER" = tempdir()) if (!dir.exists(Sys.getenv("EUNOMIA_DATA_FOLDER"))) dir.create(Sys.getenv("EUNOMIA_DATA_FOLDER")) if (!eunomia_is_available()) downloadEunomiaData(datasetName = "synpuf-1k") ``` ## Introduction In this example we're going to be using the Eunomia synthetic data. ```{r} library(CDMConnector) library(CohortConstructor) library(CodelistGenerator) library(PhenotypeR) library(dplyr) library(ggplot2) con <- DBI::dbConnect(duckdb::duckdb(), CDMConnector::eunomiaDir("synpuf-1k", "5.3")) cdm <- CDMConnector::cdmFromCon(con = con, cdmName = "Eunomia Synpuf", cdmSchema = "main", writeSchema = "main", achillesSchema = "main") ``` ## Database diagnostics We have created our study cohort, but to inform analytic decisions and interpretation of results requires an understanding of the dataset from which it has been derived. The `databaseDiagnostics()` function will help us better understand a data source. To run database diagnostics we just need to provide our cdm reference to the function. ```{r} db_diagnostics <- databaseDiagnostics(cdm) db_diagnostics |> glimpse() ``` From our results we can create a table with a summary of metadata for the data source. ```{r} OmopSketch::tableOmopSnapshot(db_diagnostics) ``` In addition, we also can see a summary of individuals' observation periods. From this we can see if there are individuals with multiple, non-overlapping, observation periods and how long each observation period lasts on average. ```{r} OmopSketch::tableObservationPeriod(db_diagnostics) ```