Using synr with real data: Coloured vowels



This is an example of how to use synr with actual data. For an in-depth tutorial which explains how synr works, including the functionality that comes up here, please see the main tutorial.

The data used here are from a study by Cuskley, Dingemanse, van Leeuwen & Kirby (2019) and are available through this GitHub repository. Note that the user (mdingemanse) who owns the repository is not involved in synr development, so please don’t send questions about synr to him or other co-authors of the article.

The data are from an experiment where participants listened to 16 different recordings of spoken vowel sounds and responded with colors they experienced/associated with the vowels. You can find much more detail in the article and repository. In this tutorial, we’ll assume that each of the recordings can be thought of as a single object, analogous to a grapheme. Note that even though synr always refers to trial stimuli as ‘graphemes’ or ‘symbols’ (in order to make the documentation less abstract), you can apply synr to other types of consistency test data. Just remember that ‘grapheme’ then really means e. g. ‘vowel sound’ or whatever other type of stimuli are presented to participants.

Load required libraries

We’ll use tidyr to reformat the raw data, explained below, and ggplot2 for producing custom plots. You should already have ggplot2 installed since synr depends on it, but may you need to run install.packages("tidyr") to get tidyr.


Download and reformat raw data

We’ll first download the raw data from GitHub.

# download the 'coloured vowels' data
githuburl <- ''
dingemanse_voweldata <- read.csv(githuburl, sep=' ')

The downloaded data are in a ‘one-row-per-stimulus’ (vocal sound) format. synr needs the data to be in either ‘long’ or ‘wide’ format, as described in the article Creating ParticipantGroup objects. Generally, ‘long’ format is preferred, so let’s reformat the data to this.

# 'pivot' the data into a long (one row per observation/trial) format,
# using tidyr's pivot_longer function (and the 'pipe' %>% operator)
cvow_long <- dingemanse_voweldata %>% 
    cols=c('color1', 'color2', 'color3',
      'timing1', 'timing2', 'timing3'),
    names_to=c(".value", "trial"),
    values_to=c('color', 'timing')
#> # A tibble: 6 × 6
#>   anonid                               setname  item trial color   timing
#>   <chr>                                <chr>   <int> <chr> <chr>    <int>
#> 1 0045dbc0-8936-4c47-b8a2-333f29f3a505 set1       12 1     #A82816  18689
#> 2 0045dbc0-8936-4c47-b8a2-333f29f3a505 set1       12 2     #B2282B   6599
#> 3 0045dbc0-8936-4c47-b8a2-333f29f3a505 set1       12 3     #BB322B   5893
#> 4 0045dbc0-8936-4c47-b8a2-333f29f3a505 set1        9 1     #3C3899  12561
#> 5 0045dbc0-8936-4c47-b8a2-333f29f3a505 set1        9 2     #F44E50   4046
#> 6 0045dbc0-8936-4c47-b8a2-333f29f3a505 set1        9 3     #EE5A3D   7648

Roll up data into a ParticipantGroup object

Note that this might take a couple of minutes, since the data set is quite large.

pg <- create_participantgroup(

We use the L*u*v color space here, since that’s what the article’s authors used.

Apply various synr functions

Single participant plot

Let’s produce a plot for the participant with ID “0086e9c0-418c-404c-8f3c-219de93cc3dc” (the raw data use anonymized ID’s, which is why they are rather unwieldy).

example_plot <- pg$participants[["0086e9c0-418c-404c-8f3c-219de93cc3dc"]]$get_plot(
# inspect the plot

Remember that the ‘grapheme’ axis here is really a ‘recording/vowel sounds’ axis.

The participant appears to have provided valid responses with a fair bit of variation. Their mean consistency score (green line) was slightly above 200, far higher than the cutoff (blue line) of 135.30 that was suggested in Rothen, Seth, Witzel & Ward (2013) for the L*u*v color space.

Of course, it’s not expected that most users will want to produce plots for all participants and look through them one by one. It can however be useful to look at a few of them to get a better feel for the data, or to inspect data more closely if something about a participant’s responses seems off.

Mean consistency scores

Here, we force a mean consistency score to be calculated even where participants have some (but not all) items with less than 3 valid color responses by specifying na.rm=TRUE. We specify that consistency scores should be calculated by using euclidean distances (synr uses euclidean distances by default, but it’s specified here for extra clarity).

mean_cons_scores <- pg$get_mean_consistency_scores( 

We’ll also put the consistency scores in a data frame where scores are linked to their respective participant ID’s:

participant_ids <- pg$get_ids()
cons_score_df <- data.frame(
#>                         participant_id mean_cons_score
#> 1 0045dbc0-8936-4c47-b8a2-333f29f3a505        85.22161
#> 2 005c48a4-acb7-4e25-8e4a-447e08a2dc85       225.60228
#> 3 0086e9c0-418c-404c-8f3c-219de93cc3dc       205.05324
#> 4 009f9049-bb4f-4dec-a889-2a408d9b7fc4        84.60556
#> 5 00ac0df1-ee81-4357-86c9-ce8691a6a4f1       171.22806
#> 6 0127d04d-0070-4bec-a004-2e658edac121       227.25244

Number of graphemes with all-valid responses

Let’s add information for each participant about the number of graphemes that they provided all-valid (3 non-NA, ie they correctly chose a color) responses for.

num_valid <- pg$get_numbers_all_colored_graphemes()

cons_score_df[['num_allvalid_sounds']] <- num_valid
#>                         participant_id mean_cons_score num_allvalid_sounds
#> 1 0045dbc0-8936-4c47-b8a2-333f29f3a505        85.22161                  16
#> 2 005c48a4-acb7-4e25-8e4a-447e08a2dc85       225.60228                  16
#> 3 0086e9c0-418c-404c-8f3c-219de93cc3dc       205.05324                  16
#> 4 009f9049-bb4f-4dec-a889-2a408d9b7fc4        84.60556                  16
#> 5 00ac0df1-ee81-4357-86c9-ce8691a6a4f1       171.22806                  16
#> 6 0127d04d-0070-4bec-a004-2e658edac121       227.25244                  16

We might want to only include participants who had a minimum of 8 (50%) stimuli with all-valid responses, based on what the original article says:

A total of 34 participants were removed from the sample because they chose “No color” for more than half of the items in the vowel association task, making it impossible to calculate a valid vowel consistency score.

print(paste('number of participants before filtering:', nrow(cons_score_df)))
#> [1] "number of participants before filtering: 1164"

enough_responses_filter <- cons_score_df$num_allvalid_sounds >= 8
cons_score_df <- cons_score_df[enough_responses_filter, ]

print(paste('number of participants after filtering:', nrow(cons_score_df)))
#> [1] "number of participants after filtering: 1105"

It’s not clear why there are seemingly more participants which fit the exclusion criteria in the data from GitHub compared to what is reported in the original article. (if you notice an error in this article, please write about it to )

Validate participant data

synr includes a unique procedure for automatically classifying test data as valid or invalid, as explained in a specific synr article. Cuskley et al.’s article itself doesn’t mention any validation procedure apart from excluding participants with too many “No color” responses, so we might try to apply very lenient criteria in order to only catch obviously invalid data, such as participants having used very similar colors (eg slightly different shades of black) for more than 80% of all their responses. Please see the validation article for details about what’s happening here.

validation_df <- pg$check_valid_get_twcv_scores(
  min_complete_graphemes = 8,
  dbscan_eps = 20,
  dbscan_min_pts = 4,
  max_var_tight_cluster = 150,
  max_prop_single_tight_cluster = 0.8,
  safe_num_clusters = 2,
  safe_twcv = 300

# again, use filter to only keep data from participants with
# >=8 all-valid responses graphemes
validation_df <- validation_df[enough_responses_filter, ]

cons_score_df[['data_valid']] <- validation_df$valid
cons_score_df[['reason_invalid']] <- validation_df$reason_invalid

# show only participants whose data were classified as invalid,
# and only relevant columns
cons_score_df[!cons_score_df$data_valid, c("participant_id", "mean_cons_score", "reason_invalid")]
#>                            participant_id mean_cons_score        reason_invalid
#> 11   01e8370f-22a2-434a-b86b-ebfc65f28327     63.30219869 few_clusters_low_twcv
#> 26   06644552-815c-47ad-a1a4-d774ac61f1e8     38.09724961 hi_prop_tight_cluster
#> 37   0b40b649-8cd0-4252-bcad-04b607694eb9     36.04509901 hi_prop_tight_cluster
#> 100  1fcd6ef5-057d-4655-a314-17aee2cbac99     25.20097383 hi_prop_tight_cluster
#> 134  28f1a586-d2c5-4c44-ba33-60e59768847c     59.66737827 few_clusters_low_twcv
#> 288  58156a1e-2633-4435-b404-750ec370bd51     36.42790210 hi_prop_tight_cluster
#> 408  7ae98022-d5d8-460e-a1c3-68b43487e191     16.41104298 hi_prop_tight_cluster
#> 554  ad9219a1-9566-4654-ba28-ac277175a347     36.19079247 hi_prop_tight_cluster
#> 606  bc3ed730-ceed-44f0-97ac-8bf7fe1b8615     39.27415198 hi_prop_tight_cluster
#> 645  ca108b61-6b84-4e1e-ab31-4c111ff885a6     27.54943999 hi_prop_tight_cluster
#> 674  d47c0e32-e3e2-4acf-84d0-08bf7375308b     44.21594367 hi_prop_tight_cluster
#> 717  de916a2b-5041-4ecf-af86-4e11feaa5d4c      0.03445631 hi_prop_tight_cluster
#> 931  5b84fcd8-7982-46bd-9f13-1771a4f2b6df     44.28342864 hi_prop_tight_cluster
#> 935  660ed2f1-7631-4c97-a46e-f9481d37bc4b     56.10827516 hi_prop_tight_cluster
#> 992  e8dd5ab3-8d3d-4137-9fd4-3b6fdf42cb97     69.07712926 few_clusters_low_twcv
#> 1104 43dd077f-b955-4ba0-8ebb-a442a4ad913f     34.78898926 hi_prop_tight_cluster
#> 1154 2b20c245-5ca1-4a1e-9b71-daa17ef445e1     46.99291493 few_clusters_low_twcv

All of the identified data sets that were classified by synr as invalid appear to have 80% or more of all responses be in roughly the same color. One of the participants, ‘de916a2b-5041-4ecf-af86-4e11feaa5d4c’, very clearly has invalid data even based on just their consistency score (0.034), as it is extremely low (due to having responded with black on all trials). What might be more interesting is to take a closer look at the participant with the highest consistency score (44.2), as their case might be somewhat less clear-cut.


The participant has actually used what we might consider to be different colors, ie hues. Note however that their consistency score (44.2) means that they would be considered a synesthete, even though there is actually very little consistency in what hues are used for the different sounds, due to all colors being very light. For instance, the participant used a ‘greyish’, a ‘brownish’ and a ‘bluish’ color for the 12th grapheme/sound, but even the consistency score for this particular grapheme is well below the cutoff of 135.3. This means that even though one could argue that responses have fairly different hues, at least in the context of consistency tests as they are currently commonly used and evaluated, synr is correct in classifying the participant’s data as invalid, ie having too little color variation.

It appears then that if synr were to be used in this manner, some misclassifications of participants as synesthetes could be prevented.

Use synr-generated data with other libraries

Once data have been exported to a data frame, they can be used with other libraries as usual.

For example, we might be interested in the distribution of mean consistency scores.

ggplot(cons_score_df, aes(x=mean_cons_score)) +  
  geom_density() +
  geom_vline(xintercept = 135.3, color="blue") +
  labs(x='Mean consistency score', y='Probability density')

ggplot(cons_score_df, aes(x=mean_cons_score)) +
  stat_ecdf(geom = "step") +
  geom_vline(xintercept = 135.3, color="blue") +
  labs(x='Mean consistency score', y='Cumulative proportion of participants')

It appears that about 30% of all participants scored below the suggested cut-off of 135, as already reported by Cuskley et al.


Much of the ‘results’ included here can already be found in the article by Cuskley et al. However, this article has hopefully given you an idea of how synr can be used with different types of data, and what role it fulfills.


Cuskley, C.1, Dingemanse, M.1, van Leeuwen, T. & Kirby, S. 2019. Cross-modal associations and synaesthesia: Categorical perception and structure in vowel-colour mappings in a large online sample. Behaviour Research Methods, doi: 10.3758/s13428-019-01203-7

Rothen, N., Seth, A. K., Witzel, C., & Ward, J. (2013). Diagnosing synaesthesia with online colour pickers: maximising sensitivity and specificity. Journal of neuroscience methods, 215(1), 156-160.