Oralstats. Code and brief tutorial

Author

Adrián Cabedo Nebot Grupo Val.Es.Co Universitat de València (adrian.cabedo@uv.es)

“Get to the choppa!” (Major Alan “Dutch” Schaefer character, Predator movie [1987])

2 Structure and functionality of Oralstats

Oralstats (Cabedo, 2021) has been developed as an exploration environment combining information from speech transcripts with pitch and intensity data obtained from Praat.

Note

(a) It creates three data frames (phones, words (POS tags included) and intonational phrases),

(b) provides the prosodic information for all of them, like pitch range, mean and median, intensity mean, or duration, for intonational phrases, and

(c) assigns a ToBI tag and MAS pattern tag. Further data frames can be generated with a slight modification of code (for instance, for wider units, like intonational clauses or paratones).

3 License

Oralstats code is licensed under GNU General Public License v3.0. Permissions of this strong copyleft license are conditioned on making available complete source code of licensed works and modifications, which include larger works using a licensed work, under the same license. Copyright and license notices must be preserved. Contributors provide an express grant of patent rights.

This document uses a CC BY 4.0 license:

Oralstats. Code and brief tutorial© 2024 by Adrián Cabedo Nebotis licensed under CC BY 4.0

Developed with R and R Studio.

4 The idea

I must give an special credit to Radiant, Business analytics using R and Shiny (Vnjis 2016). Although it is focused mainly on Business data mining, this tool was for me a deeply inspirational idea: in my case, it developed the idea of analyzing data online/offline, using Shiny, in a dynamic mode, adding and/or filtering this data to observe linguistic patterns from speech transcriptions.

Other inspiration for me was all the work done by Davies the last twenty years (Davies 2021, 2005, 2009, 2010; Davies and Kim 2019; Davies 2012, among many others) on the construction of several linguistic corpora; specially, I got really interested about joining data with SQL instances. This is really a good approach to structured data, as it could be the one coming from oral linguistic data, in which phonemes relate to words and these last two relate to utterances and all of them relate to speakers and so on.

Finally, about the ideas that have been key to develop Oralstats, I want to express my gratitude to web platforms like Spokes (http://pelcra.clarin-pl.eu/SpokesBNC/) and to software tools like the ones developed by Laurence Anthony (https://www.laurenceanthony.net/). Also, it wouldn’t have been possible to make speech and phonic analysis without the aid of tools like ELAN (https://archive.mpi.nl/tla/elan) and PRAAT (Boersma and Weenink 2022) (https://www.fon.hum.uva.nl/praat/); also in PRAAT, one of the most inspirational script was “analyse_tier.praat”, developed by Daniel Hirst..

5 Before beginning

To start using Oralstats for processing and analyzing speech data, you’ll need some files. Essentially, you’ll require a transcription file with alignment times (from Textgrid or TXT tabbed files) and an audio file.

5.1 Transcription

The transcription of the audio can be done manually or using other automatic transcription tools. Currently, when working primarily with non-overlapping genres, Whisper (large V2 or V3) performs admirably. However, it may miss some potentially interesting phenomena for linguists, such as question tags or paralinguistic nuances like laughing or coughing.

Adapting Whisper output to a TextGrid file can be somewhat challenging, but there are options available that can automate this process. For instance, Whisper transcriptions can be imported into ELAN files, from which they can then be exported into TextGrid files.

One of the primary issues with Whisper transcriptions is that they do not separate speakers. This task may need to be done manually.

5.2 Alignment

At present, Oralstats requires data to be structured in three aligned layers: intonational phrases (optional), words (required), and phones (required). These layers must be synchronized with the audio. Currently, there are three methods available for performing this alignment, which are outlined in the following sections.

5.2.1 PRAAT alignment by hand

Aligning long files such as interviews or conversations manually, annotating and aligning words and phonemes one by one, is not feasible and would be an unattainable task.

5.2.2 PRAAT using interval alignment

While the process is relatively straightforward, it is time-consuming. However, experienced users of PRAAT will appreciate being able to perform the alignment within their familiar environment. As of the time of writing this document, PRAAT alignment cannot be automated through scripting and can only be carried out within the TextGrid editor.

5.2.3 Using Forced Montreal Alignment

Learning curve: a bit steep. However, the benefits are truly remarkable. It’s worth noting that languages like Catalan are still not fully supported by native acoustic models, which may pose some challenges for users seeking to work with such languages.

5.2.4 SPPAS

SPPAS, a scientific computer software package, offers automated speech annotation and analysis. Continuously evolving, SPPAS aims to deliver a robust and dependable tool for researchers. It is freely available with open-source code. Whether you require automatic speech annotation, analysis of annotated data, or conversion of file formats, SPPAS simplifies the process.

5.2.5 Using Webmaus

The process is identical to Forced Montreal Alignment and is supported by the CLARIN infrastructure. However, it’s important to note that the results may not be as accurate as those obtained from Montreal alignment.

5.3 Google colab script

To automatically generate the necessary files to work with Oralstats, we have established a basic pipeline. This pipeline begins with an audio file and proceeds to generate a transcription using Whisper (attempting to run it directly in your terminal, although success is not guaranteed). Subsequently, this output will be aligned both by word and by phone using Montreal forced alignment.

The resulting files will be TextGrid files, which will not be separated by speaker. This task will need to be carried out manually. However, if your recording consists of a monologue, this won’t cause any problem.

6 Libraries

Libraries in R, also known as packages, are collections of functions, data sets, and other tools designed to extend the capabilities of R programming language.

library(phonfieldwork)
library(tidyverse)
library(udpipe)
library(data.table)
library(pool)
library(tidytext)
library(duckdb)
library(patchwork)
library(DataExplorer)
library(party)
library(ggparty)
library(FactoMineR)
library(factoextra)
library(gridExtra)

7 Where do your dataset come from?

7.1 From duckdB database

This option considers that there has been done a previous converting process with Oralstats; so, that there are already data frames created for intonational phrases, words, phones and turns. Frequently, the next step of using this method is to access directly Statistics (Section 10) section or visualization with Oralstats_Viewer (Section 11). Uncomment the following lines if it is the case that you have already created a database (and do not execute importing from TextGrid or TXT section):

# mydb <- dbConnect(drv = duckdb(),  "oralstats.duckdb")
# words <- mydb%>% tbl("oralstats_words")%>%collect()
# ips <- mydb%>% tbl("oralstats_ips")%>%collect()
# phones <- mydb%>% tbl("oralstats_phones")%>%collect()
# dbDisconnect(mydb, shutdown = TRUE)

7.2 Transcriptions from TextGrid

This method relies on an external library (phonfieldwork). Please note that you may need to make slight modifications to the code below for speaker variable definition. Currently, Oralstats expects the following labels for each tier:

1.  "Speaker - utterance"
2.  "Speaker - word"
3.  "Speaker - phone"

For example, if you’re working with a TextGrid file from the Catalan folder containing an interview with Rosalia, you might encounter tier names such as “R - word” and “R - phone”. Since this TextGrid originates from WebMaus, there may not be an “utterance” tier.

# dirtextgrids <- "textgrid/spanish"
# # dirtextgrids <- "textgrid/catalan"
# 
# textgrids <- list.files(dirtextgrids,full.names = TRUE,pattern = ".*textgrid|.*TextGrid")
# data <- map2_df(basename(textgrids),textgrids, ~ textgrid_to_df(.y)%>%mutate(filename=.x))
# data <-data %>% filter(content != "",grepl("(utterances|word|phone)", tier_name))
# data <-data %>% mutate(filename = gsub(".textgrid", "", filename),
# filename = gsub(".TextGrid", "", filename)) %>% select(-source, -id, -tier)  %>% mutate(spk = gsub(" -.*", "", tier_name), spk = paste(filename, spk, sep = "_"))
# data <- data %>% na.omit()
# rm(dirtextgrids)
# rm(textgrids)

7.3 Transcriptions from TXT

Uncomment the following lines if your data does not come from a TextGrid file, but you have a .txt file with these columns separated by a tab character. You can download these files from PRAAT converting TextGrid to table and then downloaded it as a tab separated text. This action will download a UTF-16 txt file that should be converted to UTF-8 format (it can be easily done with a text editor like Sublime Text (MAC) or Notepad++(Windows):

tmin tier text tmax
dirtextgrids <- "textgrid/spanish"
txts <- list.files(dirtextgrids,full.names = TRUE,pattern = ".*txt")
data <- map2_df(basename(txts),txts, ~fread(.y,sep = "\t",header =  TRUE)%>% mutate(filename = .x))
data <- data %>% filter(text != "",grepl("(utterances|word|phone)", tier))%>%rename(content=text)
data <- data  %>% mutate(filename = gsub("TXT", "", filename),
filename = gsub(".txt", "", filename)) %>%rename(tier_name=tier,time_start=tmin,time_end=tmax)%>% mutate(spk = gsub(" -.*", "", tier_name), spk = paste(filename, spk, sep = "_"))%>%relocate(time_end,.after = time_start)%>%relocate(tier_name,.after = content)
data <- data %>% na.omit()

7.4 Prosody files

7.4.1 Get pitch tier and intensity tier files from PRAAT Script

This section will create pitch and intensity files for each WAV file uploaded in a specified folder. The Speakr library (Coretta 2024) is used to connect R with PRAAT, but it is not obligatory to use it that way. You can find the corresponding PRAAT script in the ‘scripts_PRAAT’ folder and run it from there. By default, this chunk won’t be executed unless you change ‘eval’ from false to true.

library(speakr)

script <- system.file("extdata", "scripts_PRAAT/create_pitchtiers_intensitytiers.praat", package = "speakr")
pitchl <- "prosody_files/pitch"
pitchinterpolate <- "prosody_files/pitch"
intensity <- "prosody_files/pitch"
dir.create(file.path(pitch), showWarnings = FALSE)
dir.create(file.path(pitchinterpolate), showWarnings = FALSE)
dir.create(file.path(intensity), showWarnings = FALSE)
# setwd(file.path(mainDir, subDir))
start_praat()
praat_run(script = "scripts_PRAAT/create_pitchtiers_intensitytiers.praat")

old_files_interpolate <- list.files("prosody_files/pitchinterpolate/", pattern = "*.txt")
new_files_interpolate <- gsub("pitchinterpolate","",old_files_interpolate)
new_files_interpolate  <- gsub("\\.wav","",new_files_interpolate )

file.copy(from = paste0("prosody_files/pitchinterpolate/",old_files_interpolate), to = paste0("prosody_files/pitchinterpolate/",new_files_interpolate ))

old_files_pitch <- list.files("prosody_files/pitch/", pattern = "*.txt")
new_files_pitch <- gsub("pitch","",old_files_pitch)
new_files_pitch  <- gsub("\\.wav","",new_files_pitch )

file.copy(from = paste0("prosody_files/pitch/",old_files_pitch), to = paste0("prosody_files/pitch/",new_files_pitch ))

old_files_intensity <- list.files("prosody_files/intensity/", pattern = "*.txt")
new_files_intensity <- gsub("intensity","",old_files_intensity)
new_files_intensity  <- gsub("\\.wav","",new_files_intensity )

file.copy(from = paste0("prosody_files/intensity/",old_files_intensity), to = paste0("prosody_files/intensity/",new_files_intensity ))

7.4.2 Pitch

In this section, pitch information from PitchTier files is imported into the same dataframe:

# Define pitch or pitchinterpolate (uncomment the line below if it is the case)

dirpitch <- "prosody_files/pitch"
# dirpitch <- "pitchinterpolate"

pitch_list <- list.files(dirpitch,full.names = TRUE,pattern = ".*txt|.*TXT")
pitch <- map2_df(basename(pitch_list),pitch_list,~ fread(.y) %>% mutate(filename = .x))
pitch <- pitch %>% rename(time = V1, pitch = V2) %>% mutate(
            time = round(time, 2),
            time_ms = round(time, 2) * 1000,
            filename = gsub("\\.txt", "", filename),pitch=round(pitch,1))
pitch <- pitch%>%mutate(pitch_st= 12*log2(pitch/1))

7.4.3 Intensity

In this section, intensity information from IntensityTier files is imported into the same dataframe:

intensity_list <- list.files("prosody_files/intensity",full.names = TRUE,pattern = ".*txt|.*TXT")
intensity <- map2_df(basename(intensity_list),intensity_list, ~ fread(.y) %>% mutate(filename = .x)) %>% rename(time = `Time (s)`, intensity = `Intensity (dB)`) %>% mutate(time = round(time, 2),time_ms = round(time, 2) * 1000, filename = gsub("\\.txt", "", filename),intensity=round(intensity,1))%>%select(-rowLabel)

7.4.4 Prosody combined

In this section, we pair each pitch point with its corresponding intensity point.

prosody <- pitch[intensity, on = c('filename', 'time'), nomatch = 0]
prosody[, c("filename","time", "time_ms","pitch","intensity")]
prosody[order(filename,time), id_prosody := paste(filename,time_ms,sep="_")]
prosody[,i.time_ms := NULL]

7.4.5 Interpolate

By applying this section, missing pitch and intensity points can be interpolated. Although this process is an approximation for the data, it enables the generation of more precise units and processing parts for melodic analysis of speech and TOBI annotation.

interpolate_by_filename <- function(data) {
new_times <- seq(min(data$time), max(data$time), by=0.01)
# Linear interpolation for pitch using the 'approx' function
interpolated_pitch <- approx(data$time, data$pitch, xout = new_times, method = "linear")
# Linear interpolation for intensity using the 'approx' function
interpolated_intensity <- approx(data$time, data$intensity, xout = new_times, method = "linear")
# Create a new data frame for the interpolated values
interpolated_df <- data.frame(filename = unique(data$filename),time = rep(new_times, each = length(unique(data$filename))), pitch = rep(interpolated_pitch$y, each = length(unique(data$filename))), intensity = rep(interpolated_intensity$y, each = length(unique(data$filename)))) 
      }
# Group by filename and apply the interpolation function
interpolated <- prosody %>% group_split(filename) %>%
map_df(interpolate_by_filename)

prosody <- interpolated%>%mutate(time_ms= time*1000,id_prosody= paste(filename,time_ms,sep="_"),pitch=round(pitch,1),intensity=round(intensity,1))
interpolated <- NULL
pitch <- NULL
intensity <- NULL
rm(dirpitch,intensity,intensity_list,interpolated,pitch,pitch_list,textgrids, interpolate_by_filename, intensitydir,pitchinterpolate,pitchoriginal,script)  

8 Speech units

The speech units covered by Oralstats range from smaller ones, such as phones, to broader ones, such as speech turns.

Speech units
  1. Words. Words aligned to timeline.

  2. Phones. Phones aligned to timeline.

  3. Ips (intonational phrases). Basically, combination of words consecutive between silences of circa 150 milliseconds.

  4. Speech turns. Combination of ips defined by change of speaker.

8.1 Define language

As of now, Oralstats attempts to recognize the tonic structure or accent of languages such as Spanish or Catalan. If you are using data from another language, you can attempt to modify the computational rules below, or alternatively, add them manually (although this process is more time-consuming):

language <- "spanish"
# language <- "catalan"

8.2 Words

First of all, Oralstats will filter a word data frame from the general input acquired from Section 1.

words <-    data %>% filter(grepl("word", tier_name)) %>% mutate(id_word = paste(spk, "word", time_start, time_end, sep ="_"),time_start_word=time_start,time_end_word=time_end,word=content)

8.2.1 Assign accent

Words need to have accents. At this point, Oralstats automatically assigns an accent to each word, considering two possible options: Catalan or Spanish. If you are using Oralstats for another language, you can configure specific rules here for assigning accent structure to the word (oxytone, paroxytone, proparoxytone), or you can perform this task using a lexicon.

setDT(words)

if(language=="spanish"){

words[, vowels_structure := gsub("[^áéíóúaeiouÁÉÍÓÚAEIOUÜü]", "", word)]
words[,accent := ifelse(grepl("[aeiou][^aeiouáéíóúns]$", word) & !grepl("[áéíóú]", vowels_structure), "oxitone",NA)]
words[,accent := ifelse(grepl("[áéíóú][ns]$", word),"oxitone", accent)]
words[,accent := ifelse(grepl("[áéíóú]$", word),"oxitone", accent)]
words[,accent := ifelse(grepl("[aeons]$", word) & !grepl("[áéíóú]", vowels_structure),"paroxitone",accent)]
words[,accent :=  ifelse(grepl("[áéíóú][aeiou][aeiou]", vowels_structure),"proparoxitone",accent)]
words[,accent :=  ifelse(vowels_structure %in% c("a", "e", "u", "o", "i"),"oxitone", accent)]
words[,accent :=  ifelse(vowels_structure == "í" |vowels_structure == "ú","oxitone", accent)]
words[,accent :=  ifelse(
grepl("ía$", vowels_structure) |
grepl("íe$", vowels_structure) |
grepl("ío$", vowels_structure) |
grepl("úa$", vowels_structure) | grepl("úe$", vowels_structure) | grepl("úo$", vowels_structure), "paroxitone",accent)]
words[,accent := ifelse(word %in% c( "tu", "pero", "desde", "nuestro", "vuestro", "aun", "medio", "tan", "mi", "su", "me", "te", "se", "el", "la", "los", "las", "que", "a", "ante", "con", "de", "pues", "por", "para", "desde", "en", "al", "san", "porque", "y", "o", "u", "donde", "cuando", "como", "entre", "del"),
            "non_tonic",
            accent
)]} 

if(language=="catalan")
  
{
setDT(words) 
words[, vowels_structure := gsub("[^àáèéíòóúaeiouÀÁÈÉÍÒÓÚAEIOUÜü]", "", word)]
words[,accent := ifelse(grepl("(à|é|è|í|ó|ò|ú|én|èn|ín|às|és|ès|ís|ós|òs|ús)$", word), "oxitone",NA)]
words[,accent := ifelse(grepl("(a|e|i|o|u|en|in|as|es|is|os|us|ia)$",word) & !grepl("[àéèíòóú]",vowels_structure), "paroxitone",accent)]
words[,accent := ifelse(grepl("(à|é|è|í|ó|ò|í|ú)[aeiou]$",vowels_structure) , "paroxitone",accent)]
words[,accent := ifelse(grepl("(à|é|è|í|ó|ò|í|ú)[aeiou][aeiou]$", vowels_structure), "proparoxitone",accent)]
words[,accent := ifelse(vowels_structure %in% c("a","e","i","o","u","à","è","ò","é","í","ó","ú"),"oxitone",accent)]
words[,accent := ifelse(word %in% c("A","a","El","el","la","La","Els","els","com","per","amb","Amb","em","al","quan","doncs","Per","del","Doncs","ets","se'm","tan","Em","LA","dels","I","i"),"non_tonic",accent)]
words[,accent := ifelse(is.na(accent),"oxitone",accent)]
  
  
} 

setDF(words)
words <- words%>%select(-word)
# From lexicon. It must have to columns (word and accent).
# words <- words %>% left_join(lexicon,by="word")
setDT(words)

8.2.2 Assign ip id and ip text

In this section, and from previous words data frame, it will compute how many words come together before a pause of at least 0.15 (this treshold can be modified in the code below). This operation will create intonational phrases along with an specific id for them.

# Ips_coming_from <- "transcriptions or textgrids"

Ips_coming_from <- "words"

if(Ips_coming_from!="words")
  
{

ips <-    data %>% filter(grepl("utterance", tier_name) | grepl("_ip_", tier_name)) %>% mutate(id_ip = paste(spk, "ip", time_start, time_end, sep ="_"),time_start_ip=time_start,time_end_ip=time_end,ip=content)
words <- words %>%arrange(filename,time_start)%>% mutate(pause = lead(time_start) - time_end)
  
setDT(words) # make a data.table
setDT(ips) # make a data.table
# setDT(ips)
setkey(ips, filename,spk,time_start, time_end)
setkey(words, filename,spk,time_start, time_end)
# setkey(ips, filename, spk,time_start, time_end)
words_1 <- foverlaps(ips, words, nomatch = NA)%>%select(-i.time_start,-i.time_end,-i.tier_name)
words <- words_1 %>%group_by(id_ip)%>%mutate(token_id=row_number(),token_id=as.character(token_id))%>%ungroup()%>%arrange(filename,time_start) 
rm(words_1)

}

if(Ips_coming_from=="words")

{
words <- words %>%arrange(filename,time_start)%>% mutate(pause = lead(time_start) - time_end,changeip = ifelse(pause > 0.15 | lead(filename)!=filename, "change", NA),changeturn = ifelse(lead(spk)!=spk|lead(filename)!=filename, "change", NA)) %>% group_by(filename, spk, changeip) %>%
          mutate(id_ip = row_number(),
                 id_ip = ifelse(
                   changeip != "change",
                   NA,
                   paste(spk,time_start, time_end, id_ip, sep = "_")
                 )) %>%ungroup()%>%fill(id_ip, .direction = "up")%>%group_by(id_ip)%>%mutate(token_id=row_number(),token_id=as.character(token_id))%>%ungroup()%>%arrange(filename,time_start) %>% group_by(filename, id_ip) %>% mutate(ip = paste(content, collapse = " ")) %>%ungroup()%>%group_by(filename,id_ip)%>%mutate(id_ip= paste(spk,first(time_start), last(time_end), sep = "_"),time_start_ip=first(time_start),time_end_ip=last(time_end))%>%ungroup()}
rm(Ips_coming_from)

8.2.3 Assign turn id and turn text

In this section, turns will be taken directly from words. It computes the quantity of words uttered before a change of speaker. If data does not come from a dialogue, there will be no turns, as it will be a continuous monologue (that will be computed at any case).

words <- words %>% group_by(filename, spk, changeturn) %>%
          mutate(id_turn = row_number(),
                 id_turn = ifelse(
                   changeturn != "change",
                   NA,
                   paste(spk, time_start, time_end, id_turn, sep = "_")
                 ))%>%ungroup()%>%fill(id_turn, .direction = "up")%>%group_by(id_turn)%>%ungroup()%>%arrange(filename,time_start) %>% group_by(filename, id_turn) %>% mutate(pause_cod=ifelse(changeip=="change","/ ",""),pause_cod=ifelse(is.na(pause_cod),"",pause_cod),turn = paste(paste(content,pause_cod,sep=""), collapse = " ")) %>%ungroup()%>%group_by(filename,id_turn)%>%mutate(id_turn= paste(spk,first(time_start), last(time_end), sep = "_"))%>%ungroup()%>%arrange(filename,time_start)

8.2.4 Morphosyntactic tagging

8.2.4.1 Charge UdPipe model and select language

To automatically tag the words, Oralstats uses UdPipe (Wijffels 2023):

modeldownl <- udpipe_download_model(language = language,model_dir = "udpipe/")
spmodel <- udpipe_load_model(modeldownl)

8.2.4.2 POS tag

ips <- words%>%group_by(id_ip,ip)%>%summarise(ip=first(ip))%>%ungroup()
tokenized_udpipe <- udpipe_annotate(x=ips$ip,object = spmodel, tokenizer = "horizontal",tagger = "default", doc_id=ips$id_ip)%>%as.data.frame()%>%rename(id_ip=doc_id)
setDT(tokenized_udpipe)
setDT(words)
words <- words[tokenized_udpipe[,c("id_ip","upos","lemma","token_id")], on = c('id_ip',"token_id"), nomatch = 0]
words<- words%>%mutate(accent=ifelse(upos%in%c("ADP","CCONJ"),"non_tonic",accent),accent=ifelse(content=="según","oxitone",accent),accent=ifelse(content%in%c("lo","los","si","le","les","mis","mi"),"non_tonic",accent),first_word=ifelse(token_id==1,"yes","no"))%>%group_by(id_ip)%>%mutate(toneme_word = ifelse(id_word==last(id_word)&accent!="non_tonic","yes","no"))%>%ungroup()

words_non_tonic_first <- words%>%filter(accent=="non_tonic") %>%group_by(id_ip)%>%mutate(first_non_tonic_ip=ifelse(id_word ==first(id_word)&token_id==1,"yes","no"))%>%ungroup()%>%select(id_word,first_non_tonic_ip)

words_tonic_first <- words%>%filter(accent!="non_tonic") %>%group_by(id_ip)%>%mutate(first_tonic_ip=ifelse(id_word ==first(id_word)&token_id==1,"yes","no"))%>%ungroup()%>%select(id_word,first_tonic_ip)

words_tonic_non_first <- words%>%filter(accent!="non_tonic") %>%group_by(id_ip)%>%mutate(tonic_ip_no_first=ifelse(id_word ==first(id_word)&token_id>1,"yes","no"))%>%ungroup()%>%select(id_word,tonic_ip_no_first)

words <- words%>%left_join(words_non_tonic_first,by="id_word")
words <- words%>%left_join(words_tonic_first,by="id_word")
words <- words%>%left_join(words_tonic_non_first,by="id_word")

rm(spmodel,tokenized_udpipe,modeldownl,words_non_tonic_first,words_tonic_first,words_tonic_non_first)

8.2.5 Sentiment tagging

In this section, a very basic sentiment annotation derived from a lexicon will be added to the word structure. Currently, there is no application of quantifiers.

sentiment <- read.csv("sentiment/sentiment.csv",header = TRUE,sep=",")
# sentiment <- read.csv("sentiment/sentiment_cat.csv",header = TRUE,sep=",")
sentiment <- sentiment %>%distinct(content,.keep_all = T)
setDT(sentiment)

words <- merge(words, sentiment,by = "content", all.x = T)
words <- words%>%arrange(filename,time_start)
rm(sentiment)

8.3 Words with proximities

With corpus approach, although not strictly covered in this version of Oralstats, proximity words are included in the general words data frame.

setDF(words)
words <- words%>%group_by(id_ip)%>%mutate(word=content,
  wordleft2 = lag(word,2),
  wordleft = lag(word,1),
  wordright = lead(word,1),
  wordright2 = lead(word,2),
  uposleft2 = lag(upos,2),
  uposleft = lag(upos,1),
  uposright = lead(upos,1),
  uposright2 = lead(upos,2)
  
)%>%ungroup()

8.4 Vowels

phones <- data %>% filter(grepl("phone", tier_name)) %>% mutate(id_phone = paste(spk, "phone", time_start*1000, time_end*1000, sep ="_"),category = ifelse(grepl("[aeiou@EO]", content),"vowel", ifelse(grepl("[jw]", content), "glide", "consonant")),category=ifelse(grepl("[nlñ]",content),"vowel",category),time_start = time_start + 0.001, time_end = time_end - 0.001)

# words<-words%>%mutate(time_start_word = time_start, time_end_word = time_end,pause=lead(time_start)-time_end)
# ips <- ips%>% mutate(time_start_ip = time_start, time_end_ip = time_end)

8.4.1 Combine with words

setDT(words) # make a data.table
setDT(phones) # make a data.table
# setDT(ips)
setkey(words, filename,spk,time_start, time_end)
setkey(phones, filename,spk,time_start, time_end)
# setkey(ips, filename, spk,time_start, time_end)
phones_1 <- foverlaps(words, phones, nomatch = NA)%>%select(-i.time_start,-i.time_end,-i.tier_name)
# setkey(phones_1, filename, spk, time_start, time_end)
# phones_2 <- foverlaps(ips,phones_1,nomatch=NA)%>%mutate_if(is.numeric,round,2)%>%select(-i.time_start,-i.time_end,-i.tier_name)%>%rename(ip=i.content)
  
vowels <- phones_1 %>% filter(category=="vowel",grepl("[aeiou@EOIAU]",content)|(grepl("[nlñ]",content) & lead(id_word)!=id_word) )%>%arrange(filename,time_start)

phones <- phones_1
rm(phones_1)

8.4.2 Combine vowels and prosody

setDT(vowels) # make a data.table
setDT(prosody) # make a data.table
prosody <- prosody[, dummy := time]
prosody <- prosody[order(filename,time)]  # sorting by time so I can choose first match
setkey(vowels, filename, time_start, time_end)  # setting keys tells data.table what to join on
setkey(prosody, filename, time, dummy) # setting keys tells data.table what to join on
summary <- foverlaps(vowels, prosody, nomatch = NA)[, dummy := NULL]
summary <- summary[, dur := time_end - time_start]
summary <- summary[,quarter := ifelse(time <= (time_start + (0.25 * dur)), "q1", NA)]
summary <- summary[,quarter := ifelse(time >= (time_start + (0.25 * dur)) & time <= (time_start + (0.5 * dur)),"q2",quarter)]
summary <- summary[,quarter := ifelse(time >= (time_start + (0.5 * dur)) &
time <= (time_start + (0.75 * dur)),"q3",quarter)]
summary <- summary[,quarter := ifelse(time >= (time_start + (0.75 * dur)) &
time <= (time_start + (dur)), "q4",quarter)]
q1alof <- summary[quarter == "q1", .(q1piHz = mean(pitch, na.rm = TRUE),q1piHz_time = mean(time, na.rm = TRUE)),by=(id_phone)]
q2alof <- summary[quarter == "q2", .(q2piHz = mean(pitch, na.rm = TRUE),q2piHz_time = mean(time, na.rm = TRUE)),by=(id_phone)]
q3alof <- summary[quarter == "q3", .(q3piHz = mean(pitch, na.rm = TRUE),q3piHz_time = mean(time, na.rm = TRUE)),by=(id_phone)]
q4alof <- summary[quarter == "q4", .(q4piHz = mean(pitch, na.rm = TRUE),q4piHz_time = mean(time, na.rm = TRUE)),by=(id_phone)]

data2 = summary[, .(pitch_mean_HZ = mean(pitch, na.rm = TRUE),IQR_Hz = IQR(pitch, na.rm = TRUE), intensity_mean_dB = mean(intensity, na.rm = TRUE),IQR_dB = IQR(intensity, na.rm = TRUE)), by=.(id_phone,id_ip,id_word,filename,spk,content,word,ip,first_tonic_ip,first_non_tonic_ip,tonic_ip_no_first,time_start,time_end,time_start_word,time_end_word,time_start_ip,time_end_ip,toneme_word,category,accent,upos,lemma,sentiment)]%>%mutate_if(is.numeric,round,2)
setDF(data2)
data2 <- data2%>%left_join(q1alof,by="id_phone")
data2 <- data2%>%left_join(q2alof,by="id_phone")
data2 <- data2%>%left_join(q3alof,by="id_phone")
data2 <- data2%>%left_join(q4alof,by="id_phone")

data2 <- data2%>%group_by(id_ip)%>%mutate(pitch_mean_HZ=ifelse(is.na(pitch_mean_HZ),mean(pitch_mean_HZ,na.rm = TRUE),pitch_mean_HZ),q1piHz=ifelse(is.na(q1piHz),mean(q1piHz,na.rm = TRUE),q1piHz),q2piHz=ifelse(is.na(q2piHz),mean(q2piHz,na.rm = TRUE),q2piHz),q3piHz=ifelse(is.na(q3piHz),mean(q3piHz,na.rm = TRUE),q3piHz),q4piHz=ifelse(is.na(q4piHz),mean(q4piHz,na.rm = TRUE),q4piHz),intensity_mean_dB=ifelse(is.na(intensity_mean_dB),mean(intensity_mean_dB,na.rm = TRUE),intensity_mean_dB))%>%ungroup()



rm(q1alof,q2alof,q3alof,q4alof,summary)

8.4.3 Assigning tonicity

setDT(data2)
data2[, first := ifelse(id_phone==first(id_phone),"yes","no"), by=id_word]
# data2[, first_ip := ifelse(id_phone==first(id_phone),"yes","no"), by=id_ip]
data2[, last := ifelse(id_phone==last(id_phone),"yes","no"), by=id_word]
data2[, tonic := ifelse(last=="yes"&accent=="oxitone","yes","no")]
data2[, tonic := ifelse(accent=="paroxitone"&shift(last,1,type="lead")=="yes","yes",tonic), by=id_word]
data2[, tonic := ifelse(accent=="proparoxitone"&shift(last,2,type="lead")=="yes","yes",tonic), by=id_word]

data2[, tonic := ifelse(is.na(tonic),"no",tonic)]
setDF(data2)
data2 <- data2%>%relocate(tonic, .after = "content")
setDT(data2)

8.4.4 Compute tonal differences

setDT(data2)
percent_treshold <- 10

data2[,q1piHz := ifelse(is.na(q1piHz),rowMeans(data2[,c("q2piHz","q3piHz","q4piHz")],na.rm = TRUE),q1piHz)]
            data2[,q2piHz := ifelse(is.na(q2piHz),rowMeans(data2[,c("q1piHz","q3piHz","q4piHz")],na.rm = TRUE),q2piHz)]
            data2[,q3piHz := ifelse(is.na(q3piHz),rowMeans(data2[,c("q2piHz","q1piHz","q4piHz")],na.rm = TRUE),q3piHz)]
            data2[,q4piHz := ifelse(is.na(q4piHz),rowMeans(data2[,c("q2piHz","q3piHz","q1piHz")],na.rm = TRUE),q4piHz)]
            data2[,center_Hz := rowMeans(data2[,c("q2piHz","q3piHz")],na.rm=TRUE)]
            data2[, ':='(
              
              inflexion_percent_Hz_from_prev = ((center_Hz - shift(center_Hz, 1,type="lag"
              )) / shift(center_Hz, 1,type="lag")) * 100,
              inflexion_ST_from_prev =  12 * log2(center_Hz / shift(center_Hz, 1,type="lag")),
              inflexion_percent_Hz_to_next = ((shift(
                center_Hz, 1,type="lead"
              ) - center_Hz) / center_Hz) * 100)]
            
            data2[,':='(
              inflexion_percent_Hz_to_next = ifelse(
                accent == "oxitone",
                ((q4piHz - q1piHz) / q1piHz) * 100,
                inflexion_percent_Hz_to_next
              ),
              inflexion_ST_to_next =  12 * log2(shift(center_Hz, 1,type="lead") /
                                                  center_Hz))]
            
            data2[,':='(
              inflexion_ST_to_next = ifelse(
                accent == "oxitone",
                12*log2(q4piHz/q1piHz),
                inflexion_ST_to_next
              ),
              q1piHz_q2piHZ = ((q2piHz - q1piHz) / q2piHz) * 100,
              q2piHz_q3piHZ = ((q3piHz - q2piHz) / q3piHz) * 100,
              q3piHz_q4piHZ = ((q4piHz - q3piHz) / q4piHz) * 100,
              q1piHz_q3piHZ = ((q3piHz - q1piHz) / q3piHz) * 100,
              q1piHz_q4piHZ = ((q4piHz - q1piHz) / q4piHz) * 100,
              q2piHz_q4piHZ = ((q4piHz - q2piHz) / q4piHz) * 100,
              q1piHz_q4piHZ_ST = 12*log2(q4piHz/q1piHz),
              q1piHz_q2piHZ_ST = 12*log2(q2piHz/q1piHz),
              q2piHz_q3piHZ_ST = 12*log2(q3piHz/q2piHz),
              q3piHz_q4piHZ_ST = 12*log2(q4piHz/q3piHz),
              q2piHz_q4piHZ_ST = 12*log2(q4piHz/q2piHz)
            )]
setDF(data2)
data2 <- data2%>%mutate(circumflex =  
        case_when(
q1piHz_q2piHZ <= -percent_treshold & q2piHz_q3piHZ >= percent_treshold & q3piHz_q4piHZ <= -percent_treshold ~ "desc_asc_desc",
q1piHz_q2piHZ >= percent_treshold & q2piHz_q3piHZ <= -percent_treshold & q3piHz_q4piHZ >= percent_treshold ~ "asc_desc_asc",
q1piHz_q2piHZ >= percent_treshold & between(q2piHz_q3piHZ,-percent_treshold,percent_treshold) & q3piHz_q4piHZ <= -percent_treshold ~ "asc_desc",
q1piHz_q2piHZ >= percent_treshold & q2piHz_q3piHZ <= -percent_treshold & q3piHz_q4piHZ <= -percent_treshold ~ "asc_desc",
q1piHz_q2piHZ >= percent_treshold & between(q2piHz_q3piHZ,-percent_treshold,percent_treshold) & between(q3piHz_q4piHZ,-percent_treshold,percent_treshold) ~ "asc_q1",
 between(q1piHz_q2piHZ,-percent_treshold,percent_treshold)&  q2piHz_q3piHZ >= percent_treshold &between(q3piHz_q4piHZ,-percent_treshold,percent_treshold) ~ "asc_q2",
 between(q1piHz_q2piHZ,-percent_treshold,percent_treshold)& between(q2piHz_q3piHZ,-percent_treshold,percent_treshold) & q3piHz_q4piHZ >= percent_treshold ~ "asc_q3",
q1piHz_q2piHZ <= -percent_treshold & between(q2piHz_q3piHZ,-percent_treshold,percent_treshold) & between(q3piHz_q4piHZ,-percent_treshold,percent_treshold) ~ "desc_q1",
q1piHz_q2piHZ <= -percent_treshold & between(q2piHz_q3piHZ,-percent_treshold,percent_treshold) & q3piHz_q4piHZ <= -percent_treshold ~ "desc_q1q3",
 between(q1piHz_q2piHZ,-percent_treshold,percent_treshold)&  q2piHz_q3piHZ <= -percent_treshold &between(q3piHz_q4piHZ,-percent_treshold,percent_treshold) ~ "desc_q2",
 between(q1piHz_q2piHZ,-percent_treshold,percent_treshold)& between(q2piHz_q3piHZ,-percent_treshold,percent_treshold) & q3piHz_q4piHZ <= -percent_treshold ~ "desc_q3",
 between(q1piHz_q2piHZ,-percent_treshold,percent_treshold) & q2piHz_q3piHZ >= percent_treshold & q3piHz_q4piHZ >= percent_treshold ~ "asc_q2q3",
q1piHz_q2piHZ >= percent_treshold & q2piHz_q3piHZ >= percent_treshold & between(q3piHz_q4piHZ,-percent_treshold,percent_treshold) ~ "asc_q1q2",
q1piHz_q2piHZ >= percent_treshold & q2piHz_q3piHZ >= percent_treshold & q3piHz_q4piHZ >= percent_treshold ~"asc_q1q2q3",
q1piHz_q2piHZ <= -percent_treshold & q2piHz_q3piHZ <= -percent_treshold & q3piHz_q4piHZ <= -percent_treshold ~"desc_q1q2q3",
q1piHz_q2piHZ <= -percent_treshold & q2piHz_q3piHZ <= -percent_treshold & between(q3piHz_q4piHZ, -percent_treshold,percent_treshold) ~"desc_q1q2",
between(q1piHz_q2piHZ, -percent_treshold,percent_treshold) & between(q2piHz_q3piHZ, -percent_treshold,percent_treshold)&between(q3piHz_q4piHZ, -percent_treshold,percent_treshold) ~ "no_circumflex",
is.na(center_Hz) ~"non_applicable"))
setDT(data2)
 data2[,dur := time_end - time_start]
            data2[order(filename,time_start), midpoint := (time_start + time_end)/2]
            data2[order(filename,time_start), distancias := midpoint - shift(midpoint,1,type="lag")]
             data2[order(filename,time_start), pause_prev := time_start - shift(time_end,1,type="lag")]
             data2 <-data2%>%group_by(id_ip)%>%mutate(order_ip=row_number())%>%ungroup()%>%mutate(distancias := ifelse(order_ip==1,distancias-pause_prev,distancias))
            setDT(data2)
            data2[order(filename,time_start), ':=' ( 
              dur_percent_from_prev = ((distancias - shift(
                distancias, 1,type="lag"
              )) / shift(distancias, 1,type="lag"))*100,
              intensity_percent_from_prev = ((intensity_mean_dB - shift(
                intensity_mean_dB, 1,type="lag"
              )) / shift(intensity_mean_dB, 1,type="lag")) * 100)]
setDF(data2)
data2 <- data2%>%mutate_if(is.numeric, round,2)
data2 <- data2%>%group_by(id_ip)%>%mutate(maximum_Hz = ifelse(center_Hz == max(center_Hz),"yes","no"),displacement = ifelse(tonic == "yes" & lead(center_Hz) > (center_Hz+ percent_treshold) & lead(tonic)!="yes","yes","no"),displacement=ifelse(is.na(displacement),"no",displacement))%>%ungroup()
tonics <- data2%>%filter(tonic=="yes")%>%group_by(id_ip)%>%summarise(q_tonic_vowels=n())%>%mutate(multiple_tonic=ifelse(q_tonic_vowels>1,"yes","no"))
data2 <- data2%>%left_join(tonics, by="id_ip")
setDT(data2)

8.4.5 MAS (Melodic Analysis of Speech)

Fco. José Cantero Serena (MAS creator)

Linguists and phonetics experts apply MAS to:

Functionality of MAS
  1. Identify melodic patterns: MAS allows for the identification and description of intonational patterns, such as rising tones, falling tones, flat tones, and variations in pitch height that can indicate questions, statements, affirmative or negative intonations, among others.

  2. Analyze intonation structure: It helps break down discourse into intonational units, such as word groups or phrases, and label these units with specific melodic patterns.

  3. Understand communicative intention: MAS can reveal how variations in intonation reflect the speaker’s intention, such as emphasizing information, expressing surprise, or indicating uncertainty.

  4. Linguistic research: It is used in linguistic research to understand intonational differences in different dialects or languages, as well as to study how intonation varies in different communicative contexts.

How are calculations made?

To do this, vowels and tonal segments produced in them are identified, relying on the sonogram, and the mean value is collected—if stable—or the central value. In the case of tonic vowels, an inflection can occur, and there may be two or even three tonal segments, circumflex inflection; these values ​​are taken from the ends of the inflection (Mateo Ruiz 2010: 51).

8.4.5.1 Micromelodic variations in Oralstats

For computing micromelodic variations within vowels, they have been split into four quarters. If the difference among them is higher than 10 percent, we consider this micromelodic variation significant and, for this reason, it can be considered a circumflex tone. The specific cases considered are described in the following chart:

8.4.5.2 Get MAS structure

We assign parts to vowels: anacrusis, body and toneme.

setDF(data2)
data2 <- data2%>%group_by(id_ip)%>%mutate(MAS_structure=
                                      case_when(.default = "body",
toneme_word=="yes"~"toneme",
first_non_tonic_ip=="yes" ~"anacrusis",
lead(first_tonic_ip)=="yes" ~"body",
lead(tonic_ip_no_first) =="yes"~"body",
first_tonic_ip=="yes" ~"body",
tonic_ip_no_first =="yes"~"body"
                                        
                                      ))

8.4.5.3 Basic computation

setDT(data2)
toneme <- data2[tonic=="yes" & toneme_word=="yes"]
body <- data2[multiple_tonic=="yes"  & MAS_structure=="body",c("id_phone","time_start","time_end","id_ip","content","upos","word","ip","center_Hz","displacement","circumflex","MAS_structure","intensity_mean_dB","tonic","inflexion_percent_Hz_from_prev")]

body2 <- body%>%group_by(id_ip,ip)%>%mutate(body=cumsum(inflexion_percent_Hz_from_prev))%>%summarise(body=last(body)/n())

body_saw <- body%>%group_by(id_ip,ip,circumflex,tonic)%>%summarise(freq=n())%>%group_by(id_ip,ip)%>%summarise(freq_total=sum(freq[tonic=="yes"]),freq_tonic_circumflex=sum(freq[circumflex!="no_circumflex"&tonic=="yes"]))%>%ungroup()%>%mutate(body_structure = ifelse(freq_tonic_circumflex/freq_total>0.5,"yes","no"))

anacrusis_non_tonic <- data2[multiple_tonic=="yes" & tonic=="no" & first_non_tonic_ip=="yes",c("id_phone","id_ip","content","upos","word","ip","center_Hz","displacement")]

first_tonic <- data2[multiple_tonic=="yes" & tonic=="yes" & first_tonic_ip=="yes",c("id_phone","id_ip","content","accent","upos","word","ip","center_Hz","displacement","inflexion_percent_Hz_from_prev")]

anacrusis_tonic <- data2[multiple_tonic=="yes" & tonic=="yes" & tonic_ip_no_first =="yes",c("id_phone","id_ip","accent","content","upos","word","ip","center_Hz","displacement","inflexion_percent_Hz_from_prev")]

MAS <- data2[tonic=="yes"&toneme_word=="yes",c("id_phone","id_ip","ip","content","word","accent","center_Hz","q1piHz_q4piHZ","circumflex","displacement","inflexion_percent_Hz_from_prev","inflexion_percent_Hz_to_next","multiple_tonic","intensity_mean_dB","time_start","time_end","id_word")]

setDF(MAS)
MAS <- MAS%>%left_join(anacrusis_non_tonic%>%select(center_Hz,displacement,word,id_ip)%>%rename(anacrusis_center_Hz = center_Hz,anacrusis_non_tonic_word=word,anacrusis_displacement = displacement),by="id_ip")

MAS <- MAS%>%left_join(body_saw,by="id_ip")

MAS <- MAS%>%left_join(anacrusis_tonic%>%select(center_Hz,displacement,word,id_ip,inflexion_percent_Hz_from_prev)%>%rename(anacrusis_tonic_center_Hz = center_Hz,anacrusis_tonic_inflexion_percent_Hz_from_prev = inflexion_percent_Hz_from_prev,anacrusis_tonic_displacement = displacement,anacrusis_tonic_word=word),by="id_ip")
MAS <- MAS%>%left_join(first_tonic%>%select(center_Hz,displacement,word,id_ip,inflexion_percent_Hz_from_prev)%>%rename(
  first_tonic_inflexion_percent_Hz_from_prev = inflexion_percent_Hz_from_prev,
  first_tonic_center_Hz = center_Hz,first_tonic_displacement = displacement,first_tonic_word=word),by="id_ip")

MAS <- MAS%>%mutate(anacrusis= anacrusis_tonic_inflexion_percent_Hz_from_prev,anacrusis= ifelse(is.na(anacrusis), first_tonic_inflexion_percent_Hz_from_prev,anacrusis),body = ((center_Hz - first_tonic_center_Hz)/first_tonic_center_Hz)*100, body=ifelse(is.na(body), ((center_Hz - anacrusis_tonic_center_Hz)/anacrusis_tonic_center_Hz)*100,body),toneme=ifelse(accent=="oxitone"& !grepl("[nl]$",word),q1piHz_q4piHZ,inflexion_percent_Hz_to_next))%>%mutate_if(is.numeric,round,2)%>%mutate(relocated = ifelse((center_Hz < first_tonic_center_Hz +10 & center_Hz > first_tonic_center_Hz -10) |(center_Hz < anacrusis_tonic_center_Hz +10 & center_Hz > anacrusis_tonic_center_Hz -10),"yes","no"),anacrusis_displacement= first_tonic_displacement,anacrusis_displacement=ifelse(is.na(anacrusis_displacement),anacrusis_tonic_displacement,anacrusis_displacement),anacrusis_displacement=ifelse(is.na(anacrusis_displacement),"no",anacrusis_displacement))


MAS <- MAS %>%mutate(body_2=body)%>%select(-body)%>%left_join(body2,by="id_ip")%>%mutate_if(is.numeric,round,2)

MAS <- MAS%>%distinct(id_phone, .keep_all = TRUE)

8.4.5.4 Standard values

data2 <- data2 %>%
  group_by(id_ip) %>%
  arrange(filename, time_start) %>%
  mutate(
    standard_q1_pitch = cumprod(c(100, (1 + ((((q1piHz - shift(q1piHz, 1,type="lag")) / shift(q1piHz, 1,type="lag")) * 100) / 100))[-1])),    
    standard_q2_pitch = cumprod(c(100, (1 + ((((q2piHz - shift(q2piHz, 1,type="lag")) / shift(q2piHz, 1,type="lag")) * 100) / 100))[-1])),
        standard_q3_pitch = cumprod(c(100, (1 + ((((q3piHz - shift(q3piHz, 1,type="lag")) / shift(q3piHz, 1,type="lag")) * 100) / 100))[-1])),
        standard_q4_pitch = cumprod(c(100, (1 + ((((q4piHz - shift(q4piHz, 1,type="lag")) / shift(q4piHz, 1,type="lag")) * 100) / 100))[-1])),
    standard_pitch = cumprod(c(100, (1 + (inflexion_percent_Hz_from_prev / 100))[-1])),
    standard_dur = cumprod(c(100, (1 + (dur_percent_from_prev / 100))[-1])),
    standard_intensity = cumprod(c(100, (1 + (intensity_percent_from_prev / 100))[-1]))
  ) %>%mutate_if(is.numeric,round,2)%>%
  ungroup()%>%relocate(standard_pitch,.after=center_Hz)

8.4.5.5 Assigning MAS patterns

Anacrusis Displacement Body multiple_tonic body_structure Toneme relocated circumflex Pattern
(anacrusis<40 | is.na(anacrusis)) no between(body,-60,-10) between(toneme,-40,15) no PI
(anacrusis<40 | is.na(anacrusis)) no between(body,-60,-10) >70 no PII
(anacrusis<40 | is.na(anacrusis)) yes between(body,-60,-10) between(toneme,40,60) no PIII
(anacrusis<40 | is.na(anacrusis)) yes between(body,-60,-10)

asc_desc,

desc_q3,

desc_q2

PIVa
between(body,-9,10)

asc_desc,

desc_q3,

desc_q2

PIVb
(anacrusis<40 | is.na(anacrusis)) no between(body,-60,-10) between(toneme,-9,9) PV
(anacrusis<40 | is.na(anacrusis)) no between(body,-60,-10) between(toneme,15,70) PVIa
(anacrusis<40 | is.na(anacrusis)) yes between(body,-60,-10) between(toneme,15,40) PVIb
(anacrusis<40 | is.na(anacrusis)) yes toneme < -10 yes PVII
(anacrusis<40 | is.na(anacrusis)) no no PVIII
(anacrusis<40 | is.na(anacrusis)) no between(body,-60,-10) toneme < -40 PIX
(anacrusis<40 | is.na(anacrusis)) yes between(body,-60,-10)

asc_desc,

desc_q3,

desc_q2

PXa
(anacrusis<40 | is.na(anacrusis)) yes yes

desc_asc,

asc_q3,

asc_q2,

asc_desc_asc

PXb
(anacrusis<40 | is.na(anacrusis)) yes toneme >60 yes PXI
(anacrusis<40 | is.na(anacrusis)) yes between(body,-9,10) no toneme < -10 PXIIa
between(body,-9,10) yes toneme< -10 PXIIb
between(body,-60,-10) yes toneme< -10 PXIIc
between(body,20,140) toneme >40 PXIII
setDF(MAS)
 
# First round

MAS <- MAS %>%
  mutate(
         pattern = case_when(
      ((between(anacrusis,-10,40) & anacrusis_displacement == "no")| is.na(anacrusis)) & (between(body_2,-80,10)|is.na(body)) & between(toneme, -40, 15) ~ "PI",
      ((between(anacrusis,-10,40) & anacrusis_displacement == "no")| is.na(anacrusis)) & (between(body_2,-80,10)|is.na(body)) & toneme > 70 ~ "PII",
      ((between(anacrusis,-10,40) & anacrusis_displacement == "no")| is.na(anacrusis)) & (between(body_2,-80,10)) & between(toneme, 40, 60) ~ "PIII",
     ((between(anacrusis,-10,40) & anacrusis_displacement == "no")| is.na(anacrusis)) & (between(body_2,-80,10)|is.na(body)) & circumflex %in% c("asc_desc", "desc_q3","desc_q2")  ~ "PIVa",
      (is.na(anacrusis) | between(anacrusis,-10,10)) & between(body_2, -3, 10) & circumflex %in% c("asc_desc","desc_q3","desc_q2")  ~ "PIVb",
      ((between(anacrusis,-10,40) & anacrusis_displacement == "no")| is.na(anacrusis))  & (between(body_2,-80,10)|is.na(body)) & (between(toneme, -9, 9) | (between(inflexion_percent_Hz_from_prev,-5,10) & between(toneme,-5,5))) ~ "PV",
     ((between(anacrusis,-10,40) & anacrusis_displacement == "no")| is.na(anacrusis)) & (between(body_2,-80,10)|is.na(body)) & between(toneme, 15, 70) ~ "PVIa",
      ((between(anacrusis,-10,40) & anacrusis_displacement == "yes")| is.na(anacrusis)) & (between(body_2,-80,10)|is.na(body)) & between(toneme, 15, 60) ~ "PVIb",
      ((between(anacrusis,-10,40) & anacrusis_displacement == "yes")| is.na(anacrusis)) & relocated == "yes" & toneme < -10 ~ "PVII",
       is.na(anacrusis) & (between(body_2,-80,10)|is.na(body)) & multiple_tonic=="no" & !is.na(toneme) & toneme !="NaN"  ~ "PVIII",
      ((between(anacrusis,-10,40) & anacrusis_displacement == "yes")| is.na(anacrusis)) & (between(body_2,-80,10)|is.na(body)) & toneme < -40  ~ "PIX",
      ((between(anacrusis,-10,40) & anacrusis_displacement == "yes")| is.na(anacrusis))& between(toneme, -10, 60) & relocated == "yes" & circumflex %in% c("asc_desc","desc_q3","desc_q2") ~ "PXa",
      ((between(anacrusis,-10,40) & anacrusis_displacement == "yes")| is.na(anacrusis))  & relocated == "yes" & circumflex %in% c("desc_asc","asc_q3","asc_q2","asc_desc_asc") ~ "PXb",
      ((between(anacrusis,-10,40) & anacrusis_displacement == "yes")| is.na(anacrusis)) & between(body_2, -9, 10) & toneme < -10 ~ "PXI",
      ((between(anacrusis,-10,40) & anacrusis_displacement == "yes")| is.na(anacrusis)) & between(body_2, -3, 10) & toneme < -10 & body_structure == "no" ~ "PXIIa",
      is.na(anacrusis) & between(body_2, -3, 10) & toneme < -10 & body_structure == "yes" ~ "PXIIb",
      is.na(anacrusis) & (between(body,-80,10)|is.na(body)) & toneme < -10 & body_structure == "yes" ~ "PXIIc",
       between(body_2, 15, 140) & toneme > 40 ~ "PXIII",
     is.na(toneme) | toneme=="NaN" ~ "undefined_toneme"
    )

  )

# Second round (uncomment below if you want to apply; it gets a more general approach)
 
# MAS <- MAS%>%mutate(
#     
#     pattern= ifelse(is.na(pattern) &relocated=="yes"&anacrusis <0 & toneme<0,"PVII",pattern),
#     pattern= ifelse(is.na(pattern) &anacrusis <0 & inflexion_percent_Hz_from_prev > 60 & toneme<0 & body_2>60 & body<10,"PIII",pattern),
#     pattern = ifelse(is.na(pattern) & anacrusis <0 & body_2< -20 & body < 0 & between(toneme,-10,10), "PI+",pattern),
#     pattern = ifelse(is.na(pattern) &freq_tonic_circumflex >= 1 & between(toneme,-10,10) &multiple_tonic=="yes" & body_2 < -20,"PXIIc",pattern),
#         pattern = ifelse(is.na(pattern) & between(toneme,-15,30) &multiple_tonic=="yes" & body_2 < -10 & between(anacrusis,-15,40) & between(inflexion_percent_Hz_from_prev,-20,30),"PI",pattern),
#      pattern = ifelse(is.na(pattern) & between(toneme,-15,30) & body_2 > 0 & between(anacrusis,-15,40) ,"PIII",pattern),
#     pattern = ifelse(is.na(pattern) & between(toneme,-15,30) & body_2 > 0 & between(anacrusis,-15,70) & relocated=="yes","PXb",pattern),
#     pattern = ifelse(is.na(pattern) & accent=="oxitone"&inflexion_percent_Hz_from_prev < -30,"PIX",pattern),
#     pattern = ifelse(is.na(pattern) & between(inflexion_percent_Hz_from_prev,-5,10) & between(toneme,-5,5) ,"PV",pattern),
#     pattern = ifelse(is.na(pattern)&is.na(anacrusis)&body>10&toneme>10,"PIII",pattern),
#         pattern = ifelse(is.na(pattern)&multiple_tonic=="yes"&body_2 < 0 &toneme < 0,"PIX",pattern),
#     pattern = ifelse(is.na(pattern)&multiple_tonic=="yes"&body_structure=="yes"&body_2 < 0 &toneme > 5,"PXIId",pattern),
#     pattern = ifelse(is.na(pattern) & body_2 < 0 & between(inflexion_percent_Hz_from_prev,-9.9,9.9) & between(toneme,-4.9,4.9),"PV",pattern),
#       pattern = ifelse(is.na(pattern) & body_2 < 0 & between(inflexion_percent_Hz_from_prev,-9.9,9.9) & between(toneme,-4.9,4.9),"PV",pattern),
#     pattern = ifelse(is.na(pattern) & freq_total>4 &body_2 < 0 & inflexion_percent_Hz_from_prev< -20 &toneme>20,"PXI",pattern),
#     pattern = ifelse(is.na(pattern)  &body > 10 & inflexion_percent_Hz_from_prev< -10 &between(toneme,-5,5),"PXIb",pattern),
#     pattern = ifelse(is.na(pattern) &anacrusis > 5 & body_2 < -10 & inflexion_percent_Hz_from_prev< -5 &toneme>30,"PXb",pattern),
#     pattern = ifelse(is.na(pattern) & body_2 > 10 &toneme>30,"PXI",pattern),
#     pattern = ifelse(is.na(pattern) & relocated=="yes" & toneme> 0, "PXI",pattern),
#     pattern = ifelse(is.na(pattern) &toneme < 0 & inflexion_percent_Hz_from_prev>0,"PXa",pattern ),
#     pattern = ifelse(is.na(pattern) &toneme > 0 & inflexion_percent_Hz_from_prev<0,"PXb",pattern ),
#     pattern = ifelse(is.na(pattern) & between(toneme,20,60) & between(body_2,-10,10),"PIII",pattern ),
#      pattern = ifelse(is.na(pattern) & between(toneme,10,20) & inflexion_percent_Hz_from_prev>10,"PIII",pattern ),
#      pattern = ifelse(is.na(pattern) & between(toneme,0,10) & between(body_2,10,100) & inflexion_percent_Hz_from_prev>10,"Xb",pattern ),
#        pattern = ifelse(is.na(pattern) & between(toneme,0,50) & between(body_2,-20,20) ,"Xb",pattern ),
#     pattern= ifelse(is.na(pattern) & between(toneme,-100,30), "PI",pattern),
#     pattern= ifelse(is.na(pattern) & toneme >30, "PII",pattern)
#   )
 
MAS <- MAS %>% mutate( MAS_phonological= ifelse(pattern=="PI","assertion",NA))
MAS <- MAS %>% mutate( MAS_phonological = ifelse(pattern %in% c("PII","PIII","PIVa","PIVb"), "question",MAS_phonological))
MAS <- MAS %>% mutate( MAS_phonological = ifelse( pattern %in% c("PV","PVIa","VIb"), "continuous",MAS_phonological))
MAS <- MAS %>% mutate( MAS_phonological = ifelse(is.na(MAS_phonological),"emphatic",MAS_phonological ))

rm(anacrusis_non_tonic,anacrusis_tonic,body,body_saw,body2,first_tonic,toneme,tonics,percent_treshold)

MAS <- MAS%>%group_by(pattern)%>%mutate(order_pattern_group = row_number())%>%ungroup()

data2<- data2%>%left_join(MAS%>%select(id_ip,pattern,order_pattern_group), by="id_ip")

8.4.5.6 Frequent words anacrusis

anacrusis_words <- MAS%>%group_by(anacrusis_non_tonic_word)%>%summarise(freq=n())%>%filter(freq >1,!is.na(anacrusis_non_tonic_word))%>%arrange(desc(freq))%>%head(n=1)

anacrusis_barplot <-ggplot(anacrusis_words, aes(x=anacrusis_non_tonic_word, y = freq, fill = anacrusis_non_tonic_word)) +
  geom_bar(stat="identity")+theme_minimal()
anacrusis_barplot 

8.4.5.7 Frequent words toneme

toneme_words <- MAS%>%group_by(word)%>%summarise(freq=n())%>%filter(freq >1,!is.na(word))%>%arrange(desc(freq))%>%head(n=1)

anacrusis_barplot <-ggplot(toneme_words, aes(x=word, y = freq, fill = word)) +
  geom_bar(stat="identity")+theme_minimal()
anacrusis_barplot 

8.4.5.8 Examples of patterns

8.4.5.8.1 Pattern I
setDF(data2)
data3 <- data2%>%filter(pattern=="PI",order_pattern_group==1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3q1 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))
data3q4 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))

dataplot <- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse( 
 between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
plot_data <- dataplot
p1 <- ggplot() +
    # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
    # geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
    # geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
    theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")

mytheme <- gridExtra::ttheme_default(
    core = list(fg_params=list(cex = 0.6)),
    colhead = list(fg_params=list(cex = 0.6)),
    rowhead = list(fg_params=list(cex = 0.6)))
  
 p2 <- plot_data %>% 
  mutate(
    Hz = standard_pitch,
    dB = standard_intensity,
    dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange(time_start)%>% select(Hz,dB,dur,content)
 p3 <- t(p2)
 
if (nrow(p2) > 0) {
  table_grob <- tableGrob(p3, theme = mytheme)

} else {
  empty_df <- data.frame (Hz  = "0",
                  dB = "0",
                  dur = "0",
                  content ="0"
                  )
  table_grob <- tableGrob(empty_df, theme = mytheme)
}

# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))

  # pivot_longer(c("Hz","dB","dur"), names_to = "layer", values_to = "label") %>% 
#   ggplot(aes(x = time_center_table)) +
#   geom_text(aes(y = factor(layer, c("Hz","dB","dur")), label = label), size = 1.4) +
#   labs(y = "", x = NULL) +
#  theme_grey()+
#   theme(axis.line = element_blank(), axis.ticks = element_blank(), axis.text.x = element_blank(),
#         panel.grid = element_blank(),legend.text = element_text(size=1), strip.text = element_blank())
# 
# p3 <- p1 + p2 + plot_layout(ncol = 1,heights = c(5,1))
# p3
8.4.5.8.2 Pattern II
setDF(data2)
data3 <- data2%>%filter(pattern=="PII",order_pattern_group==1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3q1 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))
data3q4 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))

dataplot <- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse( 
 between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
plot_data <- dataplot
p1 <- ggplot() +
    # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
    # geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
    # geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
    theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
  
   p2 <- plot_data %>%    mutate(     Hz = standard_pitch,     dB = standard_intensity,     dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
 p3 <- t(p2)
 
if (nrow(p2) > 0) {
  table_grob <- tableGrob(p3, theme = mytheme)

} else {
  empty_df <- data.frame (Hz  = "0",
                  dB = "0",
                  dur = "0",
                  content ="0"
                  )
  table_grob <- tableGrob(empty_df, theme = mytheme)
}

# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))

8.4.5.8.3 Pattern III
setDF(data2)
data3 <- data2%>%filter(pattern=="PIII",order_pattern_group==1,(time_end_ip-time_start_ip)>1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3q1 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q4 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)

dataplot <- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse( 
 between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
plot_data <- dataplot
p1 <- ggplot() +
    # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
    # geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
    # geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
    theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
  
 p2 <- plot_data %>%    mutate(     Hz = standard_pitch,     dB = standard_intensity,     dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
 p3 <- t(p2)
 
if (nrow(p2) > 0) {
  table_grob <- tableGrob(p3, theme = mytheme)

} else {
  empty_df <- data.frame (Hz  = "0",
                  dB = "0",
                  dur = "0",
                  content ="0"
                  )
  table_grob <- tableGrob(empty_df, theme = mytheme)
}

# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))

8.4.5.8.4 Pattern IVa
setDF(data2)
data3 <- data2%>%filter(pattern=="PIVa",order_pattern_group==1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3q1 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q4 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)

dataplot <- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse( 
 between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
plot_data <- dataplot
p1 <- ggplot() +
    # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
    # geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
    # geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
    theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
  
  
 p2 <- plot_data %>%    mutate(     Hz = standard_pitch,     dB = standard_intensity,     dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
 p3 <- t(p2)
 
if (nrow(p2) > 0) {
  table_grob <- tableGrob(p3, theme = mytheme)

} else {
  empty_df <- data.frame (Hz  = "0",
                  dB = "0",
                  dur = "0",
                  content ="0"
                  )
  table_grob <- tableGrob(empty_df, theme = mytheme)
}

# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))

8.4.5.8.5 Pattern IVb
setDF(data2)
data3 <- data2%>%filter(pattern=="PIVb",order_pattern_group==1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3q1 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q4 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)

dataplot <- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse( 
 between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
plot_data <- dataplot
p1 <- ggplot() +
    # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
    # geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
    # geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
    theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
  
  
   p2 <- plot_data %>%    mutate(     Hz = standard_pitch,     dB = standard_intensity,     dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
 p3 <- t(p2)
 
if (nrow(p2) > 0) {
  table_grob <- tableGrob(p3, theme = mytheme)

} else {
  empty_df <- data.frame (Hz  = "0",
                  dB = "0",
                  dur = "0",
                  content ="0"
                  )
  table_grob <- tableGrob(empty_df, theme = mytheme)
}

# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))

8.4.5.8.6 Pattern V
setDF(data2)
data3 <- data2%>%filter(pattern=="PV",order_pattern_group==1,(time_end_ip-time_start_ip)>1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3q1 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q4 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)

dataplot <- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse( 
 between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
plot_data <- dataplot
p1 <- ggplot() +
    # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
    # geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
    # geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
    theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
  
  
   p2 <- plot_data %>%    mutate(     Hz = standard_pitch,     dB = standard_intensity,     dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
 p3 <- t(p2)
 
if (nrow(p2) > 0) {
  table_grob <- tableGrob(p3, theme = mytheme)

} else {
  empty_df <- data.frame (Hz  = "0",
                  dB = "0",
                  dur = "0",
                  content ="0"
                  )
  table_grob <- tableGrob(empty_df, theme = mytheme)
}


# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))

8.4.5.8.7 Pattern VIa
setDF(data2)
data3 <- data2%>%filter(pattern=="PVIa",order_pattern_group==1,(time_end_ip-time_start_ip)>1,(time_end_ip-time_start_ip)>1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3q1 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q4 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)

dataplot <- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse( 
 between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
plot_data <- dataplot
p1 <- ggplot() +
    # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
    # geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
    # geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
    theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
  
  
   p2 <- plot_data %>%    mutate(     Hz = standard_pitch,     dB = standard_intensity,     dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
 p3 <- t(p2)
 
if (nrow(p2) > 0) {
  table_grob <- tableGrob(p3, theme = mytheme)

} else {
  empty_df <- data.frame (Hz  = "0",
                  dB = "0",
                  dur = "0",
                  content ="0"
                  )
  table_grob <- tableGrob(empty_df, theme = mytheme)
}
# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))

8.4.5.8.8 Pattern VIb
setDF(data2)
data3 <- data2%>%filter(pattern=="PVIb",order_pattern_group==1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3q1 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q4 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)

dataplot <- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse( 
 between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
plot_data <- dataplot
p1 <- ggplot() +
    # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
    # geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
    # geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
    theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
  
  
   p2 <- plot_data %>%    mutate(     Hz = standard_pitch,     dB = standard_intensity,     dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
 p3 <- t(p2)
 
if (nrow(p2) > 0) {
  table_grob <- tableGrob(p3, theme = mytheme)

} else {
  empty_df <- data.frame (Hz  = "0",
                  dB = "0",
                  dur = "0",
                  content ="0"
                  )
  table_grob <- tableGrob(empty_df, theme = mytheme)
}
# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))

8.4.5.8.9 Pattern VII
setDF(data2)
data3 <- data2%>%filter(pattern=="PVII",order_pattern_group==1)%>%na.omit()%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3q1 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q4 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)

dataplot <- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse( 
 between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
plot_data <- dataplot
p1 <- ggplot() +
    # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
    # geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
    # geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
    theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
  
  
   p2 <- plot_data %>%    mutate(     Hz = standard_pitch,     dB = standard_intensity,     dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
 p3 <- t(p2)
 
if (nrow(p2) > 0) {
  table_grob <- tableGrob(p3, theme = mytheme)

} else {
  empty_df <- data.frame (Hz  = "0",
                  dB = "0",
                  dur = "0",
                  content ="0"
                  )
  table_grob <- tableGrob(empty_df, theme = mytheme)
}
# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))

8.4.5.8.10 Pattern VIII
setDF(data2)
data3 <- data2%>%filter(pattern=="PVIII",order_pattern_group==1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3q1 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q4 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)

dataplot <- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse( 
 between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
plot_data <- dataplot
p1 <- ggplot() +
    # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
    # geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
    # geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
    theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
  
  
   p2 <- plot_data %>%    mutate(     Hz = standard_pitch,     dB = standard_intensity,     dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
 p3 <- t(p2)
 
if (nrow(p2) > 0) {
  table_grob <- tableGrob(p3, theme = mytheme)

} else {
  empty_df <- data.frame (Hz  = "0",
                  dB = "0",
                  dur = "0",
                  content ="0"
                  )
  table_grob <- tableGrob(empty_df, theme = mytheme)
}
# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))

8.4.5.8.11 Pattern IX
setDF(data2)
data3 <- data2%>%filter(pattern=="PIX",order_pattern_group==1,(time_end_ip-time_start_ip)>1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3q1 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q4 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)

dataplot <- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse( 
 between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
plot_data <- dataplot
p1 <- ggplot() +
    # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
    # geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
    # geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
    theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
  
  
   p2 <- plot_data %>%    mutate(     Hz = standard_pitch,     dB = standard_intensity,     dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
 p3 <- t(p2)
 
if (nrow(p2) > 0) {
  table_grob <- tableGrob(p3, theme = mytheme)

} else {
  empty_df <- data.frame (Hz  = "0",
                  dB = "0",
                  dur = "0",
                  content ="0"
                  )
  table_grob <- tableGrob(empty_df, theme = mytheme)
}
# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))

8.4.5.8.12 Pattern Xa
setDF(data2)
data3 <- data2%>%filter(pattern=="PXa",order_pattern_group==1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3q1 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q4 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)

dataplot <- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse( 
 between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
plot_data <- dataplot
p1 <- ggplot() +
    # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
    # geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
    # geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
    theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
  
  
   p2 <- plot_data %>%    mutate(     Hz = standard_pitch,     dB = standard_intensity,     dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
 p3 <- t(p2)
 
if (nrow(p2) > 0) {
  table_grob <- tableGrob(p3, theme = mytheme)

} else {
  empty_df <- data.frame (Hz  = "0",
                  dB = "0",
                  dur = "0",
                  content ="0"
                  )
  table_grob <- tableGrob(empty_df, theme = mytheme)
}
# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))

8.4.5.8.13 Pattern Xb
setDF(data2)
data3 <- data2%>%filter(pattern=="PXb",order_pattern_group==1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3q1 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q4 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)

dataplot <- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse( 
 between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
plot_data <- dataplot
p1 <- ggplot() +
    # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
    # geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
    # geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
    theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
  
   p2 <- plot_data %>%    mutate(     Hz = standard_pitch,     dB = standard_intensity,     dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
 p3 <- t(p2)
 
if (nrow(p2) > 0) {
  table_grob <- tableGrob(p3, theme = mytheme)

} else {
  empty_df <- data.frame (Hz  = "0",
                  dB = "0",
                  dur = "0",
                  content ="0"
                  )
  table_grob <- tableGrob(empty_df, theme = mytheme)
}
# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))

8.4.5.8.14 Pattern XI
setDF(data2)
data3 <- data2%>%filter(pattern=="PXI",order_pattern_group==1,(time_end_ip-time_start_ip)>1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3q1 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q4 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)

dataplot <- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse( 
 between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
plot_data <- dataplot
p1 <- ggplot() +
    # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
    # geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
    # geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
    theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
  
   p2 <- plot_data %>%    mutate(     Hz = standard_pitch,     dB = standard_intensity,     dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
 p3 <- t(p2)
 
if (nrow(p2) > 0) {
  table_grob <- tableGrob(p3, theme = mytheme)

} else {
  empty_df <- data.frame (Hz  = "0",
                  dB = "0",
                  dur = "0",
                  content ="0"
                  )
  table_grob <- tableGrob(empty_df, theme = mytheme)
}
# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))

8.4.5.8.15 Pattern XIIa
setDF(data2)
data3 <- data2%>%filter(pattern=="PXIIa",order_pattern_group==1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3q1 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q4 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)

dataplot <- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse( 
 between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
plot_data <- dataplot
p1 <- ggplot() +
    # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
    # geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
    # geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
    theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
  
   p2 <- plot_data %>%    mutate(     Hz = standard_pitch,     dB = standard_intensity,     dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
 p3 <- t(p2)
 
if (nrow(p2) > 0) {
  table_grob <- tableGrob(p3, theme = mytheme)

} else {
  empty_df <- data.frame (Hz  = "0",
                  dB = "0",
                  dur = "0",
                  content ="0"
                  )
  table_grob <- tableGrob(empty_df, theme = mytheme)
}
# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))

8.4.5.8.16 Pattern XIIb
setDF(data2)
data3 <- data2%>%filter(pattern=="PXIIb",order_pattern_group==1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3q1 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q4 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)

dataplot <- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse( 
 between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
plot_data <- dataplot
p1 <- ggplot() +
    # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
    # geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
    # geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
    theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
  
   p2 <- plot_data %>%    mutate(     Hz = standard_pitch,     dB = standard_intensity,     dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
 p3 <- t(p2)
 
if (nrow(p2) > 0) {
  table_grob <- tableGrob(p3, theme = mytheme)

} else {
  empty_df <- data.frame (Hz  = "0",
                  dB = "0",
                  dur = "0",
                  content ="0"
                  )
  table_grob <- tableGrob(empty_df, theme = mytheme)
}
# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))

8.4.5.8.17 Pattern XIII
setDF(data2)
data3 <- data2%>%filter(pattern=="PXIII",order_pattern_group==1,(time_end_ip-time_start_ip)>1,(time_end_ip-time_start_ip)>1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3q1 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q4 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)

dataplot <- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse( 
 between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
plot_data <- dataplot
p1 <- ggplot() +
    # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
    # geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
    # geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
    theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
  
   p2 <- plot_data %>%    mutate(     Hz = standard_pitch,     dB = standard_intensity,     dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
 p3 <- t(p2)
 
if (nrow(p2) > 0) {
  table_grob <- tableGrob(p3, theme = mytheme)

} else {
  empty_df <- data.frame (Hz  = "0",
                  dB = "0",
                  dur = "0",
                  content ="0"
                  )
  table_grob <- tableGrob(empty_df, theme = mytheme)
}
# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))

8.4.6 TOBI (Tones Break Indices)

percent_treshold_ST <- 1.2

TOBI <- data2%>%filter(tonic=="yes")%>%mutate(center_ST = 12*log2(center_Hz/1),
circumflex_st =  
        case_when(
q1piHz_q2piHZ_ST <= -percent_treshold_ST & q2piHz_q3piHZ_ST >= percent_treshold_ST & q3piHz_q4piHZ_ST <= -percent_treshold_ST ~ "LHL",
q1piHz_q2piHZ_ST >= percent_treshold_ST & q2piHz_q3piHZ_ST <= -percent_treshold_ST & q3piHz_q4piHZ_ST >= percent_treshold_ST ~ "HLH",
q1piHz_q2piHZ_ST >= percent_treshold_ST & between(q2piHz_q3piHZ_ST,-percent_treshold_ST,percent_treshold_ST) & q3piHz_q4piHZ_ST <= -percent_treshold_ST ~ "HL",
q1piHz_q2piHZ_ST >= percent_treshold_ST & q2piHz_q3piHZ_ST <= -percent_treshold_ST & q3piHz_q4piHZ_ST <= -percent_treshold_ST ~ "HL",
q1piHz_q2piHZ_ST >= percent_treshold_ST & between(q2piHz_q3piHZ_ST,-percent_treshold_ST,percent_treshold_ST) & between(q3piHz_q4piHZ_ST,-percent_treshold_ST,percent_treshold_ST) ~ "H",
 between(q1piHz_q2piHZ_ST,-percent_treshold_ST,percent_treshold_ST)&  q2piHz_q3piHZ_ST >= percent_treshold_ST &between(q3piHz_q4piHZ_ST,-percent_treshold_ST,percent_treshold_ST) ~ "H",
 between(q1piHz_q2piHZ_ST,-percent_treshold_ST,percent_treshold_ST)& between(q2piHz_q3piHZ_ST,-percent_treshold_ST,percent_treshold_ST) & q3piHz_q4piHZ_ST >= percent_treshold_ST ~ "H",
q1piHz_q2piHZ_ST <= -percent_treshold_ST & between(q2piHz_q3piHZ_ST,-percent_treshold_ST,percent_treshold_ST) & between(q3piHz_q4piHZ_ST,-percent_treshold_ST,percent_treshold_ST) ~ "L",
q1piHz_q2piHZ_ST <= -percent_treshold_ST & between(q2piHz_q3piHZ_ST,-percent_treshold_ST,percent_treshold_ST) & q3piHz_q4piHZ_ST <= -percent_treshold_ST ~ "L>",
 between(q1piHz_q2piHZ_ST,-percent_treshold_ST,percent_treshold_ST)&  q2piHz_q3piHZ_ST <= -percent_treshold_ST &between(q3piHz_q4piHZ_ST,-percent_treshold_ST,percent_treshold_ST) ~ "L",
 between(q1piHz_q2piHZ_ST,-percent_treshold_ST,percent_treshold_ST)& between(q2piHz_q3piHZ_ST,-percent_treshold_ST,percent_treshold_ST) & q3piHz_q4piHZ_ST <= -percent_treshold_ST ~ "L",
 between(q1piHz_q2piHZ_ST,-percent_treshold_ST,percent_treshold_ST) & q2piHz_q3piHZ_ST >= percent_treshold_ST & q3piHz_q4piHZ_ST >= percent_treshold_ST ~ "H>",
q1piHz_q2piHZ_ST >= percent_treshold_ST & q2piHz_q3piHZ_ST >= percent_treshold_ST & between(q3piHz_q4piHZ_ST,-percent_treshold_ST,percent_treshold_ST) ~ "H>",
q1piHz_q2piHZ_ST >= percent_treshold_ST & q2piHz_q3piHZ_ST >= percent_treshold_ST & q3piHz_q4piHZ_ST >= percent_treshold_ST ~"H>",
q1piHz_q2piHZ_ST <= -percent_treshold_ST & q2piHz_q3piHZ_ST <= -percent_treshold_ST & q3piHz_q4piHZ_ST <= -percent_treshold_ST ~"L>",
q1piHz_q2piHZ_ST <= -percent_treshold_ST & q2piHz_q3piHZ_ST <= -percent_treshold_ST & between(q3piHz_q4piHZ_ST, -percent_treshold_ST,percent_treshold_ST) ~"L>",
between(q1piHz_q2piHZ_ST, -percent_treshold_ST,percent_treshold_ST) & between(q2piHz_q3piHZ_ST, -percent_treshold_ST,percent_treshold_ST)&between(q3piHz_q4piHZ_ST, -percent_treshold_ST,percent_treshold_ST) ~ "no_circumflex",
is.na(center_Hz) ~"non_applicable"))

TOBI <- TOBI%>%mutate(
  
TOBI_pattern=  
        case_when(.default="unchanged",
          
          inflexion_ST_from_prev < -percent_treshold_ST & between(inflexion_ST_to_next,-percent_treshold_ST,percent_treshold_ST)  ~ "H+L*",
          inflexion_ST_from_prev > percent_treshold_ST & between(inflexion_ST_to_next,-percent_treshold_ST,percent_treshold_ST)  ~ "L+H*",
          inflexion_ST_to_next < -percent_treshold_ST & between(inflexion_ST_from_prev,-percent_treshold_ST,percent_treshold_ST)  ~ "H*+L",
          inflexion_ST_to_next > percent_treshold_ST & between(inflexion_ST_from_prev,-percent_treshold_ST,percent_treshold_ST)  ~ "L*+H",
inflexion_ST_from_prev < -percent_treshold_ST & inflexion_ST_to_next > percent_treshold_ST &MAS_structure=="toneme"  ~ "H+L*H%",
inflexion_ST_from_prev > percent_treshold_ST & inflexion_ST_to_next < -percent_treshold_ST&MAS_structure=="toneme"  ~ "L+H*L%",
inflexion_ST_from_prev > percent_treshold_ST & inflexion_ST_to_next > -percent_treshold_ST &MAS_structure=="toneme" ~ "L+H*H%",
inflexion_ST_from_prev < -percent_treshold_ST & inflexion_ST_to_next < -percent_treshold_ST &MAS_structure=="toneme" ~ "H+L*L%",
inflexion_ST_from_prev < -percent_treshold_ST & inflexion_ST_to_next > percent_treshold_ST &MAS_structure!="toneme"  ~ "H+L*H-",
inflexion_ST_from_prev > percent_treshold_ST & inflexion_ST_to_next < -percent_treshold_ST&MAS_structure!="toneme"  ~ "L+H*L-",
inflexion_ST_from_prev > percent_treshold_ST & inflexion_ST_to_next > -percent_treshold_ST &MAS_structure!="toneme" ~ "L+H*H-",
inflexion_ST_from_prev < -percent_treshold_ST & inflexion_ST_to_next < -percent_treshold_ST &MAS_structure!="toneme" ~ "H+L*L-"
),TOBI_pattern= ifelse(is.na(TOBI_pattern)& !is.na(circumflex_st)&circumflex_st!="no_circumflex", circumflex_st, TOBI_pattern))


data2<- data2%>%left_join(TOBI%>%select(id_word,TOBI_pattern),by="id_word")

8.4.6.1 Results

TOBI_barplot <- TOBI%>%filter(MAS_structure=="toneme",tonic=="yes")%>%group_by(TOBI_pattern)%>%summarise(freq=n())
p_barplot <-ggplot(TOBI_barplot, aes(x=TOBI_pattern, y = freq, fill = TOBI_pattern)) +
  geom_bar(stat="identity")+theme_minimal()
p_barplot 

8.4.6.2 Comparing TOBI and MAS toneme patterns

TOBI_MAS <- data2%>%filter(MAS_structure=="toneme"&tonic=="yes")%>%select(pattern,TOBI_pattern)%>%filter(!is.na(pattern),!is.na(TOBI_pattern),pattern!="undefined_toneme")

p2_barplot <-ggplot(TOBI_MAS, aes(x=TOBI_pattern, fill = pattern)) +
  geom_bar(stat="count")+theme_minimal()
p2_barplot 

8.5 Ips

ips <- words %>%arrange(filename,time_start)%>% mutate(pause = lead(time_start) - time_end,changeip = ifelse(pause > 0.15 | lead(filename)!=filename, "change", NA)) %>% group_by(filename, spk, changeip) 
# %>%
#           mutate(id_ip = row_number(),
#                  id_ip = ifelse(
#                    changeip != "change",
#                    NA,
#                    paste(spk, time_start, time_end, id_ip, sep = "_")
#                  )) %>%ungroup()
#         ips <- ips%>% fill(id_ip, .direction = "up")
        ips <- ips %>% group_by(id_ip) %>%
          mutate(
            time_start_ip = first(time_start_word, na.rm = TRUE),
            time_end_ip = last(time_end_word, na.rm = TRUE), dur= time_end_ip  - time_start_ip,
  s_rate = n_distinct(id_word)/dur
          ) %>% ungroup() %>%arrange(filename,time_start_ip)
        
ips <- ips%>% group_by(filename,spk, id_word) %>% summarise(tier_name=first(tier_name),time_start=first(time_start_ip),time_end=last(time_end_ip),id_ip = first(id_ip), word =first(content)) %>% ungroup()
        ips <- ips%>%arrange(filename,time_start) %>% group_by(filename, id_ip) %>% mutate(content = paste(word, collapse = " ")) %>%ungroup()
        ips <- ips %>% group_by(filename, id_ip) %>% summarise(time_start=first(time_start),time_end=first(time_end),content=first(content),spk=first(spk),tier_name=first(tier_name))%>%ungroup()%>%mutate(time_start=as.numeric(time_start))%>%arrange(filename,time_start)
      ips <- ips%>%mutate(time_start= ifelse(time_start>time_end,time_end,time_start))

8.5.1 Ips with POS tagging

ips <- words 
# %>%arrange(filename,time_start)%>% mutate(pause = lead(time_start) - time_end,changeip = ifelse(pause > 0.15 | lead(filename)!=filename, "change", NA)) 
# %>% group_by(filename, spk, changeip) 
# %>%
#           mutate(id_ip = row_number(),
#                  id_ip = ifelse(
#                    changeip != "change",
#                    NA,
#                    paste(spk, time_start, time_end, id_ip, sep = "_")
#                  )) %>%ungroup()
#         ips <- ips%>% fill(id_ip, .direction = "up")
        ips <- ips %>% group_by(id_ip) %>%
          mutate(
            time_start_ip = first(time_start_word, na.rm = TRUE),
            time_end_ip = last(time_end_word, na.rm = TRUE)
          ) %>% ungroup() %>%arrange(filename,time_start_ip)
ips <- ips%>% group_by(filename,spk, id_word) %>% summarise(tier_name=first(tier_name),time_start=first(time_start_ip),time_end=last(time_end_ip),id_ip = first(id_ip), word =first(content),upos=first(upos)) %>% ungroup()
        ips <- ips%>%arrange(filename,time_start) %>% group_by(filename, id_ip) %>% mutate(content = paste(word, collapse = " ")) %>%ungroup()
        ips <- ips %>% group_by(filename, id_ip) %>% summarise(time_start=first(time_start),time_end=first(time_end),content=first(content),spk=first(spk),tier_name=first(tier_name),qnoun = sum(upos=="NOUN",na.rm = TRUE),qverb = sum(upos=="VERB",na.rm = TRUE),qadj = sum(upos=="ADJ",na.rm = TRUE),qsconj = sum(upos=="SCONJ",na.rm = TRUE),qcconj = sum(upos=="CCONJ",na.rm = TRUE),qadp = sum(upos=="ADP",na.rm = TRUE),qadv = sum(upos=="ADV",na.rm = TRUE),qintj = sum(upos=="INTJ",na.rm = TRUE),qaux = sum(upos=="AUX",na.rm = TRUE),qdet = sum(upos=="DET,na.rm = TRUE"),qpron = sum(upos=="PRON",na.rm = TRUE))%>%ungroup()%>%mutate(time_start=as.numeric(time_start))%>%arrange(filename,time_start)
      ips <- ips%>%mutate(time_start= ifelse(time_start>time_end,time_end,time_start))

8.5.2 Ips with sentiment tagging

ips_sentiment <- words

# %>%arrange(filename,time_start)%>% mutate(pause = lead(time_start) - time_end,changeip = ifelse(pause > 0.15 | lead(filename)!=filename, "change", NA)) %>% group_by(filename, spk, changeip) 
# %>%
#           mutate(id_ip = row_number(),
#                  id_ip = ifelse(
#                    changeip != "change",
#                    NA,
#                    paste(spk, time_start, time_end, id_ip, sep = "_")
#                  )) %>%ungroup()
#         ips_sentiment <- ips_sentiment%>% fill(id_ip, .direction = "up")
        ips_sentiment <- ips_sentiment %>% group_by(id_ip) %>%
          mutate(
            time_start_ip = first(time_start_word, na.rm = TRUE),
            time_end_ip = last(time_end_word, na.rm = TRUE)
          ) %>% ungroup() %>%arrange(filename,time_start_ip)
ips_sentiment <- ips_sentiment%>% group_by(filename,spk, id_word) %>% summarise(tier_name=first(tier_name),time_start=first(time_start_ip),time_end=last(time_end_ip),id_ip = first(id_ip), word =first(content),sentiment=first(sentiment)) %>% ungroup()
        ips_sentiment <- ips_sentiment%>%arrange(filename,time_start) %>% group_by(filename, id_ip) %>% mutate(content = paste(word, collapse = " ")) %>%ungroup()

ips_sentiment <- ips_sentiment %>% group_by(filename, id_ip) %>% summarise(time_start=first(time_start),time_end=first(time_end),content=first(content),spk=first(spk),tier_name=first(tier_name),qfear = sum(sentiment=="miedo",na.rm = TRUE),qsurprise = sum(sentiment=="sorpresa",na.rm = TRUE),qdisgust = sum(sentiment=="asco",na.rm = TRUE),qjoy = sum(sentiment=="alegría",na.rm = TRUE),qanticip = sum(sentiment=="anticipación",na.rm = TRUE))%>%ungroup()%>%mutate(time_start=as.numeric(time_start))%>%arrange(filename,time_start)
ips_sentiment <- ips_sentiment%>%mutate(time_start= ifelse(time_start>time_end,time_end,time_start))

8.5.3 Ips with prosodic values

ips_MAS <- MAS %>%group_by(id_ip)%>%summarise(

  toneme = mean(toneme,na.rm=TRUE),
  anacrusis = mean(anacrusis, na.rm=TRUE),
  body = mean(body,na.rm=TRUE),
  MAS_pattern = first(pattern)
  
)%>%ungroup()

ips_TOBI <- TOBI %>%filter(MAS_structure=="toneme")%>%group_by(id_ip,TOBI_pattern)%>%summarise(
  freq = n()
  
)%>%ungroup()%>%rename(TOBI=TOBI_pattern)

ips_prosody <- data2 %>%group_by(id_ip)%>% summarise(
  intensity = mean(intensity_mean_dB),
  pitch_reset_from_ST = mean(inflexion_ST_from_prev,na.rm = TRUE),
  pitch_reset_to_ST = mean(inflexion_ST_to_next,na.rm = TRUE),
  pitch_Hz = mean(center_Hz,na.rm = TRUE),
  pitch_St = mean(12*log2(center_Hz/1),na.rm = TRUE)
  
)

8.5.4 Complete ending unit

ips <- ips %>%left_join(ips_sentiment%>%select(id_ip,qjoy,qfear,qsurprise,qdisgust,qanticip),by="id_ip")%>%ungroup()
ips <- ips %>% left_join(ips_prosody%>%select(id_ip,pitch_Hz,pitch_St,intensity,pitch_reset_from_ST,pitch_reset_to_ST), by="id_ip")
ips <- ips %>% left_join(ips_MAS%>%select(id_ip,anacrusis,body,toneme,MAS_pattern), by="id_ip")%>%mutate_if(is.numeric,round,2)
ips <- ips %>% left_join(ips_TOBI%>%select(id_ip,TOBI), by="id_ip")
ips <- ips %>% mutate(time_start_ip = time_start, time_end_ip = time_end)

rm(ips_MAS,ips_prosody,ips_sentiment,ips_TOBI,TOBI)

8.6 Turns

ips <-  ips
turnsdb <- ips %>%arrange(filename,time_start)
turnsdb$change <- ifelse(turnsdb$filename == lead(turnsdb$filename,1) & turnsdb$tier_name != lead(turnsdb$tier_name,1) ,turnsdb$id_turn <- "change", 
                                        ifelse(turnsdb$filename != lead(turnsdb$filename,1),
                                               turnsdb$id_turn <- "change",
                                               turnsdb$id_turn <- NA))
      turnsdb <-  turnsdb %>% group_by(filename,change) %>% mutate(id_turn = row_number())
      turnsdb$id_turn <- ifelse(turnsdb$change != "change", turnsdb$id_turn <- NA, turnsdb$id_turn <- turnsdb$id_turn)
      
      # Orden de las turns
      
      turnsdb <-  turnsdb %>% group_by(filename) %>% 
        fill(id_turn, .direction = "up") #default direction down
      
      # time_start de las turns
      
      turnsdb$time_start_int <- ifelse(lag(turnsdb$change == "change",1) | turnsdb$id_turn == 1, turnsdb$time_start_int <- turnsdb$time_start, turnsdb$time_start_int <- NA)
      turnsdb <-  turnsdb %>% group_by(filename) %>% 
        fill(time_start_int, .direction = "down") #default direction down
      
      # time_end de las turns
      
      turnsdb$time_end_int <- ifelse(turnsdb$change == "change" | turnsdb$id_turn == 1, turnsdb$time_end_int <- turnsdb$time_end, turnsdb$time_end_int <- NA)
      turnsdb <-  turnsdb %>% group_by(filename) %>% 
        fill(time_end_int, .direction = "up") #default direction down
      
      # Duración de las turns
      
      turnsdb$dur_int <- turnsdb$time_end_int - turnsdb$time_start_int
      
      turnsdb <-  turnsdb %>% group_by(filename,id_turn) %>% mutate(ordgeint = row_number())
      
      #Creación de las pauses
      
      turnsdb$pause <- ifelse(turnsdb$filename == lead(turnsdb$filename,1) & turnsdb$tier_name == lead(turnsdb$tier_name,1), lead(turnsdb$time_start,1)-turnsdb$time_end,turnsdb$pause <-NA)
      turnsdb$pausecod <- ifelse(turnsdb$pause<500, turnsdb$pausecod <- "/", 
                                          ifelse(turnsdb$pause>=501 & turnsdb$pause <= 1000, turnsdb$pausecod <- "//", 
                                                 ifelse(turnsdb$pause>1001,turnsdb$pausecod <- paste("/// (", round((turnsdb$pause*0.001),1),")",sep = ""),turnsdb$pausecod <-NA))) 
      turnsdb$pausecod[is.na(turnsdb$pausecod)] <- " "
      
      #Creación del valor de fto
      
      turnsdb<- turnsdb%>%ungroup()%>%mutate(fto_post = ifelse((filename == lead(filename,1) 
                                                                                  & tier_name != lead(tier_name,1)), 
                                                                                 (lead(time_start,1)-time_end),NA),
                                                               fto_post_cod = ifelse(fto_post>0 & fto_post<50,"§",NA),
                                                               fto_pre = ifelse((filename == lag(filename,1) 
                                                                                 & tier_name != lag(tier_name,1)), 
                                                                                (time_start-lag(time_end,1)),NA),
                                                               fto_prev_cod = ifelse(fto_pre>0 & fto_pre<50,"§",NA))
      
      turnsdb <- turnsdb%>%mutate(content1 = ifelse(!is.na(fto_post_cod)|!is.na(fto_prev_cod),paste(fto_prev_cod,content,pausecod,fto_post_cod, sep = ""), content),
                                                    content1 = gsub("^NA","",content1),
                                                    content1 = gsub("NA$","",content1),
                                                    content1 = ifelse(!is.na(pausecod),paste(content1,pausecod, sep = ""), content1),
                                                    content1 = gsub("NA$","",content1))
      
      
      #Eliminación de los espacios iniciales
      
      turnsdb$content1 <- trimws(turnsdb$content1)
      
      #Creación de la variable "intervención" y de la base de datos "turns"
      
      turnsdb <- turnsdb %>% 
        group_by(filename,id_turn) %>% 
        mutate(intervencion = paste0(content1, collapse = " "))
      turnsdb$intervencion <- ifelse(turnsdb$change !="change", turnsdb$intervencion <- NA, turnsdb$intervencion <-turnsdb$intervencion)
      turnsdb <- turnsdb %>% mutate(id_turn = paste0(spk,"_int_",time_start_int, sep=""),grupo_id=id_ip)
      turnsdb <-  turnsdb %>% group_by(filename) %>% 
        fill(intervencion, .direction = "up") 
      
turns <- turnsdb%>%filter(change == "change")%>% select(tier_name,spk,filename,time_end_int,time_start_int,intervencion)
      turns <- turns %>% mutate(id_turn = paste(spk,"_int_",time_start_int, sep=""))
      turns$intervencion_export <- paste(turns$spk,": ",turns$intervencion,sep = "")
      turns_export <- subset(turns, select=c(id_turn,time_start_int,intervencion_export,filename))
      turns_export <- turns_export %>% mutate(inicio = format(as.POSIXct(time_start_int , "UTC", origin = "1970-01-01"), "%H:%M:%OS2"))%>%select(inicio,intervencion_export,filename)

9 Exporting documents

9.1 Save to database

If you generate new data, you can also add this data to the database without overwriting it. If you want to save generated data to a database, change “eval: false” to “eval: true”.

conn <-  duckdb::dbConnect(duckdb(), "oralstats.duckdb")


if (exists("oralstats.duckdb")) {

# Check if the table exists in the database
if ("oralstats_phones" %in% dbListTables(conn) &"oralstats_words" %in% dbListTables(conn)&"oralstats_ips" %in% dbListTables(conn)&"oralstats_turns" %in% dbListTables(conn)&"oralstats_prosody" %in% dbListTables(conn)) {
      # Identify new rows to be added
      new_rows <- anti_join(as_tibble(data2), dbReadTable(conn, "oralstats_phones"), by = "id_phone")
      new_rows1 <- anti_join(as_tibble(ips), dbReadTable(conn, "oralstats_ips"), by = "id_ip")
      new_rows2 <- anti_join(as_tibble(words), dbReadTable(conn, "oralstats_words"), by = "id_word")
      new_rows3 <- anti_join(as_tibble(turns), dbReadTable(conn, "oralstats_turns"), by = "id_turn")
      new_rows4 <- anti_join(as_tibble(prosody), dbReadTable(conn, "oralstats_prosody"), by = "filename")
      
      # Add new rows to the existing table
if (nrow(new_rows) > 0 & nrow(new_rows1) > 0 & nrow(new_rows2) > 0 & nrow(new_rows3) > 0 & nrow(new_rows4) > 0) {
        dbWriteTable(conn, "oralstats_phones", new_rows, append = TRUE, row.names = FALSE)
        dbWriteTable(conn, "oralstats_ips", new_rows1, append = TRUE, row.names = FALSE)
        dbWriteTable(conn, "oralstats_words", new_rows2, append = TRUE, row.names = FALSE)
        dbWriteTable(conn, "oralstats_turns", new_rows3, append = TRUE, row.names = FALSE)
        dbWriteTable(conn, "oralstats_prosody", new_rows4, append = TRUE, row.names = FALSE)
        dbDisconnect(conn, shutdown = TRUE)
        
      }
    }} else {
      # If table doesn't exist, create it and upload the entire data frame
      duckdb::dbWriteTable(conn, "oralstats_phones", as_tibble(data2), row.names = FALSE,append = TRUE)
      duckdb::dbWriteTable(conn, "oralstats_ips", as_tibble(ips), row.names = FALSE,append = TRUE)
      duckdb::dbWriteTable(conn, "oralstats_words", as_tibble(words), row.names = FALSE,append = TRUE)
      duckdb::dbWriteTable(conn, "oralstats_turns", as_tibble(turns), row.names = FALSE,append = TRUE)
      duckdb::dbWriteTable(conn, "oralstats_prosody", as_tibble(prosody), row.names = FALSE,append = TRUE)
      
      DBI::dbGetQuery(conn, "CREATE UNIQUE INDEX idx_id_phone ON oralstats_phones(id_phone)")
      DBI::dbGetQuery(conn, "CREATE UNIQUE INDEX idx_id_ip ON oralstats_ips(id_ip)")
      DBI::dbGetQuery(conn, "CREATE UNIQUE INDEX idx_id_word ON oralstats_words(id_word)")
      DBI::dbGetQuery(conn, "CREATE UNIQUE INDEX idx_id_turn ON oralstats_turns(id_turn)")
      DBI::dbGetQuery(conn, "CREATE UNIQUE INDEX idx_id_prosody ON oralstats_prosody(id_prosody)")
      dbDisconnect(conn, shutdown = TRUE)
    }

9.2 Save to external files

Oralstats will generate copy for all the units processed (ips, words, vowels and turns), in multiple format files: csv and rds. A time reference will be added to the filenames to know exactly when were these files generated.

write_rds(ips,paste("outputs/rds/ips_",Sys.time(),".rds",sep = ""))
write_rds(data2,paste("outputs/rds/data2_",Sys.time(),".rds",sep = ""))
write_rds(turns,paste("outputs/rds/turns_",Sys.time(),".rds",sep = ""))
write_rds(vowels,paste("outputs/rds/vowels_",Sys.time(),".rds",sep = ""))
write_rds(words,paste("outputs/rds/words_",Sys.time(),".rds",sep = ""))
write_rds(prosody,paste("outputs/rds/prosody_",Sys.time(),".rds",sep = ""))

write_csv2(ips,paste("outputs/csv/ips_",Sys.time(),".csv",sep = ""))
write_csv2(data2,paste("outputs/csv/data2_",Sys.time(),".csv",sep = ""))
write_csv2(turns,paste("outputs/csv/turns_",Sys.time(),".csv",sep = ""))
write_csv2(vowels,paste("outputs/csv/vowels_",Sys.time(),".csv",sep = ""))
write_csv2(words,paste("outputs/csv/words_",Sys.time(),".csv",sep = ""))
write_csv2(prosody,paste("outputs/csv/prosody_",Sys.time(),".csv",sep = ""))

9.3 Create textgrids

With this section, Oralstats will generate textgrids with tiers including vowels, and also its corresponding TOBI and MAS tags.

# 
# pauses <- words %>%group_by(filename,spk)%>%mutate(time_start_word=time_end_word,time_end_word=time_end_word+pause, content=NA)%>%filter(changeip=="change")%>%ungroup()
# 
# upos <- words %>%mutate(tier_name = paste0("upos_",tier_name),content=upos)
# 
# pauses_u <- words %>%group_by(filename,spk)%>%mutate(time_start_word=time_end_word,time_end_word=time_end_word+pause, content=NA,tier_name = paste0("upos_",tier_name))%>%filter(changeip=="change")%>%ungroup()
# 
# upos_pauses <- rbind(upos,pauses_u)%>%arrange(filename,time_start_word)%>%group_by(filename,spk)%>%slice(if(n() > 1)-n() else row_number())%>%ungroup()
# 
# words_pauses <- rbind(words,pauses)%>%arrange(filename,time_start_word)%>%group_by(filename,spk)%>%slice(if(n() > 1)-n() else row_number())%>%ungroup()
# 
# complete <- rbind(words_pauses,upos_pauses)%>%arrange(filename,time_start_word)
# 
# generate_textgrid_file <- function(filename, df) {
#   # Subset dataframe for the current filename
#   df_subset <- df
#   
#   # Set xmin and xmax
#   xmin <- min(df_subset$time_start_word)
#   xmax <- max(df_subset$time_end_word)
#   
#   # Open a file connection to write TextGrid format
#   file_conn <- file(paste0(filename, ".TextGrid"), "w")
#   
#   # Write TextGrid header
#   cat("File type = \"ooTextFile\"\n", file = file_conn)
#   cat("Object class = \"TextGrid\"\n\n", file = file_conn)
#   
#   cat("xmin = ", xmin, "\n", file = file_conn)
#   cat("xmax = ", xmax, "\n", file = file_conn)
#   cat("tiers? <exists>\n", file = file_conn)
#   
#   # Get unique tier names
#   unique_tiers <- unique(df_subset$tier_name)
#   cat("size = ", length(unique_tiers), "\n", file = file_conn)
#   
#   # Loop through unique tiers
#   for (tier_name in unique_tiers) {
#     cat("\nitem []:\n", file = file_conn)
#     cat("    item [1]:\n", file = file_conn)
#     cat("        class = \"IntervalTier\"\n", file = file_conn)
#     cat("        name = \"", tier_name, "\"\n", sep = "", file = file_conn)
#     cat("        xmin = ", xmin, "\n", file = file_conn)
#     cat("        xmax = ", xmax, "\n", file = file_conn)
#     
#     # Subset dataframe for the current tier
#     tier_df <- df_subset[df_subset$tier_name == tier_name, ]
#     
#     cat("        intervals: size = ", nrow(tier_df), "\n", file = file_conn)
#     
#     # Write intervals
#     for (i in 1:nrow(tier_df)) {
#       cat("        intervals [",i,"]:\n", file = file_conn,sep = "")
#       cat("            xmin = ", tier_df$time_start_word[i], "\n", file = file_conn)
#       cat("            xmax = ", tier_df$time_end_word[i], "\n", file = file_conn)
#       cat("            text = \"", tier_df$content[i], "\"\n", sep = "", file = file_conn)
#     }
#   }
#   
#   # Close the file connection
#   close(file_conn)
# }
# words_pauses %>%group_by(filename)
# complete %>%group_by(filename) %>% group_walk(~ generate_textgrid_file(.y$filename, .x))

9.4 Create textgrids MAS & TOBI

# generate_textgrid_file2 <- function(filename, df) {
#   # Subset dataframe for the current filename
#   df_subset <- df
#   
#   # Set xmin and xmax
#   xmin <- min(df_subset$time_start)
#   xmax <- max(df_subset$time_end)
#   
#   # Open a file connection to write TextGrid format
#   file_conn <- file(paste0(filename, ".TextGrid"), "w")
#   
#   # Write TextGrid header
#   cat("File type = \"ooTextFile\"\n", file = file_conn)
#   cat("Object class = \"TextGrid\"\n\n", file = file_conn)
#   
#   cat("xmin = ", xmin, "\n", file = file_conn)
#   cat("xmax = ", xmax, "\n", file = file_conn)
#   cat("tiers? <exists>\n", file = file_conn)
#   
#   # Get unique tier names
#   unique_tiers <- unique(df_subset$tier_name)
#   cat("size = ", length(unique_tiers), "\n", file = file_conn)
#   
#   # Loop through unique tiers
#   for (tier_name in unique_tiers) {
#     cat("\nitem []:\n", file = file_conn)
#     cat("    item [1]:\n", file = file_conn)
#     cat("        class = \"IntervalTier\"\n", file = file_conn)
#     cat("        name = \"", tier_name, "\"\n", sep = "", file = file_conn)
#     cat("        xmin = ", xmin, "\n", file = file_conn)
#     cat("        xmax = ", xmax, "\n", file = file_conn)
#     
#     # Subset dataframe for the current tier
#     tier_df <- df_subset[df_subset$tier_name == tier_name, ]
#     
#     cat("        intervals: size = ", nrow(tier_df), "\n", file = file_conn)
#     
#     # Write intervals
#     for (i in 1:nrow(tier_df)) {
#       cat("        intervals [",i,"]:\n", file = file_conn,sep = "")
#       cat("            xmin = ", tier_df$time_start[i], "\n", file = file_conn)
#       cat("            xmax = ", tier_df$time_end[i], "\n", file = file_conn)
#       cat("            text = \"", tier_df$content[i], "\"\n", sep = "", file = file_conn)
#     }
#   }
#   
#   # Close the file connection
#   close(file_conn)
# }

folder_path <- "outputs/textgrids/"
generate_textgrid_file2 <- function(filename, df, folder_path) {
  # Subset dataframe for the current filename
  df_subset <- df
  
  # Set xmin and xmax
  xmin <- min(df_subset$time_start)
  xmax <- max(df_subset$time_end)
  
  # Construct the full file path
  file_path <- file.path(folder_path, paste0(filename, ".TextGrid"))
  
  # Open a file connection to write TextGrid format
  file_conn <- file(file_path, "w")
  
  # Write TextGrid header
  cat("File type = \"ooTextFile\"\n", file = file_conn)
  cat("Object class = \"TextGrid\"\n\n", file = file_conn)
  
  cat("xmin = ", xmin, "\n", file = file_conn)
  cat("xmax = ", xmax, "\n", file = file_conn)
  cat("tiers? <exists>\n", file = file_conn)
  
  # Get unique tier names
  unique_tiers <- unique(df_subset$tier_name)
  cat("size = ", length(unique_tiers), "\n", file = file_conn)
  
  # Loop through unique tiers
  for (tier_name in unique_tiers) {
    cat("\nitem []:\n", file = file_conn)
    cat("    item [1]:\n", file = file_conn)
    cat("        class = \"IntervalTier\"\n", file = file_conn)
    cat("        name = \"", tier_name, "\"\n", sep = "", file = file_conn)
    cat("        xmin = ", xmin, "\n", file = file_conn)
    cat("        xmax = ", xmax, "\n", file = file_conn)
    
    # Subset dataframe for the current tier
    tier_df <- df_subset[df_subset$tier_name == tier_name, ]
    
    cat("        intervals: size = ", nrow(tier_df), "\n", file = file_conn)
    
    # Write intervals
    for (i in 1:nrow(tier_df)) {
      cat("        intervals [",i,"]:\n", file = file_conn,sep = "")
      cat("            xmin = ", tier_df$time_start[i], "\n", file = file_conn)
      cat("            xmax = ", tier_df$time_end[i], "\n", file = file_conn)
      cat("            text = \"", tier_df$content[i], "\"\n", sep = "", file = file_conn)
    }
  }
  
  # Close the file connection
  close(file_conn)
}

phones_tg <- phones%>%mutate(pause= ifelse((lead(time_start)-time_end) > 0.01 ,(lead(time_start)-time_end),NA),tier_name=spk,MAS_structure=NA,id_ip=NA)%>%select(filename,tier_name,spk,time_start,time_end, id_ip,content,pause,MAS_structure)

phones_pauses <- phones_tg %>%group_by(filename,spk)%>%mutate(time_start=time_end,time_end=time_end+pause, content=NA)%>%filter(pause>0.01)%>%ungroup()

phones_pauses2 <- rbind(phones_tg,phones_pauses)%>%arrange(filename,time_start)%>%group_by(filename,spk)%>%slice(if(n() > 1)-n() else row_number())%>%ungroup()

words_tg <- data2%>%mutate(tier_name=paste(spk,"_2_w",sep=""))%>%group_by(filename,tier_name,spk,id_word)%>%summarise(time_start=first(time_start_word),time_end=last(time_end_word),content=first(word))%>%ungroup()%>%mutate(pause= ifelse((lead(time_start)-time_end) > 0.01 ,(lead(time_start)-time_end),NA),MAS_structure=NA,id_ip=NA)%>%select(filename,tier_name,spk,time_start,time_end, id_ip,content,pause,MAS_structure)

words_pauses <- words_tg %>%group_by(filename,spk)%>%mutate(time_start=time_end,time_end=time_end+pause, content=NA)%>%filter(pause>0.01)%>%ungroup()

words_pauses2 <- rbind(words_tg,words_pauses)%>%arrange(filename,time_start)%>%group_by(filename,spk)%>%slice(if(n() > 1)-n() else row_number())%>%ungroup()

TOBI_tg <- data2%>%mutate(tier_name=paste(spk,"_4_TOBI",sep=""))%>%group_by(filename,tier_name,spk,id_word,TOBI_pattern)%>%summarise(time_start=first(time_start_word),time_end=last(time_end_word),content=first(TOBI_pattern))%>%ungroup()%>%mutate(pause= ifelse((lead(time_start)-time_end) > 0.01 ,(lead(time_start)-time_end),NA),MAS_structure=NA,id_ip=NA)%>%select(filename,tier_name,spk,time_start,time_end, id_ip,content,pause,MAS_structure)

TOBI_pauses <- TOBI_tg %>%group_by(filename,spk)%>%mutate(time_start=time_end,time_end=time_end+pause, content=NA)%>%filter(pause>0.01)%>%ungroup()

TOBI_pauses2 <- rbind(TOBI_tg,TOBI_pauses)%>%arrange(filename,time_start)%>%group_by(filename,spk)%>%slice(if(n() > 1)-n() else row_number())%>%ungroup()


MAS_tg <- data2%>%mutate(pause= ifelse((lead(time_start_word)-time_end_word) > 0.01 &lead(id_ip)!=id_ip,(lead(time_start_word)-time_end_word),NA),tier_name=paste("vowels_",spk,sep=""))%>%select(filename,id_ip,tier_name,spk,time_start,time_end, content,pause,MAS_structure)

MAS_pauses <- MAS_tg %>%group_by(filename,spk)%>%mutate(time_start=time_end,time_end=time_end+pause, content=NA)%>%filter(pause>0.01)%>%ungroup()

MAS_pauses2 <- rbind(MAS_tg,MAS_pauses)%>%arrange(filename,time_start)%>%group_by(filename,spk)%>%slice(if(n() > 1)-n() else row_number())%>%ungroup()

MAS_structure <- data2%>%mutate(pause= ifelse((lead(time_start_word)-time_end_word) > 0.01 &lead(id_ip)!=id_ip,(lead(time_start_word)-time_end_word),NA),tier_name=paste(spk,"_3_MAS",sep=""))%>%group_by(filename,tier_name,id_ip,MAS_structure)%>%summarise(time_start=first(time_start),time_end=last(time_end),spk=first(spk),content=first(content),pause=last(pause))%>%mutate(content=MAS_structure)

MAS_str <- MAS_structure %>%group_by(filename,spk)%>%mutate(time_start=time_end,time_end=time_end+pause, content=NA)%>%filter(pause>0.01)%>%ungroup()

MAS_str_pauses <- rbind(MAS_structure,MAS_str)%>%arrange(filename,time_start)%>%group_by(filename,spk)


completeMAS <- rbind(phones_pauses2,words_pauses2,MAS_str_pauses,TOBI_pauses2) %>%arrange(filename,tier_name,time_start)%>%group_by(filename, tier_name)%>%mutate(time_end=lead(time_start))%>%slice(if(n() > 1)-n() else row_number())%>%ungroup()%>%mutate(content=ifelse(is.na(content),"",content))

completeMAS %>%group_by(filename) %>% group_walk(~ generate_textgrid_file2(.y$filename, .x,folder_path))

9.5 Create plots MAS

dir.create("outputs/plots_MAS")

# Generate and save plots by group

# setDF(data2)
# data3 <- data2%>%left_join(MAS%>%select(id_ip,pattern),by="id_ip")%>%head(n=60)%>%mutate(time_center = time_start+((time_end-time_start)/2))
# data3q1 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)))
# data3q4 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)))
# 
# dataplot <- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse( 
#  between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))

data3 <- data2%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3q1 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))
data3q4 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))

dataplot <- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse( 
 between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))

setDF(dataplot)

plots <- lapply(unique(dataplot$id_ip), function(grp) {

plot_data <- dataplot %>%filter(id_ip == grp)
    
  # plot_data <- dataplot
p1 <- ggplot() +
    # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=2.5) + ggtitle(paste("Group:", unique(plot_data$id_ip),unique(plot_data$ip),unique(plot_data$pattern)))+ theme_light() +
    theme(legend.position = "right",plot.title = element_text(size = 6),legend.text = element_text(size=6),legend.title = element_text(size = 6), axis.title=element_text(size=6,face="bold"))+guides(size="none",colour = guide_legend(override.aes = list(size=3))) + xlab("Timeline") + ylab("Prosodic features")

mytheme <- gridExtra::ttheme_default(
    core = list(fg_params=list(cex = 0.5)),
    colhead = list(fg_params=list(cex = 0.5)),
    rowhead = list(fg_params=list(cex = 0.5)))
  
 p2 <- plot_data %>% 
  mutate(
    Hz = standard_pitch,
    dB = standard_intensity,
    dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange(time_start)%>% select(Hz,dB,dur,content)
 p3 <- t(p2)
 
if (nrow(p2) > 0) {
  table_grob <- tableGrob(p3, theme = mytheme)

} else {
  empty_df <- data.frame (Hz  = "0",
                  dB = "0",
                  dur = "0",
                  content ="0"
                  )
  table_grob <- tableGrob(empty_df, theme = mytheme)
}

# Combine plot and table
p3 <-grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))
  
#   plot_data <- dataplot %>%
#     filter(id_ip == grp)
#   
#   p1 <- ggplot() +
#     # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
# geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
# geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
# geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
# geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
# geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
#     # geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
#     # geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
# ggtitle(paste("Group:", grp,dataplot%>%filter(id_ip==grp)%>%select(ip)%>%head(n=1),dataplot%>%filter(id_ip==grp)%>%select(pattern)%>%head(n=1))) + theme_light() +
#     theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
#   
#   
#  p2 <- plot_data %>% 
#   mutate(
#     Hz = standard_pitch,
#     dB = standard_intensity,
#     dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>% 
#   pivot_longer(c("Hz","dB","dur"), names_to = "layer", values_to = "label") %>% 
#   ggplot(aes(x = time_center_table)) +
#   geom_text(aes(y = factor(layer, c("Hz","dB","dur")), label = label), size = 1.4) +
#   labs(y = "", x = NULL) +
#  theme_grey()+
#   theme(axis.line = element_blank(), axis.ticks = element_blank(), axis.text.x = element_blank(),
#         panel.grid = element_blank(),legend.text = element_text(size=1), strip.text = element_blank())
# 
# p3 <- p1 + p2 + plot_layout(ncol = 1,heights = c(5,1)) 
  
  filename <- paste0("outputs/plots_MAS/",grp,"_",unique(plot_data$pattern), "_MAS.png")
  ggsave(filename, plot = p3, width = 10, height = 6)
  
  return(filename)
})

9.6 Create plots TOBI

data4 <-  data2%>%mutate(time_center = time_start+((time_end-time_start)/2))%>%mutate(center_ST=12*log2(center_Hz/1))%>%mutate(time_center_table=ifelse( 
 between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))

plots <- lapply(unique(data2$id_ip), function(grp) {
  plot_data <- data4 %>%filter(id_ip == grp)
  
  p1 <- ggplot() +
    # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+

geom_line(data = plot_data, aes(x = time_center, y = center_ST),color="blue")+
geom_point(data = plot_data, aes(x = time_center, y = center_ST, color = word),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = center_ST, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
    geom_text(data = plot_data, aes(x = time_center, y = center_ST, label = TOBI_pattern), vjust = -1, hjust = 0.5,size=3.2) +
    # geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
    # geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", grp,data4%>%filter(id_ip==grp)%>%select(ip)%>%head(n=1),data4%>%filter(id_ip==grp)%>%select(TOBI_pattern)%>%head(n=1))) + theme_light() +
    theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
  
p2 <- plot_data %>% 
  mutate(ST = round(center_ST, 1)) %>%
  pivot_longer(cols = c("ST"), names_to = "layer", values_to = "label") %>% 
  ggplot(aes(x = time_center)) +
  geom_text(aes(y = layer, label = label), size = 1.4) +
  labs(y = "", x = NULL) +
  theme_gray() +
  theme(axis.line = element_blank(), 
        axis.ticks = element_blank(), 
        axis.text.x = element_blank(),
        panel.grid = element_blank(),
        legend.text = element_text(size = 1),
        strip.text = element_blank())

p3 <- p1 + p2 + plot_layout(ncol = 1,heights = c(5,1)) 
  
  filename <- paste0("outputs/plots_TOBI/",grp,"_",data4%>%filter(id_ip==grp)%>%select(TOBI_pattern)%>%head(n=1), "_TOBI.png")
  ggsave(filename, plot = p3, width = 8, height = 4)
  
  return(filename)
})

10 Statistics

10.1 Descriptive statistics

library(psych)
describe(ips$pitch_St)
   vars    n  mean   sd median trimmed  mad   min    max range skew kurtosis
X1    1 2105 84.95 6.46  86.75    84.8 8.23 74.99 105.23 30.24    0    -1.26
     se
X1 0.14
describeBy(x = ips$pitch_St, group = ips$spk)

 Descriptive statistics by group 
group: 1990_alto_01_e
   vars    n  mean   sd median trimmed  mad   min    max range skew kurtosis
X1    1 1219 89.66 3.51  89.44   89.61 3.17 76.16 105.23 29.07    0     1.85
    se
X1 0.1
------------------------------------------------------------ 
group: 1990_alto_01_i
   vars   n  mean   sd median trimmed  mad   min   max range skew kurtosis   se
X1    1 106 82.89 4.96   81.4   82.32 4.51 75.11 98.86 23.75 0.99     0.36 0.48
------------------------------------------------------------ 
group: 1990_alto_02_e
   vars   n  mean   sd median trimmed mad   min   max range skew kurtosis   se
X1    1 693 77.51 1.56  77.22   77.32 1.2 74.99 90.01 15.02  2.3    10.91 0.06
------------------------------------------------------------ 
group: 1990_alto_02_i
   vars  n  mean   sd median trimmed  mad   min   max range skew kurtosis   se
X1    1 87 80.73 3.66     80   80.44 3.75 75.24 91.14  15.9 0.67    -0.07 0.39

10.2 Histograms

p <- ggplot(data2, aes(x=center_Hz)) + geom_histogram()
p+ geom_vline(aes(xintercept=mean(center_Hz,na.rm=TRUE)),
            color="blue", linetype="dashed", size=1)

10.3 Boxplots

p_boxplot <- ggplot(data2, aes(x=spk, y=standard_intensity, fill=spk)) + geom_boxplot()
p_boxplot

10.4 Barplots

MAS_barplot <- MAS%>%group_by(pattern)%>%summarise(freq=n())
p_barplot <-ggplot(MAS_barplot, aes(x=pattern, y = freq, fill = pattern)) +
  geom_bar(stat="identity")+theme_minimal()
p_barplot 

10.5 Correlation plots

library(corrplot)
corplotsel <- ips%>%select(pitch_Hz,anacrusis,toneme,body)%>%na.omit()

corrplot(cor(corplotsel),method = "circle",tl.cex = 0.6)

10.6 Chi-square test

10.6.1 Chi-square result

chisqresults <- chisq.test(table(data2$MAS_structure,data2$maximum_Hz))
chisqresults

    Pearson's Chi-squared test

data:  table(data2$MAS_structure, data2$maximum_Hz)
X-squared = 650.15, df = 2, p-value < 2.2e-16

10.6.2 Residuals

chisqresults$residuals
           
                    no        yes
  anacrusis  -4.635226  14.749630
  body        3.686061 -11.729316
  toneme     -4.833758  15.381371

10.7 Mosaic plots

library(vcd)
mosaicout <- mosaic(~ maximum_Hz + MAS_structure , data = data2, shade =TRUE)

mosaicout
           MAS_structure anacrusis  body toneme
maximum_Hz                                     
no                             983 16612   3851
yes                            269  1126    723

10.8 Decision trees

library(partykit)
library(ggparty)

treesel <- ips%>%select(spk,MAS_pattern,TOBI,intensity,pitch_St,intensity,pitch_reset_to_ST,pitch_reset_from_ST,toneme,anacrusis,body)%>%mutate_if(is.character,as.factor)%>%na.omit()
setDF(treesel)
treeout <- ctree(spk ~.,data=treesel,control = ctree_control(maxdepth = 3))

 p <- ggparty(treeout) +
      geom_edge() +
      geom_edge_label() +
      geom_node_label(
        line_list = list(
          aes(label = splitvar),
          aes(label = paste("N =", nodesize))
        ),
        line_gpar = list(
          list(size = 10),
          list(size = 10)
        ),
        ids = "inner"
      ) +
      geom_node_label(aes(label = paste0("N = ", nodesize)),
                      ids = "terminal", nudge_y = -0.35, nudge_x = 0.02
      ) +
      geom_node_plot(
        gglist = list(
          geom_bar(aes(x = "", fill = spk),
                   position = position_fill(), color = "black"
          ),
          theme_minimal(),
          xlab(""), ylab("porcentaje")
        )
      )
    plot(p)

10.9 Heatmap

  p <- ips%>%group_by(MAS_pattern) %>% summarize_if(., is.numeric, median, na.rm = TRUE)%>%na.omit()%>%ungroup()
  p <- p %>%column_to_rownames("MAS_pattern")
  heatmap(as.matrix(p), scale = "column")

10.10 Principal component analysis

library(FactoMineR)
library(factoextra)
selection <- MAS%>%filter(is.na(pattern))%>%select(inflexion_percent_Hz_from_prev,inflexion_percent_Hz_to_next,anacrusis,body_2,toneme,displacement,circumflex,intensity_mean_dB,first_tonic_displacement,anacrusis_displacement)%>%mutate(first_tonic_displacement=replace_na(first_tonic_displacement,"no"),circumflex=replace_na(circumflex,"no"))%>%na.omit()
d <- FAMD(selection, graph = FALSE)
summary(d)

Call:
FAMD(base = selection, graph = FALSE) 


Eigenvalues
                      Dim.1  Dim.2  Dim.3  Dim.4  Dim.5
Variance              2.816  2.092  1.476  1.328  1.106
% of var.            11.734  8.716  6.152  5.533  4.608
Cumulative % of var. 11.734 20.451 26.602 32.135 36.743

Individuals (the 10 first)
                                   Dist    Dim.1    ctr   cos2    Dim.2    ctr
1                              |  4.960 | -0.887  0.041  0.032 |  3.173  0.705
2                              |  4.507 |  0.553  0.016  0.015 |  1.526  0.163
3                              |  9.464 | -0.612  0.019  0.004 |  0.708  0.035
4                              |  2.962 |  1.366  0.097  0.213 |  1.846  0.239
5                              |  3.125 |  0.914  0.043  0.086 |  2.208  0.341
6                              | 10.999 |  0.386  0.008  0.001 |  1.977  0.273
7                              |  3.992 | -1.817  0.172  0.207 |  1.602  0.180
8                              |  3.439 |  0.934  0.045  0.074 | -2.122  0.315
9                              | 11.854 |  1.828  0.174  0.024 |  2.622  0.481
10                             |  2.619 | -2.039  0.216  0.606 |  0.942  0.062
                                 cos2    Dim.3    ctr   cos2  
1                               0.409 |  2.163  0.464  0.190 |
2                               0.115 |  1.030  0.105  0.052 |
3                               0.006 | -1.674  0.278  0.031 |
4                               0.389 | -1.133  0.127  0.146 |
5                               0.499 | -0.834  0.069  0.071 |
6                               0.032 |  2.080  0.429  0.036 |
7                               0.161 |  1.477  0.216  0.137 |
8                               0.381 |  1.071  0.114  0.097 |
9                               0.049 |  0.005  0.000  0.000 |
10                              0.129 |  0.048  0.000  0.000 |

Continuous variables
                                  Dim.1    ctr   cos2    Dim.2    ctr   cos2  
inflexion_percent_Hz_from_prev | -0.458  7.438  0.209 |  0.519 12.882  0.269 |
inflexion_percent_Hz_to_next   |  0.842 25.172  0.709 |  0.457 10.000  0.209 |
anacrusis                      |  0.194  1.331  0.037 | -0.381  6.934  0.145 |
body_2                         | -0.505  9.070  0.255 |  0.597 17.061  0.357 |
toneme                         |  0.842 25.193  0.709 |  0.464 10.275  0.215 |
intensity_mean_dB              | -0.182  1.170  0.033 |  0.137  0.898  0.019 |
                                Dim.3    ctr   cos2  
inflexion_percent_Hz_from_prev  0.436 12.879  0.190 |
inflexion_percent_Hz_to_next   -0.007  0.003  0.000 |
anacrusis                       0.101  0.684  0.010 |
body_2                          0.367  9.114  0.135 |
toneme                         -0.003  0.001  0.000 |
intensity_mean_dB              -0.366  9.088  0.134 |

Categories (the 10 first)
                                   Dim.1     ctr    cos2  v.test     Dim.2
no                             |  -0.590   3.373   0.659 -16.681 |  -0.306
yes                            |   1.946  11.115   0.659  16.681 |   1.010
asc_desc                       |  -1.616   2.411   0.166  -7.069 |   1.528
asc_desc_asc                   |   2.381   0.209   0.016   2.008 |  -0.011
asc_q1                         |  -0.196   0.008   0.001  -0.391 |   1.440
asc_q1q2                       |   0.034   0.000   0.000   0.029 |   2.655
asc_q1q2q3                     |   1.140   0.024   0.002   0.679 |   5.644
asc_q2                         |   1.958   0.425   0.033   2.869 |  -0.297
asc_q2q3                       |   4.147   1.588   0.118   5.542 |   2.449
asc_q3                         |  -0.270   0.011   0.001  -0.457 |   0.823
                                   ctr    cos2  v.test     Dim.3     ctr
no                               1.646   0.178 -10.044 |   0.015   0.008
yes                              5.425   0.178  10.044 |  -0.048   0.025
asc_desc                         3.904   0.148   7.752 |   1.933  12.546
asc_desc_asc                     0.000   0.000  -0.011 |   5.747   4.436
asc_q1                           0.763   0.033   3.327 |   0.867   0.555
asc_q1q2                         0.472   0.021   2.598 |   0.154   0.003
asc_q1q2q3                       1.066   0.046   3.902 |   1.877   0.237
asc_q2                           0.018   0.001  -0.504 |   2.532   2.584
asc_q2q3                         1.003   0.041   3.797 |   0.427   0.061
asc_q3                           0.181   0.008   1.617 |  -0.789   0.335
                                  cos2  v.test  
no                               0.000   0.574 |
yes                              0.000  -0.574 |
asc_desc                         0.237  11.675 |
asc_desc_asc                     0.094   6.693 |
asc_q1                           0.012   2.384 |
asc_q1q2                         0.000   0.179 |
asc_q1q2q3                       0.005   1.545 |
asc_q2                           0.055   5.123 |
asc_q2q3                         0.001   0.788 |
asc_q3                           0.007  -1.847 |

10.10.1 Dimension contribution

fviz_eig(d)

10.10.2 Contribution of categories

fviz_contrib(d, choice="var", axes = 1:2)

10.10.3 Chart of individuals

res.pca <- d
groups <- as.factor(selection$displacement)
d_plot<- fviz_pca_ind(res.pca,
                   habillage = groups, # color by groups
                   # palette = c("#00AFBB",  "#FC4E07"),
                   legend.title = "Groups",
                   repel = FALSE)

d_plot

10.11 Cluster classification based on PCA

res.hcpc = HCPC(d,nb.clust = -1)

10.11.0.1 Chart of clusters

fviz_cluster(res.hcpc, ellipse.type = "norm")

10.11.0.2 Characteristics of groups

plotclust <- res.hcpc$data.clust
plottree2 <- ctree(clust~., data=plotclust,control = ctree_control(maxdepth = 3))

ptree <- ggparty(plottree2) +
      geom_edge() +
      geom_edge_label() +
      geom_node_label(
        line_list = list(
          aes(label = splitvar),
          aes(label = paste("N =", nodesize))
        ),
        line_gpar = list(
          list(size = 10),
          list(size = 10)
        ),
        ids = "inner"
      ) +
      geom_node_label(aes(label = paste0("N = ", nodesize)),
                      ids = "terminal", nudge_y = -0.35, nudge_x = 0.02
      ) +
      geom_node_plot(
        gglist = list(
          geom_bar(aes(x = "", fill = clust),
                   position = position_fill(), color = "black"
          ),
          theme_minimal(),
          xlab(""), ylab("porcentaje")
        )
      )

plot(ptree)

plottree2

Model formula:
clust ~ inflexion_percent_Hz_from_prev + inflexion_percent_Hz_to_next + 
    anacrusis + body_2 + toneme + displacement + circumflex + 
    intensity_mean_dB + first_tonic_displacement + anacrusis_displacement

Fitted party:
[1] root
|   [2] displacement in no
|   |   [3] circumflex in asc_desc, asc_q1, asc_q1q2, asc_q3, desc_asc_desc, desc_q1, desc_q1q2q3, desc_q2, desc_q3, no, no_circumflex
|   |   |   [4] anacrusis_displacement in no: 1 (n = 316, err = 10.1%)
|   |   |   [5] anacrusis_displacement in yes: 2 (n = 201, err = 18.9%)
|   |   [6] circumflex in asc_desc_asc, asc_q2, asc_q2q3: 2 (n = 7, err = 42.9%)
|   [7] displacement in yes
|   |   [8] first_tonic_displacement in no
|   |   |   [9] body_2 <= 35.51: 3 (n = 110, err = 8.2%)
|   |   |   [10] body_2 > 35.51: 1 (n = 13, err = 46.2%)
|   |   [11] first_tonic_displacement in yes
|   |   |   [12] toneme <= 21.87: 2 (n = 25, err = 4.0%)
|   |   |   [13] toneme > 21.87: 3 (n = 11, err = 18.2%)

Number of inner nodes:    6
Number of terminal nodes: 7

10.12 ANOVA & HSD-Tukey

anovatest <- aov(toneme ~ spk,data=ips)
summary(anovatest)
              Df  Sum Sq Mean Sq F value   Pr(>F)    
spk            3   12125    4042   5.755 0.000647 ***
Residuals   1752 1230438     702                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
351 observations deleted due to missingness

10.12.1 HSD-Tukey results

TukeyHSD(anovatest)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = toneme ~ spk, data = ips)

$spk
                                    diff       lwr       upr     p adj
1990_alto_01_i-1990_alto_01_e -0.7209639 -8.406498  6.964570 0.9950649
1990_alto_02_e-1990_alto_01_e  1.0347812 -2.551274  4.620837 0.8800700
1990_alto_02_i-1990_alto_01_e 13.2886832  4.986136 21.591230 0.0002359
1990_alto_02_e-1990_alto_01_i  1.7557451 -6.185572  9.697062 0.9414243
1990_alto_02_i-1990_alto_01_i 14.0096471  3.094657 24.924637 0.0054330
1990_alto_02_i-1990_alto_02_e 12.2539020  3.714033 20.793771 0.0013209

10.13 Complete descriptive stats report with Data Explorer

library(DataExplorer)

data_report <- data2%>%select(content,tonic,accent,pitch_mean_HZ,intensity_mean_dB,center_Hz,displacement,MAS_structure)
create_report(data_report)

11 Oralstats_Viewer

library(shiny)
runApp("oralstats_viewer/oralstats_viewer.R")

12 References

Boersma, Paul, and David Weenink. 2022. Praat: Doing Phonetics by Computer [Computer Program]. http://www.praat.org/.
Coretta, Stefano. 2024. “Speakr: A Wrapper for the Phonetic Software ’Praat’.” https://CRAN.R-project.org/package=speakr.
Davies, Mark. 2005. “The Advantage of Using Relational Databases for Large Corpora: Speed, Advanced Queries, and Unlimited Annotation.” International Journal of Corpus Linguistics 10 (3): 307–34. https://doi.org/10.1075/ijcl.10.3.02dav.
———. 2009. “The 385+ million word Corpus of Contemporary American English (19902008+): Design, architecture, and linguistic insights.” International journal of corpus linguistics 14 (2): 159–90.
———. 2010. “More Than a Peephole Using Large and Diverse Online Corpora.” International Journal of Corpus Linguistics 15 (3): 412–18. https://dialnet.unirioja.es/servlet/extart?codigo=4571143.
———. 2012. “Expanding horizons in historical linguistics with the 400-million word Corpus of Historical American English.” Corpora 7 (2): 121–57.
———. 2021. “The TV and Movies corpora: Design, construction, and use.” International journal of corpus linguistics 26 (1): 10–37.
Davies, Mark, and Jong-Bok Kim. 2019. “The advantages and challenges of "big data": Insights from the 14 billion word iWeb corpus.” Linguistic research 36 (1): 1–34.
Mateo Ruiz, Miguel. 2010. “Scripts En Praat Para La Extracción de Datos Tonales y Curva Estándar.” Phonica, no. 6.
Vnjis, Vincent. 2016. Radiant, Business Analytics Using r and Shiny. https://vnijs.github.io/radiant/.
Wijffels, Jan. 2023. “Udpipe: Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the ’UDPipe’ ’NLP’ Toolkit.” https://CRAN.R-project.org/package=udpipe.