library(phonfieldwork)
library(tidyverse)
library(udpipe)
library(data.table)
library(pool)
library(tidytext)
library(duckdb)
library(patchwork)
library(DataExplorer)
library(party)
library(ggparty)
library(FactoMineR)
library(factoextra)
library(gridExtra)
Oralstats. Code and brief tutorial
![]() |
![]() |
“Get to the choppa!” (Major Alan “Dutch” Schaefer character, Predator movie [1987])
1 Link to materials
https://drive.google.com/drive/folders/1gYTDeSO65jc43WijwTR6AjS4_H1JmKLB?usp=sharing
2 Structure and functionality of Oralstats
Oralstats (Cabedo, 2021) has been developed as an exploration environment combining information from speech transcripts with pitch and intensity data obtained from Praat.
(a) It creates three data frames (phones, words (POS tags included) and intonational phrases),
(b) provides the prosodic information for all of them, like pitch range, mean and median, intensity mean, or duration, for intonational phrases, and
(c) assigns a ToBI tag and MAS pattern tag. Further data frames can be generated with a slight modification of code (for instance, for wider units, like intonational clauses or paratones).
3 License
Oralstats code is licensed under GNU General Public License v3.0. Permissions of this strong copyleft license are conditioned on making available complete source code of licensed works and modifications, which include larger works using a licensed work, under the same license. Copyright and license notices must be preserved. Contributors provide an express grant of patent rights.
This document uses a CC BY 4.0 license:
Oralstats. Code and brief tutorial© 2024 by Adrián Cabedo Nebotis licensed under CC BY 4.0
4 The idea
I must give an special credit to Radiant, Business analytics using R and Shiny (Vnjis 2016). Although it is focused mainly on Business data mining, this tool was for me a deeply inspirational idea: in my case, it developed the idea of analyzing data online/offline, using Shiny, in a dynamic mode, adding and/or filtering this data to observe linguistic patterns from speech transcriptions.
Other inspiration for me was all the work done by Davies the last twenty years (Davies 2021, 2005, 2009, 2010; Davies and Kim 2019; Davies 2012, among many others) on the construction of several linguistic corpora; specially, I got really interested about joining data with SQL instances. This is really a good approach to structured data, as it could be the one coming from oral linguistic data, in which phonemes relate to words and these last two relate to utterances and all of them relate to speakers and so on.
Finally, about the ideas that have been key to develop Oralstats, I want to express my gratitude to web platforms like Spokes (http://pelcra.clarin-pl.eu/SpokesBNC/) and to software tools like the ones developed by Laurence Anthony (https://www.laurenceanthony.net/). Also, it wouldn’t have been possible to make speech and phonic analysis without the aid of tools like ELAN (https://archive.mpi.nl/tla/elan) and PRAAT (Boersma and Weenink 2022) (https://www.fon.hum.uva.nl/praat/); also in PRAAT, one of the most inspirational script was “analyse_tier.praat”, developed by Daniel Hirst..
5 Before beginning
To start using Oralstats for processing and analyzing speech data, you’ll need some files. Essentially, you’ll require a transcription file with alignment times (from Textgrid or TXT tabbed files) and an audio file.
5.1 Transcription
The transcription of the audio can be done manually or using other automatic transcription tools. Currently, when working primarily with non-overlapping genres, Whisper (large V2 or V3) performs admirably. However, it may miss some potentially interesting phenomena for linguists, such as question tags or paralinguistic nuances like laughing or coughing.
Adapting Whisper output to a TextGrid file can be somewhat challenging, but there are options available that can automate this process. For instance, Whisper transcriptions can be imported into ELAN files, from which they can then be exported into TextGrid files.
One of the primary issues with Whisper transcriptions is that they do not separate speakers. This task may need to be done manually.
5.2 Alignment
At present, Oralstats requires data to be structured in three aligned layers: intonational phrases (optional), words (required), and phones (required). These layers must be synchronized with the audio. Currently, there are three methods available for performing this alignment, which are outlined in the following sections.
5.2.1 PRAAT alignment by hand
Aligning long files such as interviews or conversations manually, annotating and aligning words and phonemes one by one, is not feasible and would be an unattainable task.
5.2.2 PRAAT using interval alignment
While the process is relatively straightforward, it is time-consuming. However, experienced users of PRAAT will appreciate being able to perform the alignment within their familiar environment. As of the time of writing this document, PRAAT alignment cannot be automated through scripting and can only be carried out within the TextGrid editor.
5.2.3 Using Forced Montreal Alignment
Learning curve: a bit steep. However, the benefits are truly remarkable. It’s worth noting that languages like Catalan are still not fully supported by native acoustic models, which may pose some challenges for users seeking to work with such languages.
5.2.4 SPPAS
SPPAS, a scientific computer software package, offers automated speech annotation and analysis. Continuously evolving, SPPAS aims to deliver a robust and dependable tool for researchers. It is freely available with open-source code. Whether you require automatic speech annotation, analysis of annotated data, or conversion of file formats, SPPAS simplifies the process.
5.2.5 Using Webmaus
The process is identical to Forced Montreal Alignment and is supported by the CLARIN infrastructure. However, it’s important to note that the results may not be as accurate as those obtained from Montreal alignment.
5.3 Google colab script
To automatically generate the necessary files to work with Oralstats, we have established a basic pipeline. This pipeline begins with an audio file and proceeds to generate a transcription using Whisper (attempting to run it directly in your terminal, although success is not guaranteed). Subsequently, this output will be aligned both by word and by phone using Montreal forced alignment.
The resulting files will be TextGrid files, which will not be separated by speaker. This task will need to be carried out manually. However, if your recording consists of a monologue, this won’t cause any problem.
6 Libraries
Libraries in R, also known as packages, are collections of functions, data sets, and other tools designed to extend the capabilities of R programming language.
7 Where do your dataset come from?
7.1 From duckdB database
This option considers that there has been done a previous converting process with Oralstats; so, that there are already data frames created for intonational phrases, words, phones and turns. Frequently, the next step of using this method is to access directly Statistics (Section 10) section or visualization with Oralstats_Viewer (Section 11). Uncomment the following lines if it is the case that you have already created a database (and do not execute importing from TextGrid or TXT section):
# mydb <- dbConnect(drv = duckdb(), "oralstats.duckdb")
# words <- mydb%>% tbl("oralstats_words")%>%collect()
# ips <- mydb%>% tbl("oralstats_ips")%>%collect()
# phones <- mydb%>% tbl("oralstats_phones")%>%collect()
# dbDisconnect(mydb, shutdown = TRUE)
7.2 Transcriptions from TextGrid
This method relies on an external library (phonfieldwork). Please note that you may need to make slight modifications to the code below for speaker variable definition. Currently, Oralstats expects the following labels for each tier:
1. "Speaker - utterance"
2. "Speaker - word"
3. "Speaker - phone"
For example, if you’re working with a TextGrid file from the Catalan folder containing an interview with Rosalia, you might encounter tier names such as “R - word” and “R - phone”. Since this TextGrid originates from WebMaus, there may not be an “utterance” tier.
# dirtextgrids <- "textgrid/spanish"
# # dirtextgrids <- "textgrid/catalan"
#
# textgrids <- list.files(dirtextgrids,full.names = TRUE,pattern = ".*textgrid|.*TextGrid")
# data <- map2_df(basename(textgrids),textgrids, ~ textgrid_to_df(.y)%>%mutate(filename=.x))
# data <-data %>% filter(content != "",grepl("(utterances|word|phone)", tier_name))
# data <-data %>% mutate(filename = gsub(".textgrid", "", filename),
# filename = gsub(".TextGrid", "", filename)) %>% select(-source, -id, -tier) %>% mutate(spk = gsub(" -.*", "", tier_name), spk = paste(filename, spk, sep = "_"))
# data <- data %>% na.omit()
# rm(dirtextgrids)
# rm(textgrids)
7.3 Transcriptions from TXT
Uncomment the following lines if your data does not come from a TextGrid file, but you have a .txt file with these columns separated by a tab character. You can download these files from PRAAT converting TextGrid to table and then downloaded it as a tab separated text. This action will download a UTF-16 txt file that should be converted to UTF-8 format (it can be easily done with a text editor like Sublime Text (MAC) or Notepad++(Windows):
tmin | tier | text | tmax |
---|
<- "textgrid/spanish"
dirtextgrids <- list.files(dirtextgrids,full.names = TRUE,pattern = ".*txt")
txts <- map2_df(basename(txts),txts, ~fread(.y,sep = "\t",header = TRUE)%>% mutate(filename = .x))
data <- data %>% filter(text != "",grepl("(utterances|word|phone)", tier))%>%rename(content=text)
data <- data %>% mutate(filename = gsub("TXT", "", filename),
data filename = gsub(".txt", "", filename)) %>%rename(tier_name=tier,time_start=tmin,time_end=tmax)%>% mutate(spk = gsub(" -.*", "", tier_name), spk = paste(filename, spk, sep = "_"))%>%relocate(time_end,.after = time_start)%>%relocate(tier_name,.after = content)
<- data %>% na.omit() data
7.4 Prosody files
7.4.1 Get pitch tier and intensity tier files from PRAAT Script
This section will create pitch and intensity files for each WAV file uploaded in a specified folder. The Speakr library (Coretta 2024) is used to connect R with PRAAT, but it is not obligatory to use it that way. You can find the corresponding PRAAT script in the ‘scripts_PRAAT’ folder and run it from there. By default, this chunk won’t be executed unless you change ‘eval’ from false to true.
library(speakr)
<- system.file("extdata", "scripts_PRAAT/create_pitchtiers_intensitytiers.praat", package = "speakr")
script <- "prosody_files/pitch"
pitchl <- "prosody_files/pitch"
pitchinterpolate <- "prosody_files/pitch"
intensity dir.create(file.path(pitch), showWarnings = FALSE)
dir.create(file.path(pitchinterpolate), showWarnings = FALSE)
dir.create(file.path(intensity), showWarnings = FALSE)
# setwd(file.path(mainDir, subDir))
start_praat()
praat_run(script = "scripts_PRAAT/create_pitchtiers_intensitytiers.praat")
<- list.files("prosody_files/pitchinterpolate/", pattern = "*.txt")
old_files_interpolate <- gsub("pitchinterpolate","",old_files_interpolate)
new_files_interpolate <- gsub("\\.wav","",new_files_interpolate )
new_files_interpolate
file.copy(from = paste0("prosody_files/pitchinterpolate/",old_files_interpolate), to = paste0("prosody_files/pitchinterpolate/",new_files_interpolate ))
<- list.files("prosody_files/pitch/", pattern = "*.txt")
old_files_pitch <- gsub("pitch","",old_files_pitch)
new_files_pitch <- gsub("\\.wav","",new_files_pitch )
new_files_pitch
file.copy(from = paste0("prosody_files/pitch/",old_files_pitch), to = paste0("prosody_files/pitch/",new_files_pitch ))
<- list.files("prosody_files/intensity/", pattern = "*.txt")
old_files_intensity <- gsub("intensity","",old_files_intensity)
new_files_intensity <- gsub("\\.wav","",new_files_intensity )
new_files_intensity
file.copy(from = paste0("prosody_files/intensity/",old_files_intensity), to = paste0("prosody_files/intensity/",new_files_intensity ))
7.4.2 Pitch
In this section, pitch information from PitchTier files is imported into the same dataframe:
# Define pitch or pitchinterpolate (uncomment the line below if it is the case)
<- "prosody_files/pitch"
dirpitch # dirpitch <- "pitchinterpolate"
<- list.files(dirpitch,full.names = TRUE,pattern = ".*txt|.*TXT")
pitch_list <- map2_df(basename(pitch_list),pitch_list,~ fread(.y) %>% mutate(filename = .x))
pitch <- pitch %>% rename(time = V1, pitch = V2) %>% mutate(
pitch time = round(time, 2),
time_ms = round(time, 2) * 1000,
filename = gsub("\\.txt", "", filename),pitch=round(pitch,1))
<- pitch%>%mutate(pitch_st= 12*log2(pitch/1)) pitch
7.4.3 Intensity
In this section, intensity information from IntensityTier files is imported into the same dataframe:
<- list.files("prosody_files/intensity",full.names = TRUE,pattern = ".*txt|.*TXT")
intensity_list <- map2_df(basename(intensity_list),intensity_list, ~ fread(.y) %>% mutate(filename = .x)) %>% rename(time = `Time (s)`, intensity = `Intensity (dB)`) %>% mutate(time = round(time, 2),time_ms = round(time, 2) * 1000, filename = gsub("\\.txt", "", filename),intensity=round(intensity,1))%>%select(-rowLabel) intensity
7.4.4 Prosody combined
In this section, we pair each pitch point with its corresponding intensity point.
<- pitch[intensity, on = c('filename', 'time'), nomatch = 0]
prosody c("filename","time", "time_ms","pitch","intensity")]
prosody[, order(filename,time), id_prosody := paste(filename,time_ms,sep="_")]
prosody[:= NULL] prosody[,i.time_ms
7.4.5 Interpolate
By applying this section, missing pitch and intensity points can be interpolated. Although this process is an approximation for the data, it enables the generation of more precise units and processing parts for melodic analysis of speech and TOBI annotation.
<- function(data) {
interpolate_by_filename <- seq(min(data$time), max(data$time), by=0.01)
new_times # Linear interpolation for pitch using the 'approx' function
<- approx(data$time, data$pitch, xout = new_times, method = "linear")
interpolated_pitch # Linear interpolation for intensity using the 'approx' function
<- approx(data$time, data$intensity, xout = new_times, method = "linear")
interpolated_intensity # Create a new data frame for the interpolated values
<- data.frame(filename = unique(data$filename),time = rep(new_times, each = length(unique(data$filename))), pitch = rep(interpolated_pitch$y, each = length(unique(data$filename))), intensity = rep(interpolated_intensity$y, each = length(unique(data$filename))))
interpolated_df
}# Group by filename and apply the interpolation function
<- prosody %>% group_split(filename) %>%
interpolated map_df(interpolate_by_filename)
<- interpolated%>%mutate(time_ms= time*1000,id_prosody= paste(filename,time_ms,sep="_"),pitch=round(pitch,1),intensity=round(intensity,1))
prosody <- NULL
interpolated <- NULL
pitch <- NULL
intensity rm(dirpitch,intensity,intensity_list,interpolated,pitch,pitch_list,textgrids, interpolate_by_filename, intensitydir,pitchinterpolate,pitchoriginal,script)
8 Speech units
The speech units covered by Oralstats range from smaller ones, such as phones, to broader ones, such as speech turns.
Words. Words aligned to timeline.
Phones. Phones aligned to timeline.
Ips (intonational phrases). Basically, combination of words consecutive between silences of circa 150 milliseconds.
Speech turns. Combination of ips defined by change of speaker.
8.1 Define language
As of now, Oralstats attempts to recognize the tonic structure or accent of languages such as Spanish or Catalan. If you are using data from another language, you can attempt to modify the computational rules below, or alternatively, add them manually (although this process is more time-consuming):
<- "spanish"
language # language <- "catalan"
8.2 Words
First of all, Oralstats will filter a word data frame from the general input acquired from Section 1.
<- data %>% filter(grepl("word", tier_name)) %>% mutate(id_word = paste(spk, "word", time_start, time_end, sep ="_"),time_start_word=time_start,time_end_word=time_end,word=content) words
8.2.1 Assign accent
Words need to have accents. At this point, Oralstats automatically assigns an accent to each word, considering two possible options: Catalan or Spanish. If you are using Oralstats for another language, you can configure specific rules here for assigning accent structure to the word (oxytone, paroxytone, proparoxytone), or you can perform this task using a lexicon.
setDT(words)
if(language=="spanish"){
:= gsub("[^áéíóúaeiouÁÉÍÓÚAEIOUÜü]", "", word)]
words[, vowels_structure := ifelse(grepl("[aeiou][^aeiouáéíóúns]$", word) & !grepl("[áéíóú]", vowels_structure), "oxitone",NA)]
words[,accent := ifelse(grepl("[áéíóú][ns]$", word),"oxitone", accent)]
words[,accent := ifelse(grepl("[áéíóú]$", word),"oxitone", accent)]
words[,accent := ifelse(grepl("[aeons]$", word) & !grepl("[áéíóú]", vowels_structure),"paroxitone",accent)]
words[,accent := ifelse(grepl("[áéíóú][aeiou][aeiou]", vowels_structure),"proparoxitone",accent)]
words[,accent := ifelse(vowels_structure %in% c("a", "e", "u", "o", "i"),"oxitone", accent)]
words[,accent := ifelse(vowels_structure == "í" |vowels_structure == "ú","oxitone", accent)]
words[,accent := ifelse(
words[,accent grepl("ía$", vowels_structure) |
grepl("íe$", vowels_structure) |
grepl("ío$", vowels_structure) |
grepl("úa$", vowels_structure) | grepl("úe$", vowels_structure) | grepl("úo$", vowels_structure), "paroxitone",accent)]
:= ifelse(word %in% c( "tu", "pero", "desde", "nuestro", "vuestro", "aun", "medio", "tan", "mi", "su", "me", "te", "se", "el", "la", "los", "las", "que", "a", "ante", "con", "de", "pues", "por", "para", "desde", "en", "al", "san", "porque", "y", "o", "u", "donde", "cuando", "como", "entre", "del"),
words[,accent "non_tonic",
accent
)]}
if(language=="catalan")
{setDT(words)
:= gsub("[^àáèéíòóúaeiouÀÁÈÉÍÒÓÚAEIOUÜü]", "", word)]
words[, vowels_structure := ifelse(grepl("(à|é|è|í|ó|ò|ú|én|èn|ín|às|és|ès|ís|ós|òs|ús)$", word), "oxitone",NA)]
words[,accent := ifelse(grepl("(a|e|i|o|u|en|in|as|es|is|os|us|ia)$",word) & !grepl("[àéèíòóú]",vowels_structure), "paroxitone",accent)]
words[,accent := ifelse(grepl("(à|é|è|í|ó|ò|í|ú)[aeiou]$",vowels_structure) , "paroxitone",accent)]
words[,accent := ifelse(grepl("(à|é|è|í|ó|ò|í|ú)[aeiou][aeiou]$", vowels_structure), "proparoxitone",accent)]
words[,accent := ifelse(vowels_structure %in% c("a","e","i","o","u","à","è","ò","é","í","ó","ú"),"oxitone",accent)]
words[,accent := ifelse(word %in% c("A","a","El","el","la","La","Els","els","com","per","amb","Amb","em","al","quan","doncs","Per","del","Doncs","ets","se'm","tan","Em","LA","dels","I","i"),"non_tonic",accent)]
words[,accent := ifelse(is.na(accent),"oxitone",accent)]
words[,accent
}
setDF(words)
<- words%>%select(-word)
words # From lexicon. It must have to columns (word and accent).
# words <- words %>% left_join(lexicon,by="word")
setDT(words)
8.2.2 Assign ip id and ip text
In this section, and from previous words data frame, it will compute how many words come together before a pause of at least 0.15 (this treshold can be modified in the code below). This operation will create intonational phrases along with an specific id for them.
# Ips_coming_from <- "transcriptions or textgrids"
<- "words"
Ips_coming_from
if(Ips_coming_from!="words")
{
<- data %>% filter(grepl("utterance", tier_name) | grepl("_ip_", tier_name)) %>% mutate(id_ip = paste(spk, "ip", time_start, time_end, sep ="_"),time_start_ip=time_start,time_end_ip=time_end,ip=content)
ips <- words %>%arrange(filename,time_start)%>% mutate(pause = lead(time_start) - time_end)
words
setDT(words) # make a data.table
setDT(ips) # make a data.table
# setDT(ips)
setkey(ips, filename,spk,time_start, time_end)
setkey(words, filename,spk,time_start, time_end)
# setkey(ips, filename, spk,time_start, time_end)
<- foverlaps(ips, words, nomatch = NA)%>%select(-i.time_start,-i.time_end,-i.tier_name)
words_1 <- words_1 %>%group_by(id_ip)%>%mutate(token_id=row_number(),token_id=as.character(token_id))%>%ungroup()%>%arrange(filename,time_start)
words rm(words_1)
}
if(Ips_coming_from=="words")
{<- words %>%arrange(filename,time_start)%>% mutate(pause = lead(time_start) - time_end,changeip = ifelse(pause > 0.15 | lead(filename)!=filename, "change", NA),changeturn = ifelse(lead(spk)!=spk|lead(filename)!=filename, "change", NA)) %>% group_by(filename, spk, changeip) %>%
words mutate(id_ip = row_number(),
id_ip = ifelse(
!= "change",
changeip NA,
paste(spk,time_start, time_end, id_ip, sep = "_")
%>%ungroup()%>%fill(id_ip, .direction = "up")%>%group_by(id_ip)%>%mutate(token_id=row_number(),token_id=as.character(token_id))%>%ungroup()%>%arrange(filename,time_start) %>% group_by(filename, id_ip) %>% mutate(ip = paste(content, collapse = " ")) %>%ungroup()%>%group_by(filename,id_ip)%>%mutate(id_ip= paste(spk,first(time_start), last(time_end), sep = "_"),time_start_ip=first(time_start),time_end_ip=last(time_end))%>%ungroup()}
)) rm(Ips_coming_from)
8.2.3 Assign turn id and turn text
In this section, turns will be taken directly from words. It computes the quantity of words uttered before a change of speaker. If data does not come from a dialogue, there will be no turns, as it will be a continuous monologue (that will be computed at any case).
<- words %>% group_by(filename, spk, changeturn) %>%
words mutate(id_turn = row_number(),
id_turn = ifelse(
!= "change",
changeturn NA,
paste(spk, time_start, time_end, id_turn, sep = "_")
%>%ungroup()%>%fill(id_turn, .direction = "up")%>%group_by(id_turn)%>%ungroup()%>%arrange(filename,time_start) %>% group_by(filename, id_turn) %>% mutate(pause_cod=ifelse(changeip=="change","/ ",""),pause_cod=ifelse(is.na(pause_cod),"",pause_cod),turn = paste(paste(content,pause_cod,sep=""), collapse = " ")) %>%ungroup()%>%group_by(filename,id_turn)%>%mutate(id_turn= paste(spk,first(time_start), last(time_end), sep = "_"))%>%ungroup()%>%arrange(filename,time_start) ))
8.2.4 Morphosyntactic tagging
8.2.4.1 Charge UdPipe model and select language
To automatically tag the words, Oralstats uses UdPipe (Wijffels 2023):
<- udpipe_download_model(language = language,model_dir = "udpipe/")
modeldownl <- udpipe_load_model(modeldownl) spmodel
8.2.4.2 POS tag
<- words%>%group_by(id_ip,ip)%>%summarise(ip=first(ip))%>%ungroup()
ips <- udpipe_annotate(x=ips$ip,object = spmodel, tokenizer = "horizontal",tagger = "default", doc_id=ips$id_ip)%>%as.data.frame()%>%rename(id_ip=doc_id)
tokenized_udpipe setDT(tokenized_udpipe)
setDT(words)
<- words[tokenized_udpipe[,c("id_ip","upos","lemma","token_id")], on = c('id_ip',"token_id"), nomatch = 0]
words <- words%>%mutate(accent=ifelse(upos%in%c("ADP","CCONJ"),"non_tonic",accent),accent=ifelse(content=="según","oxitone",accent),accent=ifelse(content%in%c("lo","los","si","le","les","mis","mi"),"non_tonic",accent),first_word=ifelse(token_id==1,"yes","no"))%>%group_by(id_ip)%>%mutate(toneme_word = ifelse(id_word==last(id_word)&accent!="non_tonic","yes","no"))%>%ungroup()
words
<- words%>%filter(accent=="non_tonic") %>%group_by(id_ip)%>%mutate(first_non_tonic_ip=ifelse(id_word ==first(id_word)&token_id==1,"yes","no"))%>%ungroup()%>%select(id_word,first_non_tonic_ip)
words_non_tonic_first
<- words%>%filter(accent!="non_tonic") %>%group_by(id_ip)%>%mutate(first_tonic_ip=ifelse(id_word ==first(id_word)&token_id==1,"yes","no"))%>%ungroup()%>%select(id_word,first_tonic_ip)
words_tonic_first
<- words%>%filter(accent!="non_tonic") %>%group_by(id_ip)%>%mutate(tonic_ip_no_first=ifelse(id_word ==first(id_word)&token_id>1,"yes","no"))%>%ungroup()%>%select(id_word,tonic_ip_no_first)
words_tonic_non_first
<- words%>%left_join(words_non_tonic_first,by="id_word")
words <- words%>%left_join(words_tonic_first,by="id_word")
words <- words%>%left_join(words_tonic_non_first,by="id_word")
words
rm(spmodel,tokenized_udpipe,modeldownl,words_non_tonic_first,words_tonic_first,words_tonic_non_first)
8.2.5 Sentiment tagging
In this section, a very basic sentiment annotation derived from a lexicon will be added to the word structure. Currently, there is no application of quantifiers.
<- read.csv("sentiment/sentiment.csv",header = TRUE,sep=",")
sentiment # sentiment <- read.csv("sentiment/sentiment_cat.csv",header = TRUE,sep=",")
<- sentiment %>%distinct(content,.keep_all = T)
sentiment setDT(sentiment)
<- merge(words, sentiment,by = "content", all.x = T)
words <- words%>%arrange(filename,time_start)
words rm(sentiment)
8.3 Words with proximities
With corpus approach, although not strictly covered in this version of Oralstats, proximity words are included in the general words data frame.
setDF(words)
<- words%>%group_by(id_ip)%>%mutate(word=content,
words wordleft2 = lag(word,2),
wordleft = lag(word,1),
wordright = lead(word,1),
wordright2 = lead(word,2),
uposleft2 = lag(upos,2),
uposleft = lag(upos,1),
uposright = lead(upos,1),
uposright2 = lead(upos,2)
%>%ungroup() )
8.4 Vowels
<- data %>% filter(grepl("phone", tier_name)) %>% mutate(id_phone = paste(spk, "phone", time_start*1000, time_end*1000, sep ="_"),category = ifelse(grepl("[aeiou@EO]", content),"vowel", ifelse(grepl("[jw]", content), "glide", "consonant")),category=ifelse(grepl("[nlñ]",content),"vowel",category),time_start = time_start + 0.001, time_end = time_end - 0.001)
phones
# words<-words%>%mutate(time_start_word = time_start, time_end_word = time_end,pause=lead(time_start)-time_end)
# ips <- ips%>% mutate(time_start_ip = time_start, time_end_ip = time_end)
8.4.1 Combine with words
setDT(words) # make a data.table
setDT(phones) # make a data.table
# setDT(ips)
setkey(words, filename,spk,time_start, time_end)
setkey(phones, filename,spk,time_start, time_end)
# setkey(ips, filename, spk,time_start, time_end)
<- foverlaps(words, phones, nomatch = NA)%>%select(-i.time_start,-i.time_end,-i.tier_name)
phones_1 # setkey(phones_1, filename, spk, time_start, time_end)
# phones_2 <- foverlaps(ips,phones_1,nomatch=NA)%>%mutate_if(is.numeric,round,2)%>%select(-i.time_start,-i.time_end,-i.tier_name)%>%rename(ip=i.content)
<- phones_1 %>% filter(category=="vowel",grepl("[aeiou@EOIAU]",content)|(grepl("[nlñ]",content) & lead(id_word)!=id_word) )%>%arrange(filename,time_start)
vowels
<- phones_1
phones rm(phones_1)
8.4.2 Combine vowels and prosody
setDT(vowels) # make a data.table
setDT(prosody) # make a data.table
<- prosody[, dummy := time]
prosody <- prosody[order(filename,time)] # sorting by time so I can choose first match
prosody setkey(vowels, filename, time_start, time_end) # setting keys tells data.table what to join on
setkey(prosody, filename, time, dummy) # setting keys tells data.table what to join on
<- foverlaps(vowels, prosody, nomatch = NA)[, dummy := NULL]
summary <- summary[, dur := time_end - time_start]
summary <- summary[,quarter := ifelse(time <= (time_start + (0.25 * dur)), "q1", NA)]
summary <- summary[,quarter := ifelse(time >= (time_start + (0.25 * dur)) & time <= (time_start + (0.5 * dur)),"q2",quarter)]
summary <- summary[,quarter := ifelse(time >= (time_start + (0.5 * dur)) &
summary <= (time_start + (0.75 * dur)),"q3",quarter)]
time <- summary[,quarter := ifelse(time >= (time_start + (0.75 * dur)) &
summary <= (time_start + (dur)), "q4",quarter)]
time <- summary[quarter == "q1", .(q1piHz = mean(pitch, na.rm = TRUE),q1piHz_time = mean(time, na.rm = TRUE)),by=(id_phone)]
q1alof <- summary[quarter == "q2", .(q2piHz = mean(pitch, na.rm = TRUE),q2piHz_time = mean(time, na.rm = TRUE)),by=(id_phone)]
q2alof <- summary[quarter == "q3", .(q3piHz = mean(pitch, na.rm = TRUE),q3piHz_time = mean(time, na.rm = TRUE)),by=(id_phone)]
q3alof <- summary[quarter == "q4", .(q4piHz = mean(pitch, na.rm = TRUE),q4piHz_time = mean(time, na.rm = TRUE)),by=(id_phone)]
q4alof
= summary[, .(pitch_mean_HZ = mean(pitch, na.rm = TRUE),IQR_Hz = IQR(pitch, na.rm = TRUE), intensity_mean_dB = mean(intensity, na.rm = TRUE),IQR_dB = IQR(intensity, na.rm = TRUE)), by=.(id_phone,id_ip,id_word,filename,spk,content,word,ip,first_tonic_ip,first_non_tonic_ip,tonic_ip_no_first,time_start,time_end,time_start_word,time_end_word,time_start_ip,time_end_ip,toneme_word,category,accent,upos,lemma,sentiment)]%>%mutate_if(is.numeric,round,2)
data2 setDF(data2)
<- data2%>%left_join(q1alof,by="id_phone")
data2 <- data2%>%left_join(q2alof,by="id_phone")
data2 <- data2%>%left_join(q3alof,by="id_phone")
data2 <- data2%>%left_join(q4alof,by="id_phone")
data2
<- data2%>%group_by(id_ip)%>%mutate(pitch_mean_HZ=ifelse(is.na(pitch_mean_HZ),mean(pitch_mean_HZ,na.rm = TRUE),pitch_mean_HZ),q1piHz=ifelse(is.na(q1piHz),mean(q1piHz,na.rm = TRUE),q1piHz),q2piHz=ifelse(is.na(q2piHz),mean(q2piHz,na.rm = TRUE),q2piHz),q3piHz=ifelse(is.na(q3piHz),mean(q3piHz,na.rm = TRUE),q3piHz),q4piHz=ifelse(is.na(q4piHz),mean(q4piHz,na.rm = TRUE),q4piHz),intensity_mean_dB=ifelse(is.na(intensity_mean_dB),mean(intensity_mean_dB,na.rm = TRUE),intensity_mean_dB))%>%ungroup()
data2
rm(q1alof,q2alof,q3alof,q4alof,summary)
8.4.3 Assigning tonicity
setDT(data2)
:= ifelse(id_phone==first(id_phone),"yes","no"), by=id_word]
data2[, first # data2[, first_ip := ifelse(id_phone==first(id_phone),"yes","no"), by=id_ip]
:= ifelse(id_phone==last(id_phone),"yes","no"), by=id_word]
data2[, last := ifelse(last=="yes"&accent=="oxitone","yes","no")]
data2[, tonic := ifelse(accent=="paroxitone"&shift(last,1,type="lead")=="yes","yes",tonic), by=id_word]
data2[, tonic := ifelse(accent=="proparoxitone"&shift(last,2,type="lead")=="yes","yes",tonic), by=id_word]
data2[, tonic
:= ifelse(is.na(tonic),"no",tonic)]
data2[, tonic setDF(data2)
<- data2%>%relocate(tonic, .after = "content")
data2 setDT(data2)
8.4.4 Compute tonal differences
setDT(data2)
<- 10
percent_treshold
:= ifelse(is.na(q1piHz),rowMeans(data2[,c("q2piHz","q3piHz","q4piHz")],na.rm = TRUE),q1piHz)]
data2[,q1piHz := ifelse(is.na(q2piHz),rowMeans(data2[,c("q1piHz","q3piHz","q4piHz")],na.rm = TRUE),q2piHz)]
data2[,q2piHz := ifelse(is.na(q3piHz),rowMeans(data2[,c("q2piHz","q1piHz","q4piHz")],na.rm = TRUE),q3piHz)]
data2[,q3piHz := ifelse(is.na(q4piHz),rowMeans(data2[,c("q2piHz","q3piHz","q1piHz")],na.rm = TRUE),q4piHz)]
data2[,q4piHz := rowMeans(data2[,c("q2piHz","q3piHz")],na.rm=TRUE)]
data2[,center_Hz ':='(
data2[,
inflexion_percent_Hz_from_prev = ((center_Hz - shift(center_Hz, 1,type="lag"
/ shift(center_Hz, 1,type="lag")) * 100,
)) inflexion_ST_from_prev = 12 * log2(center_Hz / shift(center_Hz, 1,type="lag")),
inflexion_percent_Hz_to_next = ((shift(
1,type="lead"
center_Hz, - center_Hz) / center_Hz) * 100)]
)
':='(
data2[,inflexion_percent_Hz_to_next = ifelse(
== "oxitone",
accent - q1piHz) / q1piHz) * 100,
((q4piHz
inflexion_percent_Hz_to_next
),inflexion_ST_to_next = 12 * log2(shift(center_Hz, 1,type="lead") /
center_Hz))]
':='(
data2[,inflexion_ST_to_next = ifelse(
== "oxitone",
accent 12*log2(q4piHz/q1piHz),
inflexion_ST_to_next
),q1piHz_q2piHZ = ((q2piHz - q1piHz) / q2piHz) * 100,
q2piHz_q3piHZ = ((q3piHz - q2piHz) / q3piHz) * 100,
q3piHz_q4piHZ = ((q4piHz - q3piHz) / q4piHz) * 100,
q1piHz_q3piHZ = ((q3piHz - q1piHz) / q3piHz) * 100,
q1piHz_q4piHZ = ((q4piHz - q1piHz) / q4piHz) * 100,
q2piHz_q4piHZ = ((q4piHz - q2piHz) / q4piHz) * 100,
q1piHz_q4piHZ_ST = 12*log2(q4piHz/q1piHz),
q1piHz_q2piHZ_ST = 12*log2(q2piHz/q1piHz),
q2piHz_q3piHZ_ST = 12*log2(q3piHz/q2piHz),
q3piHz_q4piHZ_ST = 12*log2(q4piHz/q3piHz),
q2piHz_q4piHZ_ST = 12*log2(q4piHz/q2piHz)
)]setDF(data2)
<- data2%>%mutate(circumflex =
data2 case_when(
<= -percent_treshold & q2piHz_q3piHZ >= percent_treshold & q3piHz_q4piHZ <= -percent_treshold ~ "desc_asc_desc",
q1piHz_q2piHZ >= percent_treshold & q2piHz_q3piHZ <= -percent_treshold & q3piHz_q4piHZ >= percent_treshold ~ "asc_desc_asc",
q1piHz_q2piHZ >= percent_treshold & between(q2piHz_q3piHZ,-percent_treshold,percent_treshold) & q3piHz_q4piHZ <= -percent_treshold ~ "asc_desc",
q1piHz_q2piHZ >= percent_treshold & q2piHz_q3piHZ <= -percent_treshold & q3piHz_q4piHZ <= -percent_treshold ~ "asc_desc",
q1piHz_q2piHZ >= percent_treshold & between(q2piHz_q3piHZ,-percent_treshold,percent_treshold) & between(q3piHz_q4piHZ,-percent_treshold,percent_treshold) ~ "asc_q1",
q1piHz_q2piHZ between(q1piHz_q2piHZ,-percent_treshold,percent_treshold)& q2piHz_q3piHZ >= percent_treshold &between(q3piHz_q4piHZ,-percent_treshold,percent_treshold) ~ "asc_q2",
between(q1piHz_q2piHZ,-percent_treshold,percent_treshold)& between(q2piHz_q3piHZ,-percent_treshold,percent_treshold) & q3piHz_q4piHZ >= percent_treshold ~ "asc_q3",
<= -percent_treshold & between(q2piHz_q3piHZ,-percent_treshold,percent_treshold) & between(q3piHz_q4piHZ,-percent_treshold,percent_treshold) ~ "desc_q1",
q1piHz_q2piHZ <= -percent_treshold & between(q2piHz_q3piHZ,-percent_treshold,percent_treshold) & q3piHz_q4piHZ <= -percent_treshold ~ "desc_q1q3",
q1piHz_q2piHZ between(q1piHz_q2piHZ,-percent_treshold,percent_treshold)& q2piHz_q3piHZ <= -percent_treshold &between(q3piHz_q4piHZ,-percent_treshold,percent_treshold) ~ "desc_q2",
between(q1piHz_q2piHZ,-percent_treshold,percent_treshold)& between(q2piHz_q3piHZ,-percent_treshold,percent_treshold) & q3piHz_q4piHZ <= -percent_treshold ~ "desc_q3",
between(q1piHz_q2piHZ,-percent_treshold,percent_treshold) & q2piHz_q3piHZ >= percent_treshold & q3piHz_q4piHZ >= percent_treshold ~ "asc_q2q3",
>= percent_treshold & q2piHz_q3piHZ >= percent_treshold & between(q3piHz_q4piHZ,-percent_treshold,percent_treshold) ~ "asc_q1q2",
q1piHz_q2piHZ >= percent_treshold & q2piHz_q3piHZ >= percent_treshold & q3piHz_q4piHZ >= percent_treshold ~"asc_q1q2q3",
q1piHz_q2piHZ <= -percent_treshold & q2piHz_q3piHZ <= -percent_treshold & q3piHz_q4piHZ <= -percent_treshold ~"desc_q1q2q3",
q1piHz_q2piHZ <= -percent_treshold & q2piHz_q3piHZ <= -percent_treshold & between(q3piHz_q4piHZ, -percent_treshold,percent_treshold) ~"desc_q1q2",
q1piHz_q2piHZ between(q1piHz_q2piHZ, -percent_treshold,percent_treshold) & between(q2piHz_q3piHZ, -percent_treshold,percent_treshold)&between(q3piHz_q4piHZ, -percent_treshold,percent_treshold) ~ "no_circumflex",
is.na(center_Hz) ~"non_applicable"))
setDT(data2)
:= time_end - time_start]
data2[,dur order(filename,time_start), midpoint := (time_start + time_end)/2]
data2[order(filename,time_start), distancias := midpoint - shift(midpoint,1,type="lag")]
data2[order(filename,time_start), pause_prev := time_start - shift(time_end,1,type="lag")]
data2[<-data2%>%group_by(id_ip)%>%mutate(order_ip=row_number())%>%ungroup()%>%mutate(distancias := ifelse(order_ip==1,distancias-pause_prev,distancias))
data2 setDT(data2)
order(filename,time_start), ':=' (
data2[dur_percent_from_prev = ((distancias - shift(
1,type="lag"
distancias, / shift(distancias, 1,type="lag"))*100,
)) intensity_percent_from_prev = ((intensity_mean_dB - shift(
1,type="lag"
intensity_mean_dB, / shift(intensity_mean_dB, 1,type="lag")) * 100)]
)) setDF(data2)
<- data2%>%mutate_if(is.numeric, round,2)
data2 <- data2%>%group_by(id_ip)%>%mutate(maximum_Hz = ifelse(center_Hz == max(center_Hz),"yes","no"),displacement = ifelse(tonic == "yes" & lead(center_Hz) > (center_Hz+ percent_treshold) & lead(tonic)!="yes","yes","no"),displacement=ifelse(is.na(displacement),"no",displacement))%>%ungroup()
data2 <- data2%>%filter(tonic=="yes")%>%group_by(id_ip)%>%summarise(q_tonic_vowels=n())%>%mutate(multiple_tonic=ifelse(q_tonic_vowels>1,"yes","no"))
tonics <- data2%>%left_join(tonics, by="id_ip")
data2 setDT(data2)
8.4.5 MAS (Melodic Analysis of Speech)
Fco. José Cantero Serena (MAS creator)
Linguists and phonetics experts apply MAS to:
Identify melodic patterns: MAS allows for the identification and description of intonational patterns, such as rising tones, falling tones, flat tones, and variations in pitch height that can indicate questions, statements, affirmative or negative intonations, among others.
Analyze intonation structure: It helps break down discourse into intonational units, such as word groups or phrases, and label these units with specific melodic patterns.
Understand communicative intention: MAS can reveal how variations in intonation reflect the speaker’s intention, such as emphasizing information, expressing surprise, or indicating uncertainty.
Linguistic research: It is used in linguistic research to understand intonational differences in different dialects or languages, as well as to study how intonation varies in different communicative contexts.
How are calculations made?
To do this, vowels and tonal segments produced in them are identified, relying on the sonogram, and the mean value is collected—if stable—or the central value. In the case of tonic vowels, an inflection can occur, and there may be two or even three tonal segments, circumflex inflection; these values are taken from the ends of the inflection (Mateo Ruiz 2010: 51).
8.4.5.1 Micromelodic variations in Oralstats
For computing micromelodic variations within vowels, they have been split into four quarters. If the difference among them is higher than 10 percent, we consider this micromelodic variation significant and, for this reason, it can be considered a circumflex tone. The specific cases considered are described in the following chart:
8.4.5.2 Get MAS structure
We assign parts to vowels: anacrusis, body and toneme.
setDF(data2)
<- data2%>%group_by(id_ip)%>%mutate(MAS_structure=
data2 case_when(.default = "body",
=="yes"~"toneme",
toneme_word=="yes" ~"anacrusis",
first_non_tonic_iplead(first_tonic_ip)=="yes" ~"body",
lead(tonic_ip_no_first) =="yes"~"body",
=="yes" ~"body",
first_tonic_ip=="yes"~"body"
tonic_ip_no_first
))
8.4.5.3 Basic computation
setDT(data2)
<- data2[tonic=="yes" & toneme_word=="yes"]
toneme <- data2[multiple_tonic=="yes" & MAS_structure=="body",c("id_phone","time_start","time_end","id_ip","content","upos","word","ip","center_Hz","displacement","circumflex","MAS_structure","intensity_mean_dB","tonic","inflexion_percent_Hz_from_prev")]
body
<- body%>%group_by(id_ip,ip)%>%mutate(body=cumsum(inflexion_percent_Hz_from_prev))%>%summarise(body=last(body)/n())
body2
<- body%>%group_by(id_ip,ip,circumflex,tonic)%>%summarise(freq=n())%>%group_by(id_ip,ip)%>%summarise(freq_total=sum(freq[tonic=="yes"]),freq_tonic_circumflex=sum(freq[circumflex!="no_circumflex"&tonic=="yes"]))%>%ungroup()%>%mutate(body_structure = ifelse(freq_tonic_circumflex/freq_total>0.5,"yes","no"))
body_saw
<- data2[multiple_tonic=="yes" & tonic=="no" & first_non_tonic_ip=="yes",c("id_phone","id_ip","content","upos","word","ip","center_Hz","displacement")]
anacrusis_non_tonic
<- data2[multiple_tonic=="yes" & tonic=="yes" & first_tonic_ip=="yes",c("id_phone","id_ip","content","accent","upos","word","ip","center_Hz","displacement","inflexion_percent_Hz_from_prev")]
first_tonic
<- data2[multiple_tonic=="yes" & tonic=="yes" & tonic_ip_no_first =="yes",c("id_phone","id_ip","accent","content","upos","word","ip","center_Hz","displacement","inflexion_percent_Hz_from_prev")]
anacrusis_tonic
<- data2[tonic=="yes"&toneme_word=="yes",c("id_phone","id_ip","ip","content","word","accent","center_Hz","q1piHz_q4piHZ","circumflex","displacement","inflexion_percent_Hz_from_prev","inflexion_percent_Hz_to_next","multiple_tonic","intensity_mean_dB","time_start","time_end","id_word")]
MAS
setDF(MAS)
<- MAS%>%left_join(anacrusis_non_tonic%>%select(center_Hz,displacement,word,id_ip)%>%rename(anacrusis_center_Hz = center_Hz,anacrusis_non_tonic_word=word,anacrusis_displacement = displacement),by="id_ip")
MAS
<- MAS%>%left_join(body_saw,by="id_ip")
MAS
<- MAS%>%left_join(anacrusis_tonic%>%select(center_Hz,displacement,word,id_ip,inflexion_percent_Hz_from_prev)%>%rename(anacrusis_tonic_center_Hz = center_Hz,anacrusis_tonic_inflexion_percent_Hz_from_prev = inflexion_percent_Hz_from_prev,anacrusis_tonic_displacement = displacement,anacrusis_tonic_word=word),by="id_ip")
MAS <- MAS%>%left_join(first_tonic%>%select(center_Hz,displacement,word,id_ip,inflexion_percent_Hz_from_prev)%>%rename(
MAS first_tonic_inflexion_percent_Hz_from_prev = inflexion_percent_Hz_from_prev,
first_tonic_center_Hz = center_Hz,first_tonic_displacement = displacement,first_tonic_word=word),by="id_ip")
<- MAS%>%mutate(anacrusis= anacrusis_tonic_inflexion_percent_Hz_from_prev,anacrusis= ifelse(is.na(anacrusis), first_tonic_inflexion_percent_Hz_from_prev,anacrusis),body = ((center_Hz - first_tonic_center_Hz)/first_tonic_center_Hz)*100, body=ifelse(is.na(body), ((center_Hz - anacrusis_tonic_center_Hz)/anacrusis_tonic_center_Hz)*100,body),toneme=ifelse(accent=="oxitone"& !grepl("[nl]$",word),q1piHz_q4piHZ,inflexion_percent_Hz_to_next))%>%mutate_if(is.numeric,round,2)%>%mutate(relocated = ifelse((center_Hz < first_tonic_center_Hz +10 & center_Hz > first_tonic_center_Hz -10) |(center_Hz < anacrusis_tonic_center_Hz +10 & center_Hz > anacrusis_tonic_center_Hz -10),"yes","no"),anacrusis_displacement= first_tonic_displacement,anacrusis_displacement=ifelse(is.na(anacrusis_displacement),anacrusis_tonic_displacement,anacrusis_displacement),anacrusis_displacement=ifelse(is.na(anacrusis_displacement),"no",anacrusis_displacement))
MAS
<- MAS %>%mutate(body_2=body)%>%select(-body)%>%left_join(body2,by="id_ip")%>%mutate_if(is.numeric,round,2)
MAS
<- MAS%>%distinct(id_phone, .keep_all = TRUE) MAS
8.4.5.4 Standard values
<- data2 %>%
data2 group_by(id_ip) %>%
arrange(filename, time_start) %>%
mutate(
standard_q1_pitch = cumprod(c(100, (1 + ((((q1piHz - shift(q1piHz, 1,type="lag")) / shift(q1piHz, 1,type="lag")) * 100) / 100))[-1])),
standard_q2_pitch = cumprod(c(100, (1 + ((((q2piHz - shift(q2piHz, 1,type="lag")) / shift(q2piHz, 1,type="lag")) * 100) / 100))[-1])),
standard_q3_pitch = cumprod(c(100, (1 + ((((q3piHz - shift(q3piHz, 1,type="lag")) / shift(q3piHz, 1,type="lag")) * 100) / 100))[-1])),
standard_q4_pitch = cumprod(c(100, (1 + ((((q4piHz - shift(q4piHz, 1,type="lag")) / shift(q4piHz, 1,type="lag")) * 100) / 100))[-1])),
standard_pitch = cumprod(c(100, (1 + (inflexion_percent_Hz_from_prev / 100))[-1])),
standard_dur = cumprod(c(100, (1 + (dur_percent_from_prev / 100))[-1])),
standard_intensity = cumprod(c(100, (1 + (intensity_percent_from_prev / 100))[-1]))
%>%mutate_if(is.numeric,round,2)%>%
) ungroup()%>%relocate(standard_pitch,.after=center_Hz)
8.4.5.5 Assigning MAS patterns
Anacrusis | Displacement | Body | multiple_tonic | body_structure | Toneme | relocated | circumflex | Pattern |
(anacrusis<40 | is.na(anacrusis)) | no | between(body,-60,-10) | between(toneme,-40,15) | no | PI | |||
(anacrusis<40 | is.na(anacrusis)) | no | between(body,-60,-10) | >70 | no | PII | |||
(anacrusis<40 | is.na(anacrusis)) | yes | between(body,-60,-10) | between(toneme,40,60) | no | PIII | |||
(anacrusis<40 | is.na(anacrusis)) | yes | between(body,-60,-10) | asc_desc, desc_q3, desc_q2 |
PIVa | ||||
between(body,-9,10) | asc_desc, desc_q3, desc_q2 |
PIVb | ||||||
(anacrusis<40 | is.na(anacrusis)) | no | between(body,-60,-10) | between(toneme,-9,9) | PV | ||||
(anacrusis<40 | is.na(anacrusis)) | no | between(body,-60,-10) | between(toneme,15,70) | PVIa | ||||
(anacrusis<40 | is.na(anacrusis)) | yes | between(body,-60,-10) | between(toneme,15,40) | PVIb | ||||
(anacrusis<40 | is.na(anacrusis)) | yes | toneme < -10 | yes | PVII | ||||
(anacrusis<40 | is.na(anacrusis)) | no | no | PVIII | |||||
(anacrusis<40 | is.na(anacrusis)) | no | between(body,-60,-10) | toneme < -40 | PIX | ||||
(anacrusis<40 | is.na(anacrusis)) | yes | between(body,-60,-10) | asc_desc, desc_q3, desc_q2 |
PXa | ||||
(anacrusis<40 | is.na(anacrusis)) | yes | yes | desc_asc, asc_q3, asc_q2, asc_desc_asc |
PXb | ||||
(anacrusis<40 | is.na(anacrusis)) | yes | toneme >60 | yes | PXI | ||||
(anacrusis<40 | is.na(anacrusis)) | yes | between(body,-9,10) | no | toneme < -10 | PXIIa | |||
between(body,-9,10) | yes | toneme< -10 | PXIIb | |||||
between(body,-60,-10) | yes | toneme< -10 | PXIIc | |||||
between(body,20,140) | toneme >40 | PXIII |
setDF(MAS)
# First round
<- MAS %>%
MAS mutate(
pattern = case_when(
between(anacrusis,-10,40) & anacrusis_displacement == "no")| is.na(anacrusis)) & (between(body_2,-80,10)|is.na(body)) & between(toneme, -40, 15) ~ "PI",
((between(anacrusis,-10,40) & anacrusis_displacement == "no")| is.na(anacrusis)) & (between(body_2,-80,10)|is.na(body)) & toneme > 70 ~ "PII",
((between(anacrusis,-10,40) & anacrusis_displacement == "no")| is.na(anacrusis)) & (between(body_2,-80,10)) & between(toneme, 40, 60) ~ "PIII",
((between(anacrusis,-10,40) & anacrusis_displacement == "no")| is.na(anacrusis)) & (between(body_2,-80,10)|is.na(body)) & circumflex %in% c("asc_desc", "desc_q3","desc_q2") ~ "PIVa",
((is.na(anacrusis) | between(anacrusis,-10,10)) & between(body_2, -3, 10) & circumflex %in% c("asc_desc","desc_q3","desc_q2") ~ "PIVb",
(between(anacrusis,-10,40) & anacrusis_displacement == "no")| is.na(anacrusis)) & (between(body_2,-80,10)|is.na(body)) & (between(toneme, -9, 9) | (between(inflexion_percent_Hz_from_prev,-5,10) & between(toneme,-5,5))) ~ "PV",
((between(anacrusis,-10,40) & anacrusis_displacement == "no")| is.na(anacrusis)) & (between(body_2,-80,10)|is.na(body)) & between(toneme, 15, 70) ~ "PVIa",
((between(anacrusis,-10,40) & anacrusis_displacement == "yes")| is.na(anacrusis)) & (between(body_2,-80,10)|is.na(body)) & between(toneme, 15, 60) ~ "PVIb",
((between(anacrusis,-10,40) & anacrusis_displacement == "yes")| is.na(anacrusis)) & relocated == "yes" & toneme < -10 ~ "PVII",
((is.na(anacrusis) & (between(body_2,-80,10)|is.na(body)) & multiple_tonic=="no" & !is.na(toneme) & toneme !="NaN" ~ "PVIII",
between(anacrusis,-10,40) & anacrusis_displacement == "yes")| is.na(anacrusis)) & (between(body_2,-80,10)|is.na(body)) & toneme < -40 ~ "PIX",
((between(anacrusis,-10,40) & anacrusis_displacement == "yes")| is.na(anacrusis))& between(toneme, -10, 60) & relocated == "yes" & circumflex %in% c("asc_desc","desc_q3","desc_q2") ~ "PXa",
((between(anacrusis,-10,40) & anacrusis_displacement == "yes")| is.na(anacrusis)) & relocated == "yes" & circumflex %in% c("desc_asc","asc_q3","asc_q2","asc_desc_asc") ~ "PXb",
((between(anacrusis,-10,40) & anacrusis_displacement == "yes")| is.na(anacrusis)) & between(body_2, -9, 10) & toneme < -10 ~ "PXI",
((between(anacrusis,-10,40) & anacrusis_displacement == "yes")| is.na(anacrusis)) & between(body_2, -3, 10) & toneme < -10 & body_structure == "no" ~ "PXIIa",
((is.na(anacrusis) & between(body_2, -3, 10) & toneme < -10 & body_structure == "yes" ~ "PXIIb",
is.na(anacrusis) & (between(body,-80,10)|is.na(body)) & toneme < -10 & body_structure == "yes" ~ "PXIIc",
between(body_2, 15, 140) & toneme > 40 ~ "PXIII",
is.na(toneme) | toneme=="NaN" ~ "undefined_toneme"
)
)
# Second round (uncomment below if you want to apply; it gets a more general approach)
# MAS <- MAS%>%mutate(
#
# pattern= ifelse(is.na(pattern) &relocated=="yes"&anacrusis <0 & toneme<0,"PVII",pattern),
# pattern= ifelse(is.na(pattern) &anacrusis <0 & inflexion_percent_Hz_from_prev > 60 & toneme<0 & body_2>60 & body<10,"PIII",pattern),
# pattern = ifelse(is.na(pattern) & anacrusis <0 & body_2< -20 & body < 0 & between(toneme,-10,10), "PI+",pattern),
# pattern = ifelse(is.na(pattern) &freq_tonic_circumflex >= 1 & between(toneme,-10,10) &multiple_tonic=="yes" & body_2 < -20,"PXIIc",pattern),
# pattern = ifelse(is.na(pattern) & between(toneme,-15,30) &multiple_tonic=="yes" & body_2 < -10 & between(anacrusis,-15,40) & between(inflexion_percent_Hz_from_prev,-20,30),"PI",pattern),
# pattern = ifelse(is.na(pattern) & between(toneme,-15,30) & body_2 > 0 & between(anacrusis,-15,40) ,"PIII",pattern),
# pattern = ifelse(is.na(pattern) & between(toneme,-15,30) & body_2 > 0 & between(anacrusis,-15,70) & relocated=="yes","PXb",pattern),
# pattern = ifelse(is.na(pattern) & accent=="oxitone"&inflexion_percent_Hz_from_prev < -30,"PIX",pattern),
# pattern = ifelse(is.na(pattern) & between(inflexion_percent_Hz_from_prev,-5,10) & between(toneme,-5,5) ,"PV",pattern),
# pattern = ifelse(is.na(pattern)&is.na(anacrusis)&body>10&toneme>10,"PIII",pattern),
# pattern = ifelse(is.na(pattern)&multiple_tonic=="yes"&body_2 < 0 &toneme < 0,"PIX",pattern),
# pattern = ifelse(is.na(pattern)&multiple_tonic=="yes"&body_structure=="yes"&body_2 < 0 &toneme > 5,"PXIId",pattern),
# pattern = ifelse(is.na(pattern) & body_2 < 0 & between(inflexion_percent_Hz_from_prev,-9.9,9.9) & between(toneme,-4.9,4.9),"PV",pattern),
# pattern = ifelse(is.na(pattern) & body_2 < 0 & between(inflexion_percent_Hz_from_prev,-9.9,9.9) & between(toneme,-4.9,4.9),"PV",pattern),
# pattern = ifelse(is.na(pattern) & freq_total>4 &body_2 < 0 & inflexion_percent_Hz_from_prev< -20 &toneme>20,"PXI",pattern),
# pattern = ifelse(is.na(pattern) &body > 10 & inflexion_percent_Hz_from_prev< -10 &between(toneme,-5,5),"PXIb",pattern),
# pattern = ifelse(is.na(pattern) &anacrusis > 5 & body_2 < -10 & inflexion_percent_Hz_from_prev< -5 &toneme>30,"PXb",pattern),
# pattern = ifelse(is.na(pattern) & body_2 > 10 &toneme>30,"PXI",pattern),
# pattern = ifelse(is.na(pattern) & relocated=="yes" & toneme> 0, "PXI",pattern),
# pattern = ifelse(is.na(pattern) &toneme < 0 & inflexion_percent_Hz_from_prev>0,"PXa",pattern ),
# pattern = ifelse(is.na(pattern) &toneme > 0 & inflexion_percent_Hz_from_prev<0,"PXb",pattern ),
# pattern = ifelse(is.na(pattern) & between(toneme,20,60) & between(body_2,-10,10),"PIII",pattern ),
# pattern = ifelse(is.na(pattern) & between(toneme,10,20) & inflexion_percent_Hz_from_prev>10,"PIII",pattern ),
# pattern = ifelse(is.na(pattern) & between(toneme,0,10) & between(body_2,10,100) & inflexion_percent_Hz_from_prev>10,"Xb",pattern ),
# pattern = ifelse(is.na(pattern) & between(toneme,0,50) & between(body_2,-20,20) ,"Xb",pattern ),
# pattern= ifelse(is.na(pattern) & between(toneme,-100,30), "PI",pattern),
# pattern= ifelse(is.na(pattern) & toneme >30, "PII",pattern)
# )
<- MAS %>% mutate( MAS_phonological= ifelse(pattern=="PI","assertion",NA))
MAS <- MAS %>% mutate( MAS_phonological = ifelse(pattern %in% c("PII","PIII","PIVa","PIVb"), "question",MAS_phonological))
MAS <- MAS %>% mutate( MAS_phonological = ifelse( pattern %in% c("PV","PVIa","VIb"), "continuous",MAS_phonological))
MAS <- MAS %>% mutate( MAS_phonological = ifelse(is.na(MAS_phonological),"emphatic",MAS_phonological ))
MAS
rm(anacrusis_non_tonic,anacrusis_tonic,body,body_saw,body2,first_tonic,toneme,tonics,percent_treshold)
<- MAS%>%group_by(pattern)%>%mutate(order_pattern_group = row_number())%>%ungroup()
MAS
<- data2%>%left_join(MAS%>%select(id_ip,pattern,order_pattern_group), by="id_ip") data2
8.4.5.6 Frequent words anacrusis
<- MAS%>%group_by(anacrusis_non_tonic_word)%>%summarise(freq=n())%>%filter(freq >1,!is.na(anacrusis_non_tonic_word))%>%arrange(desc(freq))%>%head(n=1)
anacrusis_words
<-ggplot(anacrusis_words, aes(x=anacrusis_non_tonic_word, y = freq, fill = anacrusis_non_tonic_word)) +
anacrusis_barplot geom_bar(stat="identity")+theme_minimal()
anacrusis_barplot
8.4.5.7 Frequent words toneme
<- MAS%>%group_by(word)%>%summarise(freq=n())%>%filter(freq >1,!is.na(word))%>%arrange(desc(freq))%>%head(n=1)
toneme_words
<-ggplot(toneme_words, aes(x=word, y = freq, fill = word)) +
anacrusis_barplot geom_bar(stat="identity")+theme_minimal()
anacrusis_barplot
8.4.5.8 Examples of patterns
8.4.5.8.1 Pattern I
setDF(data2)
<- data2%>%filter(pattern=="PI",order_pattern_group==1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))
data3q1 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))
data3q4
<- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse(
dataplot between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
<- dataplot
plot_data <- ggplot() +
p1 # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
# geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
# geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
<- gridExtra::ttheme_default(
mytheme core = list(fg_params=list(cex = 0.6)),
colhead = list(fg_params=list(cex = 0.6)),
rowhead = list(fg_params=list(cex = 0.6)))
<- plot_data %>%
p2 mutate(
Hz = standard_pitch,
dB = standard_intensity,
dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange(time_start)%>% select(Hz,dB,dur,content)
<- t(p2)
p3
if (nrow(p2) > 0) {
<- tableGrob(p3, theme = mytheme)
table_grob
else {
} <- data.frame (Hz = "0",
empty_df dB = "0",
dur = "0",
content ="0"
)<- tableGrob(empty_df, theme = mytheme)
table_grob
}
# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))
# pivot_longer(c("Hz","dB","dur"), names_to = "layer", values_to = "label") %>%
# ggplot(aes(x = time_center_table)) +
# geom_text(aes(y = factor(layer, c("Hz","dB","dur")), label = label), size = 1.4) +
# labs(y = "", x = NULL) +
# theme_grey()+
# theme(axis.line = element_blank(), axis.ticks = element_blank(), axis.text.x = element_blank(),
# panel.grid = element_blank(),legend.text = element_text(size=1), strip.text = element_blank())
#
# p3 <- p1 + p2 + plot_layout(ncol = 1,heights = c(5,1))
# p3
8.4.5.8.2 Pattern II
setDF(data2)
<- data2%>%filter(pattern=="PII",order_pattern_group==1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))
data3q1 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))
data3q4
<- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse(
dataplot between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
<- dataplot
plot_data <- ggplot() +
p1 # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
# geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
# geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
<- plot_data %>% mutate( Hz = standard_pitch, dB = standard_intensity, dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
p2 <- t(p2)
p3
if (nrow(p2) > 0) {
<- tableGrob(p3, theme = mytheme)
table_grob
else {
} <- data.frame (Hz = "0",
empty_df dB = "0",
dur = "0",
content ="0"
)<- tableGrob(empty_df, theme = mytheme)
table_grob
}
# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))
8.4.5.8.3 Pattern III
setDF(data2)
<- data2%>%filter(pattern=="PIII",order_pattern_group==1,(time_end_ip-time_start_ip)>1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q1 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)
data3q4
<- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse(
dataplot between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
<- dataplot
plot_data <- ggplot() +
p1 # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
# geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
# geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
<- plot_data %>% mutate( Hz = standard_pitch, dB = standard_intensity, dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
p2 <- t(p2)
p3
if (nrow(p2) > 0) {
<- tableGrob(p3, theme = mytheme)
table_grob
else {
} <- data.frame (Hz = "0",
empty_df dB = "0",
dur = "0",
content ="0"
)<- tableGrob(empty_df, theme = mytheme)
table_grob
}
# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))
8.4.5.8.4 Pattern IVa
setDF(data2)
<- data2%>%filter(pattern=="PIVa",order_pattern_group==1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q1 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)
data3q4
<- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse(
dataplot between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
<- dataplot
plot_data <- ggplot() +
p1 # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
# geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
# geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
<- plot_data %>% mutate( Hz = standard_pitch, dB = standard_intensity, dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
p2 <- t(p2)
p3
if (nrow(p2) > 0) {
<- tableGrob(p3, theme = mytheme)
table_grob
else {
} <- data.frame (Hz = "0",
empty_df dB = "0",
dur = "0",
content ="0"
)<- tableGrob(empty_df, theme = mytheme)
table_grob
}
# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))
8.4.5.8.5 Pattern IVb
setDF(data2)
<- data2%>%filter(pattern=="PIVb",order_pattern_group==1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q1 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)
data3q4
<- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse(
dataplot between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
<- dataplot
plot_data <- ggplot() +
p1 # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
# geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
# geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
<- plot_data %>% mutate( Hz = standard_pitch, dB = standard_intensity, dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
p2 <- t(p2)
p3
if (nrow(p2) > 0) {
<- tableGrob(p3, theme = mytheme)
table_grob
else {
} <- data.frame (Hz = "0",
empty_df dB = "0",
dur = "0",
content ="0"
)<- tableGrob(empty_df, theme = mytheme)
table_grob
}
# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))
8.4.5.8.6 Pattern V
setDF(data2)
<- data2%>%filter(pattern=="PV",order_pattern_group==1,(time_end_ip-time_start_ip)>1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q1 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)
data3q4
<- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse(
dataplot between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
<- dataplot
plot_data <- ggplot() +
p1 # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
# geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
# geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
<- plot_data %>% mutate( Hz = standard_pitch, dB = standard_intensity, dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
p2 <- t(p2)
p3
if (nrow(p2) > 0) {
<- tableGrob(p3, theme = mytheme)
table_grob
else {
} <- data.frame (Hz = "0",
empty_df dB = "0",
dur = "0",
content ="0"
)<- tableGrob(empty_df, theme = mytheme)
table_grob
}
# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))
8.4.5.8.7 Pattern VIa
setDF(data2)
<- data2%>%filter(pattern=="PVIa",order_pattern_group==1,(time_end_ip-time_start_ip)>1,(time_end_ip-time_start_ip)>1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q1 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)
data3q4
<- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse(
dataplot between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
<- dataplot
plot_data <- ggplot() +
p1 # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
# geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
# geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
<- plot_data %>% mutate( Hz = standard_pitch, dB = standard_intensity, dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
p2 <- t(p2)
p3
if (nrow(p2) > 0) {
<- tableGrob(p3, theme = mytheme)
table_grob
else {
} <- data.frame (Hz = "0",
empty_df dB = "0",
dur = "0",
content ="0"
)<- tableGrob(empty_df, theme = mytheme)
table_grob
}# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))
8.4.5.8.8 Pattern VIb
setDF(data2)
<- data2%>%filter(pattern=="PVIb",order_pattern_group==1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q1 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)
data3q4
<- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse(
dataplot between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
<- dataplot
plot_data <- ggplot() +
p1 # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
# geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
# geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
<- plot_data %>% mutate( Hz = standard_pitch, dB = standard_intensity, dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
p2 <- t(p2)
p3
if (nrow(p2) > 0) {
<- tableGrob(p3, theme = mytheme)
table_grob
else {
} <- data.frame (Hz = "0",
empty_df dB = "0",
dur = "0",
content ="0"
)<- tableGrob(empty_df, theme = mytheme)
table_grob
}# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))
8.4.5.8.9 Pattern VII
setDF(data2)
<- data2%>%filter(pattern=="PVII",order_pattern_group==1)%>%na.omit()%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q1 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)
data3q4
<- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse(
dataplot between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
<- dataplot
plot_data <- ggplot() +
p1 # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
# geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
# geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
<- plot_data %>% mutate( Hz = standard_pitch, dB = standard_intensity, dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
p2 <- t(p2)
p3
if (nrow(p2) > 0) {
<- tableGrob(p3, theme = mytheme)
table_grob
else {
} <- data.frame (Hz = "0",
empty_df dB = "0",
dur = "0",
content ="0"
)<- tableGrob(empty_df, theme = mytheme)
table_grob
}# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))
8.4.5.8.10 Pattern VIII
setDF(data2)
<- data2%>%filter(pattern=="PVIII",order_pattern_group==1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q1 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)
data3q4
<- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse(
dataplot between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
<- dataplot
plot_data <- ggplot() +
p1 # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
# geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
# geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
<- plot_data %>% mutate( Hz = standard_pitch, dB = standard_intensity, dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
p2 <- t(p2)
p3
if (nrow(p2) > 0) {
<- tableGrob(p3, theme = mytheme)
table_grob
else {
} <- data.frame (Hz = "0",
empty_df dB = "0",
dur = "0",
content ="0"
)<- tableGrob(empty_df, theme = mytheme)
table_grob
}# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))
8.4.5.8.11 Pattern IX
setDF(data2)
<- data2%>%filter(pattern=="PIX",order_pattern_group==1,(time_end_ip-time_start_ip)>1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q1 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)
data3q4
<- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse(
dataplot between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
<- dataplot
plot_data <- ggplot() +
p1 # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
# geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
# geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
<- plot_data %>% mutate( Hz = standard_pitch, dB = standard_intensity, dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
p2 <- t(p2)
p3
if (nrow(p2) > 0) {
<- tableGrob(p3, theme = mytheme)
table_grob
else {
} <- data.frame (Hz = "0",
empty_df dB = "0",
dur = "0",
content ="0"
)<- tableGrob(empty_df, theme = mytheme)
table_grob
}# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))
8.4.5.8.12 Pattern Xa
setDF(data2)
<- data2%>%filter(pattern=="PXa",order_pattern_group==1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q1 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)
data3q4
<- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse(
dataplot between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
<- dataplot
plot_data <- ggplot() +
p1 # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
# geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
# geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
<- plot_data %>% mutate( Hz = standard_pitch, dB = standard_intensity, dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
p2 <- t(p2)
p3
if (nrow(p2) > 0) {
<- tableGrob(p3, theme = mytheme)
table_grob
else {
} <- data.frame (Hz = "0",
empty_df dB = "0",
dur = "0",
content ="0"
)<- tableGrob(empty_df, theme = mytheme)
table_grob
}# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))
8.4.5.8.13 Pattern Xb
setDF(data2)
<- data2%>%filter(pattern=="PXb",order_pattern_group==1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q1 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)
data3q4
<- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse(
dataplot between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
<- dataplot
plot_data <- ggplot() +
p1 # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
# geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
# geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
<- plot_data %>% mutate( Hz = standard_pitch, dB = standard_intensity, dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
p2 <- t(p2)
p3
if (nrow(p2) > 0) {
<- tableGrob(p3, theme = mytheme)
table_grob
else {
} <- data.frame (Hz = "0",
empty_df dB = "0",
dur = "0",
content ="0"
)<- tableGrob(empty_df, theme = mytheme)
table_grob
}# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))
8.4.5.8.14 Pattern XI
setDF(data2)
<- data2%>%filter(pattern=="PXI",order_pattern_group==1,(time_end_ip-time_start_ip)>1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q1 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)
data3q4
<- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse(
dataplot between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
<- dataplot
plot_data <- ggplot() +
p1 # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
# geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
# geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
<- plot_data %>% mutate( Hz = standard_pitch, dB = standard_intensity, dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
p2 <- t(p2)
p3
if (nrow(p2) > 0) {
<- tableGrob(p3, theme = mytheme)
table_grob
else {
} <- data.frame (Hz = "0",
empty_df dB = "0",
dur = "0",
content ="0"
)<- tableGrob(empty_df, theme = mytheme)
table_grob
}# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))
8.4.5.8.15 Pattern XIIa
setDF(data2)
<- data2%>%filter(pattern=="PXIIa",order_pattern_group==1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q1 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)
data3q4
<- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse(
dataplot between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
<- dataplot
plot_data <- ggplot() +
p1 # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
# geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
# geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
<- plot_data %>% mutate( Hz = standard_pitch, dB = standard_intensity, dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
p2 <- t(p2)
p3
if (nrow(p2) > 0) {
<- tableGrob(p3, theme = mytheme)
table_grob
else {
} <- data.frame (Hz = "0",
empty_df dB = "0",
dur = "0",
content ="0"
)<- tableGrob(empty_df, theme = mytheme)
table_grob
}# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))
8.4.5.8.16 Pattern XIIb
setDF(data2)
<- data2%>%filter(pattern=="PXIIb",order_pattern_group==1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q1 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)
data3q4
<- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse(
dataplot between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
<- dataplot
plot_data <- ggplot() +
p1 # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
# geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
# geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
<- plot_data %>% mutate( Hz = standard_pitch, dB = standard_intensity, dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
p2 <- t(p2)
p3
if (nrow(p2) > 0) {
<- tableGrob(p3, theme = mytheme)
table_grob
else {
} <- data.frame (Hz = "0",
empty_df dB = "0",
dur = "0",
content ="0"
)<- tableGrob(empty_df, theme = mytheme)
table_grob
}# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))
8.4.5.8.17 Pattern XIII
setDF(data2)
<- data2%>%filter(pattern=="PXIII",order_pattern_group==1,(time_end_ip-time_start_ip)>1,(time_end_ip-time_start_ip)>1)%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))%>%arrange(time_start)
data3q1 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))%>%arrange(time_start)
data3q4
<- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse(
dataplot between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
<- dataplot
plot_data <- ggplot() +
p1 # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
# geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
# geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", dataplot$id_ip,dataplot$ip,dataplot$pattern)) + theme_light() +
theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
<- plot_data %>% mutate( Hz = standard_pitch, dB = standard_intensity, dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange((time_start))%>% select(Hz,dB,dur,content)
p2 <- t(p2)
p3
if (nrow(p2) > 0) {
<- tableGrob(p3, theme = mytheme)
table_grob
else {
} <- data.frame (Hz = "0",
empty_df dB = "0",
dur = "0",
content ="0"
)<- tableGrob(empty_df, theme = mytheme)
table_grob
}# Combine plot and table
grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))
8.4.6 TOBI (Tones Break Indices)
<- 1.2
percent_treshold_ST
<- data2%>%filter(tonic=="yes")%>%mutate(center_ST = 12*log2(center_Hz/1),
TOBI circumflex_st =
case_when(
<= -percent_treshold_ST & q2piHz_q3piHZ_ST >= percent_treshold_ST & q3piHz_q4piHZ_ST <= -percent_treshold_ST ~ "LHL",
q1piHz_q2piHZ_ST >= percent_treshold_ST & q2piHz_q3piHZ_ST <= -percent_treshold_ST & q3piHz_q4piHZ_ST >= percent_treshold_ST ~ "HLH",
q1piHz_q2piHZ_ST >= percent_treshold_ST & between(q2piHz_q3piHZ_ST,-percent_treshold_ST,percent_treshold_ST) & q3piHz_q4piHZ_ST <= -percent_treshold_ST ~ "HL",
q1piHz_q2piHZ_ST >= percent_treshold_ST & q2piHz_q3piHZ_ST <= -percent_treshold_ST & q3piHz_q4piHZ_ST <= -percent_treshold_ST ~ "HL",
q1piHz_q2piHZ_ST >= percent_treshold_ST & between(q2piHz_q3piHZ_ST,-percent_treshold_ST,percent_treshold_ST) & between(q3piHz_q4piHZ_ST,-percent_treshold_ST,percent_treshold_ST) ~ "H",
q1piHz_q2piHZ_ST between(q1piHz_q2piHZ_ST,-percent_treshold_ST,percent_treshold_ST)& q2piHz_q3piHZ_ST >= percent_treshold_ST &between(q3piHz_q4piHZ_ST,-percent_treshold_ST,percent_treshold_ST) ~ "H",
between(q1piHz_q2piHZ_ST,-percent_treshold_ST,percent_treshold_ST)& between(q2piHz_q3piHZ_ST,-percent_treshold_ST,percent_treshold_ST) & q3piHz_q4piHZ_ST >= percent_treshold_ST ~ "H",
<= -percent_treshold_ST & between(q2piHz_q3piHZ_ST,-percent_treshold_ST,percent_treshold_ST) & between(q3piHz_q4piHZ_ST,-percent_treshold_ST,percent_treshold_ST) ~ "L",
q1piHz_q2piHZ_ST <= -percent_treshold_ST & between(q2piHz_q3piHZ_ST,-percent_treshold_ST,percent_treshold_ST) & q3piHz_q4piHZ_ST <= -percent_treshold_ST ~ "L>",
q1piHz_q2piHZ_ST between(q1piHz_q2piHZ_ST,-percent_treshold_ST,percent_treshold_ST)& q2piHz_q3piHZ_ST <= -percent_treshold_ST &between(q3piHz_q4piHZ_ST,-percent_treshold_ST,percent_treshold_ST) ~ "L",
between(q1piHz_q2piHZ_ST,-percent_treshold_ST,percent_treshold_ST)& between(q2piHz_q3piHZ_ST,-percent_treshold_ST,percent_treshold_ST) & q3piHz_q4piHZ_ST <= -percent_treshold_ST ~ "L",
between(q1piHz_q2piHZ_ST,-percent_treshold_ST,percent_treshold_ST) & q2piHz_q3piHZ_ST >= percent_treshold_ST & q3piHz_q4piHZ_ST >= percent_treshold_ST ~ "H>",
>= percent_treshold_ST & q2piHz_q3piHZ_ST >= percent_treshold_ST & between(q3piHz_q4piHZ_ST,-percent_treshold_ST,percent_treshold_ST) ~ "H>",
q1piHz_q2piHZ_ST >= percent_treshold_ST & q2piHz_q3piHZ_ST >= percent_treshold_ST & q3piHz_q4piHZ_ST >= percent_treshold_ST ~"H>",
q1piHz_q2piHZ_ST <= -percent_treshold_ST & q2piHz_q3piHZ_ST <= -percent_treshold_ST & q3piHz_q4piHZ_ST <= -percent_treshold_ST ~"L>",
q1piHz_q2piHZ_ST <= -percent_treshold_ST & q2piHz_q3piHZ_ST <= -percent_treshold_ST & between(q3piHz_q4piHZ_ST, -percent_treshold_ST,percent_treshold_ST) ~"L>",
q1piHz_q2piHZ_ST between(q1piHz_q2piHZ_ST, -percent_treshold_ST,percent_treshold_ST) & between(q2piHz_q3piHZ_ST, -percent_treshold_ST,percent_treshold_ST)&between(q3piHz_q4piHZ_ST, -percent_treshold_ST,percent_treshold_ST) ~ "no_circumflex",
is.na(center_Hz) ~"non_applicable"))
<- TOBI%>%mutate(
TOBI
TOBI_pattern=
case_when(.default="unchanged",
< -percent_treshold_ST & between(inflexion_ST_to_next,-percent_treshold_ST,percent_treshold_ST) ~ "H+L*",
inflexion_ST_from_prev > percent_treshold_ST & between(inflexion_ST_to_next,-percent_treshold_ST,percent_treshold_ST) ~ "L+H*",
inflexion_ST_from_prev < -percent_treshold_ST & between(inflexion_ST_from_prev,-percent_treshold_ST,percent_treshold_ST) ~ "H*+L",
inflexion_ST_to_next > percent_treshold_ST & between(inflexion_ST_from_prev,-percent_treshold_ST,percent_treshold_ST) ~ "L*+H",
inflexion_ST_to_next < -percent_treshold_ST & inflexion_ST_to_next > percent_treshold_ST &MAS_structure=="toneme" ~ "H+L*H%",
inflexion_ST_from_prev > percent_treshold_ST & inflexion_ST_to_next < -percent_treshold_ST&MAS_structure=="toneme" ~ "L+H*L%",
inflexion_ST_from_prev > percent_treshold_ST & inflexion_ST_to_next > -percent_treshold_ST &MAS_structure=="toneme" ~ "L+H*H%",
inflexion_ST_from_prev < -percent_treshold_ST & inflexion_ST_to_next < -percent_treshold_ST &MAS_structure=="toneme" ~ "H+L*L%",
inflexion_ST_from_prev < -percent_treshold_ST & inflexion_ST_to_next > percent_treshold_ST &MAS_structure!="toneme" ~ "H+L*H-",
inflexion_ST_from_prev > percent_treshold_ST & inflexion_ST_to_next < -percent_treshold_ST&MAS_structure!="toneme" ~ "L+H*L-",
inflexion_ST_from_prev > percent_treshold_ST & inflexion_ST_to_next > -percent_treshold_ST &MAS_structure!="toneme" ~ "L+H*H-",
inflexion_ST_from_prev < -percent_treshold_ST & inflexion_ST_to_next < -percent_treshold_ST &MAS_structure!="toneme" ~ "H+L*L-"
inflexion_ST_from_prev TOBI_pattern= ifelse(is.na(TOBI_pattern)& !is.na(circumflex_st)&circumflex_st!="no_circumflex", circumflex_st, TOBI_pattern))
),
<- data2%>%left_join(TOBI%>%select(id_word,TOBI_pattern),by="id_word") data2
8.4.6.1 Results
<- TOBI%>%filter(MAS_structure=="toneme",tonic=="yes")%>%group_by(TOBI_pattern)%>%summarise(freq=n())
TOBI_barplot <-ggplot(TOBI_barplot, aes(x=TOBI_pattern, y = freq, fill = TOBI_pattern)) +
p_barplot geom_bar(stat="identity")+theme_minimal()
p_barplot
8.4.6.2 Comparing TOBI and MAS toneme patterns
<- data2%>%filter(MAS_structure=="toneme"&tonic=="yes")%>%select(pattern,TOBI_pattern)%>%filter(!is.na(pattern),!is.na(TOBI_pattern),pattern!="undefined_toneme")
TOBI_MAS
<-ggplot(TOBI_MAS, aes(x=TOBI_pattern, fill = pattern)) +
p2_barplot geom_bar(stat="count")+theme_minimal()
p2_barplot
8.5 Ips
<- words %>%arrange(filename,time_start)%>% mutate(pause = lead(time_start) - time_end,changeip = ifelse(pause > 0.15 | lead(filename)!=filename, "change", NA)) %>% group_by(filename, spk, changeip)
ips # %>%
# mutate(id_ip = row_number(),
# id_ip = ifelse(
# changeip != "change",
# NA,
# paste(spk, time_start, time_end, id_ip, sep = "_")
# )) %>%ungroup()
# ips <- ips%>% fill(id_ip, .direction = "up")
<- ips %>% group_by(id_ip) %>%
ips mutate(
time_start_ip = first(time_start_word, na.rm = TRUE),
time_end_ip = last(time_end_word, na.rm = TRUE), dur= time_end_ip - time_start_ip,
s_rate = n_distinct(id_word)/dur
%>% ungroup() %>%arrange(filename,time_start_ip)
)
<- ips%>% group_by(filename,spk, id_word) %>% summarise(tier_name=first(tier_name),time_start=first(time_start_ip),time_end=last(time_end_ip),id_ip = first(id_ip), word =first(content)) %>% ungroup()
ips <- ips%>%arrange(filename,time_start) %>% group_by(filename, id_ip) %>% mutate(content = paste(word, collapse = " ")) %>%ungroup()
ips <- ips %>% group_by(filename, id_ip) %>% summarise(time_start=first(time_start),time_end=first(time_end),content=first(content),spk=first(spk),tier_name=first(tier_name))%>%ungroup()%>%mutate(time_start=as.numeric(time_start))%>%arrange(filename,time_start)
ips <- ips%>%mutate(time_start= ifelse(time_start>time_end,time_end,time_start)) ips
8.5.1 Ips with POS tagging
<- words
ips # %>%arrange(filename,time_start)%>% mutate(pause = lead(time_start) - time_end,changeip = ifelse(pause > 0.15 | lead(filename)!=filename, "change", NA))
# %>% group_by(filename, spk, changeip)
# %>%
# mutate(id_ip = row_number(),
# id_ip = ifelse(
# changeip != "change",
# NA,
# paste(spk, time_start, time_end, id_ip, sep = "_")
# )) %>%ungroup()
# ips <- ips%>% fill(id_ip, .direction = "up")
<- ips %>% group_by(id_ip) %>%
ips mutate(
time_start_ip = first(time_start_word, na.rm = TRUE),
time_end_ip = last(time_end_word, na.rm = TRUE)
%>% ungroup() %>%arrange(filename,time_start_ip)
) <- ips%>% group_by(filename,spk, id_word) %>% summarise(tier_name=first(tier_name),time_start=first(time_start_ip),time_end=last(time_end_ip),id_ip = first(id_ip), word =first(content),upos=first(upos)) %>% ungroup()
ips <- ips%>%arrange(filename,time_start) %>% group_by(filename, id_ip) %>% mutate(content = paste(word, collapse = " ")) %>%ungroup()
ips <- ips %>% group_by(filename, id_ip) %>% summarise(time_start=first(time_start),time_end=first(time_end),content=first(content),spk=first(spk),tier_name=first(tier_name),qnoun = sum(upos=="NOUN",na.rm = TRUE),qverb = sum(upos=="VERB",na.rm = TRUE),qadj = sum(upos=="ADJ",na.rm = TRUE),qsconj = sum(upos=="SCONJ",na.rm = TRUE),qcconj = sum(upos=="CCONJ",na.rm = TRUE),qadp = sum(upos=="ADP",na.rm = TRUE),qadv = sum(upos=="ADV",na.rm = TRUE),qintj = sum(upos=="INTJ",na.rm = TRUE),qaux = sum(upos=="AUX",na.rm = TRUE),qdet = sum(upos=="DET,na.rm = TRUE"),qpron = sum(upos=="PRON",na.rm = TRUE))%>%ungroup()%>%mutate(time_start=as.numeric(time_start))%>%arrange(filename,time_start)
ips <- ips%>%mutate(time_start= ifelse(time_start>time_end,time_end,time_start)) ips
8.5.2 Ips with sentiment tagging
<- words
ips_sentiment
# %>%arrange(filename,time_start)%>% mutate(pause = lead(time_start) - time_end,changeip = ifelse(pause > 0.15 | lead(filename)!=filename, "change", NA)) %>% group_by(filename, spk, changeip)
# %>%
# mutate(id_ip = row_number(),
# id_ip = ifelse(
# changeip != "change",
# NA,
# paste(spk, time_start, time_end, id_ip, sep = "_")
# )) %>%ungroup()
# ips_sentiment <- ips_sentiment%>% fill(id_ip, .direction = "up")
<- ips_sentiment %>% group_by(id_ip) %>%
ips_sentiment mutate(
time_start_ip = first(time_start_word, na.rm = TRUE),
time_end_ip = last(time_end_word, na.rm = TRUE)
%>% ungroup() %>%arrange(filename,time_start_ip)
) <- ips_sentiment%>% group_by(filename,spk, id_word) %>% summarise(tier_name=first(tier_name),time_start=first(time_start_ip),time_end=last(time_end_ip),id_ip = first(id_ip), word =first(content),sentiment=first(sentiment)) %>% ungroup()
ips_sentiment <- ips_sentiment%>%arrange(filename,time_start) %>% group_by(filename, id_ip) %>% mutate(content = paste(word, collapse = " ")) %>%ungroup()
ips_sentiment
<- ips_sentiment %>% group_by(filename, id_ip) %>% summarise(time_start=first(time_start),time_end=first(time_end),content=first(content),spk=first(spk),tier_name=first(tier_name),qfear = sum(sentiment=="miedo",na.rm = TRUE),qsurprise = sum(sentiment=="sorpresa",na.rm = TRUE),qdisgust = sum(sentiment=="asco",na.rm = TRUE),qjoy = sum(sentiment=="alegría",na.rm = TRUE),qanticip = sum(sentiment=="anticipación",na.rm = TRUE))%>%ungroup()%>%mutate(time_start=as.numeric(time_start))%>%arrange(filename,time_start)
ips_sentiment <- ips_sentiment%>%mutate(time_start= ifelse(time_start>time_end,time_end,time_start)) ips_sentiment
8.5.3 Ips with prosodic values
<- MAS %>%group_by(id_ip)%>%summarise(
ips_MAS
toneme = mean(toneme,na.rm=TRUE),
anacrusis = mean(anacrusis, na.rm=TRUE),
body = mean(body,na.rm=TRUE),
MAS_pattern = first(pattern)
%>%ungroup()
)
<- TOBI %>%filter(MAS_structure=="toneme")%>%group_by(id_ip,TOBI_pattern)%>%summarise(
ips_TOBI freq = n()
%>%ungroup()%>%rename(TOBI=TOBI_pattern)
)
<- data2 %>%group_by(id_ip)%>% summarise(
ips_prosody intensity = mean(intensity_mean_dB),
pitch_reset_from_ST = mean(inflexion_ST_from_prev,na.rm = TRUE),
pitch_reset_to_ST = mean(inflexion_ST_to_next,na.rm = TRUE),
pitch_Hz = mean(center_Hz,na.rm = TRUE),
pitch_St = mean(12*log2(center_Hz/1),na.rm = TRUE)
)
8.5.4 Complete ending unit
<- ips %>%left_join(ips_sentiment%>%select(id_ip,qjoy,qfear,qsurprise,qdisgust,qanticip),by="id_ip")%>%ungroup()
ips <- ips %>% left_join(ips_prosody%>%select(id_ip,pitch_Hz,pitch_St,intensity,pitch_reset_from_ST,pitch_reset_to_ST), by="id_ip")
ips <- ips %>% left_join(ips_MAS%>%select(id_ip,anacrusis,body,toneme,MAS_pattern), by="id_ip")%>%mutate_if(is.numeric,round,2)
ips <- ips %>% left_join(ips_TOBI%>%select(id_ip,TOBI), by="id_ip")
ips <- ips %>% mutate(time_start_ip = time_start, time_end_ip = time_end)
ips
rm(ips_MAS,ips_prosody,ips_sentiment,ips_TOBI,TOBI)
8.6 Turns
<- ips
ips <- ips %>%arrange(filename,time_start)
turnsdb $change <- ifelse(turnsdb$filename == lead(turnsdb$filename,1) & turnsdb$tier_name != lead(turnsdb$tier_name,1) ,turnsdb$id_turn <- "change",
turnsdbifelse(turnsdb$filename != lead(turnsdb$filename,1),
$id_turn <- "change",
turnsdb$id_turn <- NA))
turnsdb<- turnsdb %>% group_by(filename,change) %>% mutate(id_turn = row_number())
turnsdb $id_turn <- ifelse(turnsdb$change != "change", turnsdb$id_turn <- NA, turnsdb$id_turn <- turnsdb$id_turn)
turnsdb
# Orden de las turns
<- turnsdb %>% group_by(filename) %>%
turnsdb fill(id_turn, .direction = "up") #default direction down
# time_start de las turns
$time_start_int <- ifelse(lag(turnsdb$change == "change",1) | turnsdb$id_turn == 1, turnsdb$time_start_int <- turnsdb$time_start, turnsdb$time_start_int <- NA)
turnsdb<- turnsdb %>% group_by(filename) %>%
turnsdb fill(time_start_int, .direction = "down") #default direction down
# time_end de las turns
$time_end_int <- ifelse(turnsdb$change == "change" | turnsdb$id_turn == 1, turnsdb$time_end_int <- turnsdb$time_end, turnsdb$time_end_int <- NA)
turnsdb<- turnsdb %>% group_by(filename) %>%
turnsdb fill(time_end_int, .direction = "up") #default direction down
# Duración de las turns
$dur_int <- turnsdb$time_end_int - turnsdb$time_start_int
turnsdb
<- turnsdb %>% group_by(filename,id_turn) %>% mutate(ordgeint = row_number())
turnsdb
#Creación de las pauses
$pause <- ifelse(turnsdb$filename == lead(turnsdb$filename,1) & turnsdb$tier_name == lead(turnsdb$tier_name,1), lead(turnsdb$time_start,1)-turnsdb$time_end,turnsdb$pause <-NA)
turnsdb$pausecod <- ifelse(turnsdb$pause<500, turnsdb$pausecod <- "/",
turnsdbifelse(turnsdb$pause>=501 & turnsdb$pause <= 1000, turnsdb$pausecod <- "//",
ifelse(turnsdb$pause>1001,turnsdb$pausecod <- paste("/// (", round((turnsdb$pause*0.001),1),")",sep = ""),turnsdb$pausecod <-NA)))
$pausecod[is.na(turnsdb$pausecod)] <- " "
turnsdb
#Creación del valor de fto
<- turnsdb%>%ungroup()%>%mutate(fto_post = ifelse((filename == lead(filename,1)
turnsdb& tier_name != lead(tier_name,1)),
lead(time_start,1)-time_end),NA),
(fto_post_cod = ifelse(fto_post>0 & fto_post<50,"§",NA),
fto_pre = ifelse((filename == lag(filename,1)
& tier_name != lag(tier_name,1)),
-lag(time_end,1)),NA),
(time_startfto_prev_cod = ifelse(fto_pre>0 & fto_pre<50,"§",NA))
<- turnsdb%>%mutate(content1 = ifelse(!is.na(fto_post_cod)|!is.na(fto_prev_cod),paste(fto_prev_cod,content,pausecod,fto_post_cod, sep = ""), content),
turnsdb content1 = gsub("^NA","",content1),
content1 = gsub("NA$","",content1),
content1 = ifelse(!is.na(pausecod),paste(content1,pausecod, sep = ""), content1),
content1 = gsub("NA$","",content1))
#Eliminación de los espacios iniciales
$content1 <- trimws(turnsdb$content1)
turnsdb
#Creación de la variable "intervención" y de la base de datos "turns"
<- turnsdb %>%
turnsdb group_by(filename,id_turn) %>%
mutate(intervencion = paste0(content1, collapse = " "))
$intervencion <- ifelse(turnsdb$change !="change", turnsdb$intervencion <- NA, turnsdb$intervencion <-turnsdb$intervencion)
turnsdb<- turnsdb %>% mutate(id_turn = paste0(spk,"_int_",time_start_int, sep=""),grupo_id=id_ip)
turnsdb <- turnsdb %>% group_by(filename) %>%
turnsdb fill(intervencion, .direction = "up")
<- turnsdb%>%filter(change == "change")%>% select(tier_name,spk,filename,time_end_int,time_start_int,intervencion)
turns <- turns %>% mutate(id_turn = paste(spk,"_int_",time_start_int, sep=""))
turns $intervencion_export <- paste(turns$spk,": ",turns$intervencion,sep = "")
turns<- subset(turns, select=c(id_turn,time_start_int,intervencion_export,filename))
turns_export <- turns_export %>% mutate(inicio = format(as.POSIXct(time_start_int , "UTC", origin = "1970-01-01"), "%H:%M:%OS2"))%>%select(inicio,intervencion_export,filename) turns_export
9 Exporting documents
9.1 Save to database
If you generate new data, you can also add this data to the database without overwriting it. If you want to save generated data to a database, change “eval: false” to “eval: true”.
<- duckdb::dbConnect(duckdb(), "oralstats.duckdb")
conn
if (exists("oralstats.duckdb")) {
# Check if the table exists in the database
if ("oralstats_phones" %in% dbListTables(conn) &"oralstats_words" %in% dbListTables(conn)&"oralstats_ips" %in% dbListTables(conn)&"oralstats_turns" %in% dbListTables(conn)&"oralstats_prosody" %in% dbListTables(conn)) {
# Identify new rows to be added
<- anti_join(as_tibble(data2), dbReadTable(conn, "oralstats_phones"), by = "id_phone")
new_rows <- anti_join(as_tibble(ips), dbReadTable(conn, "oralstats_ips"), by = "id_ip")
new_rows1 <- anti_join(as_tibble(words), dbReadTable(conn, "oralstats_words"), by = "id_word")
new_rows2 <- anti_join(as_tibble(turns), dbReadTable(conn, "oralstats_turns"), by = "id_turn")
new_rows3 <- anti_join(as_tibble(prosody), dbReadTable(conn, "oralstats_prosody"), by = "filename")
new_rows4
# Add new rows to the existing table
if (nrow(new_rows) > 0 & nrow(new_rows1) > 0 & nrow(new_rows2) > 0 & nrow(new_rows3) > 0 & nrow(new_rows4) > 0) {
dbWriteTable(conn, "oralstats_phones", new_rows, append = TRUE, row.names = FALSE)
dbWriteTable(conn, "oralstats_ips", new_rows1, append = TRUE, row.names = FALSE)
dbWriteTable(conn, "oralstats_words", new_rows2, append = TRUE, row.names = FALSE)
dbWriteTable(conn, "oralstats_turns", new_rows3, append = TRUE, row.names = FALSE)
dbWriteTable(conn, "oralstats_prosody", new_rows4, append = TRUE, row.names = FALSE)
dbDisconnect(conn, shutdown = TRUE)
}else {
}} # If table doesn't exist, create it and upload the entire data frame
::dbWriteTable(conn, "oralstats_phones", as_tibble(data2), row.names = FALSE,append = TRUE)
duckdb::dbWriteTable(conn, "oralstats_ips", as_tibble(ips), row.names = FALSE,append = TRUE)
duckdb::dbWriteTable(conn, "oralstats_words", as_tibble(words), row.names = FALSE,append = TRUE)
duckdb::dbWriteTable(conn, "oralstats_turns", as_tibble(turns), row.names = FALSE,append = TRUE)
duckdb::dbWriteTable(conn, "oralstats_prosody", as_tibble(prosody), row.names = FALSE,append = TRUE)
duckdb
::dbGetQuery(conn, "CREATE UNIQUE INDEX idx_id_phone ON oralstats_phones(id_phone)")
DBI::dbGetQuery(conn, "CREATE UNIQUE INDEX idx_id_ip ON oralstats_ips(id_ip)")
DBI::dbGetQuery(conn, "CREATE UNIQUE INDEX idx_id_word ON oralstats_words(id_word)")
DBI::dbGetQuery(conn, "CREATE UNIQUE INDEX idx_id_turn ON oralstats_turns(id_turn)")
DBI::dbGetQuery(conn, "CREATE UNIQUE INDEX idx_id_prosody ON oralstats_prosody(id_prosody)")
DBIdbDisconnect(conn, shutdown = TRUE)
}
9.2 Save to external files
Oralstats will generate copy for all the units processed (ips, words, vowels and turns), in multiple format files: csv and rds. A time reference will be added to the filenames to know exactly when were these files generated.
write_rds(ips,paste("outputs/rds/ips_",Sys.time(),".rds",sep = ""))
write_rds(data2,paste("outputs/rds/data2_",Sys.time(),".rds",sep = ""))
write_rds(turns,paste("outputs/rds/turns_",Sys.time(),".rds",sep = ""))
write_rds(vowels,paste("outputs/rds/vowels_",Sys.time(),".rds",sep = ""))
write_rds(words,paste("outputs/rds/words_",Sys.time(),".rds",sep = ""))
write_rds(prosody,paste("outputs/rds/prosody_",Sys.time(),".rds",sep = ""))
write_csv2(ips,paste("outputs/csv/ips_",Sys.time(),".csv",sep = ""))
write_csv2(data2,paste("outputs/csv/data2_",Sys.time(),".csv",sep = ""))
write_csv2(turns,paste("outputs/csv/turns_",Sys.time(),".csv",sep = ""))
write_csv2(vowels,paste("outputs/csv/vowels_",Sys.time(),".csv",sep = ""))
write_csv2(words,paste("outputs/csv/words_",Sys.time(),".csv",sep = ""))
write_csv2(prosody,paste("outputs/csv/prosody_",Sys.time(),".csv",sep = ""))
9.3 Create textgrids
With this section, Oralstats will generate textgrids with tiers including vowels, and also its corresponding TOBI and MAS tags.
#
# pauses <- words %>%group_by(filename,spk)%>%mutate(time_start_word=time_end_word,time_end_word=time_end_word+pause, content=NA)%>%filter(changeip=="change")%>%ungroup()
#
# upos <- words %>%mutate(tier_name = paste0("upos_",tier_name),content=upos)
#
# pauses_u <- words %>%group_by(filename,spk)%>%mutate(time_start_word=time_end_word,time_end_word=time_end_word+pause, content=NA,tier_name = paste0("upos_",tier_name))%>%filter(changeip=="change")%>%ungroup()
#
# upos_pauses <- rbind(upos,pauses_u)%>%arrange(filename,time_start_word)%>%group_by(filename,spk)%>%slice(if(n() > 1)-n() else row_number())%>%ungroup()
#
# words_pauses <- rbind(words,pauses)%>%arrange(filename,time_start_word)%>%group_by(filename,spk)%>%slice(if(n() > 1)-n() else row_number())%>%ungroup()
#
# complete <- rbind(words_pauses,upos_pauses)%>%arrange(filename,time_start_word)
#
# generate_textgrid_file <- function(filename, df) {
# # Subset dataframe for the current filename
# df_subset <- df
#
# # Set xmin and xmax
# xmin <- min(df_subset$time_start_word)
# xmax <- max(df_subset$time_end_word)
#
# # Open a file connection to write TextGrid format
# file_conn <- file(paste0(filename, ".TextGrid"), "w")
#
# # Write TextGrid header
# cat("File type = \"ooTextFile\"\n", file = file_conn)
# cat("Object class = \"TextGrid\"\n\n", file = file_conn)
#
# cat("xmin = ", xmin, "\n", file = file_conn)
# cat("xmax = ", xmax, "\n", file = file_conn)
# cat("tiers? <exists>\n", file = file_conn)
#
# # Get unique tier names
# unique_tiers <- unique(df_subset$tier_name)
# cat("size = ", length(unique_tiers), "\n", file = file_conn)
#
# # Loop through unique tiers
# for (tier_name in unique_tiers) {
# cat("\nitem []:\n", file = file_conn)
# cat(" item [1]:\n", file = file_conn)
# cat(" class = \"IntervalTier\"\n", file = file_conn)
# cat(" name = \"", tier_name, "\"\n", sep = "", file = file_conn)
# cat(" xmin = ", xmin, "\n", file = file_conn)
# cat(" xmax = ", xmax, "\n", file = file_conn)
#
# # Subset dataframe for the current tier
# tier_df <- df_subset[df_subset$tier_name == tier_name, ]
#
# cat(" intervals: size = ", nrow(tier_df), "\n", file = file_conn)
#
# # Write intervals
# for (i in 1:nrow(tier_df)) {
# cat(" intervals [",i,"]:\n", file = file_conn,sep = "")
# cat(" xmin = ", tier_df$time_start_word[i], "\n", file = file_conn)
# cat(" xmax = ", tier_df$time_end_word[i], "\n", file = file_conn)
# cat(" text = \"", tier_df$content[i], "\"\n", sep = "", file = file_conn)
# }
# }
#
# # Close the file connection
# close(file_conn)
# }
# words_pauses %>%group_by(filename)
# complete %>%group_by(filename) %>% group_walk(~ generate_textgrid_file(.y$filename, .x))
9.4 Create textgrids MAS & TOBI
# generate_textgrid_file2 <- function(filename, df) {
# # Subset dataframe for the current filename
# df_subset <- df
#
# # Set xmin and xmax
# xmin <- min(df_subset$time_start)
# xmax <- max(df_subset$time_end)
#
# # Open a file connection to write TextGrid format
# file_conn <- file(paste0(filename, ".TextGrid"), "w")
#
# # Write TextGrid header
# cat("File type = \"ooTextFile\"\n", file = file_conn)
# cat("Object class = \"TextGrid\"\n\n", file = file_conn)
#
# cat("xmin = ", xmin, "\n", file = file_conn)
# cat("xmax = ", xmax, "\n", file = file_conn)
# cat("tiers? <exists>\n", file = file_conn)
#
# # Get unique tier names
# unique_tiers <- unique(df_subset$tier_name)
# cat("size = ", length(unique_tiers), "\n", file = file_conn)
#
# # Loop through unique tiers
# for (tier_name in unique_tiers) {
# cat("\nitem []:\n", file = file_conn)
# cat(" item [1]:\n", file = file_conn)
# cat(" class = \"IntervalTier\"\n", file = file_conn)
# cat(" name = \"", tier_name, "\"\n", sep = "", file = file_conn)
# cat(" xmin = ", xmin, "\n", file = file_conn)
# cat(" xmax = ", xmax, "\n", file = file_conn)
#
# # Subset dataframe for the current tier
# tier_df <- df_subset[df_subset$tier_name == tier_name, ]
#
# cat(" intervals: size = ", nrow(tier_df), "\n", file = file_conn)
#
# # Write intervals
# for (i in 1:nrow(tier_df)) {
# cat(" intervals [",i,"]:\n", file = file_conn,sep = "")
# cat(" xmin = ", tier_df$time_start[i], "\n", file = file_conn)
# cat(" xmax = ", tier_df$time_end[i], "\n", file = file_conn)
# cat(" text = \"", tier_df$content[i], "\"\n", sep = "", file = file_conn)
# }
# }
#
# # Close the file connection
# close(file_conn)
# }
<- "outputs/textgrids/"
folder_path <- function(filename, df, folder_path) {
generate_textgrid_file2 # Subset dataframe for the current filename
<- df
df_subset
# Set xmin and xmax
<- min(df_subset$time_start)
xmin <- max(df_subset$time_end)
xmax
# Construct the full file path
<- file.path(folder_path, paste0(filename, ".TextGrid"))
file_path
# Open a file connection to write TextGrid format
<- file(file_path, "w")
file_conn
# Write TextGrid header
cat("File type = \"ooTextFile\"\n", file = file_conn)
cat("Object class = \"TextGrid\"\n\n", file = file_conn)
cat("xmin = ", xmin, "\n", file = file_conn)
cat("xmax = ", xmax, "\n", file = file_conn)
cat("tiers? <exists>\n", file = file_conn)
# Get unique tier names
<- unique(df_subset$tier_name)
unique_tiers cat("size = ", length(unique_tiers), "\n", file = file_conn)
# Loop through unique tiers
for (tier_name in unique_tiers) {
cat("\nitem []:\n", file = file_conn)
cat(" item [1]:\n", file = file_conn)
cat(" class = \"IntervalTier\"\n", file = file_conn)
cat(" name = \"", tier_name, "\"\n", sep = "", file = file_conn)
cat(" xmin = ", xmin, "\n", file = file_conn)
cat(" xmax = ", xmax, "\n", file = file_conn)
# Subset dataframe for the current tier
<- df_subset[df_subset$tier_name == tier_name, ]
tier_df
cat(" intervals: size = ", nrow(tier_df), "\n", file = file_conn)
# Write intervals
for (i in 1:nrow(tier_df)) {
cat(" intervals [",i,"]:\n", file = file_conn,sep = "")
cat(" xmin = ", tier_df$time_start[i], "\n", file = file_conn)
cat(" xmax = ", tier_df$time_end[i], "\n", file = file_conn)
cat(" text = \"", tier_df$content[i], "\"\n", sep = "", file = file_conn)
}
}
# Close the file connection
close(file_conn)
}
<- phones%>%mutate(pause= ifelse((lead(time_start)-time_end) > 0.01 ,(lead(time_start)-time_end),NA),tier_name=spk,MAS_structure=NA,id_ip=NA)%>%select(filename,tier_name,spk,time_start,time_end, id_ip,content,pause,MAS_structure)
phones_tg
<- phones_tg %>%group_by(filename,spk)%>%mutate(time_start=time_end,time_end=time_end+pause, content=NA)%>%filter(pause>0.01)%>%ungroup()
phones_pauses
<- rbind(phones_tg,phones_pauses)%>%arrange(filename,time_start)%>%group_by(filename,spk)%>%slice(if(n() > 1)-n() else row_number())%>%ungroup()
phones_pauses2
<- data2%>%mutate(tier_name=paste(spk,"_2_w",sep=""))%>%group_by(filename,tier_name,spk,id_word)%>%summarise(time_start=first(time_start_word),time_end=last(time_end_word),content=first(word))%>%ungroup()%>%mutate(pause= ifelse((lead(time_start)-time_end) > 0.01 ,(lead(time_start)-time_end),NA),MAS_structure=NA,id_ip=NA)%>%select(filename,tier_name,spk,time_start,time_end, id_ip,content,pause,MAS_structure)
words_tg
<- words_tg %>%group_by(filename,spk)%>%mutate(time_start=time_end,time_end=time_end+pause, content=NA)%>%filter(pause>0.01)%>%ungroup()
words_pauses
<- rbind(words_tg,words_pauses)%>%arrange(filename,time_start)%>%group_by(filename,spk)%>%slice(if(n() > 1)-n() else row_number())%>%ungroup()
words_pauses2
<- data2%>%mutate(tier_name=paste(spk,"_4_TOBI",sep=""))%>%group_by(filename,tier_name,spk,id_word,TOBI_pattern)%>%summarise(time_start=first(time_start_word),time_end=last(time_end_word),content=first(TOBI_pattern))%>%ungroup()%>%mutate(pause= ifelse((lead(time_start)-time_end) > 0.01 ,(lead(time_start)-time_end),NA),MAS_structure=NA,id_ip=NA)%>%select(filename,tier_name,spk,time_start,time_end, id_ip,content,pause,MAS_structure)
TOBI_tg
<- TOBI_tg %>%group_by(filename,spk)%>%mutate(time_start=time_end,time_end=time_end+pause, content=NA)%>%filter(pause>0.01)%>%ungroup()
TOBI_pauses
<- rbind(TOBI_tg,TOBI_pauses)%>%arrange(filename,time_start)%>%group_by(filename,spk)%>%slice(if(n() > 1)-n() else row_number())%>%ungroup()
TOBI_pauses2
<- data2%>%mutate(pause= ifelse((lead(time_start_word)-time_end_word) > 0.01 &lead(id_ip)!=id_ip,(lead(time_start_word)-time_end_word),NA),tier_name=paste("vowels_",spk,sep=""))%>%select(filename,id_ip,tier_name,spk,time_start,time_end, content,pause,MAS_structure)
MAS_tg
<- MAS_tg %>%group_by(filename,spk)%>%mutate(time_start=time_end,time_end=time_end+pause, content=NA)%>%filter(pause>0.01)%>%ungroup()
MAS_pauses
<- rbind(MAS_tg,MAS_pauses)%>%arrange(filename,time_start)%>%group_by(filename,spk)%>%slice(if(n() > 1)-n() else row_number())%>%ungroup()
MAS_pauses2
<- data2%>%mutate(pause= ifelse((lead(time_start_word)-time_end_word) > 0.01 &lead(id_ip)!=id_ip,(lead(time_start_word)-time_end_word),NA),tier_name=paste(spk,"_3_MAS",sep=""))%>%group_by(filename,tier_name,id_ip,MAS_structure)%>%summarise(time_start=first(time_start),time_end=last(time_end),spk=first(spk),content=first(content),pause=last(pause))%>%mutate(content=MAS_structure)
MAS_structure
<- MAS_structure %>%group_by(filename,spk)%>%mutate(time_start=time_end,time_end=time_end+pause, content=NA)%>%filter(pause>0.01)%>%ungroup()
MAS_str
<- rbind(MAS_structure,MAS_str)%>%arrange(filename,time_start)%>%group_by(filename,spk)
MAS_str_pauses
<- rbind(phones_pauses2,words_pauses2,MAS_str_pauses,TOBI_pauses2) %>%arrange(filename,tier_name,time_start)%>%group_by(filename, tier_name)%>%mutate(time_end=lead(time_start))%>%slice(if(n() > 1)-n() else row_number())%>%ungroup()%>%mutate(content=ifelse(is.na(content),"",content))
completeMAS
%>%group_by(filename) %>% group_walk(~ generate_textgrid_file2(.y$filename, .x,folder_path)) completeMAS
9.5 Create plots MAS
dir.create("outputs/plots_MAS")
# Generate and save plots by group
# setDF(data2)
# data3 <- data2%>%left_join(MAS%>%select(id_ip,pattern),by="id_ip")%>%head(n=60)%>%mutate(time_center = time_start+((time_end-time_start)/2))
# data3q1 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)))
# data3q4 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)))
#
# dataplot <- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse(
# between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
<- data2%>%mutate(time_center = time_start+((time_end-time_start)/2))
data3 <- data3%>%filter(q1piHz_q2piHZ>10 | q1piHz_q2piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q1piHz + q2piHz)/2, time_center = (q1piHz_time + q2piHz_time)/2,standard_pitch = standard_pitch + (standard_pitch* (q1piHz_q2piHZ/100)), time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))-0.01,time_center))
data3q1 <- data3%>%filter(q3piHz_q4piHZ>10 | q3piHz_q4piHZ< -10)%>%rowwise()%>%mutate(center_Hz = (q3piHz + q4piHz)/2, time_center = (q3piHz_time + q4piHz_time)/2, standard_pitch = standard_pitch + (standard_pitch* (q3piHz_q4piHZ/100)),time_center=ifelse(is.na(time_center),(time_start+((time_end-time_start)/2))+0.01,time_center))
data3q4
<- rbind(data3,data3q1,data3q4)%>%arrange(filename,time_center)%>%mutate(time_center_table=ifelse(
dataplot between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
setDF(dataplot)
<- lapply(unique(dataplot$id_ip), function(grp) {
plots
<- dataplot %>%filter(id_ip == grp)
plot_data
# plot_data <- dataplot
<- ggplot() +
p1 # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=2.5) + ggtitle(paste("Group:", unique(plot_data$id_ip),unique(plot_data$ip),unique(plot_data$pattern)))+ theme_light() +
theme(legend.position = "right",plot.title = element_text(size = 6),legend.text = element_text(size=6),legend.title = element_text(size = 6), axis.title=element_text(size=6,face="bold"))+guides(size="none",colour = guide_legend(override.aes = list(size=3))) + xlab("Timeline") + ylab("Prosodic features")
<- gridExtra::ttheme_default(
mytheme core = list(fg_params=list(cex = 0.5)),
colhead = list(fg_params=list(cex = 0.5)),
rowhead = list(fg_params=list(cex = 0.5)))
<- plot_data %>%
p2 mutate(
Hz = standard_pitch,
dB = standard_intensity,
dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%arrange(time_start)%>% select(Hz,dB,dur,content)
<- t(p2)
p3
if (nrow(p2) > 0) {
<- tableGrob(p3, theme = mytheme)
table_grob
else {
} <- data.frame (Hz = "0",
empty_df dB = "0",
dur = "0",
content ="0"
)<- tableGrob(empty_df, theme = mytheme)
table_grob
}
# Combine plot and table
<-grid.arrange(p1, table_grob, ncol = 1, heights = c(3, 1))
p3
# plot_data <- dataplot %>%
# filter(id_ip == grp)
#
# p1 <- ggplot() +
# # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
# geom_bar(data = plot_data,stat = "identity", aes(x = time_center, y = standard_dur),fill="lightgrey")+
# geom_line(data = plot_data, aes(x = time_center, y = standard_pitch),color="blue")+
# geom_line(data = plot_data, aes(x = time_center, y = standard_intensity),color="green") +
# geom_point(data = plot_data, aes(x = time_center, y = standard_pitch, color = word, shape=MAS_structure),size=5) +
# geom_text(data = plot_data, aes(x = time_center, y = standard_pitch, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
# # geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
# # geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
# ggtitle(paste("Group:", grp,dataplot%>%filter(id_ip==grp)%>%select(ip)%>%head(n=1),dataplot%>%filter(id_ip==grp)%>%select(pattern)%>%head(n=1))) + theme_light() +
# theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
#
#
# p2 <- plot_data %>%
# mutate(
# Hz = standard_pitch,
# dB = standard_intensity,
# dur= standard_dur) %>%mutate(Hz=round(Hz,1),dB=round(dB,1),dur=round(dur,1))%>%
# pivot_longer(c("Hz","dB","dur"), names_to = "layer", values_to = "label") %>%
# ggplot(aes(x = time_center_table)) +
# geom_text(aes(y = factor(layer, c("Hz","dB","dur")), label = label), size = 1.4) +
# labs(y = "", x = NULL) +
# theme_grey()+
# theme(axis.line = element_blank(), axis.ticks = element_blank(), axis.text.x = element_blank(),
# panel.grid = element_blank(),legend.text = element_text(size=1), strip.text = element_blank())
#
# p3 <- p1 + p2 + plot_layout(ncol = 1,heights = c(5,1))
<- paste0("outputs/plots_MAS/",grp,"_",unique(plot_data$pattern), "_MAS.png")
filename ggsave(filename, plot = p3, width = 10, height = 6)
return(filename)
})
9.6 Create plots TOBI
<- data2%>%mutate(time_center = time_start+((time_end-time_start)/2))%>%mutate(center_ST=12*log2(center_Hz/1))%>%mutate(time_center_table=ifelse(
data4 between(lag(time_center,1),time_center-0.05,time_center+0.05),time_center+0.04,time_center))
<- lapply(unique(data2$id_ip), function(grp) {
plots <- data4 %>%filter(id_ip == grp)
plot_data
<- ggplot() +
p1 # geom_line(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch))+
geom_line(data = plot_data, aes(x = time_center, y = center_ST),color="blue")+
geom_point(data = plot_data, aes(x = time_center, y = center_ST, color = word),size=5) +
geom_text(data = plot_data, aes(x = time_center, y = center_ST, label = content), vjust = 0.1, hjust = 0.5,size=3.2) +
geom_text(data = plot_data, aes(x = time_center, y = center_ST, label = TOBI_pattern), vjust = -1, hjust = 0.5,size=3.2) +
# geom_text(data = plot_data, aes(x = q3piHz_time, y = standard_q3_pitch, label = content), vjust = -0.5, hjust = 0.5)+
# geom_text(data = plot_data, aes(x = q4piHz_time, y = standard_q4_pitch, label = content), vjust = -0.5, hjust = 0.5) +
ggtitle(paste("Group:", grp,data4%>%filter(id_ip==grp)%>%select(ip)%>%head(n=1),data4%>%filter(id_ip==grp)%>%select(TOBI_pattern)%>%head(n=1))) + theme_light() +
theme(legend.position = "right",plot.title = element_text(size = 10),legend.text = element_text(size=10),legend.title = element_text(size = 10), axis.title=element_text(size=10,face="bold"))+guides(size="none") + xlab("Timeline") + ylab("Prosodic features")
<- plot_data %>%
p2 mutate(ST = round(center_ST, 1)) %>%
pivot_longer(cols = c("ST"), names_to = "layer", values_to = "label") %>%
ggplot(aes(x = time_center)) +
geom_text(aes(y = layer, label = label), size = 1.4) +
labs(y = "", x = NULL) +
theme_gray() +
theme(axis.line = element_blank(),
axis.ticks = element_blank(),
axis.text.x = element_blank(),
panel.grid = element_blank(),
legend.text = element_text(size = 1),
strip.text = element_blank())
<- p1 + p2 + plot_layout(ncol = 1,heights = c(5,1))
p3
<- paste0("outputs/plots_TOBI/",grp,"_",data4%>%filter(id_ip==grp)%>%select(TOBI_pattern)%>%head(n=1), "_TOBI.png")
filename ggsave(filename, plot = p3, width = 8, height = 4)
return(filename)
})
10 Statistics
10.1 Descriptive statistics
library(psych)
describe(ips$pitch_St)
vars n mean sd median trimmed mad min max range skew kurtosis
X1 1 2105 84.95 6.46 86.75 84.8 8.23 74.99 105.23 30.24 0 -1.26
se
X1 0.14
describeBy(x = ips$pitch_St, group = ips$spk)
Descriptive statistics by group
group: 1990_alto_01_e
vars n mean sd median trimmed mad min max range skew kurtosis
X1 1 1219 89.66 3.51 89.44 89.61 3.17 76.16 105.23 29.07 0 1.85
se
X1 0.1
------------------------------------------------------------
group: 1990_alto_01_i
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 106 82.89 4.96 81.4 82.32 4.51 75.11 98.86 23.75 0.99 0.36 0.48
------------------------------------------------------------
group: 1990_alto_02_e
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 693 77.51 1.56 77.22 77.32 1.2 74.99 90.01 15.02 2.3 10.91 0.06
------------------------------------------------------------
group: 1990_alto_02_i
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 87 80.73 3.66 80 80.44 3.75 75.24 91.14 15.9 0.67 -0.07 0.39
10.2 Histograms
<- ggplot(data2, aes(x=center_Hz)) + geom_histogram()
p + geom_vline(aes(xintercept=mean(center_Hz,na.rm=TRUE)),
pcolor="blue", linetype="dashed", size=1)
10.3 Boxplots
<- ggplot(data2, aes(x=spk, y=standard_intensity, fill=spk)) + geom_boxplot()
p_boxplot p_boxplot
10.4 Barplots
<- MAS%>%group_by(pattern)%>%summarise(freq=n())
MAS_barplot <-ggplot(MAS_barplot, aes(x=pattern, y = freq, fill = pattern)) +
p_barplot geom_bar(stat="identity")+theme_minimal()
p_barplot
10.5 Correlation plots
library(corrplot)
<- ips%>%select(pitch_Hz,anacrusis,toneme,body)%>%na.omit()
corplotsel
corrplot(cor(corplotsel),method = "circle",tl.cex = 0.6)
10.6 Chi-square test
10.6.1 Chi-square result
<- chisq.test(table(data2$MAS_structure,data2$maximum_Hz))
chisqresults chisqresults
Pearson's Chi-squared test
data: table(data2$MAS_structure, data2$maximum_Hz)
X-squared = 650.15, df = 2, p-value < 2.2e-16
10.6.2 Residuals
$residuals chisqresults
no yes
anacrusis -4.635226 14.749630
body 3.686061 -11.729316
toneme -4.833758 15.381371
10.7 Mosaic plots
library(vcd)
<- mosaic(~ maximum_Hz + MAS_structure , data = data2, shade =TRUE) mosaicout
mosaicout
MAS_structure anacrusis body toneme
maximum_Hz
no 983 16612 3851
yes 269 1126 723
10.8 Decision trees
library(partykit)
library(ggparty)
<- ips%>%select(spk,MAS_pattern,TOBI,intensity,pitch_St,intensity,pitch_reset_to_ST,pitch_reset_from_ST,toneme,anacrusis,body)%>%mutate_if(is.character,as.factor)%>%na.omit()
treesel setDF(treesel)
<- ctree(spk ~.,data=treesel,control = ctree_control(maxdepth = 3))
treeout
<- ggparty(treeout) +
p geom_edge() +
geom_edge_label() +
geom_node_label(
line_list = list(
aes(label = splitvar),
aes(label = paste("N =", nodesize))
),line_gpar = list(
list(size = 10),
list(size = 10)
),ids = "inner"
+
) geom_node_label(aes(label = paste0("N = ", nodesize)),
ids = "terminal", nudge_y = -0.35, nudge_x = 0.02
+
) geom_node_plot(
gglist = list(
geom_bar(aes(x = "", fill = spk),
position = position_fill(), color = "black"
),theme_minimal(),
xlab(""), ylab("porcentaje")
)
)plot(p)
10.9 Heatmap
<- ips%>%group_by(MAS_pattern) %>% summarize_if(., is.numeric, median, na.rm = TRUE)%>%na.omit()%>%ungroup()
p <- p %>%column_to_rownames("MAS_pattern")
p heatmap(as.matrix(p), scale = "column")
10.10 Principal component analysis
library(FactoMineR)
library(factoextra)
<- MAS%>%filter(is.na(pattern))%>%select(inflexion_percent_Hz_from_prev,inflexion_percent_Hz_to_next,anacrusis,body_2,toneme,displacement,circumflex,intensity_mean_dB,first_tonic_displacement,anacrusis_displacement)%>%mutate(first_tonic_displacement=replace_na(first_tonic_displacement,"no"),circumflex=replace_na(circumflex,"no"))%>%na.omit()
selection <- FAMD(selection, graph = FALSE)
d summary(d)
Call:
FAMD(base = selection, graph = FALSE)
Eigenvalues
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
Variance 2.816 2.092 1.476 1.328 1.106
% of var. 11.734 8.716 6.152 5.533 4.608
Cumulative % of var. 11.734 20.451 26.602 32.135 36.743
Individuals (the 10 first)
Dist Dim.1 ctr cos2 Dim.2 ctr
1 | 4.960 | -0.887 0.041 0.032 | 3.173 0.705
2 | 4.507 | 0.553 0.016 0.015 | 1.526 0.163
3 | 9.464 | -0.612 0.019 0.004 | 0.708 0.035
4 | 2.962 | 1.366 0.097 0.213 | 1.846 0.239
5 | 3.125 | 0.914 0.043 0.086 | 2.208 0.341
6 | 10.999 | 0.386 0.008 0.001 | 1.977 0.273
7 | 3.992 | -1.817 0.172 0.207 | 1.602 0.180
8 | 3.439 | 0.934 0.045 0.074 | -2.122 0.315
9 | 11.854 | 1.828 0.174 0.024 | 2.622 0.481
10 | 2.619 | -2.039 0.216 0.606 | 0.942 0.062
cos2 Dim.3 ctr cos2
1 0.409 | 2.163 0.464 0.190 |
2 0.115 | 1.030 0.105 0.052 |
3 0.006 | -1.674 0.278 0.031 |
4 0.389 | -1.133 0.127 0.146 |
5 0.499 | -0.834 0.069 0.071 |
6 0.032 | 2.080 0.429 0.036 |
7 0.161 | 1.477 0.216 0.137 |
8 0.381 | 1.071 0.114 0.097 |
9 0.049 | 0.005 0.000 0.000 |
10 0.129 | 0.048 0.000 0.000 |
Continuous variables
Dim.1 ctr cos2 Dim.2 ctr cos2
inflexion_percent_Hz_from_prev | -0.458 7.438 0.209 | 0.519 12.882 0.269 |
inflexion_percent_Hz_to_next | 0.842 25.172 0.709 | 0.457 10.000 0.209 |
anacrusis | 0.194 1.331 0.037 | -0.381 6.934 0.145 |
body_2 | -0.505 9.070 0.255 | 0.597 17.061 0.357 |
toneme | 0.842 25.193 0.709 | 0.464 10.275 0.215 |
intensity_mean_dB | -0.182 1.170 0.033 | 0.137 0.898 0.019 |
Dim.3 ctr cos2
inflexion_percent_Hz_from_prev 0.436 12.879 0.190 |
inflexion_percent_Hz_to_next -0.007 0.003 0.000 |
anacrusis 0.101 0.684 0.010 |
body_2 0.367 9.114 0.135 |
toneme -0.003 0.001 0.000 |
intensity_mean_dB -0.366 9.088 0.134 |
Categories (the 10 first)
Dim.1 ctr cos2 v.test Dim.2
no | -0.590 3.373 0.659 -16.681 | -0.306
yes | 1.946 11.115 0.659 16.681 | 1.010
asc_desc | -1.616 2.411 0.166 -7.069 | 1.528
asc_desc_asc | 2.381 0.209 0.016 2.008 | -0.011
asc_q1 | -0.196 0.008 0.001 -0.391 | 1.440
asc_q1q2 | 0.034 0.000 0.000 0.029 | 2.655
asc_q1q2q3 | 1.140 0.024 0.002 0.679 | 5.644
asc_q2 | 1.958 0.425 0.033 2.869 | -0.297
asc_q2q3 | 4.147 1.588 0.118 5.542 | 2.449
asc_q3 | -0.270 0.011 0.001 -0.457 | 0.823
ctr cos2 v.test Dim.3 ctr
no 1.646 0.178 -10.044 | 0.015 0.008
yes 5.425 0.178 10.044 | -0.048 0.025
asc_desc 3.904 0.148 7.752 | 1.933 12.546
asc_desc_asc 0.000 0.000 -0.011 | 5.747 4.436
asc_q1 0.763 0.033 3.327 | 0.867 0.555
asc_q1q2 0.472 0.021 2.598 | 0.154 0.003
asc_q1q2q3 1.066 0.046 3.902 | 1.877 0.237
asc_q2 0.018 0.001 -0.504 | 2.532 2.584
asc_q2q3 1.003 0.041 3.797 | 0.427 0.061
asc_q3 0.181 0.008 1.617 | -0.789 0.335
cos2 v.test
no 0.000 0.574 |
yes 0.000 -0.574 |
asc_desc 0.237 11.675 |
asc_desc_asc 0.094 6.693 |
asc_q1 0.012 2.384 |
asc_q1q2 0.000 0.179 |
asc_q1q2q3 0.005 1.545 |
asc_q2 0.055 5.123 |
asc_q2q3 0.001 0.788 |
asc_q3 0.007 -1.847 |
10.10.1 Dimension contribution
fviz_eig(d)
10.10.2 Contribution of categories
fviz_contrib(d, choice="var", axes = 1:2)
10.10.3 Chart of individuals
<- d
res.pca <- as.factor(selection$displacement)
groups <- fviz_pca_ind(res.pca,
d_plothabillage = groups, # color by groups
# palette = c("#00AFBB", "#FC4E07"),
legend.title = "Groups",
repel = FALSE)
d_plot
10.11 Cluster classification based on PCA
= HCPC(d,nb.clust = -1) res.hcpc
10.11.0.1 Chart of clusters
fviz_cluster(res.hcpc, ellipse.type = "norm")
10.11.0.2 Characteristics of groups
<- res.hcpc$data.clust
plotclust <- ctree(clust~., data=plotclust,control = ctree_control(maxdepth = 3))
plottree2
<- ggparty(plottree2) +
ptree geom_edge() +
geom_edge_label() +
geom_node_label(
line_list = list(
aes(label = splitvar),
aes(label = paste("N =", nodesize))
),line_gpar = list(
list(size = 10),
list(size = 10)
),ids = "inner"
+
) geom_node_label(aes(label = paste0("N = ", nodesize)),
ids = "terminal", nudge_y = -0.35, nudge_x = 0.02
+
) geom_node_plot(
gglist = list(
geom_bar(aes(x = "", fill = clust),
position = position_fill(), color = "black"
),theme_minimal(),
xlab(""), ylab("porcentaje")
)
)
plot(ptree)
plottree2
Model formula:
clust ~ inflexion_percent_Hz_from_prev + inflexion_percent_Hz_to_next +
anacrusis + body_2 + toneme + displacement + circumflex +
intensity_mean_dB + first_tonic_displacement + anacrusis_displacement
Fitted party:
[1] root
| [2] displacement in no
| | [3] circumflex in asc_desc, asc_q1, asc_q1q2, asc_q3, desc_asc_desc, desc_q1, desc_q1q2q3, desc_q2, desc_q3, no, no_circumflex
| | | [4] anacrusis_displacement in no: 1 (n = 316, err = 10.1%)
| | | [5] anacrusis_displacement in yes: 2 (n = 201, err = 18.9%)
| | [6] circumflex in asc_desc_asc, asc_q2, asc_q2q3: 2 (n = 7, err = 42.9%)
| [7] displacement in yes
| | [8] first_tonic_displacement in no
| | | [9] body_2 <= 35.51: 3 (n = 110, err = 8.2%)
| | | [10] body_2 > 35.51: 1 (n = 13, err = 46.2%)
| | [11] first_tonic_displacement in yes
| | | [12] toneme <= 21.87: 2 (n = 25, err = 4.0%)
| | | [13] toneme > 21.87: 3 (n = 11, err = 18.2%)
Number of inner nodes: 6
Number of terminal nodes: 7
10.12 ANOVA & HSD-Tukey
<- aov(toneme ~ spk,data=ips)
anovatest summary(anovatest)
Df Sum Sq Mean Sq F value Pr(>F)
spk 3 12125 4042 5.755 0.000647 ***
Residuals 1752 1230438 702
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
351 observations deleted due to missingness
10.12.1 HSD-Tukey results
TukeyHSD(anovatest)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = toneme ~ spk, data = ips)
$spk
diff lwr upr p adj
1990_alto_01_i-1990_alto_01_e -0.7209639 -8.406498 6.964570 0.9950649
1990_alto_02_e-1990_alto_01_e 1.0347812 -2.551274 4.620837 0.8800700
1990_alto_02_i-1990_alto_01_e 13.2886832 4.986136 21.591230 0.0002359
1990_alto_02_e-1990_alto_01_i 1.7557451 -6.185572 9.697062 0.9414243
1990_alto_02_i-1990_alto_01_i 14.0096471 3.094657 24.924637 0.0054330
1990_alto_02_i-1990_alto_02_e 12.2539020 3.714033 20.793771 0.0013209
10.13 Complete descriptive stats report with Data Explorer
library(DataExplorer)
<- data2%>%select(content,tonic,accent,pitch_mean_HZ,intensity_mean_dB,center_Hz,displacement,MAS_structure)
data_report create_report(data_report)
11 Oralstats_Viewer
library(shiny)
runApp("oralstats_viewer/oralstats_viewer.R")