Yorùbá vowel deletion involves compensatory lengthening: Evidence from phonetics

A phonetic pilot study of Yorùbá vowel deletion shows that the vowel that remains after an adjacent vowel deletes is slightly but significantly longer than a short vowel in non-deletional contexts (p < 0.001). In the configuration studied here, deletion occurs in the vowel of a CV verb when occurring before a V-initial direct object (/CV1 +V2 / → [CV2]). However, instead of full vowel deletion as it is previously analysed (e.g. Akinlabi and Oyebade 1987, Ola Orie and Pulleyblank 2002), a compensatory lengthening analysis is proposed based on this new phonetic evidence. The experiment for this study controlled for inherent vowel duration, voicing, and manner of articulation of the surrounding consonants. These results are in line with a similar result regarding Yorùbá tone (Ajíbóyè et al. 2011) in the same syntactic (verb + direct object) configuration.


Introduction
This work examines the phonetics of a vowel deletion process in Yorùbá (Atlantic-Congo, Nigeria). Specifically, in a pilot study, the duration of an underived short vowel is compared with the duration of a vowel that remains after deletion of an adjacent vowel (called here the "remnant vowel") in a VV sequence.
(1) a. /V1V2/ The duration of the remnant vowel (V 2 above) is slightly but significantly longer than the duration of the underived short vowel (V 3 above). The standard phonological account treats this process as full deletion. However, due to the difference in duration, this account must be revised. A compensatory lengthening account involves the segmental material being deleted while a mora remains, causing lengthening on the remnant vowel. An incomplete neutralisation account would involve the phonetic module spelling out the phonologically short vowel as slightly longer. Both options are discussed, and ultimately Yorùbá vowel deletion is re-analysed as compensatory lengthening.

2
The present study involves one speaker with vowels in one specific context. The results are significant, and this is the basis for the compensatory lengthening analysis. However, data from more speakers and more contexts will ultimately strengthen the analysis, a task left to future research.

Vowel deletion process
In discussing the deletion process, the vowel that remains after an adjacent vowel deletes is the remnant vowel (as stated above). Likewise, a short vowel outside of deletion is a "simple vowel". Any analysis that treats the remnant vowel as structurally identical to a simple short vowel is called the "standard phonological account" (see Akinlabi andOyebade 1987, Ola Orie andPulleyblank 2002, andreferences therein). For instance, Ola Orie and Pulleyblank (2002) analyse the vowel deletion process as full deletion, following previous accounts. They summarise the process investigated here as the following: (2) Deletion (Ola Orie and Pulleyblank 2002: 105) "In a V 1 + V 2 sequence, V 1 deletes when contained in a word of a single syllable." 1 In this view, the structure of the remnant vowel is identical to a simple short vowel: each projects a single, unshared mora. The structure after deletion in Ola Orie and Pulleyblank (2002) is given in (3)  While not all previous analyses assume a moraic theory, the structure above fits the description of the standard account. A monomorphemic CVCV word would have the same prosodic structure as the form in (3), as it is assumed that the result of deletion is a phonologically short vowel. However, the phonetic duration of the vowel that remains after deletion suggests that the standard account must be revised, and that the process is more properly analysed as a case of compensatory lengthening.

Methodology
Data was recorded from one native speaker of Yorùbá. The speaker is female, 30-years old, and was born and raised in Kwara State (North Central Zone), Nigeria. She speaks the Ìgbómìnà dialect of Yorùbá. The only other language the subject speaks is English, and she has no reported speech or hearing problems. The subject has an advanced linguistics background and aided in the preparation of the elicitation materials, but was naïve to the purpose of the study.
At the time of the study, the subject had lived in the United States for the previous three months, and spoke Yorùbá daily.
Forms to be elicited were compiled with the aid of the native speaker in several sessions prior to recording, though the subject was not aware of the eventual goal of the experiment. To ease in eventual segmentation, words with the vowel [a] between voiceless obstruents, preferably stops, were sought. To control for inherent durations among different vowels, only the vowel [a] is analysed. At the time of elicitation, the target sentences were given to the speaker in groups of seven sentences per page, with a total of 102 sentences, randomised throughout. The first and last sentences on each page were recorded but ignored, to control for list intonation effects, and the subject was instructed to speak naturally, as if to a friend, based on similar techniques from Broselow, Chen and Huffman (1997). Each target sentence was repeated at least five times (non-consecutively) during the elicitation. The subject was instructed to read each sentence once, with a pause in between each one, and a longer pause or break after each page. The sentences were written in standard Yorùbá orthography, with no English present.
The full wordlist used is given in (4). Vowel durations for only a subset of these are measured; these are given later in (5) and (6) Recordings were made in a sound-attenuated booth in the Phonology and Field Research Laboratory at Rutgers University. The subject wore a head-mounted AKG C420 microphone connected though a digital pre-amp, and was recorded in Goldwave at 44.1kHz. The file was saved as a WAV file and segmented in Praat.
Following Francis, Ciocca and Yu (2003) and Ladefoged (2003), vowel segmentation was determined by the first zero-crossing (where the amplitude is 0) before the first regular period of a periodic signal, and vowel offset was determined in a similar way. For vowel offset, the zero-crossing at the last stable periodic signal was marked. In some cases, there were one or two extra pulses of voicing, but these were not considered part of the vowel as the waveform had lost its shape.

Data and results
Each target word was elicited in the frame sentence mo ta _ lana, 'I sold _ yesterday', except where indicated below; in these latter cases, other verbs were deemed more natural by the consultant. 2 After excluding certain forms to control for voice and manner of articulation (see, e.g., Peterson andLehiste 1960, andVan Santen 1992), the durations of the initial vowels (in bold) in the following nouns were measured: The null hypothesis under the standard phonological account is that there should be no significant difference between the duration of the vowels between the two groups. However, there is indeed a significant difference between the duration of remnant vowels and simple short vowels (t(40.458) = −4.1821, p < 0.001). This is summarised in Table 1, while the means are plotted in Figure 1. The two groups were compared via the t.test function in R.  On average, the mean duration of the remnant vowels is about 12 ms longer than the simple vowels. While the two sets of words are balanced for place of articulation in voicing, there are still imbalances between them, such as number of tokens, tone, and word length. However, there is one near-minimal pair between the two groups: [ata] 'pepper' and [tata] 'grasshopper'. The results for these two forms are given in Table 2. This difference in the minimal pair is just above significance: t(6.84) = 2.31, p = 0.055. The absolute difference in means is comparable to the full dataset. More tokens of this type are likely to yield a more robust significance. These forms are identical in segments, tone, and word length. Additionally, the tone of both words is M(id), and there is no significant difference in f0 between the two vowels (t(4.48) = −1.03, p = 0.36); the durational differences are caused neither by phonological tone nor fundamental frequency (see, e.g., Mamadou Y. 2017 on the effects of f0 and duration in Yorùbá).
The difference in duration between the two vowel types is just at the Just Noticeable Difference for duration. Klatt (1976) finds that differences between 10 ms and 20 ms are perceptible, while the difference in means here is 12 ms. It is unclear if this difference is actually perceptible for a native Yorùbá speaker; a perception experiment is necessary to test this.

Implications for phonology and phonetics
Assuming the results are robust, there are implications for either the phonological account, its phonetic implementation, or both. If we assume the standard phonological account, where the prosodic structures containing the remnant vowel and the simple vowel are identical, then this is a potential case of incomplete neutralisation: two forms that should otherwise be identical have slight phonetic differences. In this case, a /VV/ triggers full deletion, resulting in a monomoraic remnant vowel V, which is realised differently by the phonetics from a structurally identical simple vowel.
Alternatively, because of the phonetic differences found, the other option is a revised phonological account where the process is no longer treated as full deletion, but rather as root node deletion plus compensatory lengthening: the remnant vowel will lengthen due to being reassociated to the mora of the deleted vowel root node. However, this account is not completely straightforward, as the phonetic output of compensatory lengthening usually results in a vowel of which the duration is similar to that of one that is phonologically long.

Revised phonological account
As defined by Hayes (1989), the Yorùbá facts fit the definition of compensatory lengthening, which is "[…] the lengthening of a segment triggered by the deletion or shortening of a nearby segment" (Hayes 1989: 260). Assuming the process is compensatory lengthening, the standard phonological account is thus revised. Instead of full vowel deletion, the output of the relevant phonological processes is instead root node deletion, with the mora of the underlying vowel remaining and causing lengthening on the remnant vowel: (7) Revised phonological account a. Underlying moraic structure b. V1 root node deletes c. Remaining mora reassociates Because the output of the phonological process is now structurally different from a simple short vowel (which projects only one mora -see the structure in (3)), there is a phonological reason why the remnant vowel should be phonetically realised longer than a simple vowel. Yorùbá has no contrastive vowel length distinctions, so the remnant vowel, even though it is only ∼12 ms longer than a short vowel, is the phonetic realisation of a Yorùbá long vowel. 3 This pattern is structurally identical to a more well-known case of compensatory lengthening in Luganda: 'moon' (dim.) (Clements 1986, Goldrick 2000 However, the crucial difference is in the phonetic realisation of the bimoraic vowel: whereas in Luganda and in other cases of compensatory lengthening the realisation is akin to a long vowel, in Yorùbá it is only slightly longer. This phonetic fact, previously unreported, is why the process in Yorùbá is usually treated as full deletion and not compensatory lengthening. Why should the phonetic realisation of a bimoraic in Yorùbá be only slightly longer than a short vowel? De Chene and Anderson (1979) claim that "the existence of an independently-motivated length contrast in the language is a necessary condition for compensatory lengthening" (De Chene and Anderson 1979: 508). In other words, if a language does not have phonologically long vowels elsewhere, a compensatory lengthening process will not result in a phonologically long vowel. While this claim has been weakened if not rejected since (see Gess 2011 for a review), there still might be a connection between the presence of a long vowel contrast and the phonetic realisation of compensatory lengthening. Recall that Yorùbá does not have contrastive long vowels.

Phonetic implementation: Incomplete neutralisation?
It is clear from the results that the remnant vowels in Yorùbá are not as long as would be expected for bimoraic vowels cross-linguistically. However, to emphasise this point further, durations for phonologically short versus long vowels for languages with a true length distinction are given in (9). In Hindi, the long vowel is about twice as long as the short vowel, compared to an 11% difference for Yorùbá. The phonetic module would have to realise the bimoraic structure in Yorùbá well below the long vowel duration in Hindi and many other languages. However, because there are no vowel length contrasts, there is an extremely low functional load on long vowels, which might contribute to only a slight phonetic difference between short and long vowels.
An alternate approach would be to assume that the standard phonological account holds, but this is a case of incomplete neutralisation. Incomplete neutralisation describes a process where the phonological module outputs two structures assumed to be identical, but the phonetic module interprets them differently. In the present case, assuming the standard account, the phonological module outputs two phonologically short vowels, but the phonetics implements one as slightly longer. Braver (2013: 4) defines incomplete neutralisation as instances where "the surface acoustic cues to two underlyingly distinct segments in a given context are less distinct than the segments' canonical realizations in non-neutralizing contexts, but are not completely identical". In the standard approach to Yorùbá vowel deletion, the simple vowels and remnant vowels are not underlyingly distinct; they are structurally the same. It is only in the revised approach that they differ, specifically in their moraic configurations. In the revised approach, it is not clear what the "canonical realisation" of a long vowel is, as there are no vowel length contrasts in the language.
Additionally, there is no clear reason where the phonetic pressure to lengthen comes from under the standard approach. Assuming a recent theory of incomplete neutralisation, Braver (2013), incomplete neutralisation occurs when there is a conflict between X, Y, and Z, where X and Y are identical phonological structures but Y and Z are words related in some paradigm.
For example, a case of incomplete neutralisation that Braver studies concerns vowel duration in Japanese, making it relevant to the facts here. A root can either occur with a particle or in isolation. When occurring with a particle, the root projects one mora; in isolation, two. However, the duration of the form in isolation is not as long phonetically as an underived bimoraic form. This is shown in (10).
(10) Japanese paradigm (Braver 2013: 127, with  While there is only a two-way contrast in phonological structure (one mora versus two), there is a three-way contrast in phonetic duration. For lengthened form chi, there is a conflict between two constraints in the phonetic implementation. DUR(μμ)=TARGETDUR(μμ) returns a lower cost of lengthened chi and is similar in length to underived bimoraic forms, like chii 'social.status'. Additionally, there is a constraint OO-ID-DUR, which returns a lower cost the more similar the lengthened form is to the monomoraic form that occurs with the particle; this form is the base of the paradigm, as it has the highest frequency. (I refer the reader to Braver (2013) for a full and clear exposition of the data and analysis.) What, then, does the Yorùbá paradigm look like? Still assuming the standard account, it is given in (11) Because Yorùbá has no underlying long vowels, and the standard account assumes the remnant vowel is monomoraic, there is no obvious pressure for the phonetics to realise the remnant vowel any longer than the short vowel. As one reviewer puts it, "[s]ince there is no short/long vowel length contrast in Yorùbá, a vowel of middling duration is not halfway between two contrastive categories in the language," because the second contrastive category -long vowels -does not exist. In fact, both relevant constraints, OO-ID-DUR and (in this case) DUR(μ)=TARGETDUR(μ), return the lowest cost when the duration of the short vowel equals the duration of the remnant vowel.
The revised phonological account has a clear reason why the remnant vowel should be lengthened: it projects two moras. However, the question then becomes, why is it only lengthened about 10%? While not a case of incomplete neutralisation, it seems likely that weighted, scalar phonetic constraints of Braver (2013), Flemming (2001), and others can be used to model this process. As an anonymous reviewer points out, unlike in Japanese, the Yorùbá case does not involve competing pressures from two contrastive categories in the language, as there are no contrastive long vowels. If there is no pressure in a vowel paradigm from a contrastive long vowel, the phonetics might be compelled to realise the long vowel only slightly longer than a short vowel due to this decreased functional load. However, a full account in a gradient theory of phonetics is left to future work.

Connection to syntax
The results presented here are very similar to experimental results of two tonal processes, one in the same syntactic environment as deletion. 4 All cases of deletion and lengthening here occur between a verb and its direct object. Ajíbóyè et al. (2011) describe a tone-lowering process in this same environment. Yorùbá has three standard tones, high (H), mid (M), and low (L), and "[in] Standard Yoruba, before a direct object […] monosyllabic L-tone verbs are raised to M" (Ajíbóyè et al. 2011(Ajíbóyè et al. : 1634. Additionally, there is a HL contour simplification process involving the reduplicative morpheme -kí-. Examples of these processes are shown in (12) and (13), respectively.
(12) L-raising (Ajíbóyè et al. 2011: (5d)) a. Mo na Títí 1sg beat Titi 'I beat titi' (13) HL-simplification (Ajíbóyè et al. 2011: (13c)) /èrò-kí-èrò/ → èròkerò L L H L L L L M L What Ajíbóyè et al. (2011) find is that the M that results from HL-simplification, which they call "morphologically derived", is identical in f0 to an underived M tone. However, the M that results from L-raising, which they call "syntactically derived", is significantly lower in f0 than both the underived and morphologically derived M tone. In their analysis, the syntactic environment crucially prevents the L tone from fully deleting, and thus causes it to slightly lower the f0 of the resulting M tone.
This makes predictions for vowel deletion as well. Vowel deletion also occurs in the morphological environments for HL-simplification -is the duration of the remaining vowel there more similar to the remnant vowels here or more similar to simple short vowels? While further experiments must be done, the explanation by Ajíbóyè et al. (2011) predicts that syntactically conditioned vowel deletion should differ from morphologically conditioned deletion.

Further directions
The results presented here, while significant, constitute a pilot study. More work needs to be done to conclusively show robustness across several dimensions. First, future experiments should test multiple speakers across multiple dialects.
Additionally, there are both other deletion processes in other contexts, and other types of processes in similar contexts, that should be investigated. As mentioned in Section 5.3, the vowel of a CV reduplicative particle also deletes in certain cases. Phonetically, how does the duration of the remnant vowel there relate to the remnant vowel in verb + object constructions?
The interpretation of Ajíbóyè et al. (2011) suggests there will be a phonetic difference.
Between verb and vowel-initial object, there are a number of processes that can occur. The generalisation argued for in Ola Orie and Pulleyblank (2002) is that CV verbs trigger deletion, but larger verbs, such as CVCV verbs, cause vowel assimilation but not deletion.
(14) Ola Orie and Pulleyblank (2002: (3f)) jáde opó jàdo opó *jádopo 'come out of mourning' In the form in (14), the V1 +V 2 sequence is resolved by changing the quality of V 1 to match V 2 instead of deleting it. While all the forms in the experimental stimulus fit the context for vowel deletion (in that the verb is CV), the quality of the deleted vowel and the remnant vowel are the same: both are [a]. This is then ambiguous between deletion and assimilation. 5 However, the process studied here is unlikely to be only assimilation (resulting in a VV sequence), as Ola Orie and Pulleyblank (2002: fn. 13) state that, "at the phonetic level, sequences of identical vowels appear to be produced with a slight rearticulation. That is, in a sequence like […e.e…], there appear to be two phonetic targets, not one". The results here are like a single vowel, yet slightly longer, not as a sequence of two vowels. Nevertheless, it is definitely worthwhile to compare instances of remnant vowels with clear cases of assimilation in VV sequences with both simple vowels and remnant vowels.

Summary and conclusion
While the standard account of vowel deletion in Yorùbá creates a phonological structure identical to a short vowel, new experimental results suggest that this should be revised. The duration of the vowel that remains after deletion in a VV sequence is slightly but significantly longer than a simple short vowel. A revised phonological account places the deletion process as a type of compensatory lengthening: the vowel root node deletes, and the remaining mora reassociates with the remnant vowel. However, the duration is only about 10% longer than a short vowel, which is much shorter than bimoraic vowels cross-linguistically. This could be related to the fact that Yorùbá does not have contrastive long vowels, so while there is phonological pressure for the vowel to lengthen, there is a low functional load. This work will hopefully lead to more research, both on the phonetics and the phonology, and on vowel deletion and related processes in Yorùbá.