Anticipatory Processing in a Verb-Initial Mayan Language: Eye-Tracking Evidence During Sentence Comprehension in Tseltal

We present a visual world eye-tracking study on Tseltal (a Mayan language) and investigate whether verbal information can be used to anticipate an upcoming referent. Basic word order in transitive sentences in Tseltal is Verb–Object–Subject (VOS). The verb is usually encountered ﬁrst, making argu-ment structure and syntactic information available at the outset, which should facilitate anticipation of the post-verbal arguments. Tseltal speakers listened to verb-initial sentences with either an object-predictive verb (e.g., “eat”) or a general verb (e.g., “look for”) (e.g., “Ya slo’/sle ta stukel on te kereme,” Is eating/is looking (for) by himself the avocado the boy / “The boy is eating/is looking (for) an avocado by himself”) while seeing a visual display showing one potential referent (e.g., avocado) and three distractors (e.g., bag


Introduction
To comprehend spoken sentences, listeners must extract words from the incoming speech signal, retrieve them from memory, and integrate them into an interpretation.Despite the complexity of this process, listeners comprehend language with great speed and efficiency and with apparent ease.The results of a large and growing body of psycholinguistic research suggest that one reason why language processing may be so effortless, fast, and efficient is because language users routinely make predictions about upcoming language input (Altmann & Mirković, 2009;DeLong, Urbach, & Kutas, 2005;Federmeier, 2007;Federmeier & Kutas, 1999;Huettig, Audring, & Jackendoff, 2022;Kamide, 2008;Levy, 2008;Pickering & Garrod, 2013;Van Berkum, Brown, Zwitserlood, Kooijman, & Hagoort, 2005;Wicha, Moreno, & Kutas, 2004).Empirical evidence in support of anticipatory language processing has been demonstrated at many levels of linguistic structure (phonological, lexical, syntactic, conversational) using a variety of neurophysiological and behavioral research methods (e.g., event-related potentials, reaction times, eye-tracking).Yet, in one way, the empirical coverage of research on anticipatory processing remains small: it draws on a very restricted set of languages.Thus, while much psycholinguistic research emphasizes the central, if not crucial role of anticipatory processing during language comprehension, there is currently very little data from speakers of typologically diverse languages that would allow us to test the generalizability of this claim.
The present study addresses this empirical gap by investigating anticipatory processing during sentence comprehension in Tseltal, a head-marking, verb-initial Mayan language.Using visual world eye tracking, our goal was to determine whether Tseltal listeners used information encoded in the sentence-initial verb to predict an upcoming direct object when listening to simple transitive sentences.As we discuss below, Tseltal verbs have certain semantic and morphological properties that might be expected to facilitate anticipation of an upcoming object.On the other hand, Tseltal is spoken largely in rural communities where there are lower levels of education and literacy compared to the kinds of populations typically studied by psycholinguists.Some research suggests that predictive processing may be attenuated by lack of experience with reading (Favier, Meyer, & Huettig, 2021;Huettig & Brouwer, 2015;Mani & Huettig, 2014;Mishra, Singh, Pandey, & Huettig, 2012;Ng, Payne, Stine-Morrow, & Federmeier, 2018).In our study, we assess whether such population-level factors modulate the extent of anticipatory processing in Tseltal.In what follows, we first describe existing models of sentence comprehension that assume prediction as a key mechanism for language processing and then discuss eye-tracking studies that support prediction during sentence comprehension.We then discuss the role of prediction in language processing and some of its limitations.Then, we turn to Tseltal and introduce some of its linguistic properties.Finally, we present our experiment and discuss its results.

Anticipatory processing during sentence comprehension
Recent theoretical accounts suggest that prediction is a fundamental property of human information processing.In the domains of perception, action, and learning, the brain is assumed to generate context-based predictions that guide and anticipate our processing goals (Bar, 2007(Bar, , 2009;;Clark, 2013;Friston, 2010).Many psycholinguists subscribe to the view that prediction, the pre-activation of linguistic input, is also central for language processing.Pickering andGarrod (2007, 2013) ascribe "a central role to prediction in language production, comprehension, and dialogue."Chang, Kidd, and Rowland (2013) argue that "prediction in processing is a by-product of language learning," though language acquisition constraints are critical for learning the syntactic and semantic representations that support prediction (Kidd, 2012;Rowland, Chang, Ambridge, Pine, & Lieven, 2012).Federmeier (2007; see also Federmeier & Kutas, 1999) concludes that the brain is continuously "thinking ahead" during sentence comprehension, using context to predict upcoming information while processing it at multiple levels.As a final example, Altmann and Mirković (2009) write that "most likely […] prediction has a neural basis that pervades cortical function."In short, prediction is thought to be an essential component of language comprehension that allows people faster and more efficient processing of grammatical structure.
Predictive processing has been demonstrated at various levels of linguistic structure using a variety of experimental methods (e.g., reaction times, electrophysiology, eye movements).Listeners use the information they have processed (linguistic and non-linguistic) to anticipate what will come next (Altmann & Mirković, 2009;DeLong, Groppe, Urbach, & Kutas, 2012;Federmeier & Kutas, 1999;Pickering & Garrod, 2013;Van Berkum et al., 2005;Wicha et al., 2004).Electrophysiological evidence shows that readers can form expectations about the phonological form of the word that will come next in sentences such as "The day was breezy, so the boy went outside to fly…a kite vs. an airplane" (e.g., DeLong et al., 2005; but see Nieuwland et al., 2018 for a more nuanced account).People read words faster when the word can be predicted from context, for example, predictable words are fixated less than unpredictable ones in highly-constraining contexts (e.g., Ehrlich & Rayner, 1981;Rayner & Well, 1996) or when the transitional probability between two words is greater than zero (McDonald & Shillcock, 2003).In addition, it has been demonstrated that syntactic structure can be predicted in coordinated sentences (Staub & Clifton, 2006) as well as in processing long-distance dependencies (Traxler & Pickering, 1996;Traxler, Bybee, & Pickering, 1997).Finally, there is evidence of prediction in every-day conversation.People predict upcoming turns in conversation from a variety of cues (e.g., pitch, the overall probability of turn durations of an interlocutor's current utterance, etc.) (Magyari, De Ruiter, & Levinson, 2017;Sacks, Schegloff, & Jefferson, 1974).

The visual world paradigm and sentence comprehension
Predictive processing during language comprehension has been studied using the visual world paradigm (VWP) (see Huettig, Rommers, & Meyer, 2011 for a review).This paradigm (Cooper, 1974;Tanenhaus, Spivey-Knowlton, Eberhard, & Sedivy, 1995) provides a window into how visual perception, attention, working memory, and auditory input jointly determine the way speech is processed.Using the VWP, anticipatory eye movements toward items within a visual display are taken as evidence for anticipatory processing.
In a seminal study, Altmann and Kamide (1999) demonstrated that semantic information extracted from verbs in English becomes rapidly available to listeners and guides their visual attention to an appropriate referent before it is mentioned.Participants' eye movements were recorded while they inspected a semirealistic scene that depicted, for example, a cake, a toy car, a ball, or a toy train, and listened to sentences such as "The boy will eat the cake" or "The boy will move the cake."When participants heard the verb "eat," they directed their eye gaze anticipatorily toward the only edible item depicted (e.g., cake).In contrast, eye movements to the cake started much later when participants heard the verb "move."This pattern of looks was interpreted as reflecting anticipatory processing of an upcoming grammatical object that satisfied the semantic restrictions of the verb.
Using versions of this paradigm, subsequent work on English and a number of other languages has provided further evidence that prediction is important for sentence comprehension and that language users can rapidly integrate different kinds of information to generate predictions about what they will hear next.Kamide, Scheepers, and Altmann (2003) showed for German that case-marking information combined with verb semantic constraints can be used to predict upcoming post-verbal noun phrases (NPs).They found a similar effect for English post-verbal NPs driven by the combination of verb semantics and voice information (active vs. passive marking).Kamide, Altmann, and Haywood (2003) found that in both Japanese and English, information from multiple constituents can be rapidly integrated and used to predict upcoming elements.For English, listeners used information from the verb and its direct object to predict an upcoming goal argument (e.g., "bread," in "The woman will spread butter on the bread").In Japanese verb-final ditransitive constructions (e.g., Waitress-NOM customer-DAT merrily hamburger-ACC bring, "The waitress will merrily bring the hamburger to the customer"), the combined information of the first and second NPs allowed the prediction of the third argument before hearing the verb, and therefore, anticipation of the action itself.Anticipatory processing is not only driven by linguistic input; Knoeferle, Crocker, Scheepers, and Pickering (2005) demonstrated, for example, that information from visually presented events can be rapidly integrated with linguistic information to anticipate object referents in German verb-second and verb-final constructions.
The picture that emerges from these studies is that listeners exploit the particular "affordances" of their language as they become available in order to generate predictions about upcoming language input.These findings, together with the wealth of converging evidence from electrophysiological and reading time studies, point to an important, perhaps crucial role for prediction in language processing.

The role of prediction in language processing and its limitations
Experimental evidence of anticipatory processing in different fields such as language, action-perception, and motor control has led to a widespread scientific consensus that prediction is a fundamental property of human language processing.Clark (2013) has claimed that "brains…are essentially prediction machines."There is, however, also evidence to suggest that prediction, while important, may not be crucial for processing language.These studies suggest that comprehenders do not always engage in predictive processing mechanisms and that some populations tend to rely less on predictive processing without compromising language comprehension.A good deal of research that derives from studies of language acquisition, second language learning, individual differences and cognitive resource limitations (e.g., working memory), and from different populations such as healthy aged adults and illiterate speakers has shown either no effects of predictive processing or only weak engagement of predictive mechanisms during language comprehension.
Reading ability (Favier et al., 2021;Huettig & Brouwer, 2015;James & Watson, 2013;Mani & Huettig, 2014) and vocabulary knowledge (Hintz, Meyer, & Huettig, 2017;Kukona et al., 2016;Rommers, Meyer, & Huettig, 2015) have been related to anticipatory languagemediated eye movements in adults.It has been suggested that formal literacy enhances predictive language processing in a number of ways including by facilitating the processes that involve the retrieval of associated words and the pre-activation of the grammatical representations of upcoming words (see Huettig & Pickering, 2019).Eye-tracking studies have shown that adult high literates predict object names following a sentential context, while low literates do not (Mishra et al., 2012).In addition, proficient readers can rapidly use phonological and semantic information to anticipate upcoming referents, while less proficient readers rely on these types of information to a lesser extent and do not engage in anticipatory language processing (Huettig, Singh, & Mishra, 2011).An important aspect of literacy is vocabulary knowledge.Recent studies suggest that anticipatory eye movements are associated with high vocabulary scores in adults.In three eye-tracking visual world studies, Hintz et al. (2017) evaluated the effects of different potential predictors (e.g., functional and general associations, receptive vocabulary knowledge, production skills, and non-verbal intelligence) of verb-mediated anticipatory eye movements and found that vocabulary knowledge is a robust predictor of anticipatory eye movements during language comprehension.
Finally, another potential mediating factor of prediction in language processing is language itself.To date, predictive processing has only been studied for a small handful of the world's languages (predominantly English, Dutch, German-all closely related Germanic languages-with just a few studies on East Asian languages, e.g., Japanese, Korean, Chinese).Given how dramatically languages vary from one another at all levels of organization (Evans & Levinson, 2009), an important question is whether and to what extent different kinds of linguistic structures themselves might attenuate or facilitate prediction during language processing, and how linguistic context might interact with other factors that have been shown to moderate prediction, such as individual differences in working memory, bilingualism, or literacy levels.

The present study
The goal of this study is to expand the empirical base of research on predictive language processing by investigating sentence comprehension in Tseltal, a Mayan language spoken in the state of Chiapas in southern Mexico.As we describe in detail below, Tseltal has a number of grammatical properties that may facilitate the generation of predictions during sentence processing.On the other hand, Tseltal is largely spoken in rural communities where there are lower levels of education and literacy compared to the kinds of populations typically studied by psycholinguists.Possibly, any facilitating effect of language structure on predictive processing in Tseltal might be modulated by the literacy levels of the population and lack of experience with experimental procedures.
To examine whether Tseltal listeners use information provided by the sentence-initial verb to anticipate an upcoming object, we carried out a visual-world eye-tracking study in which participants heard simple verb-initial transitive sentences while looking at visual displays.The paradigm employed is adapted from Altmann and Kamide's (1999) study.We outline the properties of Tseltal most relevant for the present study in detail below (for a full grammatical description of the language, see Polian, 2013), and we then turn to the experiment and its results.

The Tseltal language
Tseltal is a Mayan language spoken in Mexico by over 400,000 people (INEGI, 2010).One of the most striking features of Tseltal grammar is its constituent order: verb-object-subject (VOS) for transitive sentences (Robinson, 2002).Only about 2% of the languages in the world follow this basic word order (Dryer, 2013).Tseltal is also a "head-marking" language: verbs carry agreement markers that index the grammatical roles as well as the person and number of the verb's arguments.Head-marking in Tseltal is sensitive to transitivity (i.e., it is ergatively aligned): the agent (grammatical subject) of active transitive verbs is cross-referenced on the verb distinctly from the single argument (grammatical subject) of intransitive verbs, which is instead marked the same way as the object argument of a transitive verb.The two sets of markers on verbs are customarily labeled "set A" (for the prefixes that encode information about the agent argument of transitive verbs) and "set B" (for the suffixes that index the object/intransitive subject) (Polian, 2013).Both set A and set B markers differentiate between 1st-2nd-3rd person singular and plural.
The structure of a basic Tseltal transitive sentence is given in Many basic-level transitive verbs 2 in Tseltal are highly semantically specific, imposing strong restrictions on the types of objects that can be involved in the event described by the verb.There are, for example, many different verbs for describing transitive events of eating, carrying, breaking, washing, and placing objects.These verbs encode properties of the objects that they select for, such as their shape, substance, or position (ti' is an eat verb for meat foods; top is a break verb for pottery items, etc.).They are in frequent use, morphologically simple (monomorphemic), and are not restricted to any particular register (Brown, 2008).

Task and predictions
We used a visual world eye-tracking task to test whether Tseltal listeners engage in anticipatory processing when comprehending simple VOS transitive sentences.In this task, participants listened to pre-recorded spoken sentences while viewing displays featuring an array of simple objects (see Fig. 1).Displays showed one target referent (for example, an avocado) and three distractors (e.g., bag, toy car, coffee grinder).We manipulated verb type (predictive: the verb can only refer to one object in the display, e.g., "eat (soft things);" general (control): the verb can refer to any of the items, e.g., "look at") and we recorded participants' eye movements while they listened and inspected the visual scene.
If Tseltal listeners use information at the sentence-initial verb to predict the upcoming object, then they should look anticipatorily at the target object (the avocado) more in the predictive condition than in the general condition.Previous studies on English have shown that presenting a sentence-initial agent in combination with a predictive verb (e.g., eat) is enough to yield anticipatory eye movements to an upcoming object (e.g., cake) (Altmann & Kamide, 1999).Other studies have shown that a specific agent in combination with a verb (e.g., man rides vs. girl rides) constrains anticipatory eye movements to the most plausible direct object (e.g., The man will ride the motorbike vs.The girl will ride the carousel) (Kamide, Altmann, et al., 2003).Here, we wish to know whether, when listening to VOS sentences, there are sufficient linguistic cues from the sentence-initial verb without its subject/agent (which occurs last in the sentence) to enable Tseltal listeners to predict the upcoming object.
The only eye-tracking comprehension studies to so far be undertaken on a verb-initial language are studies of anticipatory processing in Tagalog (Austronesian).Sauppe (2016) found that verbal information in Tagalog drove anticipatory eye movements toward the agent of the sentence, not the patient, regardless of its syntactic function (subject or object) and its position in the sentence (immediately after the verb or sentence finally).In contrast, Garcia, Garrido Rodriguez, and Kidd (2021) found that morphosyntactic markers are used by Tagalog adults to anticipate the upcoming agent but only in patient-voice sentences (compared to agent-voice utterances).These results might reflect an agent bias during comprehension (Bickel, Witzlack-Makarevich, Choudhary, Schlesewsky, & Bornkessel-Schlesewsky, 2015;Bornkessel & Schlesewsky, 2006;Bornkessel-Schlesewsky & Schlesewsky, 2013a, 2013b;Cohn & Paczynski, 2013;Kemmerer, 2012;Sauppe, 2016;Wang, Schlesewsky, Bickel, & Bornkessel-Schlesewsky, 2009).Our displays in the Tseltal experiment, unlike those of Sauppe and Garcia et al., do not depict agents, and therefore, we do not measure shifts of attention to the agent in our study.Nevertheless, if such an agent bias is also operative during Tseltal comprehension, it is possible that this might attenuate predictive looks to the target object.On the other hand, as we outline below, transitive verbs in Tseltal have semantic and morphological properties that strongly orient toward object referents and that may therefore enhance listeners' ability to predict upcoming objects.
We hypothesised that three particular features of Tseltal verbs would facilitate the use of predictive processing in the predictive condition.First, sentence-initial verb placement: unlike most previously studied languages, Tseltal verbs come first in the sentence.This means that already by the first word of a simple sentence, listeners have information about the event structure of the unfolding utterance.Moreover, because the subject NP (or any other verbal argument) has not yet been mentioned, listeners do not have to devote cognitive resources to integrating the subject argument into the ongoing parse, potentially freeing up resources for predictive processing.
Second, head-marking morphology: Tseltal's head markers indicate the argument structure of the clause (whether the verb selects for one-or two-core arguments) and the person and number of those arguments.This means that having parsed the initial head-marked verb, listeners already know whether the clause will contain an object referent, and what the person and number features of that referent will be.This information arguably serves to restrict the possibility space when generating predictions about upcoming arguments.
Third, verb specificity.As described above, many transitive verbs in Tseltal are highly specific; their semantics strongly restricts the range of objects they can select for.Brown (2008) draws an interesting connection between the high semantic specificity of many of Tseltal's transitive verbs and a salient property of spoken Tseltal: in natural speech, object arguments are very often not mentioned.She suggests that the reason object ellipsis is so frequent in Tseltal speech is because objects are often easily recoverable from the verb's semantics.Supporting this possibility, Brown found in a corpus study that in both adult and child speech, object ellipsis occurred more often with semantically specific transitive verbs than with light verbs such as "give" and "see," which do not restrict the referent of their object argument (see also Resnik, 1996, for a similar correlation in English).If listeners are able to easily recover the object referents of semantically specific verbs, they should find it easier to anticipate them, compared to the referents of semantically general verbs.
Together, these three features of Tseltal transitive verbs (initial placement, head-marking morphology, and semantic specificity) may facilitate the prediction of object arguments during the comprehension of transitive clauses.There are, however, population-level factors that may influence the extent to which Tseltal listeners engage in predictive language processing.As discussed above, several studies have shown that language-mediated prediction may be modulated by formal literacy (Huettig & Mani, 2016;Huettig et al., 2011).In a study of Hindi-speaking low and high literates, for example, low literates used information from the unfolding spoken words to direct their eye gaze, but unlike high literates, they did not use such information for prediction (Huettig et al., 2011).Tseltal is largely spoken in communities where there are low levels of education and literacy.If the attenuating effects of literacy are stronger than the facilitating effects of language structure, then we may not find robust evidence for predictive processing.Furthermore, life in Tseltal communities revolves primarily around subsistence agriculture and there is limited access and exposure to technology such as computers and to experimental settings.It has been suggested that people's ability to predict upcoming events is influenced by their level of expertise in the task at hand (e.g., sports psychology research: Aglioti, Cesari, Romani, & Urgesi, 2008;Mori, Ohtani, & Imanaka, 2002;Starkes, Edwards, Dissanayake, & Dunn, 1995;Williams, Ward, Knowles, & Smeeton, 2002).The fact that our Tseltal population is less specialized than the usual populations under study may therefore also affect the extent to which listeners generate predictions in experimental settings.
In the present study, we recruited Tseltal participants with a range of different educational levels, allowing us to explore indirectly the extent to which these population-level factors might interact with the effects of linguistic context (verb specificity/highly predictive verbs) on object prediction.This is not a study of literacy and its effects on predictive language processing.However, given the literature mentioned above, it is important to consider our study population's characteristics.We additionally took into account participants' level of bilingualism (in Spanish), as this has also been shown in previous studies to affect comprehenders' reliance on predictive processing (Contemori & Dussias, 2020;Dijkgraaf et al., 2017;Zirnstein et al., 2018).
Finally, we tested whether participants learned to predict over the course of the experiment-one hypothesis is that as they receive regular input about the language structures used in the experiment, and as they get more exposure to the experimental setup itself, they might get better at generating predictions about the object argument.Thus, given that the population is not "specialized" at the outset, they may gain enough specialization over the course of the experiment to alter their processing strategies.

Participants
Sixty-seven native speakers of Tseltal from the indigenous community of Majosik' were recruited and paid for their time to take part in this study.Majosik' is a small rural community of ∼1100 inhabitants that is located in the municipality of Tenejapa (Chiapas, Mexico).The population is largely monolingual and lives along traditional lines.Sixteen participants had to be excluded due to bad calibration of the eye tracker (n = 4), external distractions (n = 2, e.g., the participant could not pay attention because she had to attend to her child, etc.), they fell asleep during the experiment (n = 8, e.g., before the experiment they had been working the land), or the participant did not want to continue with the experiment (n = 2).Our final sample consisted of 51 participants (30 women, Mage = 25.2 years, SD = 6.8, range: 16-42 years).
Conducting experiments in the field comes with the risk of higher participant exclusion rates than might be expected under ideal lab-based conditions (e.g., due to failure to understand or complete the experiment, technical issues such as power outages; see Norcliffe, Harris, &Jaeger, 2015 andWhalen &McDonough, 2015 for discussion of the practical challenges of field-based experimental research).For these reasons, it is typical to run a higher number of participants than in an equivalent lab-based experiment, on the expectation that exclusion rates will be higher.In our study, we tested 67 participants, of which we were able to retain 51.In this context, we note that this sample size is somewhat higher than previous lab-based visual world studies that have observed verb-mediated anticipatory effects, which have had between 20 and 40 participants, (see, e.g., Altmann & Kamide, 1999;Boland, 2005;Kako & Trueswell, 2000;Kamide, Scheepers, et al., 2003) but comparable to other eyetracking studies (production and comprehension) conducted in the field (Norcliffe et al., 2015 We administered a short oral questionnaire in Tseltal asking participants to state their level of education (formal schooling) and knowledge of Spanish: whether they spoke the language, if they understood it, and whether they read in Spanish or not.Indigenous students attending school in Majosik' are taught both in Tseltal (L1) and Spanish (L2) during primary education.After that, instruction is provided only in Spanish (SEP, 2017).Indigenous schools are multigrade: students in the same course are monolingual speakers (Tseltal) or have different levels of bilingualism (speaking another indigenous language or Spanish).The mean education level of participants was 7.8 years of schooling (SD = 3.3, range: 0-12 years).In Mexico, seventh grade is equivalent to the first year of middle school (e.g., the first year of Junior High School).Participants' level of education varied widely.Three were illiterate, 14 participants had completed primary school (6 years of formal schooling), 15 had completed secondary school (9 years of schooling), and 11 participants had completed high school/ COBACH (12 years of formal schooling).The rest of the participants had received some level of formal education (n = 2: 3 years; n = 2: 4 years; n = 1: 5 years; n = 1: 7 years; n = 2: 11 years of education).Thirty three participants reported knowing some Spanish (speaking, reading, and writing) and the rest described themselves as monolingual Tseltal speakers.All participants gave written or oral consent before the experiment.The study was approved by the ethics board of the Faculty of Social Sciences of the Radboud University, Nijmegen.

Materials and design
The stimuli consisted of 32 pairs of transitive active sentences recorded in two experimental conditions: one sentence contained a predictive verb (e.g., "eat-soft thing") and the other sentence a general verb (e.g., "look for").Example sentences are provided in [2].Each sentence pair was accompanied by a visual display that depicted four items.Only one of the items fulfilled the semantic restrictions of the predictive verb (i.e., the target direct object), while the three other items were distractors.The four depicted items were possible referents of the general verb condition (see Fig. 1).In all sentences, the aspect and the verb were mentioned first, followed by an adverbial phrase (ta stukel, "by himself/herself"), the direct object, and finally the subject.The adverbial phrase was included to give participants more time to process verbal semantic information and direct their gaze anticipatorily to the target object.Similar stimuli sentence structures have been used in the previous research (Garcia et al., 2021;Kamide, Scheepers, et al., 2003;Sauppe, 2016).The sentences were recorded with neutral intonation by a male native speaker of Tseltal.The recordings were sampled at 44,100 Hz.The onsets and offsets of all words were marked using Praat (Boersma, 2002).
The visual displays were created using photographs of familiar items in daily use, which were taken at the field site by the researchers.These were supplemented with culturally appropriate free images that were available on the internet (Google Images).The images were processed using Photoshop so that they all had the same white background.Thirty two transitive sentences were used as fillers and each of them was paired to a visual display with four items.We followed Altmann and Kamide's (1999) within-subjects experimental design with respect to the filler sentences: these were divided into four sets of eight sentences each.The direct object mentioned in the filler sentences was never depicted visually.However, the depicted items were chosen in a way that they could be possible referents of the verb.For example, in set 1, three of the items fulfilled the selectional requirements of the verb, while one item was incompatible with the semantics of the verb.In set 2, two items depicted were possible referents of the verb and two items were not.In set 3, only one item fulfilled the semantic requirements of the verb, while the others were distractors.In set 4, none of the items depicted were possible targets of the verb.
Two lists of stimuli were created containing only one version of each sentential condition together with its accompanying visual display.The stimuli were arranged in a fixedrandom order so that every experimental condition was followed by a filler sentence and the target items depicted would be in a different position on the screen from the previous trial.Participants were randomly assigned to the lists.The stimuli materials and a description of the objects in each scene are given in the Supporting Information (Section A).Fig. 2. Trial example.A fixation cross appeared for 1 s, and this was followed by a 1200 ms visual preview of the visual display.After this, a sentence was played through the headphones while participants looked at the visual display.

Apparatus and procedure
Before the test session, participants read or were given overt instructions about the experiment in Tseltal.They were asked to complete a short questionnaire on their linguistic and educational background.After this, they were asked to sit in front of a 17'' laptop computer with a resolution of 1024×768 pixels at a distance of approximately 58 cm.Their eye movements were recorded with an SMI RED-M eye tracker (SensoMotoric Instruments, Teltow, Brandenburg, Germany), which was attached to the base of the computer's screen, sampling at 120 Hz.The auditory stimuli were presented via headphones.Overt instructions were given for a second time by a native speaker of Tseltal.Participants were asked to listen to the sentences carefully and were told that they could look at whatever they wanted on the screen.They were not asked to perform any explicit task (i.e., this was a "look and listen" task, e.g., Huettig & Altmann, 2005;Huettig & McQueen, 2007;Huettig et al., 2011).It has been found that in the absence of any metalinguistic task, participants shift their visual attention around the scene as the acoustic stimulus unfolds (see Huettig et al., 2011 for discussion).
Each trial began with a centrally located fixation cross that appeared for 1000 ms.This was followed by a 1200 ms visual display preview that preceded the auditory stimulus.After this preview time, the sentence was played over the headphones, while the display remained in view until the end of the trial.The experimental and filler visual displays were presented for 6000 ms in total.When the display disappeared, a fixation cross appeared in the center of the screen signaling the next trial (see Fig. 2).There were four practice trials before the main experimental block.Before the practice and experiment session, the eye tracker was calibrated using a nine-point fixation dot.Calibration took about 20 s.The entire session lasted approximately 30 min.

Data analyses
The auditory sentential stimuli were segmented in Praat (Boersma, 2002) to measure the mean duration of verbs, adverbial phrases, direct objects, and subjects across the two sentential conditions in order to account for differing word lengths across our stimuli (see Table 1).We calculated the proportion of fixations to the target (e.g., avocado) and to the averaged distractor objects (e.g., market bag, toy car, coffee mill) and white space (empty areas on the screen) and their corresponding 95% confidence intervals (calculated by-participant and by-item for each sampling step) for both the predictive verb condition and the general verb condition.Fig. 3 illustrates the time course of fixations to the target and averaged distractors in the two sentential conditions.
We used multilevel logistic regression (cf.Barr, 2008Barr, , 2013;;Jaeger, 2008) with random intercepts and slopes to analyze the dependent variable (i.e., fixations to the target: categorical dependent variable coded as 1 = yes, 0 = no) as a function of verb type (coded as 1 = predictive verb, 0 = general verb) and time (continuous variable measured in milliseconds).We estimated the contribution of verb type to the anticipatory eye movements toward the target object referent and how these eye fixations changed over time in two different time windows (TWs).The first time window (TW 1), reflecting the Verb + Adverb region, included the aspect marker, the verb, and the adverbial phrase, and was defined as from 200 ms after aspect onset until 200 ms after adverb offset (duration: M = 1498.39ms, SD = 172.99ms).TW 1 is our predictive window.We expect verb-mediated anticipatory eye movements toward the appropriate object in this TW.The second time window (TW 2) covered the auditory presentation of the first NP, the target Object.TW 2 started 200 ms after the onset of the object and ended 200 ms after the object was mentioned (duration: M = 494.35,SD = 113.79).This TW serves a control purpose: we expect fixations only to the target object during this region because participants are hearing the target noun in both sentential conditions.All of our analysis TWs were segmented according to the different regions of interest (i.e., word durations: Verb + Adverb and Object regions).For each analysis TW, we added 200 ms to Verb + Adverb and Object word onsets to adjust for the time it takes to program and launch a saccadic eye movement (Duchowsky, 2007;Matin, Shao, & Boff, 1993;Saslow, 1967).The SMI Red-m eye tracker (sampling rate 120 Hz) sampled the eye position every 8.3 ms (e.g., fixations, saccades and blinks).Therefore, to account for variations in the duration of regions across stimuli due to differing word durations and the fixation pattern of each participant, the duration of each TW was standardized.That is, each TW was centered around the grand mean and divided by the standard deviation of the specific TW.We considered only fixations that occurred within the specific TW to compute such values.Thus, the fixed effect of time within each model has its own time scale and a value of 0 in this variable represents the scaled grand mean of the TW of interest.More details about the operationalization of the time variable can be found in an OSF repository (https://osf.io/gqv5c/?view_only=7fce527ed0a344888d2ea2d5dd80c5e9).
We conducted multilevel logistic regression models that included our main predictors of interest, verb type and time, and their interaction (Agresti, 2019).The random effects structure included intercepts and slopes for verb type and time calculated for participants and items.The maximal random effects structure justified by design and that allowed the models to converge was used (Barr, 2013;Barr, Levy, Scheepers, & Tily, 2013).In addition, we conducted an exploratory analysis in which participants' level of education (continuous measure that ranged from 0 years of schooling to 12 years of formal education) and knowledge of Spanish (sum-coded as yes= 0.5, no= −0.5; see Brehm & Alday, 2022;Schad, Vasishth, Hohenstein, & Kliegl, 2020;Venables & Ripley, 2002) were included as fixed effects to explore the extent to which these population characteristics might interact with the effects of verb specificity on object prediction.The effect of experiment (i.e., number of trials) was also included as a predictor (sum-coded as first part of the experiment = −0.5 vs. second part = 0.5) to test whether participants had learnt to anticipate an object target over the course of the experiment.
To evaluate the contribution of each predictor and see if it improved model fit, we generated models from the more reduced version (only one predictor) to the most complex one with the relevant effects and interactions (evaluated via forward model comparison, using the likelihood ratio test criterion) (Pinheiro & Bates, 2000).All models were fit with the Laplace approximation for maximum likelihood using lme4 (version 1.1-21) function (Bates, Machler, Bolker, & Walker, 2015).Confidence intervals (95%) are provided for the regression coefficients (Snijders & Bosker, 2012).All statistical analyses were conducted in R (version 3.6.1).The model that best described the data had an interaction of verb type and time and random intercepts and slopes of our main predictors for participants and items.The summaries of model fit of the interaction of verb type and time during TW 1 (Verb + Adverb region) and TW 2 (Object region) are given in Table 4.We also report the estimates of the variance of the distribution of the random effects.Additional model summaries that included level of education, knowledge of Spanish, and part of the experiment can be found in the Supporting Information (Section B).All analyses are available in an OSF repository (https://osf.io/gqv5c/?view_only=7fce527ed0a344888d2ea2d5dd80c5e9).

Results
The mean duration of verbs, adverbial phrases, direct objects, and subjects across the predictive and general sentential conditions can be found in Table 1.There were no significant differences in word duration across the two sentential conditions.Fig. 3 shows the time-course graph of the proportion of fixations to the target object (i.e., avocado-like items) when participants heard the predictive verb "eat-soft things" (solid line), compared to fixations to the same object when hearing the general verb "look" (dotted lines).Visual inspection of the graph shows that fixations toward the appropriate object started to increase during the Verb + Adverb region in the predictive verb condition and continued to increase during the Object region and all the way until the subject was heard.In contrast, in the general verb condition, participants directed their attention toward the target only once it was mentioned (i.e., during the Object region).Then, fixations to the target increased once the object was heard and continued until the end of the sentence.This pattern of fixations suggests that Tseltal speakers anticipated the upcoming direct object before it was encountered in the sentence.In addition, participants fixated the target object in both sentential conditions once the linguistic expression referring to that object was encountered in the utterance, and they continued fixating it until the end of the sentence.
Table 2 presents the proportion of fixations to the target as a function of verb type and knowledge of Spanish (during TW 1).It shows that the proportion of fixations to the target was very similar across verb types and between participants with and without knowledge of Spanish.Descriptive statistics of fixations to the target as a function of verb type and education (TW 1) can be found in Table 3.The fixation distributions were very similar across participants' years of formal schooling and verb type with subtle differences in some groups.The proportion of fixations to the target was higher for the illiterate participants (note that there are only three illiterate participants in the sample), for those with 4 years of schooling and for participants with 9-12 years of education, in the predictive verb condition.Finally, Table 4 summarizes the estimated regression coefficients and variance components of the   multilevel logistic regression model consisting of the interaction between verb type and time during TW 1 (Verb +Adverb region) and TW 2 (Object region).The model that best described the data had an interaction of verb type and time and random intercepts and slopes of our predictors for participants and items.We discuss each TW next.TW 1 Verb + Adverb region: There was no significant effect of time (β = 0.06, SE = 0.09, 95% CI [−0.12, 0.23]), but there was a significant effect of verb type (β = 0.54, SE = 0.15, 95% CI [0.25, 0.83]), suggesting that Tseltal speakers are more likely to fixate the target when hearing a predictive verb compared to a general one.In addition, there was a significant interaction between verb and time (β = 0.41, SE = 0.12, 95% CI [0.18, 0.64]).Thus, speakers are more likely to fixate the target when hearing a predictive verb and as more verbal information becomes available over time.The results show that Tseltal participants anticipated the target object before it was mentioned, during this TW.
TW 2 Object region: There were significant effects of verb type (β = 0.88, SE = 0.25, 95% CI [0.40, 1.36]) and time (β = 0.53, SE = 0.13, 95% CI [0.28, 0.78]), suggesting that participants were more likely to fixate the target, both when hearing a predictive verb compared to a general one and as time increased.The interaction between verb type and time was not significant (β = −0.26,SE = 0.16, 95% CI [−0.58, 0.06]).As expected, participants directed their visual attention to the object that was being mentioned during this TW.We conducted complementary analyses to test the effects that other factors might have on object prediction.Specifically, these models compared fixations to the target as a function of verb type, time, education, knowledge of Spanish, and section of the experiment.The interactions between these predictors and verb type were included as well to address effects specific to predictive (vs.general) verbs.The models were estimated only during TW 1 (Verb + Adverb region) because this is the predictive region of interest, where we could test whether these factors have an influence on verb-mediated anticipatory processing.The model summary can be found in the Supporting Information (Section B, Table S.B1).We found that none of the additional predictors evaluated had an effect on fixating the target.In addition, there were no significant interactions between education and verb type (β = −0.01,SE = 0.04, 95% CI [−0.08, 0.06]), knowledge of Spanish and verb type (β = 0.04, SE = 0.25, 95% CI [−0.45, 0.53]), and section of the experiment and verb type (β = 0.24, SE = 0.24, 95% CI [−0.23, 0.71]).Nevertheless, there were significant main effects of verb and time and a significant interaction between the two (β = 0.36, SE = 0.01, 95% CI [0.34,0.38] the results suggest that education, knowledge of Spanish, and section of the experiment do not play a role in verb-mediated anticipatory eye fixations in Tseltal.

General discussion
While predictive processing is often assumed to be a central, if not crucial aspect of language comprehension, very little evidence from typologically diverse languages has been marshaled to support this view.Our study makes a small contribution toward filling this empirical gap by investigating real-time sentence comprehension in Tseltal, a verb-initial Mayan language.Using the VWP, our goal was to test whether Tseltal speakers use verbal information, which is provided upfront in the sentence, to anticipate the upcoming grammatical object.Our study followed Altmann and Kamide's (1999) study design, adapted to a smaller community of non-Western speakers to assess whether there are similar effects to those found in the literature for subject-initial languages.
In our experiment, Tseltal speakers listened to verb-initial transitive sentences while seeing a visual display showing one potential referent and three distractors.We manipulated verb type (predictive vs. general) and recorded participants' eye movements while they listened and inspected the visual scene.We estimated the contribution of verb type to anticipatory eye movements toward the target object referent in two different TWs.In the first TW, which covered the initial verb (and its aspect marker) together with the adverbial phrase, we found a significant effect of verb type, indicating that participants fixated the target more during this window in the predictive condition compared to the general condition.There was also a significant interaction between verb type and time, showing that fixations to the target increased over time when hearing a predictive verb.Fixations to the target remained until the object was heard (the second time window) in the predictive condition.In contrast, in the general verb condition, participants directed their attention to the relevant object in the visual display only when the word for it was encountered in the sentence.By the time the object was heard in both sentential conditions, participants were fixating the only object depicted visually that matched that referring expression.
This pattern of language-mediated eye movements shows two things: (a) there are anticipatory looks to the most plausible referent depicted that will follow a verbal expression; and (b) Tseltal participants use verbal information to direct their visual attention toward the external world (i.e., the visual display).These results replicate what has been previously found in the literature for subject-initial languages (Altmann & Kamide, 1999;Arai & Keller, 2013;Boland, 2005;Hintz et al., 2017;Kako & Trueswell, 2000;Kamide, Altmann, et al., 2003;Kamide, Scheepers, et al., 2003;Knoeferle et al., 2005;Mani & Huettig, 2012;etc.)that verbal information is extracted very quickly and guides anticipatory looks to whichever object in the visual display satisfies the selectional restrictions of the verb.
In line with previous studies, the results suggest that conceptual overlap between objects in the visual display and the unfolding language is one way that mediates the direction of visual attention (Altmann & Kamide, 2007;Altmann & Mirković, 2009;Kamide, Alrmann, et al., 2003;Tanenhaus, Magnuson, & Chambers, 2000).The items depicted in the visual  Huettig & McQueen, 2007;McQueen & Huettig, 2014) possibly creating an "episodic trace" (Altmann & Kamide, 2007, p. 512).When linguistic input is presented, the verb's own specific semantic features are activated.Because the semantic features of the predictive verbs match only one of the objects depicted visually, the matching activations between the object and its referring linguistic expression cause a shift of attention toward the target object (Altmann & Kamide, 2007;Altmann & Mirković, 2009).
What is interesting about the Tseltal results is that semantic information encoded in the verb can rapidly guide eye movements, despite the fact that Tseltal verbs come first in the sentence.It is possible that the absence of an initial subject NP (or any other verbal argument) may, in fact, facilitate anticipatory processing: because the subject NP has not yet been mentioned, listeners do not have to devote cognitive resources to integrating the subject argument into the ongoing parse, which might free up resources for anticipating upcoming arguments.Future work could capitalize on Tseltal's word order flexibility (the grammar permits fronted sentence-initial subjects, see Polian, 2013) to investigate whether anticipatory looks to the object are modulated by the position of the grammatical subject of the sentence.
Predictive processing in Tseltal may also be facilitiated by the particular semantic properties of transitive verbs in the language.A property of Mayan languages is that many verb roots (i.e., transitive and positional verb roots) incorporate into their semantics physical properties (e.g., shape, substance, or position) of the object that they select for (cf.Brown, 2008).In Altmann and Kamide's study, when English speakers hear The boy will eat, the verb eat restricts interpretation possibilities to edible items a boy could eat.In Tseltal, by contrast, when hearing Ya slo' ("He/ She/ It is eating-soft things"), the possible referents are restricted even more to only those edible items that are soft.The referents of Tseltal verbs are more concrete and the range of contexts where they might occur is more easily determined.
An important question for future research concerns the nature of the representations that are activated upon hearing such semantically rich verbs: Are the perceptual properties of upcoming objects (e.g., their shape, their texture) activated in such cases?(see Huettig & Altmann, 2005, 2007;Huettig & McQueen, 2007).Rommers, Meyer, and Huettig (2013) showed that Dutch listeners can predict perceptual attributes that will be referred to in an utterance, even in the absence of a visual depiction of the target word.In their study, participants' eye movements were recorded while they listened to sentences that were predictive of a specific word, for example, "moon" in "In 1969 Neil Armstrong was the first man to set foot on the moon."Participants inspected a visual display with three unrelated distractors and either the target object (e.g., moon), a shape competitor of the target object (e.g., tomato), or an unrelated control object (e.g., rice).Participants looked significantly more at the target object as expected, but interestingly, they also directed their attention toward the shape competitor of the target word more than unrelated objects before "moon" was heard.Rommers et al's data provide evidence that the pre-activation of a predicted concept can also activate the visual representation of that object.In Rommers et al's study, listeners' predictions of perceptual attributes of noun referents were generated from the entire multiword sentential context before the target noun.In Tseltal, by contrast, verbal information alone may be sufficient to activate visual representations of upcoming objects, given the relative "nouniness" of transitive verbs in the language.
The only other visual world eye-tracking studies to investigate anticipatory processing in a verb-initial language, to the best of our knowledge, are Sauppe's (2016) and Garcia et al.'s (2021) studies of Tagalog (Austronesian).In Sauppe's study, participants heard sentences of the type "Eat frog fly" or "Eat fly frog" (both with the meaning "The frog will eat the fly") while inspecting a visual display that depicted an agent, a patient, and a distractor.The verb carried morphological marking that allowed listeners to infer the order and the syntactic status of the agent and patient.After hearing the verb, Tagalog listeners directed their gaze more to the agent, regardless of its syntactic function and position in the sentence.In a different study, Garcia and colleagues found that Tagalog adult comprehenders (the study was also conducted with children) did not always anticipate the agent when hearing verb-initial sentences.In their study, participants heard sentences such as "Bite last Tuesday diligent cow monkey" or "Bite last Tuesday diligent monkey cow" (both meaning "A (diligent) cow was biting a monkey last Tuesday") while viewing a picture depicting a transitive event between two animals (e.g., cow biting a monkey).Voice marking (agent vs. patient) and NP argument order (VAP: agentinitial sentences vs. VPA: patient-initial sentences) were manipulated.Participants anticipated the agent in the patient-voice condition (compared to agent-voice) in agent-initial sentences (compared to patient-initial sentences), thus showing that Tagalog speakers use morphosyntactic markers in the verb to anticipate an upcoming argument.These results might reflect a bias toward agent identification in the early stages of sentence processing (cf.Bornkessel & Schlesewsky, 2006).In the present study, we did not include depictions of agents in the visual display so we cannot determine whether such a bias is also operative in Tseltal.We can conclude, however, that at least in the absence of a competing visual representation of an agent, verbal information rapidly guides listeners' eye movements to a depicted object.In other words, an agent bias does not fully attenuate the rapid anticipation of a grammatical object, at least when a visual agent competitor is not present.We have suggested that the object-oriented nature of Tseltal's transitive verbs may encourage early attention to object referents; this may be a point of contrast with Tagalog.Future work should directly test this by examining whether Tseltal listeners preferentially anticipate agents or patients when both are visually depicted.
It has been suggested that literacy, vocabulary knowledge and second language learning might mediate prediction during language processing.In our study, literacy levels were not manipulated systematically (i.e., we did not include measures from standardized tests on reading abilities, vocabulary size, and level of bilingualism in our experimental design).We instead made the assumption that years of formal (rural) schooling and knowledge of Spanish are somewhat related to literacy skills (and vocabulary size).Bearing in mind that we could only examine these factors in an exploratory way, it is nevertheless notable that we did not find any effects of level of education or knowledge of Spanish as factors mediating anticipatory processing and incremental interpretation of objects in Tseltal.These results are different from those reported in the literature, where participants with low literacy levels showed either reduced or no anticipation of upcoming language input compared to high literates (see Huettig & Brouwer, 2015;Huettig et al., 2011;Mishra et al., 2012).There are two possible (mutually compatible) explanations for these differences.First, the Tseltal speakers had more years of formal schooling (on average 7-8 years) than the low-literate Hindi speakers in Mishra et al. ( 2012) study (2 mean years of formal education).This might suggest that some exposure to formal education already has an effect in facilitating prediction in spoken language (Araújo, Fernándes, & Huettig, 2019;Favier et al., 2021;Huettig & Pickering, 2019).An interesting question in this regard is the extent to which it is education (and therefore literacy) per se that is at issue here, as opposed to education and literacy specifically obtained in the target language.This is relevant because it is only during the first 4 years of formal schooling that Tseltal children are taught to read and write in Tseltal.After that, formal schooling is conducted in Spanish.Further work is required to determine whether the capacity for predictive processing in Tseltal can be tied specifically to those 4 years of schooling in Tseltal and/or whether Spanish literacy levels and vocabulary size are also factors mediating anticipatory processing in Tseltal.
Second, it is possible that Tseltal verbs may have provided a richer set of predictive cues than those used in the Hindi sentences.In this study, we pointed to a number of linguistic features of Tseltal verbs that sets them apart from those of previously studied languages (their rich semantics and morphology) and which may serve to facilitate predictive processing.An important area of future work will be to untangle these various potentially contributing factors to predictive processing in Tseltal.
Finally, we did not find significant effects of the role of experiment associated with anticipatory eye movements.That is, Tseltal participants did not acquire any predictive strategy during the experiment that might have driven anticipatory eye movements toward the target object.Taken together, the data suggest that anticipatory processing of upcoming linguistic information in Tseltal relies only on the verb's meaning, independent of education, knowledge of Spanish, or experiment effects.The visual display together with the semantics of the verb contributed to the speed with which Tseltal listeners activated upcoming constituents.

Conclusions
This study shows that it is possible to create a laboratory setting in the field to investigate an aspect of sentence processing (incrementality and prediction during sentence comprehension) that has received little typological coverage.Our results provide evidence that in Tseltal, a verb-initial Mayan language, predictive processing is an integral part of real-time language processing.This lends support the view that predictive processing during language comprehension might be a universal processing principle.

Fig. 1 .
Fig. 1.Stimulus example of visual display and sentential conditions.The auditory sentences and glosses are provided in Tseltal.The time windows relevant for analyses are indicated with arrows.

Fig. 3 .
Fig. 3. Time course of fixation proportions for target and averaged distractor objects in the predictive and general conditions.Ribbons indicate confidence intervals (95%), calculated for each sampling step.Dotted lines indicate the mean onset and offset of word durations in the sentential conditions.

Table 2
Proportions and standard deviations of fixations to the target as a function of both verb type and knowledge of Spanish (N = 251,717 obs.)Fixations to the target (vs.distractors and white space) during TW 1.

Table 3
Proportions and standard deviations of fixations to the target as a function of verb type and education (N = 251,717 obs.) Note.Fixations to the target (vs.distractors and white space).

Table 4
Regression coefficients and variance components for the multilevel logistic regression models.Fixations to the target were modeled as a function of verb type and time during two time windows: TW 1 Verb + Adverb region and TW 2 Object region ).Overall, 15516709, 2023, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13219 by MPI 378 Psycholinguistics, Wiley Online Library on [19/01/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 15516709, 2023, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13219 by MPI 378 Psycholinguistics, Wiley Online Library on [19/01/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License display activate their own conceptual features (see