How to deal with spelling variation in early stages of French and English

Lay, M; Duchet, J

Conference item

How to deal with spelling variation in early stages of French and English

Abstract:: The efficiency of search engines is based on the principle that the information sought can be retrieved by “looking for words” conveying the information. This amounts to taking for granted that words are always written in the same way. This view, which is well adapted to texts produced in contemporary periods of language history, is not suited to texts produced during the French Renaissance, and what is true for early French is true for early English too.
It is therefore necessary to adapt search engines based on word form identification if they are to render the service expected. Several strategies can be envisaged and the purpose of this paper is to focus on one which resorts to linguistic expertise.
A Java program was developed, which first transforms a list of words into an extended list of forms, using that for a rules set, based on linguistic knowledge about morphology and spelling history. Having done this, the need is to localize the different forms attested in the old spelling in a text, according to the requested form. Hence one can identify two “phases”.
(1) Generating the extended form of the request: at this step, the program generates 3 files; 2 of them are dedicated to synthetic information about the process (using the rules: how often they have been used) and the end result (how many forms generated). The third one is a file containing, for each word, the list of the generated forms as well as the rules used in the process.
(2) Finding the right form within a text: when the extended request is calculated, the ultimate test is to identify all the variants really attested in the text. This is the second phase of our program. The output file of this last part of the process is an HTML file with a graphical highlighting (or bold character) of the identified variant. Moreover, each form is connected to a bubble showing the rules used to derive the variant.
The present paper aims to describe the tool which was first developed in the particular context of the Virtual Humanistic Library, and then to show the way to adapt it to early English.

Publication status:: Not published

Peer review status:: Reviewed (other)

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Lay, M., & Duchet, J. (2012). How to deal with spelling variation in early stages of French and English.

MLA Style

Lay, M, and J Duchet. “How to Deal with Spelling Variation in Early Stages of French and English.” 2012.

Chicago Style

Lay, M, and J Duchet. 2012. “How to Deal with Spelling Variation in Early Stages of French and English.”
Print

Access Document

Files:: LayDuchetVarialog-Oxford2012v5.pdf

(Preview, pdf, 428.0KB, Terms of use)

Authors

+ Lay, M More by this author

Institution:: Université de Poitiers/University of Poitiers
Role:: Author

+ Duchet, J More by this author

Role:: Author

Publication date:: 2012-01-01

UUID:: uuid:36790232-3d92-4eba-8f56-09ca307362f9
Local pid:: EEBOTCP:7
Deposit date:: 2012-12-05
ARK identifier:: ark:/29072/ora_367902323d924eba8f5609ca307362f9

Terms of use

Copyright holder:: Lay and Duchet

Licence:: Terms and Conditions of Use for Oxford University Research Archive

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Conference item

How to deal with spelling variation in early stages of French and English

Actions

Access Document

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Conference item

How to deal with spelling variation in early stages of French and English

Actions

Access Document

Authors

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions