Conference item icon

Conference item

How to deal with spelling variation in early stages of French and English

Abstract:

The efficiency of search engines is based on the principle that the information sought can be retrieved by “looking for words” conveying the information. This amounts to taking for granted that words are always written in the same way. This view, which is well adapted to texts produced in contemporary periods of language history, is not suited to texts produced during the French Renaissance, and what is true for early French is true for early English too.

It is therefore necessary to adapt search engines based on word form identification if they are to render the service expected. Several strategies can be envisaged and the purpose of this paper is to focus on one which resorts to linguistic expertise.

A Java program was developed, which first transforms a list of words into an extended list of forms, using that for a rules set, based on linguistic knowledge about morphology and spelling history. Having done this, the need is to localize the different forms attested in the old spelling in a text, according to the requested form. Hence one can identify two “phases”.

(1) Generating the extended form of the request: at this step, the program generates 3 files; 2 of them are dedicated to synthetic information about the process (using the rules: how often they have been used) and the end result (how many forms generated). The third one is a file containing, for each word, the list of the generated forms as well as the rules used in the process.

(2) Finding the right form within a text: when the extended request is calculated, the ultimate test is to identify all the variants really attested in the text. This is the second phase of our program. The output file of this last part of the process is an HTML file with a graphical highlighting (or bold character) of the identified variant. Moreover, each form is connected to a bubble showing the rules used to derive the variant.

The present paper aims to describe the tool which was first developed in the particular context of the Virtual Humanistic Library, and then to show the way to adapt it to early English.

Publication status:
Not published
Peer review status:
Reviewed (other)

Actions

Access Document

Authors

More by this author
Institution:
Université de Poitiers/University of Poitiers
Role:
Author


Publication date:
2012-01-01


UUID:
uuid:36790232-3d92-4eba-8f56-09ca307362f9
Local pid:
EEBOTCP:7
Deposit date:
2012-12-05
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP