This document is an historical remnant. It belongs to the collection Skeptron Web Archive (included in Donald Broady's archive) that mirrors parts of the public Skeptron web site as it appeared on 31 December 2019, containing material from the research group Sociology of Education and Culture (SEC) and the research programme Digital Literature (DL). The contents and file names are unchanged while character and layout encoding of older pages has been updated for technical reasons. Most links are dead. A number of documents of negligible historical interest as well as the collaborators’ personal pages are omitted.
The site's internet address was since Summer 1993 www.nada.kth.se/~broady/ and since 2006 www.skeptron.uu.se/broady/sec/.



URL of this page is www.skeptron.uu.se/broady/dl/palate-proposal-march2002.htm

Personalized Access to Large Text Archives (PALaTe)

 

Extract from
Wolfgang Nejdl et al, Personalized Access to Distributed Learning Repositories (PADLR). Final Proposal, March 25, 2001, pp. 17-19.


Personalized Access to Large Text Archives (PALaTe)

Contributing Research Groups and PIs:
Uppsala (Borin/Broady)
CID (Broady).

Working Title. PALaTe: Personalized Access to Large Text Archives

Problem Description. Text is still important in the teaching of almost any subject, viz. in the form of textbooks and other course texts. In Languages and Humanities education, (large) textual resources are also quite often objects of study in themselves. Arguably, their effective deployment as study objects in the context of ICT-based personalized learning demands some kind of language understanding. Hence, personalized access and navigation among such resources should – almost by definition – make use of Computational Linguistics (CL) / Natural Language Processing (NLP) techniques, to complement the more general personalization tools which will be developed in the submodule “PLeaSe: Personalized Learning Sequences”.

In this submodule/testbed, we thus consider the issue of personalized access to large text archives in Languages and Humanities education. In order to make the fruits of our labor in the proposed project useable also in other subject areas, we will focus on certain aspects of this issue, namely how (aspects of the) content and difficulty of texts or parts of texts can be inferred and utilized for creating personalized access to text material.

Research plan and deliverables. We will consider the use of two fairly different kinds of large text archives:

1. In language education and linguistics, large text archives are important mainly (but not only!) because of their (linguistic) form. Here, the so-called text corpus has become an important educational (and research) resource. The uses of text corpora in language education are manifold:

2. On the other hand, in such Humanities subjects as History, Literature Studies, History of Science, Teacher Training, etc., large text archives are important mainly because of their content, i.e. because of the information contained in the texts (and, as a rule, the range of languages dealt with will be much smaller; see below).

Typical issues which arise when such text archives are to be used in education (or research) are:

There are also more open-ended research issues in the list, e.g. the—already mentioned— problem of entity references in text, or that of determining the level of difficulty of a text (for a language learner having a particular linguistic background; see also submodule “Automatic extraction of metadata and ontological information”, where the related issue of “determining the level of information” is discussed). Generally, we believe that the realistic course of action here is to pursue so-called ‘shallow’, or ‘knowledge-light’ techniques for text corpora used in language education, because of their potential application to a large number of languages—Uppsala University currently offers courses at various levels in about 40 languages—which in practice precludes the use of ‘deep’, ‘knowledge-intensive’ techniques. When there are such techniques available (as may be the case for English, German and a few other languages), they should be considered, of course, but developing them from scratch is too costly. For the case of general Humanities textual resources, however, we should consider developing more knowledgeintensive methods for selected problems, such as the ‘PPT extraction’ already mentioned, where there is an expressed need among educators and researchers.

The work with large text archives will proceed along two interconnected lines of research:

1. We will explore the issue of using partial parsing and information extraction techniques for marking text portions for persons, places, and times, and carry out formative evaluation of these techniques in an educational setting. This work will be pursued in collaboration with the work in the submodules “Automatic extraction of metadata and ontological information” and “PLeaSe: Personalized learning sequences”.

Deliverables: Prototype person/place/time partial parser (‘PPT extractor’), and evaluation reports.

2. We will pursue the issue of how to (operationally) define and determine the level of difficulty (or “level of information”; see above) of a text or a portion of a text (for language education purposes it would be useful to be able to determine this even for small linguistic units such as phrases or clauses), and carry out formative evaluation of this definition in an educational setting. This work, too, will be a collaboration with the work in the submodules “Automatic extraction of metadata and ontological information” and “PLeaSe: Personalized learning sequences”.

Deliverables: Preliminary operational definition of level of difficulty (for particular foreign/second language learner), prototype application for determining level of difficulty at least for Swedish and English text material, and evaluation reports.

Dissemination, Testbeds and Evaluation Dissemination of results will be done through reports and scientific publications on the different aspects outlined in the research plan. In general, we plan to do research/development and evaluation in parallel (i.e., formative 18 evaluation), but for obvious reasons, the first year will be devoted mainly to research and development, while the second year will be dominated by deployment and evaluation in regular education. We will use existing courses in the departments of the Faculty of Languages, in the History Department and in the Department of Teacher Education as resources for our requirements analysis and as testbeds for our implementations.

Collaboration and Scholarly Exchange. Strong interactions with the submodules “PLeaSe: Personalized Learning Sequences”, “Automatic extraction of metadata and ontological information” and “Content Archives”.

Budget Overview (including overhead costs): Uppsala: 25K first year, 25K second year. Budget will pay for one part-time Postdoc, and for faculty involvement in testbed integration in regular Languages/Humanities curricula, overhead costs, travel and exchange. CID: 10K first year, 10K second year. Budget will pay for a part-time Ph.D. student, overhead costs, travel and exchange.


URL of this page is http://www.skeptron.uu.se/broady/dl/palate-proposal-march2002.htm
This HTML version created by Donald Broady. Last updated March 2001
Back to Digital Literature Start Page
Back to SEC home page



This page is an historical remnant, part of  Skeptron Web Archive