Get Involved

Participating in and Contributing to the Alpheios Project

The following opportunities represent ways to get involved with the Alpheios Project for interested collaborators. They range from annotation tasks, through testing and reviewing our existing software, to porting our applications to new platforms and developing entirely new functionality.


Annotation and Curation

  • Doing Treebanking, eg of Attic prose, Plato, Plutarch, Thudydides et al using our treebank editor. See http://nlp.perseus.tufts.edu/syntax/treebank/getinvolved.html for details.
  • Improving the treebanking process
  • Manual adjustment of the alignment of original text with a translation using our alignment editor (working with either an automatically aligned translation or a wholly unaligned translation)
  • Providing metadata, such as composition year, for Latin and Greek texts that are to become part of a corpus for computational analysis and pedagogical use
  • Helping us mount and check the treebanked and aligned parallel text of the New Testament in Greek and Latin. (which the Proiel Project gave us)
  • Application of a semantic layer on top of our grammatical identification tools- applying something like Princeton Wordnet to Greek and Latin vocabulary so that we can improve the automatic identification of grammatical constructions (by using the meaning of the component words), recognize related passages by meaning rather than just grammar, etc, etc.
  • Recording poetry for word by word and line by line display with or without associated images, glosses etc to expand our prototype

Software Development

The various components of the Alpheios Project are coded in a number of different programming languages, including Javascript, XSLT, Xquery, and C, among others, and there are opportunities for students to contribute to any of these components. Tasks range from those that could easily be accomplished by relatively inexperienced computer science students with little or no linguistic background, to those requiring advanced skills in natural language processing and computational linguistics. For example:

  • Help test, criticize and develop our existing reading tools
    • upgrade C programs for compatibility with Windows 7.
    • fix minor bugs in the Javascript-based user interface
    • port the browser extensions to Google Chrome
    • making some version of the tools available on mobile devices and ebook readers.
    • add the ability to find the next (or all) occurrences of a word or phrase in the current text
    • add the ability to do lookups in two foreign languages from a single text eg Latin and Greek in Erasmus
    • add the ability to display three aligned translations at once eg New Testament in Latin, Greek and English
    • incorporate some tools for using contextual clues to disambiguate Arabic morphology that our friends at Columbia have developed and explore ways to optimize performance. http://www.ccls.columbia.edu/project/madatokan/
    • help us evolve a superior interface for handling arabic dictionaries - relaxing matching requirements gracefully, and providing a root browsing capability within and perhaps across multiple dictionaries.

  • Help test, criticize and develop our existing pedagogical tools:
    • preparing corpora to be used in a variety of pedagogical tasks and for automatically generating and scoring learning-quizzes such as alignment exercises
    • treebank exercises
    • morphology and vocabulary exercises
    • exercises demonstrating the ability to repeat a phrase or sentence after brief exposure- (or choose the correct one among several automatically generated variants- eg requiring the user to recognize the correct inflected ending.)
    • suggest additional pedagogical exercises that the application and its associated corpora can support.
    • further develop the user model with both local and server presence, export and import functionality for complete integration in several of the chief learning management systems.
    • further develop the ability of the user to indicate his target proficiency and of the system to automatically generate the appropriate supporting materials, lexical and grammatical.
    • further develop the user model for monitoring developing proficiency, varying ICALL variables, and recording and analyzing the consequences, both for the individual, possible groups and the average user.
    • Develop the ability of our system to collect and analyze useful data bearing on ICALL design- eg review interval, review methods, etc.
  • Further development of various text analysis tools including named entity recognition and disambiguation algorithms in various subject domains.