Ottoman Text Recognition Network

The Ottoman Text Recognition Network

Since the early 16^th century at the latest, the Ottoman Empire had a highly efficient bureaucratic apparatus that generated an incredible number of written texts and documents. Today, the Ottoman State Archives alone hosts approximately 150 million documents.

Until the late 19^th century the empire covered three continents including regions such as Southeastern Europe, the Middle East, North Africa, and large parts of the Arabian Peninsula. By that the Ottoman Empire included among its subjects members of diverse ethnic groups, language communities as well as confessions: Slavs, Turks, Arabs, Jews, Greeks, Roma, Armenians, Aramaeans, Kurds, and many others. They all produced manuscripts (of all sorts) in which an abundance of languages and scripts were used.

The archival material and manuscripts relevant to the study of the Ottoman Empire are mostly handwritten texts and present researchers with several challenges. Besides the peculiarities of the Arabic script such as varying script styles, ligatures, and diacritics that make reading and transcribing these sources a hard task, there is an abundance of Turkish texts written in letters other than Arabic: Armenian, Hebrew, or Greek.
Automatic text recognition can successfully address these challenges. Thus, not only can transcriptions be generated automatically, but handwritten texts can be converted into text formats that are machine-readable and searchable.)

The chair of Ottoman and Turkish Studies at the University of Vienna organised a one-day international workshop in order to discuss the possibilities of automatic text recognition for Ottoman manuscripts on 12 February 2021 (https://dh-ottoman.univie.ac.at). The workshop first presented findings connected to working with HTR (handwritten text recognition) tools provided by the software platform Transkribus (transkribus.eu) and further discussed potentials and challenges of digital Ottoman research. One of the outcomes of this workshop was the decision to set up a network. (suggestion: The primary outcome of this workshop constitutes the decision to set up a network).

The Ottoman Text Recognition Network (OTRN) aims to bring together researchers and students of the Ottoman Empire who are interested in applying, testing and developing text recognition technologies for handwritten and printed Ottoman Turkish texts either in Arabic or other scripts such as Armenian, Hebrew or Greek.

Aims

OTRN will provide a platform for

exchange and discussions
using and testing text recognition technologies with a / specific focus on Transkribus
testing page segmentation, layout analysis and different writing styles peculiarities
developing transcription standards for text recognition of Ottoman Turkish texts
creating models for Ottoman handwritten and printed texts
sharing experiences in applying and developing standards in Ottoman text recognition
offering workshops for scholars and students interested in using Transkribus
sharing important events, activities, projects

The network plans to organize an online workshop once a year.

Founding Members

Ahmet Abdullah Saçmalı (LexiQamus, Istanbul)
Ani Sargsyan (University Hamburg)
Aysu Akcan (University of Vienna)
Achim Rabus (MultiHTR, University of Freiburg)
Claudia Römer (University of Vienna)
Grigor Boykov (ihb, Austrian Academy of Sciences (ÖAW), Vienna)
Hülya Çelik (Ruhr University Bochum)
Julia Brigitte Fröhlich (University of Vienna)
M. Fatih Çalışır (Ibn Haldun University, Istanbul)
Martin Gasteiner (University of Vienna)
Merve Tekgürler (Stanford University)
Milanka Matić-Chalkitis (MultiHTR, University of Freiburg)
Serhat Açar (Özyeğin University, Istanbul)
Stephan Kurz (ACDH_CH, QHOD, ÖAW)
Yasir Yılmaz (ihb, QHOD, ÖAW)
Yavuz Köse (University of Vienna)

Membership

For membership, please state your full name, your academic affiliation, your research interests as well as your motivation for joining the network.

To subscribe to OTRN’s newsletter, please send an e-mail to aysu.akcan@univie.ac.at.