Speech to Text Transcription Challenge


Eduardo Lleida, Alfonso Ortega, Antonio Miguel (UZ)
Virginia Bazán, Carmen Pérez, Alberto de Prada (RTVE)


The IBERSPEECH-RTVE SPEECH TO TEXT TRANSCRIPTION is a new challenge in the ALBAYZIN evaluation series. This is supported by the Spanish Thematic Network on Speech Technology (RTTH) and Cátedra RTVE en la Universidad de Zaragoza and is organized by Vivolab – Universidad de Zaragoza.

The Speech to Text transcription evaluation consists of automatically transcribe different types of TV shows. For this evaluation, RTVE has licensed more than 550 hours of own TV production jointly with the corresponding subtitles. The shows cover a great variety of scenarios from studio to live broadcast, from read speech to spontaneous speech, different Spanish accents, including Latin-American accents and a great variety of contents. Some of the contents have been labelled thanks to the Spanish Thematic Network on Speech Technology (RTTH) and Cátedra RTVE en la Universidad de Zaragoza. Eighty hours of different TV shows have been manually transcribed. Around forty five hours will be used for development and another thirty five hours for testing. The TV shows contents used for development and testing are different. The rest of the shows, more than 450 hours, can be used for training acoustic and language models. The selected TV shows range from broadcast news with a high degree of verbatim in the subtitles to live shows with respeaking subtitles. Participants are free to use these TV shows or any other data to train their systems. The content of the training data must be described on the system description report to be presented at IberSpeech2018.

The data is available to the evaluation participants only and subject to the terms of a licence agreement with the RTVE. The license agreement can be downloaded from Cátedra RTVE-UZ web page (http://catedrartve.unizar.es)

System outputs will be scored in terms of word error rate (WER). The WER will be computed using the NIST sclite tool. Two different scores will be computed. A primary score with the raw output of the systems with and without punctuation marks (comma and period) and a secondary score removing stopwords after lemmatisation (the freeling lemmatiser will be used).

More details will be given in the evaluation plan.


June 18, 2018: Release of the evaluation plan, training and development data
July 15, 2018: Registration deadline
September 24, 2018: Release of evaluation data
October 21, 2018: Deadline for the submission of system outputs and description papers
October 31, 2018: Results distributed to participants
November 21-23, 2018: Evaluation Workshop at Iberspeech 2018


Interested groups must register for the evaluation before July 15th 2018, by contacting the organizing team at lleida@unizar.es with CC to ALBAYZIN 2018 Evaluations Organising Committee. The contact should contain the following information:

Research group (name and acronym)
Institution (university, research center, company, …)
Contact person (name)

To download the RTVE data, you will need to sign this data license and return it to the IberSpeech-RTVE team.