How to export a plain-text conversation transcript for a sound file annotated with the Praat program


Suomeksi, kiitos

Mietta Lennes
30.12.2009

An example picture of a conversation annotated with Praat.

conversation_example_1	(Tue Dec 29 17:05:15 2009)

F1:	blah
F2:	bli bla bla blaah
F1:	bla blaaba blah bli
F2:	blaaaaa
F1:	blabla
F2:	blee blaa blabla blaah
F1:	bla
F2:	blaba blaa
F1:	blah blaaba blaaba blaaba blaa
F2:	blabla
F1:	bla bla-
	blaah
F2:	bli bla bla bla bli bla bla
F1:	blabl-
	bla bla bla bla bla blaaaaah

Watch the video tutorial!


Please note: The Praat script described here is still being tested and may contain bugs! In case you believe the script is not working correctly or if you notice errors in the instructions below, I would be happy to receive suggestions for improvements by email.
However, I cannot provide further support for using or modifying the script.


Introduction

Once you have annotated an audio recording of conversational speech with the Praat program, you can use this Praat script for exporting a transcript of the conversation from the TextGrid object into a plain text file. It is then easy to modify and use the transcript as a human-readable description of your material in presentations and publications.
The same Praat script can also be used for exporting any other annotations from Praat. However, the instructions below are based on a fictitious example of conversational speech.

Prerequisites: Exactly one TextGrid object must be selected in the Object list of the Praat program. Each speaker's utterances must be represented by the labeled intervals in one interval tier in the selected TextGrid object. Thus, the TextGrid object must contain at least as many IntervalTiers as there are participants in the conversation.

Result: a plain text file containing a skeletal CA style transcript with one utterance per line. Lines are saved in the order of their starting times within the TextGrid object. When speakers change, the code for the next speaker will appear at the beginning of the line. Utterances that are completely overlapped by a preceding utterance will appear in square brackets. The user may choose to insert the durations of pauses (in seconds) between lines, where applicable (pauses referring to those portions of the audio during which nobody is speaking). In addition, it is possible to insert the starting times of overlapping speech in relation to the end time of a preceding utterance (in seconds). Overlaps will appear in brackets like pause durations, but they will be represented as negative numbers. For instance, the overlap time (-0.20 s) would mean that the next utterance overlaps a previous speaker and it has started 0.2 seconds before the preceding utterance ended.

Why should you use Praat for transcribing conversational speech?

Example

Imagine that the following conversation with two speakers (F1 and F2) has been annotated with Praat:

An example picture of a conversation annotated with Praat.

When the annotations above are exported into a plain text file with the Praat script, this is what the result would look like:

Plain transcript:

conversation_example_1	(Tue Dec 29 17:05:15 2009)

F1:	blah
F2:	bli bla bla blaah
F1:	bla blaaba blah bli
F2:	blaaaaa
F1:	blabla
F2:	blee blaa blabla blaah
F1:	bla
F2:	blaba blaa
F1:	blah blaaba blaaba blaaba blaa
F2:	blabla
F1:	bla bla-
	blaah
F2:	bli bla bla bla bli bla bla
F1:	blabl-
	bla bla bla bla bla blaaaaah

Full transcript with the start times of the utterances, pause durations and overlap durations:

conversation_example_1	(Tue Dec 29 16:45:22 2009)

[0.25 s]	F1:	blah
				(-0.04 s)
[1.12 s]	F2:	bli bla bla blaah
				(-0.54 s)
[4.77 s]	F1:	bla blaaba blah bli
				(-0.29 s)
[7.74 s]	F2:	blaaaaa
				(-0.57 s)
[8.44 s]	F1:	blabla
				(0.10 s)
[9.40 s]	F2:	blee blaa blabla blaah
				(0.20 s)
[11.75 s]	F1:	bla
				(-0.10 s)
[12.38 s]	F2:	blaba blaa
				(0.16 s)
[14.04 s]	F1:	blah blaaba blaaba blaaba blaa
				(0.11 s)
[17.37 s]	F2:	blabla
				(-0.46 s)
[17.50 s]	F1:	bla bla-
				(0.08 s)
[18.57 s]		blaah
				(-0.46 s)
[18.73 s]	F2:	bli bla bla bla bli bla bla
				(0.47 s)
[21.05 s]	F1:	blabl-
				(0.31 s)
[21.73 s]		bla bla bla bla bla blaaaaah

Instructions for using the script

  1. Download and save the Praat script save_conversation_tiers_as_text_file.praat to a convenient location on your computer.

  2. Open the TextGrid file in Praat.

  3. If required, rename those annotation tiers in the TextGrid that you wish to include in the transcript. Tier names will be used to identify the different speakers. Short codes with 2-3 characters are recommended. In the example, the tiers have been named with the codes F1 and F2.
    The order of the tiers is not important - the utterances will be saved in the order of their starting times.
    NB. The script currently works with IntervalTiers only, i.e., not with PointTiers!

  4. Open the Praat script save_conversation_tiers_as_text_file.praat with the command Praat:Open Praat script... in the Object window (or use Read:Read from file...).

  5. In the script window, select Run:Run. The following dialog box should appear:
    Praat-skriptin alkulomake

  6. Make sure that the TextGrid object is selected in the Object window, then press OK.
    In case the TextGrid file is very long and/or if there are several speakers, it may take a while to create the transcript. When the script has finished running, the transcript should appear in the text file you asked for.
    You can now use the text file as is, or you may reformat it for presentation with, e.g., MS Word. The fields in the transcript are separated by tab characters, making it even possible to open or import the file in spreadsheet programs (such as MS Excel).

    NB: The script will convert the TextGrid object into the "generic" format used by Praat where special characters (e.g., phonetic symbols) are expressed as combinations of several characters ("backslash trigraphs"). This format is recommended especially when you need to open and edit the same TextGrid files in several computers and platforms (Windows/Mac/Linux). However, the script does not save the converted TextGrid object, so you need to take care of this yourself.

    AND YOU'RE DONE! :-)

Please note: The Praat script described here is still being tested and may contain bugs! In case you believe the script is not working correctly or if you notice errors in the instructions below, I would be happy to receive suggestions for improvements by email. However, I cannot provide further support for using or modifying the script.

More Praat scripts