SpeCT - The Speech Corpus Toolkit for Praat

(formerly known as Mietta's Praat scripts)

The aim of the Speech Corpus Toolkit for Praat (SpeCT) is to provide an organized inventory of well-documented Praat scripts that can be easily downloaded, modified and used in order to perform small tasks during the various stages of building, organizing, annotating, analysing, searching and exporting data from a speech corpus.

List of releases that you can cite

Please note:
These scripts may not have been fully tested! You may use them at your own risk.
I cannot provide support for using the scripts, but I will gladly receive bug reports ;-)

Organize files and objects →

Annotate and manage TextGrids →

Import segmentation data
Labeling and TextGrid management
Automatic forced alignment of words and phonemes
within previously annotated utterances

Analyze and visualize →

Analysis views
Drawing pictures
Calculating data from labeled TextGrids and sounds

Export data

These Praat scripts were written by Mietta Lennes . They should provide some functionalities and tools for the Praat program for phonetic analysis (see the Praat home page at http://www.praat.org/ ). Praat is being developed by Paul Boersma and David Weenink in the University of Amsterdam.
These scripts are distributed under the GNU General Public License. The scripts are distributed without any warranty: I do not guarantee that the scripts work in your system, and I will not be held responsible for any harm or damage caused by their use. Please make sure that you know what you are doing.
Please refer any interested parties to this web page (https://lennes.github.io/spect/).

How to run and modify the scripts: see Scripting tutorial in the built-in Help pages within the Praat program (see the Help menu in the Objects list).
Requirements: In the Requirements column, you can find information on what type of objects have to be selected in the Object list or what sort of files are needed in order to run each script. This is important especially when you want to create new buttons or menu commands to use the script.
Compatibility note: Some of the scripts were originally written in a Windows machine, some in Macintosh, and some in Linux. All of them should work in any platform running Praat, but you may want to change, e.g., the default path for files according to your system. The version number tells you the Praat version on which the script has been tested. Sometimes the commands change in Praat, and consequently all of the scripts may not work in all Praat versions.

Go up

Files and objects

File name	Requirements	Output	Praat version	Description
open_all_files_in_folder.praat	No selections needed	New objects for all the files in a given folder		This script will open all the files in a user-specified directory, given that they can be recognized by Praat. Unknown file formats in the folder will result in an error.
change_sample_rate_of_sound_files.praat	No selections needed	Resampled copies of sound files		This script will open all the sound files in a user-specified directory one by one, resample each file to a given sampling rate, and save resampled files to another directory in AIFF or WAV format. Tip: You can also use this script just to change the file format of all the sounds in a directory.
remove_all_objects.praat	No selections needed	An empty object list		Beware! This script will remove all the objects in the Object list. NO UNDO. All unsaved changes will be lost. To be on the safe side, the user will be prompted before removing the objects. The script is useful as a menu command within the Object list, if you need to handle large amounts of objects at once and you often need to clean them out.

Go up

Analysis views

File name	Requirements	Output	Praat version	Description
draw_spectrum_from_selection.praat	A Sound editor window has to be open, and a single point has to be selected. Open the script from the File menu of the editor window.	A Spectrum editor window containing the spectrum around the cursor	4.0.5	This script will draw a spectrum from a window around the cursor. (Note: The original version of this script is not Mietta's idea - it can be found in the built-in Praat manual. Due to some changes in Praat commands, the original script would not work in new Praat versions. This one works.)
draw_LPC_spectrum_from_selection.praat	A Sound editor window has to be open, and a single point has to be selected. Open the script from the File menu of the editor window.	A Spectrum editor window containing an LPC spectrum around the cursor	4.0.5	This script is a modification of the one above: it will draw an LPC spectrum from a window around the cursor: first, a window will be extracted from around the cursor, then an LPC object will be calculated, and the center frame of the LPC analysis will be drawn as a Spectrum object. The user will be prompted for the LPC analysis options.
draw_formant_chart.praat	A folder containing sound files and the corresponding TextGrid files	An F1/F2 formant chart of all transcribed segments in a folder		This script reads sound files and the corresponding TextGrid files from a user-specified folder and draws F1-F2 values to the Picture window from the centre points of all segments that have a given label. All F1/F2 values are also written to a text file. Based on the Formant object (which is LPC based) supplemented with the Track command. (Hint: The script can be easily modified to make some other measurements from the segments!)
play_vowels_draw_formant_chart.praat	A Sound object and the corresponding TextGrid object have to be selected	An F1/F2 formant chart in the Picture window, with vowel points		This script will draw F1-F2 values from the centre points of segments according to the selected TextGrid object. The user will be prompted for the tier and for the interval labels that will be analysed. Based on the Formant object (which is LPC based) supplemented with the Track command. The script will also play each segment and the environment a couple of times - just for the fun of it!

Go up

Pause detection

File name	Requirements	Output	Praat version	Description
mark_pauses.praat	One or more LongSound objects have to be selected	A new TextGrid object for each LongSound object. TextGrid will be saved to a user-defined location.	4.0.5	This script will run a series of intensity analyses on a LongSound object and mark boundaries at pauses into a new TextGrid object: one boundary in the center of each pause, or two boundaries at the edges. The user is allowed to define the intensity criteria for a pause: the upper intensity limit, and how long the pause has to be. It works, but don't expect miracles! The optimal pause criteria may vary depending on the sound material. (Some bug fixes 23.1.2006)

Go up

Cutting up long sound files

File name	Requirements	Output	Praat version	Description
save_intervals_to_aiff_sound_files.praat save_intervals_to_wav_sound_files.praat	One LongSound object and the corresponding TextGrid object have to be selected	Short, numbered AIFF or WAV sound files cut from the LongSound plus a text file with the labels of the original intervals		This script will save all the segments of a LongSound according to a selected tier in the TextGrid to small AIFF or WAV sound files. The sound files will get a running index number, and you can also add a special prefix and/or suffix to all filenames. The labels of the intervals in the original TextGrid object will be saved to a text file in the same directory. Convenient for cutting up long sound files.
save_labeled_intervals_to_aiff_sound_files.praat save_labeled_intervals_to_wav_sound_files.praat	One LongSound object and the corresponding TextGrid object have to be selected	Short AIFF or WAV sound files cut from the LongSound, named after the interval labels		This version of the above script will save all the segments of a LongSound according to a selected tier in the TextGrid to small AIFF or WAV sound files. The sound files will be named after the corresponding TextGrid labels, and you can also add a special prefix and/or suffix to all filenames. Convenient for cutting up long sound files. Note: you have to check yourself that the interval labels do not contain illegal characters (or extra long strings) for the filenames!
save_selection_to_sound_and_textgrid.praat	One LongSound object and the corresponding TextGrid object have to be selected in the Object window	A WAV sound file and a TextGrid file, that are copies of the selected location (without preserving times)		This script will extract the selected part of a LongSound and the corresponding TextGrid into separate files. You need to make sure the correct objects are selected in the Object list and to make a selection in the TextGrid editor. The script is handy as a menu command in the editor window. Use it for extracting sound samples for presentation.

Go up

File name	Requirements	Output	Praat version	Description
label_from_text_file.praat	One TextGrid object has to be selected, and a text file containing the interval labels must exist	Adds text lines from a text file as labels in a tier of a TextGrid object		This script will take text lines from a text file and add them as labels to the intervals of a selected tier of a TextGrid object. The user will be prompted for the location of the text file.
label_quickly_from_text_file.praat	One TextGrid object has to be selected in the object list, and a text file containing the interval labels must exist	Overwrites interval labels in the first tier of a TextGrid with text lines taken from a text file		A quick and ruthless version of the previous script: this one will not open a dialog box before overwriting all the labels! So, beware. (Hint: use this as a button in the Object list.)
save_selection_to_sound_and_textgrid.praat	One LongSound object and the corresponding TextGrid object have to be selected in the Object window	A WAV sound file and a TextGrid file, that are copies of the selected location (without preserving times)		This script will extract the selected part of a LongSound and the corresponding TextGrid into separate files. You need to make sure the correct objects are selected in the Object list and to make a selection in the TextGrid editor. The script is handy as a menu command in the editor window. Use it for extracting sound samples for presentation.
total_duration_of_labeled_segments.praat	One TextGrid object has to be selected	The total duration of transcribed intervals will be printed in the Info window	4.0.5	This script will go through all the intervals in a selected tier and calculate the total duration of intervals that have a non-empty label. You can also give an additional criterion for the contents of another tier. The script is handy for measuring your progress when you are labeling a LongSound! Better add a button for it in the dynamic menu.
align_boundaries_in_two_interval_tiers.praat	One TextGrid object needs to be selected	Boundaries in one interval tier (1) that are sufficiently close to boundaries in another interval tier (2) will be aligned with boundaries in tier 2.	4.3.21	This script will align boundaries in two tiers, in case they are only slightly misplaced. The selected TextGrid will be changed and the boundaries that are moved around are displayed to the user while the script is working. Tier 2 stays fixed. The script is handy when you need to check, e.g., that word and syllable boundaries are aligned. If you need to align boundaries in several tiers, you should first align a tier (1) with boundaries in the tier (2) you trust most.
tokenize_tiers_in_TextGrid.praat	A directory with TextGrid files ending in .TextGrid	The TextGrid files will be replaced with files where the tiers have been tokenized into new word tiers.	The script has not been fully tested yet!	A copy is created for each original tier and all the intervals that contain text have been tokenized (divided into words) into the new tiers. The new tiers will be called originaltier-word. The word boundaries will be placed roughly according to the number of (alphabetical) characters in each word with respect to the length of the string contained in the original interval. Tip: Save the script file into the same directory with the TextGrid files and leave the directory box empty when running the script. This will ensure a smooth run :-)
align_transcribed_utterance_intervals_with_sound.praat	Full path to a sound file and to the corresponding TextGrid file with an interval tier that includes the transcribed utterances	Two new tiers (labeled as ../word and ../phoneme) inserted in the TextGrid object	6.1.04 NB: In older Praat versions (<= v6.1.03), the script may fail due to a bug in the corresponding forced alignment command in Praat. The bug was fixed in Praat 6.1.04.	This script automatically aligns the words and phonemes within annotated utterances in a TextGrid. The language of the transcribed utterances can be selected by the user. The script automatically passes through each annotated interval in the given tier and runs the forced alignment command on them. New tiers will be created with the segmentation of words and phonemes suggested by eSpeak. The script utilizes the eSpeak based automatic aligner that is available in the TextGrid editor window in Praat, see, e.g., http://info.linguistlist.org/aardvarc/resources/AARDVARC_Boersma_Abstract.pdf.
replace_part_of_textgrid.praat	Two TextGrid objects have to be selected	A new TextGrid object, where IntervalTier(s) from the longer TextGrid are partly replaced with tiers in the shorter TextGrid	4.0.23	This script will copy all data from one or more tiers of a short TextGrid object to a selected time point in a longer TextGrid. Old data in the long TextGrid are thus partly overwritten - however, the original TextGrids will be saved. The script was written for a situation where people want to extract shorter sections of a LongSound object for labeling, but still they want to be able to later append the labeled sections back into the big TextGrid file. All you have to do is make a big TextGrid for the LongSound and mark boundaries at the starting points of each extracted region. It is also a good idea to label the extracted intervals with the filenames. This way, you can later find the correct places for replacements. Note: There are still errors in this script at the moment, but the partial replacement of one single tier seems to work nicely! :)
save_Finnish_tier_to_phoneme_string.praat	A TextGrid object with an interval tier that has text in Finnish orthography	A text file with the Finnish tier text roughly converted into phonemes		This script converts the Finnish text in a Praat IntervalTier into a phoneme string. The script has been used to prepare input files for some automatic segmentation tools, which generally need a phoneme string as input. In the case of Finnish, the grapheme-to-phoneme conversion rules are very few and simple, so the task is not difficult and you only need a transliteration of the speech signal to get a pretty good approximation of the phonemes. For technical reasons, the phonemes will be converted to corresponding uppercase characters.

Go up

Importing segmentation data

File name	Requirements	Output	Praat version	Description
make_textgrid_from_segment_data.praat	No selections required. There should be a folder with sound files and corresponding text files with segmentation data.	New TextGrid objects will be created and saved for all sounds that have segmentation data available	4.0.5	This script will read segmentation and labeling data from simple text files in a user-specified directory and convert the information to Praat TextGrid objects, which will then be saved in the same directory. The segmentation data should consist of text lines in the form starting point of segment - space - segment label - line break, and the segments should be ordered according to time. The corresponding sound files will also be read, in order to get the correct duration for the TextGrid objects. The text file names should be identical with the corresponding sound file names, except for the file extension. Good for importing simple label data to Praat from other environments. A bug in this script was fixed on 30.6.2003. Please email me in case the script does not work correctly!
make_textgrid_from_segment_data_endpoints.praat	No selections required. There should be a folder with sound files and corresponding text files with segmentation data.	New TextGrid objects will be created and saved for all sounds that have segmentation data available	4.0.5	This script is a modification of the one above (but this one requires end point times of segments instead of starting points). It will read segmentation and labeling data from simple text files in a user-specified directory and convert the information to Praat TextGrid objects, which will then be saved in the same directory. The segmentation data should consist of text lines in the form end point of segment - space - segment label - line break, and the segments should be ordered according to time. The corresponding sound files will also be read, in order to get the correct duration for the TextGrid objects. The text file names should be identical with the corresponding sound file names, except for the file extension. Good for importing simple label data to Praat from other environments.

Go up

Exporting segmentation data

File name	Requirements	Output	Praat version	Description
save_interval_data_to_text_file.praat	One TextGrid object has to be selected	A new text file will be saved with the segment data of one IntervalTier	4.0.5	This script is the inverse of the preceding script: it writes a user-specified tier of a selected TextGrid object to a simple text file. The lines in the text file will have the format starting point of segment - space - segment label - line break. Good for exporting simple label data from Praat to other environments.
save_conversation_tiers_as_text_file.praat	One TextGrid object has to be selected	A new text file will be saved containing the labels from intervals in up to four IntervalTiers in the order of their starting points	5.1	This script exports the labeled utterances in a conversation to a plain text file, one utterance per line. Each speaker must be represented by one interval tier in the selected TextGrid object. Tier names are used as individual codes for the speakers. Utterance transcriptions are written in the order of their starting times, which the user may choose to insert in front of each utterance line. Pause durations and overlap times may also be included in the transcript. Good for exporting a CA style conversation transcript from Praat.

Go up

Calculating data from labeled TextGrids and sounds

File name	Requirements	Output	Praat version	Description
calculate_segment_durations.praat	A TextGrid object has to be selected	A text file with duration data	4.0.5	This script will go through all the intervals in a selected tier of a selected TextGrid object, and save the durations of labeled intervals to a text file. Each new text line will contain the label of the interval, a tab separator, and the duration of the segment in question.
collect_pitch_data_from_files.praat	No selections required	A text file with pitch data	4.0.23-	A script that will go through all the TextGrid files and the Sound files in a given folder, find sound-grid pairs that have the same name, open each pair, run through the TextGrid, collect data from labeled intervals and append the information to a simple tabulated text file (which you can later open in a statistical or spreadsheet program). This version calculates the maximum pitch of each labeled interval. Take a look at the script; you should be able to modify the script to suit your own needs!
collect_formant_data_from_files.praat	No selections required	A text file with formant data	4.0.23-	A script that will go through all the TextGrid files and the Sound files in a given folder, find sound-grid pairs that have the same name, open each pair, run through the TextGrid, collect data from labeled intervals and append the information to a simple tabulated text file (which you can later open in a statistical or spreadsheet program). This version calculates formant values at the mid point of each labeled interval. Take a look at the script; you should be able to modify the script to suit your own needs! NB: Automatic formant measurements do not always produce sensible results, since the optimal analysis parameters and the interpretation of formants will vary for different signals. I do not want to patronize, but... please do not try to use the same parameters for both male, female, and child speakers! You should always make sure you understand how the analysis works before drawing conclusions on the basis of your results.
collect_data_from_two_tiers_in_files.praat	No selections required	A text file with duration and pitch data	4.2.21	A script that will go through all the TextGrid files and the Sound files in a given folder, find sound-grid pairs that have the same name, open each pair, find two tiers that have the given names (phone tier and syllable tier), run through the TextGrid, collect duration and pitch maximum from labeled phone intervals along with durations of the corresponding syllable tier intervals, and append this information to a tabulated plain text file (which you can later open in a statistical or spreadsheet program). Each data row will also contain both the label of the phone and the label of the preceding phone interval. Take a look at it. You can easily add more tiers and analysis objects that the script should check!

Go up

Drawing pictures

File name	Requirements	Output	Praat version	Description
draw_distribution_bar_from_data_file.praat	A text file, with each line containing a number, a space, and any text	A horizontal bar in the Picture window, showing the distribution of the lines in the text file		This script takes advantage of the TextGrid drawing functionalities. It creates a fake TextGrid object from the data you give in a text file, draws it as a horizontal bar with "boundaries" to the Picture window, and adds a title and some other information. The whole bar represents "100%", so you can see how your data is distributed... funny, but rather useful!
draw_formant_point_to_Bark_chart.praat	F1 and F2 values for a vowel point	A Bark-scale formant chart picture with a one-Bark formant circle		This script draws a one-Bark vowel circle from given formant values (Hz) on a Bark-scale F1/F2 chart.
draw_formant_point_to_ERB_chart.praat	F1 and F2 values for a vowel point	An ERB-scale formant chart picture with a one-Erb formant circle		This script draws a one-Erb vowel circle from given formant values (Hz) on a Erb-scale F1/F2 chart.
draw_f0_curves_from_files.praat	A directory with sound and/or Pitch files, each representing one unit, e.g., a sentence, to be plotted	A picture with pitch curves drawn on top of each other, the Pitch files, plus a text file with basic pitch statistics	4.4.30	This script draws pitch curves for sound or Pitch files into the same picture. The pitch scale can be selected (Hz, logHz, semitone, mel, ERB). Moreover, basic statistics (min, max, mean, median, stdev) of each pitch object are saved into a tabulated text file in the selected scale. Curves can be drawn as plain lines or speckled, and either in only black lines or in different colours and line types. You may let Praat "normalize time" in order to compare the general contour shapes, or you may choose to plot the curves in absolute time scale. For advanced users: The curves can be drawn in different colours according to, e.g., speaker ID or other group code that is contained by the file name. (See instructions within the script file.) This script is not very well tested yet. Please report any bugs!

Go up