How to extract many small files out of a long sound file

with the Praat program

Suomeksi, kiitos

Mietta Lennes
23.9.2002


These instructions should help you to extract loads of small sound files from a long sound file - relatively painlessly.

Result: you will get either sound files named with running index numbers or files with special labels. (The latter option is easy to handle if you have a list of the words or sentences that the speaker has read aloud.)
Requirements: you should open the long sound file in Praat as a LongSound object.
  1. Mark the boundaries of the intervals that you want to extract into a TextGrid object. One IntervalTier is enough.
    (Select the LongSound object in the object list. Create a TextGrid object for it by pressing the To TextGrid... button to the right.)


  2. a) If you do not know precisely what the contents of each interval are (e.g., the list of words read by the speaker), but you wish to keep this information in the file names (see 4 b below), you should simply listen and type the file names to the intervals in the TextGrid.

    b) If you happen to have a text file that contains, e.g., the sentences read by the speaker (one sentence per line), you can use the script label_from_text_file.praat , which reads the text file line by line and adds the text to the intervals marked in the TextGrid. Note: if every other interval is a pause, you must add empty lines to the text file (or, alternatively, mark all pauses with 'xxx' in the TextGrid. The mark_pauses.praat script can do this for you).
    If you like to make things very semi-automatic: In case the intervals you marked do not fully correspond to the text file (e.g., the speaker has repeated or mispronounced some sentences), it is a good thing to check from the TextGrid, where exactly the labeling starts to go wrong. Then, you can edit the text file or the boundaries in the TextGrid accordingly, and run the label-script again. If this happens often, you can use the faster version of the script: label_quickly_from_text_file.praat (you have to edit the text file path directly to the script file). You could even create a button for the script to the Object list. This way, you will only have to press one single button to update the new labels to the TextGrid.

    c) If you do not care about the file names or saving some special labels, just type anything (even just one character) to each interval that you want saved. Leave those intervals empty that you do not want to save. (You can also skip the typing, if you really want to save all intervals, including pauses etc.)

  3. Save the TextGrid, just in case. (You may also find it useful for other purposes.)

  4. Select the original LongSound object and the TextGrid object together.
a) If you want the small files to be named with a running index number (and the file extension .aiff- or .wav), run the Praat script save_intervals_to_wav_sound_files.praat (makes WAV files), or alternatively save_intervals_to_aiff_sound_files.praat (makes AIFF files). You can even give affixes to the file names (e.g., a certain prefix for a certain speaker's files).
b) If you want the small files to be named after the correponding segment labels in the TextGrid: use the Praat script save_labeled_intervals_to_wav_sound_files.praat  (makes WAV files) or save_labeled_intervals_to_aiff_sound_files.praat (makes AIFF files). You can even give affixes to the file names (e.g., a certain prefix for a certain speaker's files). If there are several intervals with an identical label string, these files will receive an additional running index number to prevent overwriting. However, you have to make sure that the interval labels do not contain illegal characters or excessively long strings.

AND YOU'RE DONE! :-)

More Praat scripts