Welcome to the Voice Annotation Tool documentation!
This program allows annotation of a list of audio files and exporting it in a similar format as Mozilla’s CommonVoice dataset.
Contents
Usage
Installation
The tool can be install using pip:
$ pip install voice-annotation-tool
To start the program you can use the command line script or execute the python module:
$ voice-annotation-tool
$ python -m voice_annotation_tool
Creating a Project
To get started, click the “Create Project” button. You will be prompted to open the project settings, where you can choose the folder with the audio samples and the location of the tsv file.
The TSV file is used to store the metadata of the audio samples, including the annotated text and the age, gender and accent of the speaker. More about the format can be found here: The CommonVoice TSV Format
The User Interface
Sample List
In this list you can select the sample you want to annotate.
You can delete the selected samples under Edit > Delete Selected
or pressing Delete
.
When you change the annotation of a sample it is highlighted in green. To remove this highlight select the samples and press the Mark Unchanged
.
Metadata Section
In this region you can edit the metadata of all selected annotations.
You can also import the profile metadata exported from the CommonVoice account page.
Annotation Field
Here you can edit the annotation of the selected sample.
Audio Playback Controls
To play the audio of the selected sample, press the play button. There are also buttons to move to the next / previous sample. To speed up the workflow you can also assign shortcuts to these buttons: Keyboard Shortcuts
Keyboard Shortcuts
Keyboard shortcuts can be configured under Settings > Keyboard Shortcuts.
To change a shortcut, click the button and press the shortcut you want to assign.
Right-click the button to clear the shortcut.
The CommonVoice TSV Format
The TSV format can store tabular data using tab separated values.
The fields included in the TSV files of the CommonVoice dataset are explained here: https://github.com/common-voice/cv-dataset#fields
The order of the samples in the TSV file is kept, with new samples added to the end.
Import and Export
The annotations can be exported and imported in multiple formats, including Json and CSV.
These are meant for easy reading and modification by other applications, and only include the sample file name and the annotated text.
Json
[
{
"file": "sample1.mp3",
"text": "Annotation One",
},
{
"file": "sample2.mp3",
"text": "Annotation Two",
}
]
CSV
file;text
sample1.mp3;Annotation One
sample2.mp3;Annotation Two
API
The voice-annotation-tool package also includes an API to load, modify and save CommonVoice datasets.
from voice_annotation_tool.project import Project
# Create a new project.
project = Project()
# Load a TSV file.
with open('tsv_file.tsv') as file:
project.load_tsv(file)
# Show all annotations.
print(project.annotations)
from voice_annotation_tool.project import Project
from voice_annotation_tool.annotation import Annotation
project = Project()
project.load_audio_files(Path('audio'))
annotation = project.annotations[0]
annotation.sentence = 'Annotated text'
annotation.age = 'thirties'
annotation.up_votes += 1
with open('tsv_file.tsv', 'w', newline='') as file:
project.save_annotations(file)