FlightGear: Festival Voice Interface

This page describes how to use FlightGear's voice interface to the Festival speech synthesis system, so that ATC, Pilot, etc. messages can be made audible. These messages are normally only displayed on top of the screen. A raw socket mode allows to send the messages to arbitrary servers.

Quick instructions (assuming that you have Festival installed)

$ festival --server &
$ fgfs --aircraft=j3cub --airport=KSQL --prop:/sim/sound/voices/enabled=true

Now, in FlightGear, enable ATC (in the menu under "ATC"->"Options"), press the '-key (apostrophe key) and send a message to the ATC. Hear "your" voice, that of the ATC, and some time later that of AI-planes.

Installing the Festival system

Make sure Festival is installed, or download it from here: http://www.cstr.ed.ac.uk/projects/festival/
Check if Festival works. Only the relevant lines are shown here. Note the parentheses!
```
$ festival
festival> (SayText "FlightGear")
festival> (quit)
```
Check if MBROLA is installed, or download it from here: http://tcts.fpms.ac.be/synthesis/mbrola/ -> "Downloads" -> "MBROLA binary and voices" (link at the bottom; hard to find). Choose the binary for your platform. Unfortunately, there's no source code available. If you don't like that, then you can skip the whole MBROLA setup. But then you can't use the more realistic voices. You can also install further MBROLA voices from this page. (See below)
Run MBROLA and marvel at the help screen. That's just to check if it's in the path and executable.
```
$ mbrola -h
```

Installing more voices

I'm afraid this is a bit tedious. You can skip it if you are happy with the default voice. First find the Festival data directory. All Festival data goes to a common file tree, like in FlightGear. This can be /usr/local/share/festival/ on Unices. We'll call that directory $FESTIVAL for now.

Check which voices are available. You can test them by prepending voice_:

$ festival
festival> (print (mapcar (lambda (pair) (car pair)) voice-locations))
(kal_diphone rab_diphone don_diphone us1_mbrola us2_mbrola us3_mbrola en1_mbrola)
nil
festival> (voice_us3_mbrola)
festival> (SayText "I've got a nice voice.")
festival> (quit)

Festival voices and MBROLA wrappers can be downloaded here: http://festvox.org/packed/festival/1.95/ The "don_diphone" voice isn't the best, but it's comparatively small and well suited for "ai-planes". If you install it, it should end up as directory $FESTIVAL/voices/english/don_diphone/. You also need to install "festlex_OALD.tar.gz" for it as $FESTIVAL/dicts/oald/ and run the Makefile in this directory. (You may have to add "--heap 10000000" to the festival command arguments in the Makefile.)
Quite good voices are "us2_mbrola", "us3_mbrola", and "en1_mbrola". For these you need to install MBROLA (see above) as well as these wrappers: festvox_us2.tar.gz, festvox_us3.tar.gz, and festvox_en1.tar.gz. They create directories $FESTIVAL/voices/english/us2_mbrola/ etc. The voice data, however, has to be downloaded separately from another site:
MBROLA voices can be downloaded from the MBROLA download page (see above). You want the voices labeled "us2" and "us3". Unpack them in the directories that the wrappers have created: $FESTIVAL/voices/english/us2_mbrola/ and likewise for "us3" and "en1".

Running FlightGear with voice support

First start the festival server:
```
$ festival --server
```
Start FlightGear with enabled voice subsystem, let's say with
```
$ fgfs --aircraft=j3cub --airport=KSQL --prop:/sim/sound/voices/enabled=true
```
Of course, you can put this option into your personal configuration file. This doesn't mean that you then always have to use FlightGear together with Festival. You'll just get a few error messages in the terminal window, but that's it. Note that you can currently not enable the voice subsystem at runtime!
Open the property browser to /sim/sound/voices/voice[0]/ and write some text to the text property. You should now hear this spoken with the default voice ("voice_kal_diphone"). You can try the same with voice[1]/ etc. and should hear different voices if they are installed, or the default voice again otherwise.
Contact the KSFO ATC via '-key dialog (apostrophe key). You should hear "your" voice first (and see the text in yellow color on top of the screen), then you should hear ATC answer with a different voice (and see it in light-green color).
You can edit the voice parameters in the preferences.xml file, and select different screen colors and voice assignments in $FG_ROOT/Nasal/voice.nas. The messages aren't written to the respective /sim/sound/voices/voice[*]/text properties directly, but rather to aliases /sim/sound/voices/{atc,approach,ground,pilot,ai-plane}. (BTW: I've never heard anything from ground and approach yet.)

Configuration & Internals

The voice subsystem only offers the common subsystem functions to the rest of FlightGear. There's no built-in function to let it send data to the socket. The only way is to write to the respective speech properties. The number of available voices, or rather "channels", isn't hard-coded. It's the number of <voice> groups in "/sim/sound/voices" that decides how many channels should be opened. This is a typical setting of interface properties, whereby the aliases at the end have nothing to do with the subsystem, but are handy shortcuts:

<sim>
    <voices>
        <host type="string">localhost</host>
        <port type="string">1314</port>
        <enabled type="bool">false</enabled>

        <voice>
            <desc>Pilot</desc>
            <text type="string"></text>
            <volume type="double">1.0</volume>
            <pitch type="double">100.0</pitch>
            <speed type="double">1.0</speed>
            <preamble type="string">(voice_us3_mbrola)</preamble>
            <festival type="bool">true</festival>
        </voice>

        <voice>
            ...
        </voice>

        <!-- handy aliases, not part of the interface: -->

        <atc alias="/sim/sound/voices/voice[0]/text"/>
        <approach alias="/sim/sound/voices/voice[0]/text"/>
        <ground alias="/sim/sound/voices/voice[0]/text"/>
        <pilot alias="/sim/sound/voices/voice[1]/text"/>
        <copilot alias="/sim/sound/voices/voice[2]/text"/>
        <ai-plane alias="/sim/sound/voices/voice[3]/text"/>
    </voices>
</sim>

The <enabled> property decides at init time whether the subsystem should be activated or not. There's currently no way to change this at runtime. Each <voice> group defines one channel. <text> is the output property. Every value that's written to it will be spoken by this channel. If <festival> is true, then the channel will set up <pitch> and <speed> (<volume> does currently not work and has to be 1), and puts Festival markup around the text. If <festival> is false, then all text is written verbatim to the socket. <preamble> is always written to the socket once as last step of the socket creation. In "festival" mode it's used to set the voice, while in raw mode it could be used to identify the channel (assuming that the server knows what to do with it).

Usage

The design principle is that message generators (e.g. the ATC subsystem) write to a message property (e.g. /sim/messages/pilot). A listener ($FG_ROOT/Nasal/screen.nas) watches this property and decides what to do with it. For pilot and ATC it writes the message to the screen.log and copies it to the /sim/sound/voices/pilot property. This is an alias to the real voice channel /sim/sound/voices/voice[1]/text. This allows the most control and makes all steps user-configurable from Nasal scripts. Message generator should not write to the voice's <text> property directly, and only to the /sim/sound/voices/* aliases if a message should not be displayed by the system.

Backward compatibility

The new voice subsystem is functionally compatible with the old one that was part of the ATC subsystem. You just need to turn the <festival> bool properties off and set the server address correctly. This sends only the messages without any Festival syntax added:

<sim>
    <voices>
        <host type="string">192.168.2.15</host>
        <port type="string">7100</port>
        <enabled type="bool">true</enabled>
        <voice>
            <desc>ATC/Approach/Ground</desc>
            <text type="string"></text>
            <preamble type="string">ATC</preamble>
            <festival type="bool">false</festival>
        </voice>
        <voice>
            <desc>Pilot</desc>
            <text type="string"></text>
            <preamble type="string">Pilot</preamble>
            <festival type="bool">false</festival>
        </voice>
        ...
    </voices>
</sim>

<volume>, <pitch>, and <speed> have no meaning and can be left away. Note that also in this mode the preamble gets sent first. It can be used to identify the channel. Of course, all messages could be sent to just one channel, though.

Multichannel server

Raw mode does, of course, require a different server than Festival. Here's a small Perl example for a multichannel server. Note how the <preamble> is used as channel identification:

#!/usr/bin/perl -Tw
# License: GPL V2
# Modified after Example from perlipc.pod ($ man perlipc)

use strict;
BEGIN {
	$ENV{PATH} = '/usr/ucb:/bin';
}

use Socket;
use Carp;
my $EOL = "\015\012";

sub spawn;  # forward declaration
sub logmsg {
	print "$0 $$: @_ at ", scalar localtime, "\n";
}


my $port = shift || 1314;
my $proto = getprotobyname('tcp');


($port) = $port =~ /^(\d+)$/ or die "invalid port";


socket(Server, PF_INET, SOCK_STREAM, $proto) || die "socket: $!";
setsockopt(Server, SOL_SOCKET, SO_REUSEADDR, pack("l", 1)) || die "setsockopt: $!";
bind(Server, sockaddr_in($port, INADDR_ANY)) || die "bind: $!";
listen(Server,SOMAXCONN) || die "listen: $!";


logmsg "server started on port $port";


my $waitedpid = 0;
my $paddr;

use POSIX ":sys_wait_h";
sub REAPER {
	my $child;
	while (($waitedpid = waitpid(-1,WNOHANG)) > 0) {
		logmsg "reaped $waitedpid" . ($? ? " with exit $?" : '');
	}
	$SIG{CHLD} = \&REAPER;  # loathe sysV
}


$SIG{CHLD} = \&REAPER;

for ($waitedpid = 0;
		($paddr = accept(Client,Server)) || $waitedpid;
		$waitedpid = 0, close Client) {
	next if $waitedpid and not $paddr;
	my($port,$iaddr) = sockaddr_in ($paddr);
	my $name = gethostbyaddr($iaddr,AF_INET);

	logmsg "connection from $name [", inet_ntoa($iaddr), "] at port $port";

	spawn sub {
		$|=1;
		print "Hello there, $name, it's now ", scalar localtime, $EOL;
		exec '/usr/bin/fortune'           # XXX: `wrong' line terminators
			or confess "can't exec fortune: $!";
	};
}


sub spawn
{
	my $coderef = shift;

	unless (@_ == 0 && $coderef && ref($coderef) eq 'CODE') {
		confess "usage: spawn CODEREF";
	}

	my $pid;
	if (!defined($pid = fork)) {
		logmsg "cannot fork: $!";
		return;
	} elsif ($pid) {
		logmsg "creating child $pid";
		return; # I'm the parent
	}
	# else I'm the child -- go spawn

	# print header
	my $id;
	while (<Client>) {
		s/^\s+//;
		s/\s+$//;

		# first line is voice channel id = "<preamble>"
		if (not defined $id) {
			$id = $_;
			next;
		}

		print "\033[32m$id: \033[m$_\n";
		last unless /\S/;
	}

	open(STDIN,  "<&Client") || die "can't dup client to stdin";
	open(STDOUT, ">&Client") || die "can't dup client to stdout";
	## open(STDERR, ">&STDOUT") || die "can't dup stdout to stderr";
	exit &$coderef();
}