FlightGear: Festival Voice Interface

This page describes how to use FlightGear's voice interface to the Festival speech synthesis system, so that ATC, Pilot, etc. messages can be made audible. These messages are normally only displayed on top of the screen. A raw socket mode allows to send the messages to arbitrary servers.

Quick instructions (assuming that you have Festival installed)

$ festival --server &
$ fgfs --aircraft=j3cub --airport=KSQL --prop:/sim/sound/voices/enabled=true
Now, in FlightGear, enable ATC (in the menu under "ATC"->"Options"), press the '-key (apostrophe key) and send a message to the ATC. Hear "your" voice, that of the ATC, and some time later that of AI-planes.

Installing the Festival system

Installing more voices

I'm afraid this is a bit tedious. You can skip it if you are happy with the default voice. First find the Festival data directory. All Festival data goes to a common file tree, like in FlightGear. This can be /usr/local/share/festival/ on Unices. We'll call that directory $FESTIVAL for now.

Running FlightGear with voice support

Configuration & Internals

The voice subsystem only offers the common subsystem functions to the rest of FlightGear. There's no built-in function to let it send data to the socket. The only way is to write to the respective speech properties. The number of available voices, or rather "channels", isn't hard-coded. It's the number of <voice> groups in "/sim/sound/voices" that decides how many channels should be opened. This is a typical setting of interface properties, whereby the aliases at the end have nothing to do with the subsystem, but are handy shortcuts:
<sim>
    <voices>
        <host type="string">localhost</host>
        <port type="string">1314</port>
        <enabled type="bool">false</enabled>

        <voice>
            <desc>Pilot</desc>
            <text type="string"></text>
            <volume type="double">1.0</volume>
            <pitch type="double">100.0</pitch>
            <speed type="double">1.0</speed>
            <preamble type="string">(voice_us3_mbrola)</preamble>
            <festival type="bool">true</festival>
        </voice>

        <voice>
            ...
        </voice>

        <!-- handy aliases, not part of the interface: -->

        <atc alias="/sim/sound/voices/voice[0]/text"/>
        <approach alias="/sim/sound/voices/voice[0]/text"/>
        <ground alias="/sim/sound/voices/voice[0]/text"/>
        <pilot alias="/sim/sound/voices/voice[1]/text"/>
        <copilot alias="/sim/sound/voices/voice[2]/text"/>
        <ai-plane alias="/sim/sound/voices/voice[3]/text"/>
    </voices>
</sim>
The <enabled> property decides at init time whether the subsystem should be activated or not. There's currently no way to change this at runtime. Each <voice> group defines one channel. <text> is the output property. Every value that's written to it will be spoken by this channel. If <festival> is true, then the channel will set up <pitch> and <speed> (<volume> does currently not work and has to be 1), and puts Festival markup around the text. If <festival> is false, then all text is written verbatim to the socket. <preamble> is always written to the socket once as last step of the socket creation. In "festival" mode it's used to set the voice, while in raw mode it could be used to identify the channel (assuming that the server knows what to do with it).

Usage

The design principle is that message generators (e.g. the ATC subsystem) write to a message property (e.g. /sim/messages/pilot). A listener ($FG_ROOT/Nasal/screen.nas) watches this property and decides what to do with it. For pilot and ATC it writes the message to the screen.log and copies it to the /sim/sound/voices/pilot property. This is an alias to the real voice channel /sim/sound/voices/voice[1]/text. This allows the most control and makes all steps user-configurable from Nasal scripts. Message generator should not write to the voice's <text> property directly, and only to the /sim/sound/voices/* aliases if a message should not be displayed by the system.

Backward compatibility

The new voice subsystem is functionally compatible with the old one that was part of the ATC subsystem. You just need to turn the <festival> bool properties off and set the server address correctly. This sends only the messages without any Festival syntax added:
<sim>
    <voices>
        <host type="string">192.168.2.15</host>
        <port type="string">7100</port>
        <enabled type="bool">true</enabled>
        <voice>
            <desc>ATC/Approach/Ground</desc>
            <text type="string"></text>
            <preamble type="string">ATC</preamble>
            <festival type="bool">false</festival>
        </voice>
        <voice>
            <desc>Pilot</desc>
            <text type="string"></text>
            <preamble type="string">Pilot</preamble>
            <festival type="bool">false</festival>
        </voice>
        ...
    </voices>
</sim>
<volume>, <pitch>, and <speed> have no meaning and can be left away. Note that also in this mode the preamble gets sent first. It can be used to identify the channel. Of course, all messages could be sent to just one channel, though.

Multichannel server

Raw mode does, of course, require a different server than Festival. Here's a small Perl example for a multichannel server. Note how the <preamble> is used as channel identification:
#!/usr/bin/perl -Tw
# License: GPL V2
# Modified after Example from perlipc.pod ($ man perlipc)

use strict;
BEGIN {
	$ENV{PATH} = '/usr/ucb:/bin';
}

use Socket;
use Carp;
my $EOL = "\015\012";

sub spawn;  # forward declaration
sub logmsg {
	print "$0 $$: @_ at ", scalar localtime, "\n";
}


my $port = shift || 1314;
my $proto = getprotobyname('tcp');


($port) = $port =~ /^(\d+)$/ or die "invalid port";


socket(Server, PF_INET, SOCK_STREAM, $proto) || die "socket: $!";
setsockopt(Server, SOL_SOCKET, SO_REUSEADDR, pack("l", 1)) || die "setsockopt: $!";
bind(Server, sockaddr_in($port, INADDR_ANY)) || die "bind: $!";
listen(Server,SOMAXCONN) || die "listen: $!";


logmsg "server started on port $port";


my $waitedpid = 0;
my $paddr;

use POSIX ":sys_wait_h";
sub REAPER {
	my $child;
	while (($waitedpid = waitpid(-1,WNOHANG)) > 0) {
		logmsg "reaped $waitedpid" . ($? ? " with exit $?" : '');
	}
	$SIG{CHLD} = \&REAPER;  # loathe sysV
}


$SIG{CHLD} = \&REAPER;

for ($waitedpid = 0;
		($paddr = accept(Client,Server)) || $waitedpid;
		$waitedpid = 0, close Client) {
	next if $waitedpid and not $paddr;
	my($port,$iaddr) = sockaddr_in ($paddr);
	my $name = gethostbyaddr($iaddr,AF_INET);

	logmsg "connection from $name [", inet_ntoa($iaddr), "] at port $port";

	spawn sub {
		$|=1;
		print "Hello there, $name, it's now ", scalar localtime, $EOL;
		exec '/usr/bin/fortune'           # XXX: `wrong' line terminators
			or confess "can't exec fortune: $!";
	};
}


sub spawn
{
	my $coderef = shift;

	unless (@_ == 0 && $coderef && ref($coderef) eq 'CODE') {
		confess "usage: spawn CODEREF";
	}

	my $pid;
	if (!defined($pid = fork)) {
		logmsg "cannot fork: $!";
		return;
	} elsif ($pid) {
		logmsg "creating child $pid";
		return; # I'm the parent
	}
	# else I'm the child -- go spawn

	# print header
	my $id;
	while (<Client>) {
		s/^\s+//;
		s/\s+$//;

		# first line is voice channel id = "<preamble>"
		if (not defined $id) {
			$id = $_;
			next;
		}

		print "\033[32m$id: \033[m$_\n";
		last unless /\S/;
	}

	open(STDIN,  "<&Client") || die "can't dup client to stdin";
	open(STDOUT, ">&Client") || die "can't dup client to stdout";
	## open(STDERR, ">&STDOUT") || die "can't dup stdout to stderr";
	exit &$coderef();
}