first version of a voice README. Because of the links I decided to write
it as HTML, although I prefer raw text otherwise.
This commit is contained in:
parent
0bed47d554
commit
7ae98578f3
1 changed files with 196 additions and 0 deletions
196
Docs/README.voice.html
Normal file
196
Docs/README.voice.html
Normal file
|
@ -0,0 +1,196 @@
|
||||||
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<title>FlightGear: Festival Voice Interface</title>
|
||||||
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||||
|
</head>
|
||||||
|
|
||||||
|
<body>
|
||||||
|
|
||||||
|
|
||||||
|
<h1>FlightGear: Festival Voice Interface</h1>
|
||||||
|
|
||||||
|
This page describes how to use FlightGear's voice interface to the Festival speech synthesis system, so that
|
||||||
|
ATC, Pilot, etc. messages can be made audible. These messages are normally only displayed on top of the screen.
|
||||||
|
A raw socket mode allows to send the messages to arbitrary servers.
|
||||||
|
|
||||||
|
|
||||||
|
<h2>Quick instructions (assuming that you have Festival installed)</h2>
|
||||||
|
|
||||||
|
<blockquote><pre>
|
||||||
|
$ festival --server &
|
||||||
|
$ fgfs --aircraft=j3cub --airport=KSQL --prop:/sim/sound/voices/enabled=true</pre></blockquote>
|
||||||
|
|
||||||
|
Now, in FlightGear, enable ATC (in the menu under "ATC"->"Options"), press the '-key (apostrophe key) and
|
||||||
|
send a message to the ATC. Hear "your" voice, that of the ATC, and some time later that of AI-planes.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
<h2>Installing the Festival system</h2>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>
|
||||||
|
Make sure Festival is installed, or download it from here:
|
||||||
|
<a href="http://www.cstr.ed.ac.uk/projects/festival/">http://www.cstr.ed.ac.uk/projects/festival/</a>
|
||||||
|
</li><li>
|
||||||
|
Check if Festival works. Only the relevant lines are shown here. Note the parentheses!</li>
|
||||||
|
<blockquote><pre>
|
||||||
|
$ festival
|
||||||
|
festival> (SayText "FlightGear")
|
||||||
|
festival> (quit)</pre></blockquote>
|
||||||
|
</li><li>
|
||||||
|
Check if MBROLA is installed, or download it from here:
|
||||||
|
<a href="http://tcts.fpms.ac.be/synthesis/mbrola/">http://tcts.fpms.ac.be/synthesis/mbrola/</a> -> "Downloads"
|
||||||
|
-> "MBROLA binary and voices" (link at the bottom; hard to find). Choose the binary for your platform.
|
||||||
|
Unfortunately, there's no source code available. If you don't like that, then you can skip the whole MBROLA
|
||||||
|
setup. But then you can't use the more realistic voices. You can also install further MBROLA voices from
|
||||||
|
this page. (See below)
|
||||||
|
</li><li>
|
||||||
|
Run MBROLA and marvel at the help screen. That's just to check if it's in the path and executable.
|
||||||
|
<blockquote><pre>
|
||||||
|
$ mbrola -h</pre></blockquote>
|
||||||
|
</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
|
||||||
|
<h2>Installing more voices</h2>
|
||||||
|
|
||||||
|
I'm afraid this is a bit tedious. You can skip it if you are happy with the default voice. First find the
|
||||||
|
Festival data directory. All Festival data goes to a common file tree, like in FlightGear. This can be
|
||||||
|
<tt>/usr/local/share/festival/</tt> on Unices. We'll call that directory <tt>$FESTIVAL</tt> for now.
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>
|
||||||
|
Check which voices are available. You can test them by prepending <tt>voice_</tt>:
|
||||||
|
<blockquote><pre>
|
||||||
|
$ festival
|
||||||
|
festival> (print (mapcar (lambda (pair) (car pair)) voice-locations))
|
||||||
|
(kal_diphone rab_diphone don_diphone us1_mbrola us2_mbrola us3_mbrola en1_mbrola)
|
||||||
|
nil
|
||||||
|
festival> (voice_us3_mbrola)
|
||||||
|
festival> (SayText "I've got a nice voice.")
|
||||||
|
festival> (quit)</pre></blockquote>
|
||||||
|
</li><li>
|
||||||
|
Festival voices and MBROLA wrappers can be downloaded here:
|
||||||
|
<a href="http://festvox.org/packed/festival/1.95/">http://festvox.org/packed/festival/1.95/</a>
|
||||||
|
The "don_diphone" voice isn't the best, but it's comparatively small and well suited for "ai-planes".
|
||||||
|
If you install it, it should end up as directory <tt>$FESTIVAL/voices/english/don_diphone/</tt>. You also need
|
||||||
|
to install "festlex_OALD.tar.gz" for it as <tt>$FESTIVAL/dicts/oald/</tt> and run the Makefile in this
|
||||||
|
directory. (You may have to add "<tt>--heap 10000000</tt>" to the festival command arguments in the Makefile.)
|
||||||
|
</li><li>
|
||||||
|
Quite good voices are "us2_mbrola", "us3_mbrola", and "en1_mbrola". For these you need to install
|
||||||
|
MBROLA (see above) as well as these wrappers: <tt>festvox_us2.tar.gz</tt>, <tt>festvox_us3.tar.gz</tt>,
|
||||||
|
and <tt>festvox_en1.tar.gz</tt>. They create directories <tt>$FESTIVAL/voices/english/us2_mbrola/</tt> etc.
|
||||||
|
The voice <em>data</em>, however, has to be downloaded separately from another site:
|
||||||
|
</li><li>
|
||||||
|
MBROLA voices can be downloaded from the MBROLA download page (see above). You want the
|
||||||
|
voices labeled "us2" and "us3". Unpack them in the directories that the wrappers have created:
|
||||||
|
<tt>$FESTIVAL/voices/english/us2_mbrola/</tt> and likewise for "us3" and "en1".
|
||||||
|
</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
|
||||||
|
<h2>Running FlightGear with voice support</h2>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>First start the festival server:
|
||||||
|
<blockquote><pre>
|
||||||
|
$ festival --server</pre></blockquote>
|
||||||
|
</li><li>
|
||||||
|
Start FlightGear with enabled voice subsystem, let's say with
|
||||||
|
<blockquote><pre>
|
||||||
|
$ fgfs --aircraft=j3cub --airport=KSQL --prop:/sim/sound/voices/enabled=true</pre></blockquote>
|
||||||
|
Of course, you can put this option into your personal configuration file. This doesn't mean that
|
||||||
|
you then <em>always</em> have to use FlightGear together with Festival. You'll just get a few
|
||||||
|
error messages in the terminal window, but that's it. Note that you can currently <em>not</em>
|
||||||
|
enable the voice subsystem at runtime!
|
||||||
|
</li><li>
|
||||||
|
Open the property browser to <tt>/sim/sound/voices/voice[0]/</tt> and write some text to the
|
||||||
|
<tt>text</tt> property. You should now hear this spoken with the default voice ("voice_kal_diphone").
|
||||||
|
You can try the same with <tt>voice[1]/</tt> etc. and should hear different voices if they
|
||||||
|
are installed, or the default voice again otherwise.
|
||||||
|
</li><li>
|
||||||
|
Contact the KSFO ATC via '-key dialog (apostrophe key). You should hear "your" voice first (and see the
|
||||||
|
text in yellow color on top of the screen), then you should hear ATC answer with a different voice (and see
|
||||||
|
it in light-green color).
|
||||||
|
</li><li>
|
||||||
|
You can edit the voice parameters in the <tt>preferences.xml</tt> file, and select different
|
||||||
|
screen colors and voice assignments in <tt>$FG_ROOT/Nasal/voice.nas</tt>. The messages aren't written
|
||||||
|
to the respective <tt>/sim/sound/voices/voice[*]/text</tt> properties directly, but rather to aliases
|
||||||
|
<tt>/sim/sound/voices/{atc,approach,ground,pilot,ai-plane}</tt>. (BTW: I've never heard anything from
|
||||||
|
<tt>ground</tt> and <tt>approach</tt> yet.)
|
||||||
|
</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
<h2>Cofiguration & Internals</h2>
|
||||||
|
|
||||||
|
The <em>voice</em> subsystem only offers the common subsystem functions to the rest of FlightGear.
|
||||||
|
There's no built-in function to let it send data to the socket. The only way is to write to the
|
||||||
|
respective speech properties. The number of available voices, or rather "channels", isn't hard-coded.
|
||||||
|
It's the number of <voice> groups in "/sim/sound/voices" that decides how many channels should be
|
||||||
|
opened. This is a typical setting of interface properties, whereby the aliases at the end have
|
||||||
|
nothing to do with the subsystem, but are handy shortcuts:
|
||||||
|
|
||||||
|
<blockquote><pre>
|
||||||
|
<sim>
|
||||||
|
<voices>
|
||||||
|
<host type="string">localhost</host>
|
||||||
|
<port type="string">1314</port>
|
||||||
|
<enabled type="bool">false</enabled>
|
||||||
|
|
||||||
|
<voice>
|
||||||
|
<desc>Pilot</desc>
|
||||||
|
<text type="string"></text>
|
||||||
|
<volume type="double">1.0</volume>
|
||||||
|
<pitch type="double">100.0</pitch>
|
||||||
|
<speed type="double">1.0</speed>
|
||||||
|
<preamble type="string">(voice_us3_mbrola)</preamble>
|
||||||
|
<festival type="bool">true</festival>
|
||||||
|
</voice>
|
||||||
|
|
||||||
|
<voice>
|
||||||
|
...
|
||||||
|
</voice>
|
||||||
|
|
||||||
|
<!-- handy aliases, not part of the interface: -->
|
||||||
|
|
||||||
|
<atc alias="/sim/sound/voices/voice[0]/text"/>
|
||||||
|
<approach alias="/sim/sound/voices/voice[0]/text"/>
|
||||||
|
<ground alias="/sim/sound/voices/voice[0]/text"/>
|
||||||
|
<pilot alias="/sim/sound/voices/voice[1]/text"/>
|
||||||
|
<copilot alias="/sim/sound/voices/voice[2]/text"/>
|
||||||
|
<ai-plane alias="/sim/sound/voices/voice[3]/text"/>
|
||||||
|
</voices>
|
||||||
|
</sim>
|
||||||
|
</pre></blockquote>
|
||||||
|
|
||||||
|
The <enabled> property decides at init time whether the subsystem should
|
||||||
|
be activated or not. There's currently no way to change this at runtime.
|
||||||
|
|
||||||
|
Each <voice> group defines one channel. <text> is the output
|
||||||
|
property. Every value that's written to it will be spoken by this channel.
|
||||||
|
If <festival> is true, then the channel will set up <pitch> and
|
||||||
|
<speed> (<volume> does currently not work and has to be <tt>1</tt>),
|
||||||
|
and puts Festival markup around the text. If <festival> is false,
|
||||||
|
then all text is written verbatim to the socket. <preamble> is always
|
||||||
|
written to the socket once as last step of the socket creation. In "festival"
|
||||||
|
mode it's used to set the voice, while in raw mode it could be used to identify
|
||||||
|
the channel (assuming that the server knows what to do with it).
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
<h2>Usage</h2>
|
||||||
|
|
||||||
|
The design principle is that message generators (e.g. the ATC subsystem) write
|
||||||
|
to a message property (e.g. <tt>/sim/messages/pilot</tt>). A listener ($FG_ROOT/Nasal/screen.nas)
|
||||||
|
watches this property and decides what to do with it. For pilot and ATC it writes the message
|
||||||
|
to the screen.log and copies it to the <tt>/sim/sound/voices/pilot</tt> property. This
|
||||||
|
is an alias to the real voice channel <tt>/sim/sound/voices/voice[1]/text</tt>.
|
||||||
|
This allows the most control and makes all steps user-configurable from Nasal
|
||||||
|
scripts. Message generator should <em>not</em> write to the voice's <text>
|
||||||
|
property directly, and only to the <tt>/sim/sound/voices/*</tt> aliases if a
|
||||||
|
message should not be displayed by the system.
|
||||||
|
</body>
|
||||||
|
</html>
|
Loading…
Reference in a new issue