1
0
Fork 0
fgdata/Docs/README.embedded-resources
Scott Giese c5460a3cf1 Doc/README: Only use Latin character set.
Eliminate overruns in PDF output.
2018-10-08 20:11:58 -05:00

478 lines
22 KiB
Text

-*- coding: utf-8; fill-column: 72; -*-
The Embedded Resources System
=============================
This document gives an overview of FlightGear's embedded resources
system and related classes. For specific information on the C++
functions, the reference documentation is in the corresponding header
files.
Contents
--------
1. The CharArrayStream and ZlibStream classes
2. The "embedded resources" system
3. About the XML resource declaration files
4. The EmbeddedResourceProxy class
Introduction
------------
The embedded resources system allows FlightGear to use data from files
without relying on FG_ROOT to be set. This can be used, for instance, to
grab the contents of XML files at FG build time, from any repository[1],
and use said contents in the C++ code. The term "embedded" is used to
avoid confusion with the ResourceProvider and ResourceManager classes
provided by SimGear, which have nothing to do with the system described
here.
The embedded resources system relies on classes present in
simgear/io/iostreams/{zlibstream.cxx,CharArrayStream.cxx}, which were
implemented as a way to address a concern that embedding a few XML files
in the fgfs binary could use precious memory. The resource compiler
(fgrcc) compresses resources before writing them in C++ form---except
for some extensions, and it's configurable on a per-resource basis
anyway. Then, the EmbeddedResourceManager instance, which lives in the
fgfs process, can decompress them on-the-fly, incrementally,
transparently. So, there is really no reason to worry about memory
consumption, even for several dozens of XML files.
fgrcc is the resource compiler: it turns arbitrary files into C++ code
the EmbeddedResourceManager can make use of, in order to "serve" the
files' contents at runtime. It is named this way, because it fulfills
the same role as Qt's rcc tool. It supports a thin superset of the
XML-based format used by rcc for declaring resources[2][3].
'fgrcc --help' gives a lot of info.
1) The CharArrayStream and ZlibStream classes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The CharArrayStream* files in simgear/io/iostreams/ implement
CharArrayStreambuf and related IOStreams classes for working with char
arrays, namely:
- CharArrayStreambuf subclass of std::streambuf stream buffer
- ROCharArrayStreambuf subclass of CharArrayStreambuf stream buffer
- CharArrayIStream subclass of std::istream input stream
- CharArrayOStream subclass of std::ostream output stream
- CharArrayIOStream subclass of std::iostream input/output stream
(in the 'simgear' namespace, of course)
CharArrayStreambuf is a stream buffer class allowing to read from, and
write to char arrays (std::strstream has been deprecated since C++98).
Contrary to std::strstream, this class does no dynamic allocation: it is
very simple, strictly staying for both reads and writes within the
bounds of the buffer specified in its constructor. Contrary to
std::stringstream, CharArrayStreambuf allows one to work on an array of
char (that could be static data, on the stack, whatever) without having
to make a whole copy of it.
ROCharArrayStreambuf is a read-only subclass of CharArrayStreambuf
(useful for const-correctness). CharArrayIStream, CharArrayOStream and
CharArrayIOStream are very simple convenience stream classes using
either CharArrayStreambuf or ROCharArrayStreambuf as their associated
stream buffer class.
While these classes can be of general-purpose usefulness, the particular
reason they have been written for is to make the embedded resources
system clean and memory-friendly. Concretely, this system supports both
compressed and uncompressed resources, all of which can be read from
their respective static arrays like this (think pipelines):
static char array
(uncompressed ---------------> data available via an std::istream
resource) CharArrayIStream or std::streambuf interface
or ROCharArrayStreambuf
static char array
(compressed ---------------> compressed data -------------------> ditto
resource) CharArrayIStream ZlibDecompressorIStream
or ZlibDecompressorIStreambuf
where ditto = uncompressed data available via an std::istream or
std::streambuf interface
So, whether the resource data stored in static arrays by fgrcc is
compressed or not, end-user code can read it in uncompressed form using
an std::istream or std::streambuf interface, which means the resource
never needs to be copied in memory a second time. This is particularly
interesting with compressed resources, because:
1) The in-memory static data is much smaller in general than the
uncompressed contents, and it's the only one we really have to
"pay" for if one uses these stream-based interfaces.
2) The data is transparently decompressed on-demand as the end-user
code reads from the ZlibDecompressorIStream or
ZlibDecompressorIStreambuf instance.
In other words, these CharArrayStream classes complement the ones in
zlibstream.cxx and make it easy to implement all kinds of pipelines to
incrementally read or write, and possibly on-the-fly compress or
decompress data from or to in-memory buffers (cf.
writeCompressedDataToBuffer() in
simgear/simgear/embedded_resources/embedded_resources_test.cxx, or
ResourceCodeGenerator::writeEncodedResourceContents() in
flightgear/src/EmbeddedResources/fgrcc.cxx for examples).
Since all of these provide standard IOStreams interfaces, they can be
easily plugged into existing code. For instance, readXML() in
simgear/simgear/xml/easyxml.cxx and readProperties() in
simgear/props/props_io.cxx can incrementally read and parse data from an
std::istream instance, and thus are able to directly read from a
resource containing the compressed version of an XML file.
This incremental stuff is of course really interesting with large
resources... which probably won't be used in FlightGear, in order not to
waste RAM[4][5]. The EmbeddedResourceManager also has a getString()
method to simply get an std::string when you don't care about the fact
that this operation, by std::string design, will necessarily make a copy
of the whole resource contents (in uncompressed form in the case of a
compressed resource). This getString() method should be convenient and
quite acceptable for reasonably-sized resources.
Finally, all of these classes---CharArray*Stream*, the classes in
zlibstream.cxx, the EmbeddedResourceManager and related classes---can
handle text and binary data in exactly the same way (std::string doesn't
care, and neither do the other classes).
2) The "embedded resources" system
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The embedded resources system works this way:
(1) The fgrcc resource compiler reads an XML file which has almost the
same syntax[2] as Qt's .qrc files[3] and writes a .cxx file
containing:
- static char arrays initialized with resource contents
(possibly compressed, this is automatic unless explicitly
specified in the XML file);
- a function definition containing calls to
EmbeddedResourceManager::addResource() that register each of
these resources with the EmbeddedResourceManager instance.
If you pass the --output-header-file option to fgrcc, it also
writes a header file that goes with the generated .cxx file. For
other options, see the output of 'fgrcc --help'.
It is quite possible to call fgrcc several times, each time with a
different (XML input file, .cxx/.hxx output files) tuple: for
instance, one call for resources present in the FlightGear repo,
and possibly another call for resources in FGData. The point of
this is that paths in the XML input file should be relative to
avoid being system-dependent, and fgrcc accepts a --root option to
indicate what you want them to be relative to, in order to let it
find the real files. Thus, on a first invocation of fgrcc, one can
make --root point to a path to the FlightGear repository when
building, and on the second call use it to indicate a path to the
FGData repository. Other variations are possible, of course.
Notes:
1) The example given here with FGData would *not* freeze the
FGData location at FG compile time; this is only to allow
files from FGData to be turned into generated .cxx files
inside the FG source tree, that will make their contents
available as embedded resources at runtime.
2) At the time of this writing, resources from the FlightGear
repository are compiled at build time, and resources from the
FGData repository are compiled offline using the
'rebuild-fgdata-embedded-resources' script[6] (a
convenience wrapper for fgrcc), before being committed to the
FlightGear repository.
(2) SimGear contains an EmbeddedResourceManager class with, among
others, createInstance() and instance() methods similar to the
ones of NavDataCache. See [7] for the corresponding code.
FlightGear creates an EmbeddedResourceManager instance at startup
and calls the various init functions generated by fgrcc, each of
which registers the resources present in its containing .cxx file
(using EmbeddedResourceManager::addResource()).
End-user FG code can then use EmbeddedResourceManager methods such
as getResource(), getString(), getStreambuf() and getIStream()
to access resource contents:
- getResource() returns an
std::shared_ptr<const AbstractEmbeddedResource>
- getString() returns an std::string
- getStreambuf() returns an std::unique_ptr<std::streambuf>
- getIStream() returns an std::unique_ptr<std::istream>
AbstractEmbeddedResource is an abstract base class that you can
think of as a resource descriptor: it points to (not contains!)
the resource data (which is normally of static storage class), and
contains + gives access to metadata such as the compression type
and resource size (compressed and uncompressed).
AbstractEmbeddedResource currently has two derived concrete
classes: RawEmbeddedResource for resources stored as-is
(uncompressed) and ZlibEmbeddedResource for resources compressed by
fgrcc. It's quite easy to add new subclasses if wanted, e.g. for
LZMA compression or other things.
Resource fetching requires two things:
- an std::string key (fgrcc manipulates them with SGPath, but the
EmbeddedResourceManager code in SimGear is so far completely
agnostic of the kind of data stored in keys; this could be
changed, though, if we wanted for example to be able to query
at runtime all available resources in a given "virtual
directory");
- a "locale" name, similar to what FlightGear's XML translation
files and FGLocale use. We used double quotes here, because
fgrcc and the EmbeddedResourceManager expect "locale" names to
be of one of these forms:
* empty string: default locale, typically but not necessarily
English (it is "engineering English" in FlightGear, i.e.,
English written by programmers in the code, before
translators possibly fix it up :)
* en, fr, de, es, it...
* en_GB, en_US, fr_FR, fr_CA, de_DE, de_CH, it_IT...
There is no encoding part, contrary to POSIX locales, hence the
use of double quotes around the term "locale" in this context.
The FGLocale::getPreferredLanguage() method returns the preferred
"locale" in the form described above, according to user choice
(from fgfs' --language option) and/or settings (system locale).
This allows FG to tell the EmbeddedResourceManager the preferred
"locale" for resource fetching (same syntax as in Qt's rcc tool for
declaration in the XML file, using the 'lang' attribute on
'qresource' elements).
[ Regarding the default locale, the way things are currently set
up, I would use no 'lang' attribute for resources suitable for
English in the XML input file for fgrcc, except when a
country-specific variant is desired (en_GB, en_US, en_AU...). In
such a case, there should also be a generic variant with no
'lang' attribute declared for the same resource virtual path.
This matches what I did for FGLocale::getPreferredLanguage(),
that maps unset locales and locales such as C and C.UTF-8 to the
default locale for the EmbeddedResourceManager, which is the
empty string. This is a matter of policy, of course, and could be
changed if desired. ]
The EmbeddedResourceManager class has getLocale() and
selectLocale() methods to manage the _selected locale_. Each
resource-fetching method of this class (getResourceOrNullPtr(),
getResource(), getString(), getStreambuf() and getIStream()) has
two overloads:
- one taking only a virtual path (the key mentioned above);
- one taking a virtual path and a "locale" name.
(we'll write "locale" without enclosing double-quotes from now on,
otherwise it gets too painful to read; but we're *not* talking
about POSIX-style locales ending with an encoding part)
The first kind of overload uses the selected locale to look up the
resource, whereas the second kind uses the explicitly specified
locale. Then resource lookup behaves as one could expect. For
instance, assuming a resource is looked up for in the "fr_FR"
locale, then the EmbeddedResourceManager tries in this order:
- "fr_FR";
- if no resource has been registered for "fr_FR" with the provided
virtual path, it then tries with the "fr" locale;
- if this is also unsuccessful, it finally tries with the default
locale: "";
- if this third attempt fails, the resource-fetching method
throws an sg_exception, except for getResourceOrNullPtr(),
which returns a null
std::shared_ptr<const AbstractEmbeddedResource> instead.
To see how this is used, you can look at
simgear/simgear/embedded_resources/embedded_resources_test.cxx. The
only difference with real use is that in this file, resource
contents and registering calls with the EmbeddedResourceManager
have been written manually instead of by fgrcc. Apart from
embedded_resources_test.cxx, here are two examples of client usage
of the EmbeddedResourceManager:
(a) With EmbeddedResourceManager::getString():
#include <simgear/embedded_resources/EmbeddedResourceManager.hxx>
#include <simgear/debug/logstream.hxx>
[...]
const auto& resMgr = simgear::EmbeddedResourceManager::instance();
SG_LOG(SG_GENERAL, SG_INFO,
"Resource contents: '" <<
resMgr->getString("/virtual/path/to/resource") << "'");
(b) With EmbeddedResourceManager::getIStream():
#include <cstddef> // std::size_t
#include <simgear/io/iostreams/sgstream.hxx>
#include <simgear/embedded_resources/EmbeddedResourceManager.hxx>
[...]
sg_ofstream outFile(SGPath("/tmp/whatever"));
if (!outFile) {
<handle open error>
}
const auto& resMgr = simgear::EmbeddedResourceManager::instance();
auto resStream = resMgr->getIStream("/virtual/path/to/resource");
// One possible way of handling errors from resStream[8]:
// resStream->exceptions(std::ios_base::badbit);
constexpr std::size_t bufSize = 4096;
std::unique_ptr<char[]> buf(new char[bufSize]); // intermediate buffer
do {
resStream->read(buf.get(), bufSize);
outFile.write(buf.get(), resStream->gcount());
} while (*resStream && outFile); // resStream *points* to an std::istream
<handle possible errors that might have caused to loop to stop
prematurely>
3) About the XML resource declaration files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You may want to read the output of 'fgrcc --help', which explains a few
things, in particular how to write an XML resource declaration file that
fgrcc can use. At the time of this writing, such files are already
present as flightgear/src/EmbeddedResources/FlightGear-resources.xml and
flightgear/src/EmbeddedResources/FGData-resources.xml in the FlightGear
repository. In case you need resources from elsewhere, it's easy to add
other XML resource declaration files:
1) If you want the .cxx/.hxx resource files to be automatically
generated as part of the FlightGear build:
Copy and adapt the add_custom_command() call in
flightgear/src/Main/CMakeLists.txt[9] that invokes fgrcc on
flightgear/src/EmbeddedResources/FlightGear-resources.xml.
2) In flightgear/src/Main/CMakeLists.txt, add paths for your new
fgrcc-generated .cxx and .hxx files to the SOURCES and HEADERS
CMake variables for the 'fgfs' target.
3) Assuming you passed for instance
--init-func-name=initFoobarEmbeddedResources in step 1, add a call
to initFoobarEmbeddedResources() after this code in fgMainInit()
(flightgear/src/Main/main.cxx):
simgear::EmbeddedResourceManager::createInstance();
initFlightGearEmbeddedResources();
4) The EmbeddedResourceProxy class
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SimGear contains an EmbeddedResourceProxy class that allows one to
access real files or embedded resources in a unified way. When using it,
one can switch from one data source to the other with minimal code
changes, possibly even at runtime (in which case there is obviously no
code change at all).
Sample usage (from FlightGear):
simgear::EmbeddedResourceProxy proxy(globals->get_fg_root(), "/FGData");
proxy.setUseEmbeddedResources(false); // can also be set via the constructor
std::string s = proxy.getString("/some/path");
std::unique_ptr<std::istream> streamp = proxy.getIStream("/some/path");
This example would retrieve contents from the real file
$FG_ROOT/some/path. If true had been passed in the
proxy.setUseEmbeddedResources() call, it would instead have used the
default-locale version of the embedded resource whose virtual path is
/FGData/some/path.
For more information about this class, see [10] and [11].
Footnotes
=========
[1] E.g., FlightGear or FGData, as long as the path to the latter is
provided to the FG build system, which is currently possible but not
required (passing '-D FG_DATA_DIR:PATH=...' to CMake when
configuring the FlightGear build).
[2] The differences with the QRC format[3] are explained in the output
of 'fgrcc --help'. Here is the relevant excerpt:
,----
| 1. The <!DOCTYPE RCC> declaration at the beginning should be omitted (or
| replaced with <!DOCTYPE FGRCC>, however such a DTD currently doesn't
| exist). I suggest to add an XML declaration instead, for instance:
|
| <?xml version="1.0" encoding="UTF-8"?>
|
| 2. <RCC> and </RCC> must be replaced with <FGRCC> and </FGRCC>,
| respectively.
|
| 3. The FGRCC format supports a 'compression' attribute for each 'file'
| element. At the time of this writing, the allowed values for this
| attribute are 'none', 'zlib' and 'auto'. When set to a value that is
| not 'auto', this attribute of course bypasses the algorithm for
| determining whether and how to compress a given resource (algorithm
| which relies on the file extension).
|
| 4. Resource paths (paths to the real files, not virtual paths) are
| interpreted relatively to the directory specified with the --root
| option. If this option is not passed to 'fgrcc', then the default root
| directory is the one containing INFILE, which matches the behavior of
| Qt's 'rcc' tool.
`----
[3] http://doc.qt.io/qt-5/resources.html
[4] The main reason why I wrote the classes in
simgear/simgear/io/iostreams/{CharArrayStream,zlibstream}.cxx is
thus not to maximize memory-efficiency with very large resources;
rather, it is to make the implementation of the following parts
simple, clean and modular:
- the resource compiler (fgrcc);
- the EmbeddedResourceManager.
[5] The EmbeddedResourceManager architecture would make it quite easy to
also support runtime loading of resources from files (a thing the Qt
resource system supports), but it is not very clear how interesting
this would be, compared to having the files loaded from $FG_ROOT.
Well, maybe for large files [apt.dat.gz & Co] that we would want to
load but not see in the FGData repository at all. But then there
would be the requirement, of course, that "something" puts the files
in a clearly-defined, platform-dependent location known to the
EmbeddedResourceManager.
[6] https://sourceforge.net/p/flightgear/fgmeta/ci/next/tree/python3-flightgear/
rebuild-fgdata-embedded-resources
[7] https://sourceforge.net/p/flightgear/simgear/ci/next/tree/simgear/
embedded_resources/
[8] We know that in some buggy C++ implementations, the
std::ios_base::failure exception can't be caught, at least not under
its name, due to some ABI compatibility mess:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66145
However, it stills causes the program to abort, and since this
error handling technique makes for much more readable and less
error-prone code, I think it's still a good way to handle IOStreams
errors even now, unless you really need to *catch* the
std::ios_base::failure exception.
[9] flightgear/CMakeModules/GenerateFlightgearResources.cmake in my
'i18n-and-init-work-v2' branch (not merged into 'next' at the time
of this writing).
[10] https://sourceforge.net/p/flightgear/simgear/ci/next/tree/simgear/
embedded_resources/EmbeddedResourceProxy.hxx
[11] https://sourceforge.net/p/flightgear/simgear/ci/next/tree/simgear/
embedded_resources/embedded_resources_test.cxx