You may now call terrasync.py with --mode=sync or --mode=check. 'sync'
mode is the default and corresponds to terrasync.py's usual behavior.
In 'check' mode, terrasync.py never writes to disk and aborts at the
first mismatch between local and remote data. The exit status in 'check'
mode is:
- 0 if the program terminated successfully and no mismatch was found
between the local and remote repositories;
- 1 in case an error was encountered;
- 2 if there was a mismatch between local and remote data.
In 'sync' mode, the exit status is:
- 0 if the program terminated successfully;
- 1 in case an error was encountered.
A mismatch in 'check' mode is *not* an error, it is just one of the two
expected results. An error is a worse condition (uncaught exception,
network retrieval aborted after retrying failed, stuff like that).
Additionally, calling terrasync.py with --report causes it to print
lists of:
- files and dirs that were missing or had mismatching hashes (this is
okay in 'sync' mode: these things have been "fixed" in the target
directory before the report was printed);
- files and dirs that have been found to be orphaned (i.e., found
under the target directory but not mentioned in the corresponding
.dirindex file). These are the ones removed in 'sync' mode when
--remove-orphan is passed.
- Add computeHash() utility function that can work with any file-like
object (e.g., a connected socket).
- Rename hash_of_file() to hashForFile(), and of course implement it
using our new computeHash().
- Add class HTTPSocketRequest derived from HTTPGetCallback. It allows
one to process data from the network without storing it to a file (it
uses the file-like interface provided by http.client.HTTPResponse).
The callback returns the http.client.HTTPResponse object, which can be
conveniently used in a 'with' statement.
- Simplify the API of TerraSync.updateDirectory(): its 'dirIndexHash'
argument must now be a hash (a string); the None object is not allowed
anymore (with the soon-to-come addition of --mode=check, having to
deal with this special case in updateDirectory() would make the logic
too difficult to follow, or we would have to really completely
separate check-only mode from update mode, which would entail code
duplication).
Since TerraSync.updateDirectory() must now always have a hash to work
with, compute the hash of the root '.dirindex' file from the server in
TerraSync.start(), using our new HTTPSocketRequest class---which was
written for this purpose, since that will have to work in check-only
mode (but not only), where we don't want to write any file to disk.
- TerraSync.updateFile(): correctly handle the case where a directory
inside the TerraSync repository is (now) a file according to the
server: the directory must be recursively removed before the file can
be downloaded in the place formerly occupied by the directory.
- Add stub class Report. Its methods do nothing for now, but are already
called in a couple of appropriate places. The class will be completed
in a future commit, of course.
The goal of removeDirectoryTree() is to provide a safety net around
recursive directory removal with shutil.rmtree(), in order to prevent
user or bug-caused catastrophic events such as /, /home /home/joeuser or
C:\ being recursively erased.
- Add method assembleUrl() to HTTPGetter.
- Raise a NetworkError exception with the particular URL and number of
retries when it has been exhausted.
- Number of retries is now trivial to expose as a parameter, and set to
5 in HTTPGetter.
- Sleep for one second between self.httpConnection.close() and
self.httpConnection.connect() when retrying a failed HTTP request.
- Apply DRY principle.
- New generic exception class TerraSyncPyException.
- Add subclass NetworkError of TerraSyncPyException.
- Raise a NetworkError exception when the HTTP return code is not 200.
- hash_of_file() does not silently ignore errors anymore; exceptions
should be dealt with wherever appropriate by the callers.
Whenever hash_of_file() returns, its return value is now the SHA-1
hash of the specified file. This is less error-prone IMHO than
returning None. Otherwise, calling code could erroneously conclude
that there is a matching hash when the file to check is actually
missing. For a concrete example, see the 'dirIndexHash' parameter of
TerraSync.updateDirectory(), which so far is used precisely with the
value None to express that "we are just starting the recursion and
have no hash from the server to compare to".
When called, the callback passed to HTTPGetter.get() is now explicitly
passed the URL and the http.client.HTTPResponse instance.
Remove the HTTPGetCallback.result attribute (not needed anymore, leaves
more freedom when implementating HTTPGetCallback subclasses...).
If the response to the HTTP request isn't 200 (success), then don't save
the response, and don't call the callback.
Additionally, only retry in the case of HTTPException. This allows using
Ctrl-C to work correctly (and easily).
- add option --quick
check sha1sum of .dirindex files and skip directory if hash matches
- add option --remove-orphan
remove orphan files (files exist locally but not on server)
- be less verbose
- write .dirindex files locally