www.XuvTools.org
Table of ContentsAccessing the Server Side Data Set RepositoryFile Formats and Folder StructureThe folder structure on in the data set repository is depicted here: /data/xuvtools_img/BugListData - data sets belonging to a trac bugreport /data/xuvtools_img/FileFormats - data sets and documents for file readers /data/xuvtools_img/ReferenceData - data sets that are publicly available for testing /data/xuvtools_img/OriginalData - all other data sets (originals) that are available The files are stored in a specific format on the server, to reduce the likelyhood of corruption and to consume as few space as possible. The format is rar, created with the proprietary rar tools from http://www.rarlab.com/download.htm. The proprietary rar format was chosen because a) it supports recovery information, b) compresses well and c) is available for all platforms we support. To install rar on Windows, simply download and install the latest nagware version. To install rar on Debian/Ubuntu (Ubuntu needs the multiverse repository), do: sudo aptitude install rar unrar Useful rar flags explained a compress file(s) -m5 use compression strength "5" -rr1 add 1 percent recovery information x extract file(s) rar examplesTo compress a data set with recovery information, do: rar a -m5 -rr1 "<filename>.rar" "<filename>" To uncompress a data set x with recovery information to the current directory, do: unrar x "<filename>.rar" If you need to compress a directory of files, you can use a loop like the following: find "${PWD}" -type f|grep -v ".rar\$\|.md5\$"|while read FILE ; do J=$(dirname "$FILE") ; K=$(basename "$FILE") ; cd "$J" && if ! test -f "$K.rar" ; then rar a -m5 -rr1 "$K.rar" "$K" || break ; fi done If you need to uncompress the full repository of files, you can use a loop like the following: for DIR in BugListData FileFormats ReferenceData OriginalData ; do find "/data/xuvtools_img/$DIR" -type f -name \*\.rar|while read FILE ; do J=$(dirname "$FILE") ; K=$(basename "$FILE" ".rar") ; if ! test -f "$J/$K" ; then cd "$J" && unrar x -o- "$K.rar" ; fi done done md5sum examplesWe also provide md5sums of the original file (not the compressed rar file), along with the compressed data set, in order to test the file after extraction. To create a new md5 checksum file, pipe the result of the md5sum tool into a file: md5sum --binary "<filename>" > "<filename>.md5" To check an extracted file against its md5 checksum file, use the md5sum tool in check mode: md5sum --check "<filename>.md5" <filename>: OK If you need to add md5sums to a directory of files, you can use a loop like the following: find "${PWD}" -type f|grep -v ".rar\$\|.md5\$"|while read FILE ; do J=$(dirname "$FILE") ; K=$(basename "$FILE") ; cd "$J" && if ! test -f "$K.md5" ; then && md5sum --binary "$K" > "$K.md5" || break ; fi done
rar with md5sum combined examplesHere is a combined example that will create compressed rar archives and md5sum's for a directory of files. This example has better performance than the two individual ones after each other, because the files are likely cached from the filesystem cache: find "${PWD}" -type f|while read FILE ; do J=$(dirname "$FILE") ; K=$(basename "$FILE") ; cd "$J" && rar a -m5 -rr1 "$K.rar" "$K" && md5sum --binary "$K" > "$K.md5" || break done Useful rsync flags explained--dry-run don't change any files, just print what would be done --verbose print what is being done --progress show a progress meter for each transfer --archive preserve permissions and times --recursive go through subdirectories --links transfer links --devices transfer device nodes --specials preserve special files --rsh='ssh -p 22022' use the ssh port 22022 --compress compress before sending/receiving --include "P" include all files matching pattern P --exclude "P" exclude all files matching pattern P --prune-empty-dirs do not create empty directories Synchronize Data Set Repository: From Server to LocalHere is a call to rsync, that would download (incrementally) the parts of the repository BugListData that are missing locally. mkdir -p /data/xuvtools_img rsync --archive --verbose --progress --rsh='ssh -p 22022' \ "xuvtools.org:/data/xuvtools_img/BugListData" \ "/data/xuvtools_img/" If you want to mirror/update all repositories, you can synchronize all above listed folders BugListData, FileFormats, ReferenceData and OriginalData individually: mkdir -p /data/xuvtools_img for DIR in BugListData FileFormats ReferenceData OriginalData ; do rsync --archive --verbose --progress --rsh='ssh -p 22022' \ "xuvtools.org:/data/xuvtools_img/${DIR}" \ "/data/xuvtools_img/" || \ break done Of course, you can just use WinSCP, scp or unison as well.
Synchronize Data Set Repository: From Local to ServerPlease upload only rar-compressed files with recovery information to the server. rsync --recursive --links --devices --specials --verbose \ --progress --rsh='ssh -p 22022' --prune-empty-dirs --include "*/" \ --include "*.rar" --include "*.md5" --exclude "*" \ "/data/xuvtools_img/FileFormats" \ "xuvtools.org:/data/xuvtools_img/" for DIR in BugListData FileFormats ReferenceData OriginalData ; do rsync --recursive --links --devices --specials --verbose \ --progress --rsh='ssh -p 22022' --prune-empty-dirs --include "*/" \ --include "*.rar" --include "*.md5" --exclude "*" \ "/data/xuvtools_img/${DIR}" \ "xuvtools.org:/data/xuvtools_img/" || \ break done Of course, you can just use WinSCP, scp or unison as well. Fixing PermissionsIf you work on Unix, you might want to synchronize with the correct user names and permissions, so everyone accessing the server has correct user access rights. For that, it might be helpful to add a useraccount www-xuvtools on your local machine, and become part of its group: sudo addgroup -gid 1020 www-xuvtools sudo adduser --uid 1020 --gid 1020 www-xuvtools sudo adduser ${USER} www-xuvtools # you may need to log out, and back in, for the group addition to become effective for DIR in BugListData FileFormats ReferenceData OriginalData ; do sudo mkdir -p "/data/xuvtools_img/${DIR}" && \ sudo find "/data/xuvtools_img/${DIR}" -type d -exec chmod 770 {} \; && \ sudo find "/data/xuvtools_img/${DIR}" -type f -exec chmod 660 {} \; && \ sudo chown -R www-xuvtools:www-xuvtools "/data/xuvtools_img/${DIR}" || \ break done Finding duplicate Datasets (based on MD5)We sometimes find duplicate datasets in the upload folder. To remove duplicate entries, you can use the following script lines: find BugListData FileFormats OriginalData ReferenceData -name \*md5 \ |while read I ; do \ J=$(cat "$I" |cut -d' ' -f1) K=$(echo "$I"|perl -pe 's/.md5$//g') echo "$J:$K" done | sort > /tmp/server-side-data-sorted.txt cat /tmp/server-side-data-sorted.txt \ |while read I ; do MD5=$(echo "$I"|cut -d':' -f1) NAME=$(echo "$I"|cut -d':' -f2-) COUNT=$(grep "$MD5" /tmp/server-side-data-sorted.txt|wc -l) if test $COUNT -gt 1 ; then grep "$MD5" /tmp/server-side-data-sorted.txt ; fi done Other Data Repositories
|
|||
|