My name is Philipp C. Heckel and I write about nerdy things.
This site moved here from blog.philippheckel.com/blog.heckel.xyz!

Minimizing remote storage usage and synchronization time using deduplication and multichunking: Syncany as an example


Cloud Computing, Distributed Systems, Security, Synchronization

Minimizing remote storage usage and synchronization time using deduplication and multichunking: Syncany as an example


Contents


1. Introduction
2. Related Work
3. Deduplication
4. Syncany
5. Implications of the Architecture
6. Experiments
7. Future Research
8. Conclusion
A. List of Configurations
B. Pre-Study Folder Statistics
C. List of Variables Recorded
D. Best Algorithms by Deduplication Ratio
E. Best Algorithms by Duration
F. Best Algorithms by CPU Usage
Bibliography

Download as PDF: This article is a web version of my Master’s thesis. Feel free to download the original PDF version.


C. List of Variables Recorded

The following table contains a detailed explanation of each variable recorded in experiment 1 (cf. chapter 6.4.1). For each dataset version and each configuration, one set of these variables was saved.

Measured Variable Description
totalDurationSec Duration of the chunking and indexing process for the run
totalChunkingDurationSec Duration of the chunking process for the run (excludes index lookups and other management tasks)
totalDatasetFileCount Number of analyzed files in the run, i.e. the number of files in this version of the dataset
totalChunkCount Number of created chunks from the files analyzed
totalDatasetSize Size in bytes of all analyzed files in the run
totalNewChunkCount Number of chunks that have not been found in the index (new chunks, negative chunk index lookup)
totalNewChunkSize Size in bytes of all the new chunks
totalNewFileCount Number of new files (negative file index lookup)
totalNewFileSize Size in bytes of all the new files
totalDuplicateChunkCount Number of chunks that have been found in the index (positive chunk index lookup)
totalDuplicateChunkSize Size in bytes of all duplicate chunks
totalDuplicateFileCount Number of duplicate files found during this run (positive file index lookup)
totalDuplicateFileSize Size in bytes of duplicate files during this run
totalMultiChunkCount Number of multichunks created from the new chunks
totalMultiChunkSize Size in bytes of the created multichunks
totalIndexSize Size in bytes of the incremental index file for this run
totalCpuUsage CPU usage during this run in percent
tempDedupRatio Temporal deduplication ratio, excluding the size of the index; calculated by dividing the sum of the cumulated input bytes by the cumulated size of the generated multichunks in bytes (both from t0 to tn)
tempSpaceReducationRatio Temporal space reduction in percent, excluding the size of the index; calculated as one minus the inverse temporal deduplication ratio
tempDedupRatioInclIndex Temporal deduplication ratio, incl. the size of the index
tempSpaceRedRatioInclIndex Temporal space reduction in percent, including the size of the index
recnstChunksNeedBytes Size in bytes of chunks needed to reconstruct the current dataset version if no previous version has been downloaded before
recnstMultChnksNeedBytes Size in bytes of multichunks needed to reconstruct the current dataset version if no previous version has been downloaded before
recnstMultOverhDiffBytes Difference in bytes between required size of multichunks and size of chunks.
recnst5NeedMultChnkBytes Size in bytes of multichunks needed to reconstruct the current dataset version if the last five dataset versions are missing
recnst5NeedChunksBytes Size in bytes of chunks needed to reconstruct the current dataset version if the last five dataset versions are missing
recnst5OverheadBytes Difference in bytes between required size of multichunks and size of chunks (if five versions are missing).
recnst10NeedMultChnkBytes Size in bytes of multichunks needed to reconstruct the current dataset version if the last ten dataset versions are missing
recnst10NeedChunksBytes Size in bytes of chunks needed to reconstruct the current dataset version if the last ten dataset versions are missing
recnst10OverhBytes Difference in bytes between required size of multichunks and size of chunks (if ten versions are missing).

Table C.1: Variables recorded for each dataset version during experiment 1.

>> Next chapter: Appendix “Best Algorithms by Deduplication Ratio”


1. Introduction
2. Related Work
3. Deduplication
4. Syncany
5. Implications of the Architecture
6. Experiments
7. Future Research
8. Conclusion
A. List of Configurations
B. Pre-Study Folder Statistics
C. List of Variables Recorded
D. Best Algorithms by Deduplication Ratio
E. Best Algorithms by Duration
F. Best Algorithms by CPU Usage
Bibliography

Pages:<12 ... 1011 12 ... 1415>

3 Comments

  1. JP

    Hi,

    I would love to see a ebook version of your thesis (epub or mobi). Would that be possible ?

    thanks



  2. Thiruven Madhavan

    Hi Philipp:
    Good Morning. Possible to receive pdf version of your thesis.
    cheers
    Madhavan


Leave a comment

I'd very much like to hear what you think of this post. Feel free to leave a comment. I usually respond within a day or two, sometimes even faster. I will not share or publish your e-mail address anywhere.