Minimizing remote storage usage and synchronization time using deduplication and multichunking: Syncany as an example

May 20 / 2013
3
Cloud Computing, Deduplication, Distributed Systems, Security, Syncany, Synchronization, Version Control

Cloud Computing, Distributed Systems, Security, Synchronization

Minimizing remote storage usage and synchronization time using deduplication and multichunking: Syncany as an example

Contents

1. Introduction
2. Related Work
3. Deduplication
4. Syncany
5. Implications of the Architecture
6. Experiments
7. Future Research
8. Conclusion

A. List of Configurations
B. Pre-Study Folder Statistics
C. List of Variables Recorded
D. Best Algorithms by Deduplication Ratio
E. Best Algorithms by Duration
F. Best Algorithms by CPU Usage
Bibliography

Download as PDF: This article is a web version of my Master’s thesis. Feel free to download the original PDF version.

C. List of Variables Recorded

The following table contains a detailed explanation of each variable recorded in experiment 1 (cf. chapter 6.4.1). For each dataset version and each configuration, one set of these variables was saved.

Measured Variable	Description
totalDurationSec	Duration of the chunking and indexing process for the run
totalChunkingDurationSec	Duration of the chunking process for the run (excludes index lookups and other management tasks)
totalDatasetFileCount	Number of analyzed files in the run, i.e. the number of files in this version of the dataset
totalChunkCount	Number of created chunks from the files analyzed
totalDatasetSize	Size in bytes of all analyzed files in the run
totalNewChunkCount	Number of chunks that have not been found in the index (new chunks, negative chunk index lookup)
totalNewChunkSize	Size in bytes of all the new chunks
totalNewFileCount	Number of new files (negative file index lookup)
totalNewFileSize	Size in bytes of all the new files
totalDuplicateChunkCount	Number of chunks that have been found in the index (positive chunk index lookup)
totalDuplicateChunkSize	Size in bytes of all duplicate chunks
totalDuplicateFileCount	Number of duplicate files found during this run (positive file index lookup)
totalDuplicateFileSize	Size in bytes of duplicate files during this run
totalMultiChunkCount	Number of multichunks created from the new chunks
totalMultiChunkSize	Size in bytes of the created multichunks
totalIndexSize	Size in bytes of the incremental index file for this run
totalCpuUsage	CPU usage during this run in percent
tempDedupRatio	Temporal deduplication ratio, excluding the size of the index; calculated by dividing the sum of the cumulated input bytes by the cumulated size of the generated multichunks in bytes (both from t₀ to t_n)
tempSpaceReducationRatio	Temporal space reduction in percent, excluding the size of the index; calculated as one minus the inverse temporal deduplication ratio
tempDedupRatioInclIndex	Temporal deduplication ratio, incl. the size of the index
tempSpaceRedRatioInclIndex	Temporal space reduction in percent, including the size of the index
recnstChunksNeedBytes	Size in bytes of chunks needed to reconstruct the current dataset version if no previous version has been downloaded before
recnstMultChnksNeedBytes	Size in bytes of multichunks needed to reconstruct the current dataset version if no previous version has been downloaded before
recnstMultOverhDiffBytes	Difference in bytes between required size of multichunks and size of chunks.
recnst5NeedMultChnkBytes	Size in bytes of multichunks needed to reconstruct the current dataset version if the last five dataset versions are missing
recnst5NeedChunksBytes	Size in bytes of chunks needed to reconstruct the current dataset version if the last five dataset versions are missing
recnst5OverheadBytes	Difference in bytes between required size of multichunks and size of chunks (if five versions are missing).
recnst10NeedMultChnkBytes	Size in bytes of multichunks needed to reconstruct the current dataset version if the last ten dataset versions are missing
recnst10NeedChunksBytes	Size in bytes of chunks needed to reconstruct the current dataset version if the last ten dataset versions are missing
recnst10OverhBytes	Difference in bytes between required size of multichunks and size of chunks (if ten versions are missing).

Table C.1: Variables recorded for each dataset version during experiment 1.

>> Next chapter: Appendix “Best Algorithms by Deduplication Ratio”

1. Introduction
2. Related Work
3. Deduplication
4. Syncany
5. Implications of the Architecture
6. Experiments
7. Future Research
8. Conclusion

Pages:<1 2 ... 1011 12 ... 14 15 >

3 Comments

JP February 10th, 2014

Hi,

I would love to see a ebook version of your thesis (epub or mobi). Would that be possible ?

thanks
Philipp C. Heckel February 13th, 2014

@JP: Is there a simple way to compile it from LaTeX format? If it is, I can definitely make one for you. Just let me know :-)
Thiruven Madhavan November 10th, 2016

Hi Philipp:
Good Morning. Possible to receive pdf version of your thesis.
cheers
Madhavan

Minimizing remote storage usage and synchronization time using deduplication and multichunking: Syncany as an example

Minimizing remote storage usage and synchronization time using deduplication and multichunking: Syncany as an example

C. List of Variables Recorded

3 Comments

Categories

Recent Comments