My name is Philipp C. Heckel and I write about nerdy things.
This site moved here from blog.philippheckel.com/blog.heckel.xyz!

Minimizing remote storage usage and synchronization time using deduplication and multichunking: Syncany as an example


Cloud Computing, Distributed Systems, Security, Synchronization

Minimizing remote storage usage and synchronization time using deduplication and multichunking: Syncany as an example


Contents


1. Introduction
2. Related Work
3. Deduplication
4. Syncany
5. Implications of the Architecture
6. Experiments
7. Future Research
8. Conclusion
A. List of Configurations
B. Pre-Study Folder Statistics
C. List of Variables Recorded
D. Best Algorithms by Deduplication Ratio
E. Best Algorithms by Duration
F. Best Algorithms by CPU Usage
Bibliography

Download as PDF: This article is a web version of my Master’s thesis. Feel free to download the original PDF version.


B. Pre-Study Folder Statistics

To get a better idea of what kind of data users will be storing inside the Syncany folders, the pre-study asked users and developers to run a small java program to collect data about the files types and sizes. In particular, they were asked to run the following commands on a folder that contains the type of files they would store in Syncany. Because many of them might use Dropbox, they were given the option to choose the Dropbox folder.

After running the commands, they were asked to upload the CSV files using the an interface on the Syncany Web site. The program generated two CSV files — one containing data about the file types, and one representing a histogram of existing file sizes:

Excerpt of syncany-type-categories.csv:
Excerpt of syncany-size-categories.csv:

>> Next chapter: Appendix “List of Variables Recorded”


1. Introduction
2. Related Work
3. Deduplication
4. Syncany
5. Implications of the Architecture
6. Experiments
7. Future Research
8. Conclusion
A. List of Configurations
B. Pre-Study Folder Statistics
C. List of Variables Recorded
D. Best Algorithms by Deduplication Ratio
E. Best Algorithms by Duration
F. Best Algorithms by CPU Usage
Bibliography

Pages:<12 ... 910 11 ... 1415>

3 Comments

  1. JP

    Hi,

    I would love to see a ebook version of your thesis (epub or mobi). Would that be possible ?

    thanks



  2. Thiruven Madhavan

    Hi Philipp:
    Good Morning. Possible to receive pdf version of your thesis.
    cheers
    Madhavan


Leave a comment

I'd very much like to hear what you think of this post. Feel free to leave a comment. I usually respond within a day or two, sometimes even faster. I will not share or publish your e-mail address anywhere.