My name is Philipp C. Heckel and I write about nerdy things.

Minimizing remote storage usage and synchronization time using deduplication and multichunking: Syncany as an example


Cloud Computing, Distributed Systems, Security, Synchronization

Minimizing remote storage usage and synchronization time using deduplication and multichunking: Syncany as an example


Contents


1. Introduction
2. Related Work
3. Deduplication
4. Syncany
5. Implications of the Architecture
6. Experiments
7. Future Research
8. Conclusion
A. List of Configurations
B. Pre-Study Folder Statistics
C. List of Variables Recorded
D. Best Algorithms by Deduplication Ratio
E. Best Algorithms by Duration
F. Best Algorithms by CPU Usage
Bibliography

Download as PDF: This article is a web version of my Master’s thesis. Feel free to download the original PDF version.


7. Future Research

Looking at all of the experiments, the overall goal of the thesis has been more than fulfilled: The selected algorithm performs well in terms of chunking efficiency and also improves the synchronization time of Syncany clients. However, given the complexity of the system and its proximity to other research areas, there are many possibilities for future research.

A necessity to make Syncany ready for end-user machines is to make its CPU usage acceptable. While this thesis tried reducing processor usage, it failed to introduce a hard limit for maximum CPU usage. As mentioned in previous chapters, one possibility to solve this problem is to use a dynamic write pause parameter instead of a static one. However, other options might also be possible and can be further researched.

In terms of resource usage, this thesis only analyzed the processor performance and not the use of disk I/O or memory. Especially RAM usage, however, is crucial when an in-memory index is used. Future research might hence focus on optimizing memory usage, for example with the help of techniques such as extreme binning [14].

Given the amount of possible input parameters identified by the thesis, the experiments were not able to test all of the possible combinations. For a complete analysis, however, the other parameter values must be included in the experiments. In fact, there might be other parameters that this thesis did not take into account. Research could focus on further improving the chunking efficiency with a more complete set of parameters. Instead of testing each of the algorithms, a future analysis could use evolutionary methods to find the best combination.

Besides theoretical research, Syncany needs to be tested with real user data on real user machines. Possible future work could include tests of Syncany in real-life scenarios.

>> Next chapter: Conclusion


1. Introduction
2. Related Work
3. Deduplication
4. Syncany
5. Implications of the Architecture
6. Experiments
7. Future Research
8. Conclusion
A. List of Configurations
B. Pre-Study Folder Statistics
C. List of Variables Recorded
D. Best Algorithms by Deduplication Ratio
E. Best Algorithms by Duration
F. Best Algorithms by CPU Usage
Bibliography

Pages:<12 ... 67 8 ... 1415>

3 Comments

  1. JP

    Hi,

    I would love to see a ebook version of your thesis (epub or mobi). Would that be possible ?

    thanks



  2. Thiruven Madhavan

    Hi Philipp:
    Good Morning. Possible to receive pdf version of your thesis.
    cheers
    Madhavan