Snippet 0x03: Recursively watch folders on Java (using the Java 7 WatchService)
The “new” Java 7 WatchService provides a mechanism to easily monitor a single folder for file system events. Java uses the underlying OS mechanisms to realize that (inotifiy on Linux, and ReadDirectoryChanges* on Windows). What the WatchService cannot do, however, is monitor a folder recursively — meaning monitoring all sub-folders and register new folders if created. This tiny code snippet article shows you how.
Content
1. Full code
The full working code (embedded in a my open source file sync software Syncany) is available on GitHub. Feel free to use it, or to leave feedback in the comments.
2. A simpler approach
The basic idea is simple: To monitor an entire file tree, we first must walk the file tree and register a watch on every single folder. Once the initial watches are set, new watches must be added once the watches register any changes.
In my first attempt to implement this (back in 2011 with jpathwatch), I tried to interpret the events to add new folders and remove deleted/moved folders. This turned out to be very, very difficult, because quick changes on the file systems are hard to trace — especially if you move around things very quickly. Files vanish, file watches get cancelled while you’re trying to set watches for their subfolders, etc.
Long story short: This time, I settled for a simpler approach:
- Walk the file tree and register watches in all folders and subfolders (walkTreeAndSetWatches())
- When an event occurs, wait 3 seconds events to settle (resetWaitSettlementTimer())
- When the settlement time has passed, walk the tree again, cancel stale watches and fire a watch event
3. One thread to monitor them all
The central watch thread basically runs this algorithm. It first walks the file tree using walkTreeAndSetWatches() (basically using Java 7’s Files.walkFileTree()) and then enters a loop polling for events. In case you haven’t worked with the file tree walker before, here is what it looks like in the wild:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
private synchronized void walkTreeAndSetWatches() { try { Files.walkFileTree(root, new FileVisitor<Path>() { @Override public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) throws IOException { if (ignorePaths.contains(dir)) { return FileVisitResult.SKIP_SUBTREE; } else { registerWatch(dir); return FileVisitResult.CONTINUE; } } // Return FileVisitResult.CONTINUE for other methods // ... }); } catch (IOException e) { // Don't care } } |
It applies a classic visitor pattern, firing callback methods for every node it touches. Depending on what you return (CONTINUE, SKIP_SUBTREE, SKIP_SIBLINGS or TERMINATE), the rest of the algorithm behaves accordingly. I use this functionality just to ignore certain sub trees — in the case of Syncany that’s just the .syncany sub tree.
After initially walking the tree and registering the watches for the sub folders, the poll-wait-fire-loop is started. Inside the loop, the watchService.take() method is blocking so it’s important that this happens in a separate thread:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
public void start() throws IOException { watchService = FileSystems.getDefault().newWatchService(); watchThread = new Thread(new Runnable() { @Override public void run() { running.set(true); walkTreeAndSetWatches(); while (running.get()) { try { WatchKey watchKey = watchService.take(); watchKey.pollEvents(); // Take events, but don't care what they are! watchKey.reset(); resetWaitSettlementTimer(); } catch (InterruptedException | ClosedWatchServiceException e) { running.set(false); } } } }, "Watcher"); watchThread.start(); } |
When an event happens, we actually don’t process events at all. Instead, we just throw them away by not collecting the events returned by watchKey.pollEvents(). We just reset the watch key (= make it listen to events again) and start the above mentioned 3-second-timer:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
private synchronized void resetWaitSettlementTimer() { if (timer != null) { timer.cancel(); timer = null; } timer = new Timer("WatchTimer"); timer.schedule(new TimerTask() { @Override public void run() { walkTreeAndSetWatches(); unregisterStaleWatches(); fireListenerEvents(); } }, 3000); } |
This method is called for every event the watch service registers — meaning that if the 3 seconds haven’t run out yet, but another event occurs, this method is called more than once time. If, for instance, you move a file, wait a second, then move it again, and then wait >3 seconds, this method will be called twice, but the timer task will only be fired once.
This is a basic mechanism to avoid passing too many events to the application — believe me, it’s worth waiting at least 3 seconds. Other applications wait even longer: As far as I know, SparkleShare even waits 5 seconds.
All that’s left now is to actually fire the even to our listener(s). In the case of Syncany, I only need one listener, so this is pretty easy:
1 2 3 4 5 |
private synchronized void fireListenerEvents() { if (listener != null) { listener.watchEventsOccurred(); } } |
A. About this post
I’m trying a new section for my blog. I call it Code Snippets. It’ll be very short, code-focused posts of things I recently discovered or find fascinating or helpful. I hope this helps.
I want to make a java ftp auto uploader and use your script to detect changes in the folder system or files. I don’t get the whole listener thingy you have there, so I am unable to use them. Thanks for help.
Usually when file system events are going on, you don’t want to react right away. In most cases, you want to wait a while until the user has stopped doing stuff and then react. If you don’t do that, the sync will start while the user is still copying/moving/creating files and that usually ends up being a big mess.
You can look at the complete and up-to-date code here:
https://github.com/syncany/syncany/tree/develop/syncany-lib/src/main/java/org/syncany/operations/watch
Just check out RecursiveWatcher, WindowsRecursiveWatcher and DefaultRecursiveWatcher. Due to some issues with issues on Windows, I had to implement a special jpathwatch-based watcher for Windows.
Hi Phillip
Nice article! I am on the verge of doing something a bit like this myself so what you wrote is a great head start. In your opinion how many folders can the JVM watch at any given time without grinding to a halt? In some file systems one may find 1000’s of folders. Seems perhaps too much to expect for the watcher to monitor all of these.
Another approach I am half considering is not to use the watcher at all, but just to poll the file-system myself at business-convenient intervals – then diff the tree with the last snapshot to find out what’s changed. Of course, this is difficult in the general case, but might be OK with some agreed simplifications. Be interested to hear of your experience with java file watchers capacity.
rgds
Rob
For Syncany I use both approaches, because the monitoring cannot always be relied on. 1000 folders seems a lot. It really depends on the OS (for Linux, for instance, the inotify limit). I would definitely implement a backup mechanism with polling.
Thanks Phillipp – great advice.
Hi Phillip, thanks for this great piece. Two quick question:
1. I was trying to figure out why you decided to revert to WatchService after starting Syncanny with JPathWatch …. you mentioned there are some difficulties with the latter. Could you specify the advantages of combinb both approachesd as opposed to using either.
2. Synacanny uses a client-only architecture, what do think will be the diffrence for a server client appraoch.
Many thanks.
Hi Phillip, thanks for this great piece. Two quick question:
1. I was trying to figure out why you decided to revert to WatchService after starting Syncany with JPathWatch …. you mentioned there are some difficulties with the latter. Could you specify the advantages of combining both approaches as opposed to using either.
2. Syncany uses a client-only architecture, what do think will be the difference for a server client approach, with regards to syncing and versioning.
Many thanks.