My name is Philipp C. Heckel and I write about nerdy things.


  • Jun 20 / 2021

elastictl: Import, export, re-shard and performance-test Elasticsearch indices

For my work at Datto, I work a lot with Elasticsearch. Elasticsearch is pretty famous by now, so I doubt that it needs an introduction. But if you happen to not know what it is: it’s a document store with unique search capabilities, and incredible scalability.

Despite its incredible features though, it has its rough edges. And no, I don’t mean the horrific query language (honestly, who thought that was a good idea?). I mean the fact that without external tools it’s quite impossible to import, export, copy, move or re-shard an Elasticsearch index. Indices are very final, unfortunately.

This is quite often very inconvenient if you have a growing index for which each Elasticsearch shard is outgrowing its recommended size (2 billion documents). Or even if you have the opposite problem: if you have an ES cluster that has too many shards (~800 shards per host is the recommendation I think), because you have too many indices.

This is why I wrote elastictl: elastictl is a simple tool to import/export Elasticsearch indices into a file, and/or reshard an index. In this short post, I’ll show a few examples of how it can be used.

Continue Reading

  • Dec 17 / 2020
Code Snippets, Programming

Snippet 0x0F: Recursive search/replace tool “re”

Two and a half years ago, I wrote my first Go program. I wanted to learn another language, and Go looked like a ton of fun: straight forward, easy to learn, and a static binary with no runtime shenanigans. I picked a project and I started hacking. Looking back, the code I wrote is a little cringy, but not terrible. I’d surely do things differently these days, now that I have more Go experience. But we all start somewhere.

However, the tool that I wrote, a recursive search/replace tool which I intelligently dubbed re, is actually incredibly useful: to my own surprise, I use it every day. I haven’t made a single modification to it in all that time (until today for this post). And since I’m in the sharing mood today, I thought I’d share it with the millions of people (cough) that come here every day. Ha!

Continue Reading

  • Dec 13 / 2020
Scripting, Security

Go: Calculating public key hashes for public key pinning in curl

Something occurred to me the other day. This is my blog, and that means I can write about whatever I want. Now you may think that’s totally obvious, but it’s not. For the longest time I wouldn’t blog about anything that I didn’t deem blog-worthy. Small things, like “this is a cool function I found” or “I learned this thing today”, were not blog-worthy in my mind for some reason.

Well today I am changing that. I like writing, but not necessarily so much that I always want to write a super long post. Sometimes, things should be short. Like this one.

So in this super short post I’m gonna show you a cool thing I figured out: How to calculate the the value that curls --pinnedpubkey option needs in Go.

Continue Reading

  • Oct 08 / 2020

Reliably rebooting Ubuntu using watchdogs

Rebooting Ubuntu is hard. I don’t really know why, but in my twelve years as an Ubuntu user, I’ve encountered countless “stuck at reboot” scenarios. Somehow, typing reboot always comes with that extra special feeling of uncertainty and the thrill of danger — Will it come back? Where will it get stuck this time? If it’s your home computer or your laptop, that’s fine, because you can always manually hard reset. If it’s a remote computer to which you have IPMI access, it’s a little bit annoying, but not tragic. But if you’re attempting to reboot tens of thousands of devices across the globe, that level of uncertainty is nothing short of terrifying.

I know I’m being unfair, because more often than not, rebooting Ubuntu actually completes successfully. However, my incredibly unscientific estimate of how often things get stuck forever on shutdown or reboot is this: 1-3%. That’s how often I believe reboots hang. That’s shockingly high, right? Well, I pulled that out of my hat, but that estimate is based on many hundred thousands of reboots I’ve witnessed in our fleet of backup devices. That number is not too terrible when you deal with a handful of machines that you rarely ever reboot. It is, however, incredibly terrible if you reboot tens of thousands of devices running Ubuntu every two weeks as part of an upgrade process (I wrote about our image based upgrade mechanism in another post).

This post describes the short story of how we managed to make Ubuntu machines reliably reboot.

I cross-posted this post on the Datto Engineering Blog. Feel free to head over there and check out the other cool things our engineers do.

Continue Reading

  • Sep 18 / 2019

Image based upgrades: Upgrading software and OS of 80k servers every two weeks

Anyone that’s ever managed a few dozen or hundreds of physical servers knows how hard it can become to keep all of them up-to-date with security updates, or in general to keep them in sync with their configuration and state. Sysadmins typically solve this problem with Puppet, or Salt or by putting applications in a container. While those are great options if you control your environment, they are less applicable when you think about Datto‘s BCDR appliance (or really any appliance/server that doesn’t reside in your infrastructure). On top of that, replacing the kernel, major distribution upgrades or any larger upgrades that require a reboot are not covered by these solutions.

Being faced with this problem for Datto’s fleet of BCDR devices, we started exploring alternative options and came up with something that has worked reliably for almost two years for a fleet of now over 80,000 devices. In this blog post, I’d like to talk about how we solved this problem using images, loop devices and lots of Grub-magic. If you’d like to know more, keep reading.

I originally published this post on the Datto Engineering Blog. Feel free to head over there and check out the other cool things our engineers do.

Continue Reading

  • Jul 22 / 2019

Deduplicating NTFS file systems (fsdup)

At Datto, we store hundreds of thousands of block-level backups for our customers. Since our customer base is mostly Windows focused, most of these backups are copies of NTFS file systems. As of today, we’re not performing any data deduplication on these backups, which is pretty crazy considering that how well you’d think a Windows OS will probably dedup.

So I started on a journey to attempt to dedup NTFS. This blog post briefly describes my journey and thoughts, but also introduces a tool called fsdup I developed as part of a 3 week proof-of-concept. Please note that while the tool works, it’s highly experimental and should not be used in production!

Continue Reading

  • Aug 05 / 2018
  • 33
Linux, Security

Using Let’s Encrypt for internal servers

Let’s Encrypt is a revolutionary new certificate authority that provides free certificates in a completely automated process. These certificates are issued via the ACME protocol. Over the last 2 years or so, the Internet has widely adopted Let’s Encrypt — over 50% of the web’s SSL/TLS certificates are now issued by Let’s Encrypt.

But while there are many tools to automatically renew certificates for publicly available webservers (certbot, simp_le, I wrote about how to do that 3 years back), it’s hard to find any useful information about how to issue certificates for internal non Internet facing servers and/or devices with Let’s Encrypt.

This blog posts describes how to issue Let’s Encrypt certificates for internal servers. At Datto, we issued a certificate for each of our 65,000 90,000+ BCDR appliances using this exact mechanism.

Continue Reading

  • Mar 18 / 2018
Linux, Virtualization

USB disk causes blinking cursor at boot; how to “fix” the MBR bootstrap code

Have you ever rebooted your computer only to see a black screen with a blinking cursor? If you have a USB drive attached, chances are the blinking cursor is caused by invalid bootstrap code in the Master Boot Record (MBR) on that drive which has caused the normal boot execution to stop without returning control to the BIOS. If you have physical access to the machine, simply remove the USB drive and/or change the boot order to pick the OS disk first.

If you have no physical access, things are a bit more tricky: This exact thing happened to me at work the other day. Unfortunately, it didn’t happen to my computer, but to a few dozen of our customer backup appliances during their scheduled upgrade/reboot. Now, while dozens out of over 60k isn’t that much, our customers rely on these devices, so it’s not acceptable to have them not boot properly.

In this short post, I’ll demonstrate how to reproduce the blinking cursor problem, and how to “fix” the MBR to ensure the computer still boots, regardless of the boot order.

Continue Reading

  • May 28 / 2017
  • 9

Creating a BIOS/GPT and UEFI/GPT Grub-bootable Linux system

Good old Master Boot Record (MBR) unfortunately cannot address anything beyond 2TB, so partitioning large disks and making them bootable is impossible using MBR. The GUID Partition Table (GPT) solves this problem: It supports disks up to 16EB. However, installing grub does not work without a special BIOS boot partition. If you also want to support booting the same system via UEFI, another partition, the EFI System Partition (ESP), is necessary.

This should post shows you how to partition a disk with GPT and make a bootable Linux system via BIOS/Legacy and UEFI.

Continue Reading