A command line tool for managing your photo backups

Like many, I use an external hard drive for backing up my photos. I take photo with my iphone and for important events I will use my digital cameras. Photography is not my hobby but I managed to collect about 90k files in my hard drive.

Backing up photos from difference device can be as easy as copy and pasting from your memory cards to the hard drive. With iphone, I have to use the app Photo to first import files into my laptop and them export all files to the hard drive.

The problem comes when sometimes I forgot if have or not back up photos on a memory card. To be on the safe side, I would normally copy and paste again to the hard drive just to make sure I am missing any photos. My hard drive ended up with many duplicated photo files. What making it worse is that sometimes photo files are renamed, resulting duplicated photos with different names.

Introducing Photoman

I want a tool that can help me do two things:

  1. Remove the duplicated photo files;
  2. Organise my photo files in the Year/Month directories. e.g. 2017/2017_01/my_photo_name.jpg.

I did some search but I could not really find a tool that can easily identify duplicated files, extract EXIF date data from photos and move/reorganise photos. I ended up writing my first ever Golang tool for this purpose. I’ve been using it for about one year and I would like the share it with you.

Repository: https://github.com/dlin-me/photoman

For installation and usage instruction, please visit the above Github page.

How it works

  1. photoman works on the files under the current directory ( where photoman command is executed );
  2. For photoman to work, it needs to first index all the files under the current directory. After indexing, it creates a .photoman_db files at the directory;
  3. The index it creates is a map of all files with their file paths, md5 hashes and EXIF created date;
  4. Updating the index involves making sure every file has a entry in the index, and create one if it doesn’t;
  5. It identify duplicate files by simply looking up files share same md5 hashes;
  6. Moving files is also easy by making use of the EXIF date available in the index;
  7. For files with no EXIF data, you have the option of using the file creation time instead;

Performance

  1. It indexes 90k files (280 GB) on a USB3 external hard drive in about 30 minutes ( Disk IO is the bottleneck ) ;
  2. Reindexing with no new files takes about 20 seconds;
  3. Reindexing with 2000 new files takes less than 1 minutes;
  4. De-duplicating and moving files takes less than 5 seconds;

Leave a comment