Thursday, December 17, 2009

De-Duping a PC

Over the years, my Documents directory has accreted a lot of duplicate files. What's usually happened is that I've retired a secondary computer, such as a laptop, and stuck its data files in a directory on main computer. The issue is that there's usually a fair bit of overlap in the files.

But there are also a lot that were created on and remained on the laptop. So I can't just just blow everything away. On the other hand, it takes work to actually merge everything neatly, so it's easier to just stuff everything in the attic--so to speak--rather than figure out what to keep and what to toss.

But things were reaching a breaking point for me. I was up to something like five to-go-through-someday directories that were getting backed up and generally making it harder to find things.

So I finally decided to do something about it.

It turns out that there are some programs out there which can go through a directory or set of directories and find duplicates. The one I chose is called clonespy. After running through the process and eliminating about 11GB of files, here are some thoughts and cautions.
  • Make sure that you have a good backup (and one that won't be overwritten by any automated backup processes). It is very easy to cause serious damage here and, as a practical matter, you're going to have to let the software do a lot of its work in an automated way. (In my case, we were talking about tens of thousands of files.)
  • You should be sure to exclude (or plan to copy back) any directories that require all their files to remain in place even if they are duplicates. A typical example is your local copy of a hosted Web site.
  • In retrospect, I should have played around more with program settings or date stamps or other means of making sure that duplicates were preferentially removed from my archived directories rather than elsewhere in my Documents folder. (In other words, to the degree that you have an existing folder hierarchy, you don't want to pull files out of there.)
  • You will probably be left with archive directories with a whole bunch of empty or near-empty folders. There is no straightforward way to tell Windows to "just give me all the files in this directory tree and forget about the folders." There is a neat workaround though. Do a search at the top-level forward for * and you'll get all the files in the hierarchy returned. Just select them, copy them, and paste them into a ToBeFiled folder or something along those lines. [UPDATE: Upon further examination, I'm not sure this completely works--not quite sure what was going on.]
  • Bottom line: I'd have done things a bit differently that would have avoided some back-end cleanup but it's worth doing if your files are getting out of control.

No comments: