Calculating duplicate data across servers with get-filehash?

So I’ve added all of our VM s disks and other fun stuff together and it comes to 17tb of data.

I’m preeeetty sure a good chunk of that is duplicate data between servers.

I kinda want to just pull a list of all the files and run a md5 hash on each file to find identical files, shouldn’t I theoretically be able to see how much duplicate storage we have across servers ?

Could I do something like get a list of servers, run get-hash -algorithm md5 | select path,hash,filesize > $serverpath.txt and then add all of them in an excel sheet and see which files have the same hash?

submitted by /u/xStimorolx
[link] [comments]

Leave a Reply