Loading millions of objects into variables – memory consumption? Or would a 3rd party tool be better?

So more of an abstract question than a code related question. I’d like to write a powershell script (almost a program, really) that will grab all objects in our various file share servers that fit certain extensions (.doc, .xls, .csv, etc) and load them into variables. Then I want the script to go through each object and look at the file names as well as the content of each object to determine certain string matches (if you guessed PHI/PII, you’re right!). I then want it to form some kind of output that would be usable and actionable by IT staff and/or management (though the exact use case for this amount of raw data is something we need to discuss as a team).

If there were over 6 million such objects, would PowerShell be able to handle it? Would there be considerations for running this on its own server with dedicated CPU, memory, etc? Or would a 3rd party tool be better for this, written in a faster, more versatile language?

submitted by /u/Marquis77
[link] [comments]

Leave a Reply