Help with compare-object and increasing efficiency

Hey All, so this is kind of a continuation from my last post yesterday: https://old.reddit.com/r/PowerShell/comments/d2bz0g/need_a_faster_way_to_remove_duplicates_from_large/?ref=share&ref_source=link

So you guys were extremely helpful in helping me solve that piece of my puzzle, however now I see more issues. For context folder A has about 100,000 files and folder B has about 50,000 files, the script gathers a list of all unique filenames across the directories and evaluates each checking if it is only in A, only in B, in both and equal, and in both and differ. Here is my script in its entirety here:

#grabbed this function from this site https://blogs.technet.microsoft.com/ashleymcglone/2017/08/07/use-hash-tables-to-go-faster-than-powershell-compare-object/ Function Compare-Object2 { param( [psobject[]] $ReferenceObject, [psobject[]] $DifferenceObject, [switch] $IncludeEqual, [switch] $ExcludeDifferent ) # Put the difference array into a hash table, # then destroy the original array variable for memory efficiency. $DifHash = @{} $DifferenceObject | ForEach-Object {$DifHash.Add($_,$null)} Remove-Variable -Name DifferenceObject # Put the reference array into a hash table. # Keep the original array for enumeration use. $RefHash = @{} for ($i=0;$i -lt $ReferenceObject.Count;$i++) { $RefHash.Add($ReferenceObject[$i],$null) } # This code is ugly but faster. # Do the IF only once per run instead of every iteration of the ForEach. If ($IncludeEqual) { $EqualHash = @{} # You cannot enumerate with ForEach over a hash table while you remove # items from it. # Must use the static array of reference to enumerate the items. ForEach ($Item in $ReferenceObject) { If ($DifHash.ContainsKey($Item)) { $DifHash.Remove($Item) $RefHash.Remove($Item) $EqualHash.Add($Item,$null) } } } Else { ForEach ($Item in $ReferenceObject) { If ($DifHash.ContainsKey($Item)) { $DifHash.Remove($Item) $RefHash.Remove($Item) } } } If ($IncludeEqual) { $EqualHash.Keys | Select-Object @{Name='InputObject';Expression={$_}},` @{Name='SideIndicator';Expression={'=='}} } If (-not $ExcludeDifferent) { $RefHash.Keys | Select-Object @{Name='InputObject';Expression={$_}},` @{Name='SideIndicator';Expression={'<='}} $DifHash.Keys | Select-Object @{Name='InputObject';Expression={$_}},` @{Name='SideIndicator';Expression={'=>'}} } } #This prompts the user with a folder dialog Function Get-Folder{ param([String]$selectDescription) [System.Reflection.Assembly]::LoadWithPartialName("System.windows.forms")| Out-Null $foldername = New-Object System.Windows.Forms.FolderBrowserDialog $foldername.Description = $selectDescription $foldername.rootfolder = "MyComputer" if($foldername.ShowDialog() -eq "OK") { $folder += $foldername.SelectedPath } return $folder } $folderA = Get-Folder "Select your parent A folder" $folderB = Get-Folder "Select your parent B folder" $resultsFolder = Get-Folder "Create or Select a folder where you would like to place results" #Recursively grab all filenames as suggested by u/graysky311, u/madbomb122, & u/night_filter $folderAFiles = Get-ChildItem $folderA -file -Recurse | Select -ExpandProperty Name $folderBFiles = Get-ChildItem $folderB -file -Recurse | Select -ExpandProperty Name [System.Collections.ArrayList]$AllFiles = @() [System.Collections.ArrayList]$objArray = @() #Assigns all unique filenames across the 2 folders to the Array List $AllFiles = (Compare-Object2 -ReferenceObject ($folderAFiles) -DifferenceObject ($folderBFiles) -IncludeEqual | select -ExpandProperty InputObject | sort) $AllFiles | ForEach-Object { $filename = $_ [String]$splitFN = "$($filename)" [String]$componentName = $splitFN.split(".", 2)[0] [String]$filepathA = "$($folderA)$($componentName)$($filename)" [String]$filepathB = "$($folderB)$($componentName)$($filename)" [String]$location = "" $diff = "" if([System.IO.File]::Exists($filepathA)){ #file exists in A if([System.IO.File]::Exists($filepathB)){ #file exists in A & B #This is now where my script takes a huge amount of time, it runs correctly with this compare-object but shoots exception when using compare-object2 $diff = $(Compare-Object -ReferenceObject $(Get-Content $filepathA) -DifferenceObject $(Get-Content $filepathB)) | Format-List * if($diff -ne $null){ #files differ, write the differences to the log text file $filename | Out-File "$($resultsFolder)$($componentName)Differences.txt" -Append $diff | Out-File "$($resultsFolder)$($componentName)Differences.txt" -Append Write-Output ([Environment]::NewLine) | Out-File "$($resultsFolder)$($componentName)Differences.txt" -Append $location = "Files differ" }else{ #files are equal $location = "Files equal" } }else{ #file exists in ONLY A $location = "Only in A" } }else{ #file does not exist in A if([System.IO.File]::Exists($filepathB)){ #file exists in ONLY B $location = "Only in B" }else{ #file exists in NEITHER $location = "In neither" } } $fileDataObject = [PSCustomObject]@{ Location = $location Component = $componentName ExportFileName = $filename } $objArray.Add($fileDataObject) | Out-Null } $objArray | Export-CSV "$($resultsFolder)AtoBCompare.csv" -NoTypeInformation Write-Host "The Script is complete" 

This script works, just takes a while. I believe that using the compare-object2 function at the file level would drastically improve efficiency but I get shot this exception when I use it instead of the original compare cmd:

Exception calling "Add" with "2" argument(s): "Item has already been added. Key in dictionary: ' <ID xsi:type="xsd:string">1000095</ID>' Key being added: ' <ID xsi:type="xsd:string">1000095</ID>'" At F:configDiff_09112019.ps1:16 char:41 + $DifferenceObject | ForEach-Object {$DifHash.Add($_,$null)} + ~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : NotSpecified: (:) [], MethodInvocationException + FullyQualifiedErrorId : ArgumentException 

Anyone have any ideas on how to fix this? or just any overall advice/help/tips regarding any efficiencies in my script that might speed it up?

Once again, I’d just like to thank all you for all the continued guidance and support. It’s good to know as a self-taught programmer that there are people that are really out there that are willing to help. Just wanted to note my appreciation. Thanks!!

submitted by /u/tahp_master
[link] [comments]

Leave a Reply