Most efficient way to filter a massive List>

As the title suggests, I am trying to find the fastest way to filter a huge nested list of objects. The list has millions to even billions of records and needs to filter it based on a variety of different criteria.

The lists are a list of all combinations of another flat list. The lists are filtered based on the sum and count of properties of the objects in the nested list. Such that…

foreach (var list in lists) { if(list.Sum(l => l.prop1) > X) continue; if(list.Sum(l => l.prop2) > Y) continue; var prop3Counts = list.GroupBy(l => l.prop3)) if (prop3Counts.Count() < X) continue; if (prop3Counts.Any(p => p.Count()) > X) continue; listsThatMadeTheCut.Add(list); } 

There is another group by property, but I think this is sufficient enough to show the issue I am facing. With roughly 2 million combinations I got it from a 30 second runtime down to 21. It would be nice to get this much faster, as the list size could potentially be in the billions.

I have googled a lot, but don't really see anything different than what I am doing. Is there anything within c# or Linq I could do to make any significant speed increases with this? I have thought about using HashSets, but that would require me to change a lot more code. Any other suggestions would be fantastic.

Hopefully what I am asking is clear, if you have any questions, feel free to ask.

by WhiteXHysteria via /r/csharp

Leave a Reply