Using LINQ to extract Attributes from XML.

I've got a XML file and I'm trying to extract some information from The file has a list of products and their attributes. I'm trying to create 3D models for Tekla so only some of these attributes are relevant to me. The intern before me was doing this manually. My issue is that there are 10 files and each file is over 100MB large. I'm not interested in wasting a significant portion of my existence sifting through over 1 million lines of code. Here's a basic setup of each Product entry in the Xml file.

 <Product ID="productID" UserTypeID="USERTYPE"> <Name>PRODUCT NAME</Name> <ClassificationReference ClassificationID=" CLASSIFICATION_PARKING" Type="LINK_TYPE_CLASSIFICATION_SYSTEM"/> <Values> <Value AttributeID="CHA_STREETPRICE_STD_NETAMOUNT">0.00</Value> <Value AttributeID="CHA_SAP_MATMAS_WERKS">0000</Value> <Value AttributeID="CHA_STREETPRICE_STD_CURRENCY">EUR</Value> <Value AttributeID="CHA_SAP_MATMAS_ZZPUBLISH">00000</Value> <Value AttributeID="CHA_SAP_MATMAS_ZZCATALOG_TYPE">00000</Value> <Value AttributeID="CHA_SAP_MATMAS_MARM_PCE_MEINH">0000</Value> <Value AttributeID="CHA_STREETPRICE_STD_QUANTITY">1</Value> <Value AttributeID="CHA_SAP_MATMAS_MARM_PCE_UMREZ">1</Value> <Value AttributeID="CHA_SAP_MATMAS_ZZDISCGRP">000000</Value> <Value AttributeID="CHA_STREETPRICE_STD_NETPRICE">0.00</Value> </Product> 

I've only just discovered LINQ but I think it might be able to help me here. My problem is I only seem to know the basics of LINQ and XML. I've got a basic approach in my head but I'm not quite sure how to write the queries. Here's what I'm thinking:

  • I only need products of certain USERTYPE so I'd ignore all Product Elements without that USERTYPE

  • Then I'd like to extract the Product attributes "ID" and "USERTYPE" together with the "Name" Node.

  • Then extract the Values in the Values Node based on the attributeID. I don't want all attributes just some.

  • Write to a text file on a single line.

However, I've stumbled at the first step. I've got this query:

// find Products with USERTYPE "PRD" static IEnumerable<string>GetKeyWordNames(string file) { return XDocument.Load(file) .Descendants("Product") .Attributes("ID") // how do you write a query to select multiple attributes .Select(attr => attr.Value) .ToList(); } 

So my questions in short are:

  • How do I query multiple attributes and only select a product based on attribute type?
  • How do I query Values nodes based on AttributeID. Is Values a descendant of Product or an innerNode?
  • How do I stored said results.

TL:DR: Please save me from a life reduced to CTRL+F and excel.

