How would you structure this?

I guess I mean architecturally, sorry English isn't my primary language.

I started programming again recently after a loooong break and decided for once I'd try to do stuff in a more "correct" way instead of my usual "get shit working, then move on", so I thought I'd ask here for some tips.

I'm currently implementing a scraper for my small program, easy enough. The problem is that I need to scrape anywhere from 1 to 20 different sites depending on options, and though they all have the same type of data and even about 80-90% overlap in content, each site displays aforementioned data differently which means I need to customize the scraping for each site.

The three different ways I figured would be appropriate are:

1) A bare scraper baseclass for the formated data and just extending/overriding for each site.

2) Encapsulating everything into a superclass that just handles everything with if (site == SiteA) CrawlSiteA. This sounds like the absolute worst way and just a nightmare.

3) Textbased templates I could have in configfiles, as in SiteA_template would somehow parse and feed that info at runtime into variables, problem is I have no idea how I would do that or what libraries to use if any even exist for those kind of features.

Am I right in thinkin 1 is my best option? How would I best implement it?

Also, the scraping though launched from the app (simple buttonclick) should be completely asynchronous (fire&forget almost). The app only wants to know when it's finished which might be anywhere from a minute to well a lot longer. Shortest case scenario it's scraping about 400 pages (normal usage), worst case 100-150k (retardusage :P). What would be preferred, simple await in the click, or an event based model where I just send it of to do it's stuff and report back in a callback/event?

Im btw using HtmlAgilitypack for the html parsing/scraping as I'm terrible at XSLT/XPath and needed an easy way to just get what I needed.

I don't mind learning new things, actually quite enjoy it, especially if it's helpful for future projects. LINQ has been on my list of things to pickup for years for example, just never really liked the syntax.

Hope the question isn't too vague/broad or stupid.

Thanks =)

by FigurativelyLiterate via /r/csharp

Leave a Reply