Scraping HTML into an Excel cell using VBA

Hi there! I’ve got a column with a whole bunch of search terms that I just need a single paragraph/sentence from a search query; I got the below from StackOverflow which provides exactly what I need:

https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&exsentences=1&titles=Stack_Overflow

Raw data Output: {“batchcomplete”:””,”query”:{“normalized”:[{“from”:”Stack_Overflow”,”to”:”Stack Overflow”}],”pages”:{“21721040”:{“pageid”:21721040,”ns”:0,”title”:”Stack Overflow”,”extract”:”Stack Overflow is a question and answer site for professional and enthusiast programmers.”}}}} (I can filter everything out in Excel, but the sentence that I need is there).

I’m trying to use VBA to call the first part of the url and then the cell that has the search term (example, D2), and then output the entire string into the next cell, E1.

My code is below:

Sub ScrapeHTML() Set xmlhttp = CreateObject("MSXML2.serverXMLHTTP") Dim term As String x = 2 Do While Sheet2.Cells(x, 4) <> "" term = Sheet2.Cells(x, 4) xmlhttp.Open "GET", "http://www.google.com/search?q=" & term, False xmlhttp.setRequestHeader "Content-Type", "text/xml" xmlhttp.send Sheet2.Cells(x, 5).Value = xmlhttp.responseText x = x + 1 Loop End Sub 

When I use the above, Google (with search term like “hello”), it works perfectly, drops the scraped HTML into cell E2. When I replace Google with the Wikipedia address above, it can’t pull it; I get an operation timeout error on xmlhttp.send. Any help would be greatly appreciated!

submitted by /u/404NotFounded
[link] [comments]

Leave a Reply