If you don’t want this here can you please point me to the right sub?
I didn’t know where to post this but I know this is not an easy task to execute.
I only need to look at the source code not have the web page rendered on a web browser.
I will not crawl images or pdf files or anything else other than what is in the source code but compressed using a different kind of compression.
I have made my own fully functioning web browser for iOS (including Cloud Sync) and I have some experience on the subject.
Assume that cost isn’t an issue. I can purchase 30 pettabytes of storage, 10 medium-specd PCs and the fastest available connection OTE (the national internet provider of Greece) can offer.
My questions are:
- What are the minimum requirements for this? (except storage)
- Generally, what is a time frame of completion? (except pages that need to be updated more frequently or any other functional segmentation delay, think of it straight forward)
I experiment with a miraculous new format that can compress any website into a text file (1KB) that contains all of the meaning of that page but this is not the reason I choose to do this.
Google has 30 trillion pages, that is 30*10^12*1KB = 30*10^12 KB = ~27.939 pettabytes or ~28 pettabytes of data.