Originally Posted by AdultSites
(Post 20478222)
I am a lot more familiar with this, and it is not an easy thing, still. I mean, it is easy, but you cant get 100% of the pages, without scraping the site, or scraping serps, and there is no guarantee, this will be 100% of the pages too.
For a page to be known, it needs to be linked to from some other place on the Internet, could be search engine sites. If there is no single link to a page, there is no way for people to know, that it exists, except for people who own the site, or have access to its admin.
There is multiple methods of scraping, Seo Spider by Screaming Frog, and Xenus Link Sleuth, are not good enough for websites, which are large (over 3,000,000 urls). People to some tweaking, and it works better, but there is no a perfect solution to this.
So, in general, to scrape large websites, you need to do it programatically, and it takes time too.
So, getting an idea of a strucure of a website, is not an easy thing, and like I am saying, I am very familiar with anything that can or needs to be done, for this.
|