Critiques on my method to find all live URLs on a large

i/SEO

abhilasha

•

3 months

Critiques on my method to find all live URLs on a large site

Hello, can anyone please critique my method for locating every live page on a large website? I am working on a site with more than 3,000 pages, including product pages. There has been little to no governance prior. There is no way to export a list of pages from our headless CMS. Therefore, I am pulling every page that GA found over the past year; every page GSC logged; every page that shows up in a comprehensive Screaming Frog crawl that uses GA4 and GSC API's . I am going section by section, pulling the list of pages in a given subfolder from each source into a spreadsheet. The group of pages from each subfolder is in a different column. I am manually removing URLs with parameters. Then, I'm using a script to identify the pages in all three columns and which ones are not in all three columns (exceptions). I am then running the list of exceptions through Screaming Frog to reveal 404 and 500 errors. The ones with 200 statuses get added to the list of pages in all three columns. At this stage, I'm also spot-checking the Exception URLs for ones that may be correct if I manually fix typos. That, in part is why I'm going section by section and not doing the entire site at once: So the list is still relatively small enough to spot-check and also keep an eye out for any trends. Once I'm done, I can either spot-check against what's in our headless CMS, or manually go folder-by-folder in the CMS to look for anything I missed. I imagine I will miss a few non-indexed or orphan pages, but this should get me close. Any thoughts, suggestions, constructive criticism appreciated! Thanks in advance.

4

i/communityname

i/seo

Critiques on my method to find all live URLs on a large site

Manolo Marquez doesn't mince his words about India's performance in their goalless draw against Bangladesh 😠

A disappointing result in Shillong for India, who had their chances to get past Bangladesh in the first game of their AFC Asian Cup qualifying campaign.

Udanta Singh, who has spent the last two seasons under Manolo Marquez at FC Goa, joins his club coach at the national team camp too.

'He wanted to work on one more shot': Karthik backs Kohli against CSK's spin trio

Not Just Dwarkadhish Krishna, the controversial book of Swaminarayan sect also insulted Lord Ram, Lord Shiva and Mata Chamunda: Reveals OpIndia investigation

Rajasthan: 29,000 fake accounts in the name of Muslims made to scam money from PM Kisan Samman Nidhi, police had refused to file FIR in 2020

Haven’t forgotten sedition charges: Protest against Congress leader Kanhaiya Kumar in Bangaon, Bihar, stage of Bhagwati temple purified with Gangajal after his meeting

Jharkhand Police holds ‘peace meeting’, allows Ram Navami procession with several restrictions: How peace is being ensured at the cost curbing Hindu religious freedom

Two more Hurriyat groups J&K Tahreeqi Isteqlal and J&K Tahreek-I-Istiqamat discard separatism, Amit Shah says triumph of unity echoing across Kashmir

Haryana declares restricted holiday for Eid instead of gazetted holiday