All my spiders were taking all content from a website on a single visit starting from the begining.
It seems that the idea of remembering which urls “produce” links with content is not so very bad.
Here is what I found for diri.bg – a local Bulgarian SE.
I see that diri.bg hasn’t remove from their page
show_categories.php
Even I have no links to this page. Check the result here
Ops. Google do it the same way: here
Then how to get rid of old pages without leaving “bad” links in internet?
I will try to put show_categories.php in robots.txt to see what will happen with this page.