Page 1 of 2
Changes to robots.txt?
Posted: Sun Mar 29, 2015 3:24 am
by manmanman (imported)
Until recently (two weeks ago or so) it was possible to read old stories via archive.org. Since then the only thing coming up is "Page cannot be crawled or displayed due to robots.txt." Were there any recent changes to robots.txt which are now applied by archive.org retroactively? If so, is it possible to revert them?
Thanks in advance
Re: Changes to robots.txt?
Posted: Sun Mar 29, 2015 4:32 am
by kristoff
I don't understand. Are you searching in Google? A couple weeks ago I removed a restriction and allowed Google to crawl us. If anything, it should have made searching easier. Please explain step by step what you are doing.
I just did some story searching and everything worked fine.
Re: Changes to robots.txt?
Posted: Mon Mar 30, 2015 1:39 am
by Skeith27 (imported)
I am searching over web.archieve.org or clicking on links in this thread (
http://forums.eunuch.org/showthread.php ... ht=Stories)
And always there just come the error message:
manmanman (imported) wrote: Sun Mar 29, 2015 3:24 am
"Page cannot be crawled or displayed due to robots.txt.
See
www.eunuch.org robots.txt page. Learn more about robots.txt."
So either archieve.org has a bug or your robot.txt ?
Re: Changes to robots.txt?
Posted: Mon Mar 30, 2015 7:19 am
by fhunter
t."
So either archieve.org has a bug or your robot.txt ?
archive.org, probably, it takes robots.txt as an immediate, and does not keep history of it, so, even if something was indexable at the moment, but owner changed robots.txt later - it disappears.
Re: Changes to robots.txt?
Posted: Mon Mar 30, 2015 8:33 am
by kristoff
you're clicking on old links that take you to the wayback machine, not the EA. It is old archived stuff on another system. They no longer archive us, so you are getting errors.
Re: Changes to robots.txt?
Posted: Mon Mar 30, 2015 9:04 am
by manmanman (imported)
Thanks for the quick reply and also for confirming, that there have been changes to robots.txt.
What did I do?
To read stories that didn't make it into the new archive I used the wayback machine. The index pages are still visible (e.g.
https://web.archive.org/web/20110107120 ... al=stories), yet the stories themselves aren't visible anymore (e.g.
https://web.archive.org/web/20110107120 ... e_eunu.htm). The second link worked until two weeks ago.
My guess is that archive.org respects robots.txt in its current incarnation even retroactively, which would make sense because they are sort of a search engine. I'm no expert when it comes to this file, but could it be that the changes bar the old archive from being viewed in the wayback machine?
Re: Changes to robots.txt?
Posted: Mon Mar 30, 2015 3:07 pm
by kristoff
manmanman (imported) wrote: Mon Mar 30, 2015 9:04 am
Thanks for the quick reply and also for confirming, that there have been changes to robots.txt.
What did I do?
To read stories that didn't make it into the new archive I used the wayback machine. The index pages are still visible (e.g.
https://web.archive.org/web/20110107120 ... al=stories), yet the stories themselves aren't visible anymore (e.g.
https://web.archive.org/web/20110107120 ... e_eunu.htm). The second link worked until two weeks ago.
My guess is that archive.org respects robots.txt in its current incarnation even retroactively, which would make sense because they are sort of a search engine. I'm no expert when it comes to this file, but could it be that the changes bar the old archive from being viewed in the wayback machine?
We have made no changes here at eunuch.org that would have caused an issue. Whatever is going on is located at wayback machine. I really don't even know what robots.txt is, except it is posing an issue.
ADD: I just went and read a bunch about robots.txt. If that is what is stopping access to wayback, it was placed there by them. We do not own or control wayback machine. Suggest finding a means to contact them directly to inquire.
Re: Changes to robots.txt?
Posted: Thu Apr 02, 2015 1:04 am
by manmanman (imported)
Thanks for taking the time to look at this issue.
I had a look at
www.eunuch.org/robots.txt, too, and I think it is the line "Disallow: /Alpha/" that blocks the stories in the Wayback machine.
Because the new archive is located at eunuchworld.org, removing this line shouldn't have any detrimental effect.
Re: Changes to robots.txt?
Posted: Sat Apr 25, 2015 11:47 pm
by Skeith27 (imported)
nothing has changed,does it? Would be really annoying if the reason is a simple robot.txt line. -.-
Re: Changes to robots.txt?
Posted: Wed Apr 29, 2015 3:26 pm
by jearns1985 (imported)
manmanman (imported) wrote: Thu Apr 02, 2015 1:04 am
Thanks for taking the time to look at this issue.
I had a look at
www.eunuch.org/robots.txt, too, and I think it is the line "Disallow: /Alpha/" that blocks the stories in the Wayback machine.
Because the new archive is located at eunuchworld.org, removing this line shouldn't have any detrimental effect.
This. I have a masters in tech. Please, just fix the robots.txt. Until everything is ported over, you're just screwing loyal readers and writers. Please, it's not that hard to take out one line of code.