Changes to robots.txt?

manmanman (imported)
Articles: 0
Posts: 9
Joined: Thu Sep 20, 2012 7:48 am

Posting Rank

Changes to robots.txt?

Post by manmanman (imported) »

Until recently (two weeks ago or so) it was possible to read old stories via archive.org. Since then the only thing coming up is "Page cannot be crawled or displayed due to robots.txt." Were there any recent changes to robots.txt which are now applied by archive.org retroactively? If so, is it possible to revert them?

Thanks in advance
kristoff
Articles: 0
Posts: 4756
Joined: Sat Sep 17, 2005 5:45 pm

Posting Rank

Re: Changes to robots.txt?

Post by kristoff »

I don't understand. Are you searching in Google? A couple weeks ago I removed a restriction and allowed Google to crawl us. If anything, it should have made searching easier. Please explain step by step what you are doing.

I just did some story searching and everything worked fine.
Skeith27 (imported)
Articles: 0
Posts: 6
Joined: Sat Aug 16, 2014 7:45 am

Posting Rank

Re: Changes to robots.txt?

Post by Skeith27 (imported) »

I am searching over web.archieve.org or clicking on links in this thread (http://forums.eunuch.org/showthread.php ... ht=Stories)

And always there just come the error message:
manmanman (imported) wrote: Sun Mar 29, 2015 3:24 am "Page cannot be crawled or displayed due to robots.txt.

See www.eunuch.org robots.txt page. Learn more about robots.txt."

So either archieve.org has a bug or your robot.txt ?
fhunter
Site Admin
Articles: 0
Posts: 1634
Joined: Wed Nov 27, 2024 9:57 am
Location: Serbia
Has thanked: 57 times
Been thanked: 18 times

Posting Rank

Re: Changes to robots.txt?

Post by fhunter »

Skeith27 (imported) wrote: Mon Mar 30, 2015 1:39 am I am searching over web.archieve.org or clicking on links in this thread (http://forums.eunuch.org/showthread.php ... ht=Stories)

And always there just come the error message:
manmanman (imported) wrote: Sun Mar 29, 2015 3:24 am "Page ca
Skeith27 (imported) wrote: Mon Mar 30, 2015 1:39 am nnot be crawled or displayed due to robots.txt.

See www.eunuch.org robots.txt page. Learn more about robots.tx
t."

So either archieve.org has a bug or your robot.txt ?

archive.org, probably, it takes robots.txt as an immediate, and does not keep history of it, so, even if something was indexable at the moment, but owner changed robots.txt later - it disappears.
kristoff
Articles: 0
Posts: 4756
Joined: Sat Sep 17, 2005 5:45 pm

Posting Rank

Re: Changes to robots.txt?

Post by kristoff »

you're clicking on old links that take you to the wayback machine, not the EA. It is old archived stuff on another system. They no longer archive us, so you are getting errors.
manmanman (imported)
Articles: 0
Posts: 9
Joined: Thu Sep 20, 2012 7:48 am

Posting Rank

Re: Changes to robots.txt?

Post by manmanman (imported) »

Thanks for the quick reply and also for confirming, that there have been changes to robots.txt.

What did I do?

To read stories that didn't make it into the new archive I used the wayback machine. The index pages are still visible (e.g. https://web.archive.org/web/20110107120 ... al=stories), yet the stories themselves aren't visible anymore (e.g. https://web.archive.org/web/20110107120 ... e_eunu.htm). The second link worked until two weeks ago.

My guess is that archive.org respects robots.txt in its current incarnation even retroactively, which would make sense because they are sort of a search engine. I'm no expert when it comes to this file, but could it be that the changes bar the old archive from being viewed in the wayback machine?
kristoff
Articles: 0
Posts: 4756
Joined: Sat Sep 17, 2005 5:45 pm

Posting Rank

Re: Changes to robots.txt?

Post by kristoff »

manmanman (imported) wrote: Mon Mar 30, 2015 9:04 am Thanks for the quick reply and also for confirming, that there have been changes to robots.txt.

What did I do?

To read stories that didn't make it into the new archive I used the wayback machine. The index pages are still visible (e.g. https://web.archive.org/web/20110107120 ... al=stories), yet the stories themselves aren't visible anymore (e.g. https://web.archive.org/web/20110107120 ... e_eunu.htm). The second link worked until two weeks ago.

My guess is that archive.org respects robots.txt in its current incarnation even retroactively, which would make sense because they are sort of a search engine. I'm no expert when it comes to this file, but could it be that the changes bar the old archive from being viewed in the wayback machine?

We have made no changes here at eunuch.org that would have caused an issue. Whatever is going on is located at wayback machine. I really don't even know what robots.txt is, except it is posing an issue.

ADD: I just went and read a bunch about robots.txt. If that is what is stopping access to wayback, it was placed there by them. We do not own or control wayback machine. Suggest finding a means to contact them directly to inquire.
manmanman (imported)
Articles: 0
Posts: 9
Joined: Thu Sep 20, 2012 7:48 am

Posting Rank

Re: Changes to robots.txt?

Post by manmanman (imported) »

Thanks for taking the time to look at this issue.

I had a look at www.eunuch.org/robots.txt, too, and I think it is the line "Disallow: /Alpha/" that blocks the stories in the Wayback machine.

Because the new archive is located at eunuchworld.org, removing this line shouldn't have any detrimental effect.
Skeith27 (imported)
Articles: 0
Posts: 6
Joined: Sat Aug 16, 2014 7:45 am

Posting Rank

Re: Changes to robots.txt?

Post by Skeith27 (imported) »

nothing has changed,does it? Would be really annoying if the reason is a simple robot.txt line. -.-
jearns1985 (imported)
Articles: 0
Posts: 26
Joined: Tue Jan 20, 2009 2:18 pm

Posting Rank

Re: Changes to robots.txt?

Post by jearns1985 (imported) »

manmanman (imported) wrote: Thu Apr 02, 2015 1:04 am Thanks for taking the time to look at this issue.

I had a look at www.eunuch.org/robots.txt, too, and I think it is the line "Disallow: /Alpha/" that blocks the stories in the Wayback machine.

Because the new archive is located at eunuchworld.org, removing this line shouldn't have any detrimental effect.

This. I have a masters in tech. Please, just fix the robots.txt. Until everything is ported over, you're just screwing loyal readers and writers. Please, it's not that hard to take out one line of code.
Post Reply

Return to “Comments and Suggestions”