Page 1 of 1
Robots.txt / Internet Archive Wayback Machine
Posted: Tue Mar 03, 2015 3:22 am
by GreyPhoenix (imported)
I'm not sure if this is something that got changed here or something that got changed at the IAWM but I can no longer look for old stories over there. It was working just last week or so.
"
Page cannot be crawled or displayed due to robots.txt.
See
www.eunuch.org robots.txt page. Learn more about robots.txt.
"
The robots.txt says:
"
User-agent: *
Disallow:
Disallow: /cgi-bin/
Disallow: /Alpha/
Disallow: /forums/
Disallow: /personals/
"
If its on this end I believe the issue is the "Dissallow: /Alpha/" as that is what the old stories were under.
Re: Robots.txt / Internet Archive Wayback Machine
Posted: Tue Mar 03, 2015 8:52 pm
by Losethem (imported)
Just checked, I got the same problem.
My guess is one of two things has happened. 1. Archive.org (not to be confused with eunuch.org) has decided to only archive pages down to a certain level. Perhaps they have hit their bandwidth/storage wall. 2. Someone at Archive.org saw the content here and banned it beyond the links to get into the stories.
The only way to fix it is wait on them, or hope people volunteer to help out the folks here and get the rest of the old stories back online. I'm busy working on fixing problems with my own site, otherwise I'd chip in.
--LT
Re: Robots.txt / Internet Archive Wayback Machine
Posted: Wed Dec 16, 2015 10:46 am
by ttswitch (imported)
Just wanted to comment on this, archive.org now follows the policy that if a section of a website is disallowed in the current robots.txt, it retroactively blocks all of their old archives of that site. So because the current
http://www.eunuch.org/robots.txt has the line that disallows /Alpha/, none of the old stories can be accessed through archive.org.
Kind of a poor policy for them, since if a website changes ownership the new owners can block the content of the old owners in the archive, but I imagine it's to avoid lawsuits. If that "Disallow: /Alpha/" line could be removed from the curent robots.txt, it should let archive.org start showing the old archives again. It wouldn't effect the current site, since /Alpha no longer exists on eunuch.org.
Re: Robots.txt / Internet Archive Wayback Machine
Posted: Mon Jun 06, 2016 1:13 am
by SammyTissues (imported)
Why is "Disallow: /Alpha/" still in robot.txt?
Re: Robots.txt / Internet Archive Wayback Machine
Posted: Mon Jun 06, 2016 8:03 am
by kristoff
When you find out, let us know. Eunuch.org has nothing whatsoever to do with the wayback machine.
Re: Robots.txt / Internet Archive Wayback Machine
Posted: Mon Jun 06, 2016 1:10 pm
by Hamburger (imported)
kristoff wrote: Mon Jun 06, 2016 8:03 am
Eunuch.org has nothing whatsoever to do with the wayback machine.
It doesn't seem to be the point.
Current list:
GreyPhoenix (imported) wrote: Tue Mar 03, 2015 3:22 am
User-agent: *
Disallow:
Disallow: /cgi-bin/
Disallow: /Alpha/
Disallow: /forums/
Disallow: /personals/
Re: Robots.txt / Internet Archive Wayback Machine
Posted: Mon Jun 06, 2016 1:11 pm
by fhunter
Because no one really understands what it is, or cares to fix it. Also, probably, liability limitation by eunuch.org/moderators (which recently kind of skyroc
kristoff wrote: Mon Jun 06, 2016 8:03 am
keted - that is my feeling, nothing more, but it is here).
When you find out, let us know. Eunuc
h.org has nothing whatsoever to do with the wayback machine.
It is. said robots.txt, is located on eunuch.org, as you can check from this link:
As per specification, robots.txt file on web server stops search engines (of which wayback machine is one of the examples) from indexing certain pages. See here:
https://en.wikipedia.org/wiki/Robots_exclusion_standard for details.
In short - wayback machine, for some reason, takes only current version of robots.txt, instead of taking the relevant one from the time.
See here:
https://archive.org/post/406632/why-doe ... -robotstxt
and here:
https://archive.org/post/1019415/retroa ... ive-policy
PS. There is no reason to keep /Alpha/ line in the
because there is no longer anything there, check yourself:
http://www.eunuch.org/Alpha/