I'm not sure if this is something that got changed here or something that got changed at the IAWM but I can no longer look for old stories over there. It was working just last week or so.
"
Page cannot be crawled or displayed due to robots.txt.
See www.eunuch.org robots.txt page. Learn more about robots.txt.
"
The robots.txt says:
"
User-agent: *
Disallow:
Disallow: /cgi-bin/
Disallow: /Alpha/
Disallow: /forums/
Disallow: /personals/
"
If its on this end I believe the issue is the "Dissallow: /Alpha/" as that is what the old stories were under.
Robots.txt / Internet Archive Wayback Machine
-
GreyPhoenix (imported)
- Articles: 0
- Posts: 1
- Joined: Tue Mar 03, 2015 3:22 am
-
Posting Rank
-
Losethem (imported)
- Articles: 0
- Posts: 3342
- Joined: Tue Dec 25, 2001 9:01 am
-
Posting Rank
Re: Robots.txt / Internet Archive Wayback Machine
Just checked, I got the same problem.
My guess is one of two things has happened. 1. Archive.org (not to be confused with eunuch.org) has decided to only archive pages down to a certain level. Perhaps they have hit their bandwidth/storage wall. 2. Someone at Archive.org saw the content here and banned it beyond the links to get into the stories.
The only way to fix it is wait on them, or hope people volunteer to help out the folks here and get the rest of the old stories back online. I'm busy working on fixing problems with my own site, otherwise I'd chip in.
--LT
My guess is one of two things has happened. 1. Archive.org (not to be confused with eunuch.org) has decided to only archive pages down to a certain level. Perhaps they have hit their bandwidth/storage wall. 2. Someone at Archive.org saw the content here and banned it beyond the links to get into the stories.
The only way to fix it is wait on them, or hope people volunteer to help out the folks here and get the rest of the old stories back online. I'm busy working on fixing problems with my own site, otherwise I'd chip in.
--LT
-
ttswitch (imported)
- Articles: 0
- Posts: 8
- Joined: Wed Dec 16, 2015 10:46 am
-
Posting Rank
Re: Robots.txt / Internet Archive Wayback Machine
Just wanted to comment on this, archive.org now follows the policy that if a section of a website is disallowed in the current robots.txt, it retroactively blocks all of their old archives of that site. So because the current http://www.eunuch.org/robots.txt has the line that disallows /Alpha/, none of the old stories can be accessed through archive.org.
Kind of a poor policy for them, since if a website changes ownership the new owners can block the content of the old owners in the archive, but I imagine it's to avoid lawsuits. If that "Disallow: /Alpha/" line could be removed from the curent robots.txt, it should let archive.org start showing the old archives again. It wouldn't effect the current site, since /Alpha no longer exists on eunuch.org.
Kind of a poor policy for them, since if a website changes ownership the new owners can block the content of the old owners in the archive, but I imagine it's to avoid lawsuits. If that "Disallow: /Alpha/" line could be removed from the curent robots.txt, it should let archive.org start showing the old archives again. It wouldn't effect the current site, since /Alpha no longer exists on eunuch.org.
-
SammyTissues (imported)
- Articles: 0
- Posts: 1
- Joined: Mon Jun 06, 2016 1:13 am
-
Posting Rank
Re: Robots.txt / Internet Archive Wayback Machine
SammyTissues (imported) wrote: Mon Jun 06, 2016 1:13 am Why is "Disallow: /Alpha/" still in robot.txt?
When you find out, let us know. Eunuch.org has nothing whatsoever to do with the wayback machine.
-
Hamburger (imported)
- Articles: 0
- Posts: 26
- Joined: Fri Jan 15, 2016 12:42 pm
-
Posting Rank
Re: Robots.txt / Internet Archive Wayback Machine
kristoff wrote: Mon Jun 06, 2016 8:03 am Eunuch.org has nothing whatsoever to do with the wayback machine.
It doesn't seem to be the point.
Current list:
GreyPhoenix (imported) wrote: Tue Mar 03, 2015 3:22 am User-agent: *
Disallow:
Disallow: /cgi-bin/
Disallow: /Alpha/
Disallow: /forums/
Disallow: /personals/
-
fhunter
- Site Admin
- Articles: 0
- Posts: 1634
- Joined: Wed Nov 27, 2024 9:57 am
- Location: Serbia
- Has thanked: 57 times
- Been thanked: 18 times
-
Posting Rank
Re: Robots.txt / Internet Archive Wayback Machine
Because no one really understands what it is, or cares to fix it. Also, probably, liability limitation by eunuch.org/moderators (which recently kind of skyrocSammyTissues (imported) wrote: Mon Jun 06, 2016 1:13 am Why is "Disallow: /Alpha/" still in robot.txt?
h.org has nothing whatsoever to do with the wayback machine.kristoff wrote: Mon Jun 06, 2016 8:03 am keted - that is my feeling, nothing more, but it is here).
When you find out, let us know. Eunuc
It is. said robots.txt, is located on eunuch.org, as you can check from this link:
As per specification, robots.txt file on web server stops search engines (of which wayback machine is one of the examples) from indexing certain pages. See here: https://en.wikipedia.org/wiki/Robots_exclusion_standard for details.
In short - wayback machine, for some reason, takes only current version of robots.txt, instead of taking the relevant one from the time.
See here:
https://archive.org/post/406632/why-doe ... -robotstxt
and here:
https://archive.org/post/1019415/retroa ... ive-policy
PS. There is no reason to keep /Alpha/ line in the
because there is no longer anything there, check yourself: http://www.eunuch.org/Alpha/