User:Purple Wrench/Homestar Offsite
From Homestar Runner Wiki
|  (this is what i'd call a "stretch goal", although now it's a little more important than before) |  (i hope the board game comes out, but more importantly i hope they don't make a habit of adding these local hostnames) | ||
| Line 44: | Line 44: | ||
| #[[User:Purple_Wrench/Homestar_Offsite/hsr-root|www.homestarrunner.com]] - list is 210KB, site is approx. 640MB, updated 14 Jan 2018 | #[[User:Purple_Wrench/Homestar_Offsite/hsr-root|www.homestarrunner.com]] - list is 210KB, site is approx. 640MB, updated 14 Jan 2018 | ||
| #podstar.homestarrunner.com - will be added if there is enough interest | #podstar.homestarrunner.com - will be added if there is enough interest | ||
| + | #trogdorboardgame.homestarrunner.com - will be added if there is enough interest (and if it remains after the board game is released) | ||
| #<s>www.videlectrix.com</s> - probably won't be added, as most of the content has been moved to homestarrunner.com proper. | #<s>www.videlectrix.com</s> - probably won't be added, as most of the content has been moved to homestarrunner.com proper. | ||
| #*homestarrunner.com/videlectrix - may be added | #*homestarrunner.com/videlectrix - may be added | ||
Revision as of 22:38, 29 June 2018
|   | Homestar Offsite is currently incomplete. Pending changes: 
 | 
Here you will find an HTML list comprised of all files on the various official Homestar Runner websites. This allows you to download each entire website with minimal effort, but retain the browsing experience as much as possible. For example, this would make it possible to put homestarrunner.com on a CD-ROM for offline use. This page has a similar purpose to User:Nerd42/List and is equally compatible with DownThemAll, but has a much larger scope, including all secret pages (and a few which the wiki has not yet documented).
Please read the instructions before attempting to use these lists.
| Contents | 
Instructions
- Using Firefox, go to the DownThemAll add-on page and download it. (Update: This version is not compatible with Firefox Quantum as of 14 Jan 2018, though it appears to be coming soon. There is now also a Google Chrome version available although I have not yet tried using it, so the interface may differ somewhat.)
- Customize the Firefox toolbar to include a DownThemAll button.
- Go to one of the website links below and copy the HTML code you see inside the box. NOTE: Do NOT view the source page, as it uses templates to display the code.
- Paste the HTML code into a text editor and save it as a .html file on your computer. NOTE: If you have a text editor that only supports 64KB files or less, you will need to download a better text editor. This will most likely not be a problem for most users.
- Open the .html file in Firefox and click on the DownThemAll button.
- In the "Save Files In:" section, choose a directory where you want to save the files.
- In the "Filters" section, uncheck everything and then check "All Files".
- Make sure nothing is selected in the "Fast Filtering" section.
- In the "Renaming Mask:" section, clear the text box and paste the following (without quotes):
- If you have Windows, use "*subdirs*\*name*.*ext*"
- If you have Mac or Linux, use "*subdirs*/*name*.*ext*"
 
- Click "Start!" and wait a few minutes.
- Eventually all files should be downloaded in the directory you specified. Files in subdirectories on the website will now be in subfolders of the directory.
- Open the directory you specified and open "index.html" to start browsing.*
Issues
If you find any issues* when using these lists, please discuss them on the talk page. If the issue can be confirmed it will be posted here.
- All the NavBar links, except for "main", "rando" and "youtube", redirect to homestarrunner.com instead of the local directory. This means that, from the main pages, it is impossible to directly access the local copy of Legal. (Likewise, "store" redirects to a blank page on homestarrunner.com that then redirects to the current store website.) Oddly enough, this problem does not occur in the NavBar buttons found within virus. Therefore, Legal is not an orphaned page entirely.
- Flash and JavaScript are used a lot in these pages. Sometimes Flash links or JavaScript links will be blocked by the browser.* For example, I was not able to get the pop-up in crazy cartoon for the AIM icons to appear.
- The favicon does not work.*
- I have "flagged" a few files which are not included in the lists below. On homestarrunner.com, they include a low-level file that is often blocked by web browsers; the regionally-visible files from April Fools' 2006; and the original version of Who Said What Now?. None of these are actively linked to on the website. In addition, some secret files may return a 404 and therefore will not download (such as "Default.htm" and "theme.ZIP", which is an exact copy of the theme song download for Windows), but none of them are actively linked to either.
- Some css and js files and code, especially from late 2017 on, may not link properly or work at all. This could break page functionality or cause the page's appearance to suffer.*
- Links (Videlectrix, the Email Processing Room link) and embedded files (YouTube videos) that are stored outside of homestarrunner.com are not downloaded. Whether these links are functional depends on your network connection and whether the external sites are still functional and honor those links.
*To those of you with some programming/network experience: It appears to be possible to mitigate some of these issues by creating a local static server and setting its root directory to the directory you specified. This can be relatively simple to do; Python, for example, has this functionality built in as a very short, simple command. I have tried this out and, in general, it does solve these problems, although the experience certainly isn't perfect. With a slightly more involved setup, the server could return "404error.html" as its 404 error page for that extra bit of authenticity. However, a task of this nature goes beyond the scope of this wiki page and the steps involved can vary significantly depending on your computer/device and network setup, not to mention which software/programming language/application you are using to do this. Further, it is possible to modify the hosts file on your computer to map "www.homestarrunner.com" to your local server, which will solve a few more problems, but this is not recommended due to security concerns.
Websites Listed
- www.homestarrunner.com - list is 210KB, site is approx. 640MB, updated 14 Jan 2018
- podstar.homestarrunner.com - will be added if there is enough interest
- trogdorboardgame.homestarrunner.com - will be added if there is enough interest (and if it remains after the board game is released)
- www.videlectrix.com- probably won't be added, as most of the content has been moved to homestarrunner.com proper.- homestarrunner.com/videlectrix - may be added
 
- www.thoraxcorp.com - will be added if there is enough interest
- www.homestarrunnerstore.com- site closed before I had the chance; please see archive.org instead
How I Did This
It's not easy to find a surefire way to identify every file on a website. Sitemap generators are great when the site uses HTML links predominantly, but homestarrunner.com links almost exclusively are embedded in Flash files. As a result, most sitemap generators stop at the Index Page. Google's search results provide a better solution, but it's difficult to pull those results into a complete list (even though Google supposedly allows that to be done using the Spreadsheets app).
The best solution, if you're willing to use outdated results, is archive.org. I typed in "http://www.homestarrunner.com/*", which retrieves all files found in the www hostname of homestarrunner.com. The archive returned over 10,000 different URLs. I saved the source code of the page to a file (which was 3MB alone) and searched it for text enclosed by "<a href= ... >" and "</a>", which returned the actual URLs rather than their archive.org mirrors.
4,000 of the URLs were obviously wrong, so I eliminated them. I pasted the rest into an Excel spreadsheet and wrote a macro to specify if each one was a valid URL or not.
Under normal circumstances, an HTML request that does not find a valid URL returns code 404. However, homestarrunner.com generates a 404'd page every time an invalid URL is found, and that custom 404'd page is valid, so the request returns 200 like every other page. Instead of checking the HTML request, the macro checks whether the text "404error.swf" is found, meaning that it must be an invalid page. Non-HTML files are treated as an exceptional occurrence, meaning that the macro was very sluggish. That said, it was able to reveal that, out of the remaining 6,000 URLs, about 1 in 3 were valid.
I then took all of the valid URLs and, by hand, labeled them by name (or category, in cases like Menu Previews and Fan Stuff) and extension. This allows them to be organized in a more meaningful manner, and allows new URLs to be added much more easily.
I converted the spreadsheet to a plain text document, and wrote a quick program to format its contents using HTML code. The program also gave each name/category its own heading, allowing multiple files with the same name to be grouped together. As I said previously, this makes it easier to add new URLs, and I quickly realized that when I tried downloading the site and found that the 2014 Toons had no Menu Previews. Manually adding the four remaining menu previews was only as difficult as finding their names and putting them in their correct order alphabetically under "Menu Previews".
Enjoy! -- ■■ PURPLE WRENCH ■■ 19:21, 22 February 2015 (UTC)
