This is cross-posted from the Muhlenberg College Digital Learning blog (https://diglearn.bergbuilds.domains).
This past June, I attended the Reclaim Roadshow, a two day staff development opportunity for folks who manage Domain of One’s Own at a college or university. Reclaim Hosting always puts on great conferences and workshops. Over two days I attended many remote presentations and breakouts full of useful how-to information and some great discussion of digital learning and digital pedagogy.
We did much of our engaging in the Discord application, which allowed me to work backward through breakouts I wasn’t able to attend. One of the more lively conversations concerned how Domains institutions like Muhlenberg might handle short- medium- and long-term preservation and archiving of student and faculty work on Domains. WordPress is far and away the most used application on Berg Builds at Muhlenberg, and we’re not unusual. But WordPress presents a particularly thorny question — how do you archive and preserve websites built with WordPress that depend upon a database and have a lot of moving parts, so to speak, regarding software and software dependencies?
WordPress, among other things, is a dynamically-generated web content management application that employs a relational database. So, to archive it and make it available in the future, one may need to archive the entire operating environment: the operating system, that relational database management system, WordPress software itself, and of course the content and configurations of a particular site, itself. This is a pretty complicated thing to do with assurance it will all work as desired in, say, five or more years into the future. This scenario also demands a pretty sizable amount of storage space and the archival object is really complex and dependent.
Let’s assume a future reader or visitor to an archived Domain site is primarily interested in the content of the site, and may be willing to sacrifice the digital or technological gestalt of engaging with a dynamic CMS like WordPress. A better approach might be to generate a static version of the WordPress site and store that for future access. What do I mean exactly? Well, essentially every access of a WordPress page or post is a call to a database that, in that moment, generates a version of that page for your browser. Visiting a single WordPress link might culminate after several queries of multiple data tables to pull words and images from associated records in those tables. What looks like a standard web page to visitors is actually a bunch of chunks of content stored within tables and served up when requested. In fact, you may have seen some WordPress posts depicted as record IDs in the URLs, themselves.
- nearly human-readable as markup files
- much smaller with respect to storage of bytes, and compressible
- organized hierarchically in nested folders
- rendered more simply (maybe via a single index.html file in a future browser that works on old web pages)
These were the merits and demerits flying back and forth in that Discord channel during the Reclaim Roadshow. Folks also mentioned the power of reflective assessment for faculty who teach with domains and the benefit to future scholars of access to the volumes and volumes of student and faculty work that otherwise are likely lost forever.
Lots of folks in conversation came to this conclusion — generate flat files from a dynamic WordPress sites and store them. But what’s the best way? Several hd tried building crawlers, but that means writing scripts using some of the more arcane Unix/Linux tools like wget or cURL and then parsing streams with any of a dozen different libraries out there. Or perhaps automating the process using software like Site Sweeper, but when I tried this, I was mistaken for a crawler on our campus network and blocked by security software partway through extracting a WordPress site. Scraping the web is, as it sounds, painful.
In this fast and amazing conversation, Ed Beck from SUNY-Oneonta asked, “Why isn’t everyone just using the Simply Static plugin?” The what? OK, please tell us more…!
Simply Static is a really easy WordPress plugin that converts your WordPress site to a nested set of folders and files containing everything. Posts, pages, images, categories, tags, dates, links. All that stuff. And built to work as the public sees your site. The administrative backend of WordPress is gone. The site is frozen at the point of conversion. But it’s all there.
And with that, everything I had been struggling to do for years became easy and fast to accomplish. That crucial piece of information that one colleague had, and that others like me needed, was provided.
Below I walk through installing and using the Simply Static WordPress plugin to generate a snapshot static version of a WordPress site. Something suitable for storing into the future with a fair degree of certainty it will be accessible at that time, presuming we can parse the various file formats that comprise our web (HTML, CSS, JS, JPEG, PNG, and so on). Hopefully you’ll see how easy it is to use, and maybe try it out, too.
Installing Simply Static
Within your WordPress administration dashboard, click on Plugin, and then click Add New.
Search for the Plugin named, “Simply Static” and install it.
You should notice a new menu option over on the left of your WordPress dashboard entitled, “Simply Static”. Here, you can configure the settings of the Simply Static plugin and generate a static version of your site.
There are a couple things to consider. First, if you merely want to create a usable archive of your site, you’ll likely want to select the option “save for offline use” when you generate a static version. This will give you a .ZIP archive that can be unzipped and navigated on your local drive as if it was your online wordpress site. Some features, like the search box widget, will not operate. But navigating the content of your site via menu links, hyperlinks, and category or tag links, will work perfectly.
On the other hand, if you wish to host on the web a fully functioning version of your WordPress site as a static site, you’ll need to decide if you want this static version to be accessible from its current web URL, or a different URL. Do you have a lot of visitors to your existing site? Is your site used mainly by your students, or are there others who may have bookmarked your site and continue to visit it? Is your site essentially complete in terms of its contents, or will it continue to grow and evolve in future iterations, perhaps when you teach the class again?
There is an option when generating a static version of a site to create either absolute or relative links. An absolute link incorporates the current fully formed domain name (for example, https://diglearn.bergbuilds.domains/blog/ ). The internal links within your static site, any pages or posts or images hyperlinked within your static site, will prepend this fully formed URL to the link (for example, https://diglearn.bergbuilds.domains/blog/resources/simply-static). This will only work properly if you place your static site generated by Simply Static, within the directory or folder currently housing your WordPress site. In other words, after generating your static site, you’ll need to remove AND replace your existing WordPress site.
Relative links work a bit differently. These are links that point to files relative to the site’s main directory. Put another way, Absolute links resolve by going out onto the web, looking up domain names and URLs via DNS, and resolving down to the area of a web server where these files reside. Relative links resolve by navigating the local file and folder hierarchy, relative to the top-most folder (or directory) of your website.
Think of it like this…there are two ways to get to your kitchen (at the rear of your house) from your couch in the living room (at the front of your house). The absolute path is to stand up, leave through the front door, walk around to the back door, and enter the kitchen. The relative path is to stand up, walk down the hallway connecting these rooms, and enter the kitchen. Absolute links are resolved, absolutely, using a specific, unique to the world web address. But relative links navigate a local file structure to, essentially, arrive at the same file or image.
To summarize, if you’re making a static copy of your WordPress site that will overwrite and answer to the same knock on the front door (the known URL to your site’s homepage), then you’ll want the option to use absolute URLs. However, if your static surrogate for your WordPress site will reside elsewhere on your server, you’ll likely want to choose the option to use relative URLs, and furthermore, you’ll want to reference in advance what these URLs are relative to. Put another way, you’ll need to know the folder or directory where your index.html file will reside, and put it into the Simply Static form field for the use relative URLs option.
In nearly every case, I’ve worked with ZIP file archives when generating static versions of my WordPress sites. Once you generate a ZIP archive, Simply Static will give you a link to click to download the ZIP file.
Advanced Options and Alternate Uses
There are more advanced options that can be configured when using Simply Static to exclude specific file types (for instance CSS) or specific directories (for instance, a bunch of stashed PDF files in /wp-content/uploads/documents/) as you generate your static site. I’m happy to work through these and why they might be useful, just let me know.
There are also other potential uses for static versions of sites generated with WordPress. If you would like to discuss this, explore, goof around, share ideas then please just reach out.
And you may have security concerns not addressed directly here but that a static version of a WordPress site could ameliorate. If you do, please let me know and we can work through them together.
Archiving student and faculty work within WordPress has presented challenges in the past. The Simply Static plugin makes it easy and fast to generate an off site archive of student work that takes up little space and that should operate far into the future. This presents, really for the first time, a feasible path to reflecting on or assessing past work (our own, our students, across programs and time) created with WordPress as part of Domain of One’s Own initiatives.
Simply Static is quick to learn and generates a compressed archive of files and directories from a dynamic WordPress site. This ZIP file, when expanded, works as a static website and can be a substitute for or snapshot of a site originally built in WordPress. Some possible advantages of a static site versus a database-dependent (e.g., dynamic) WordPress site include faster load time, decreased future maintenance of software updates and upgrades, a diminished attack surface, and use of fewer system resources like storage space and processing cycles.