Archiving Studio ORKA's website and social media accounts

Uit Tracks
Versie door Bart Magnus (overleg | bijdragen) op 28 mrt 2025 om 12:30 (Nieuwe pagina aangemaakt met 'kaderloos|geen|thumb|450px')
Naar navigatie springen Naar zoeken springen

Theatre company Studio ORKA transferred their archive to the Letterenhuis following the cessation of their activities. The archive includes a website and various social media accounts.

Problem definition

Studio ORKA wanted to ensure that all information about their theatre productions, along with the visual material, would be preserved during the transfer of their archive. This information was located on their website, and they also expressed the desire to transfer their social media accounts.

Method and results

Website

Since the website contained extensive descriptions of the performances, the archivist decided to start with this material. Initially, an attempt was made to automate the process using a web crawler application to scan and store the entire website. This was done with Heritrix, a versatile web crawler often used for such tasks. For this specific application, where it was crucial that every link was correctly captured, this option proved problematic: some links were saved, while others were missing or not working correctly. This made the results unreliable and incomplete. They therefore moved away from Heritrix and opted for Archive WebPage, manually going through all the links on the Studio ORKA website to save the entire site in both WARC and WACZ formats (Web ARChive).

The WARC format not only saves the HTML pages but also all associated files such as images, videos and scripts, so the website remains fully interactive later. The WACZ format is a compressed (zipped) version with additional metadata, making the archived website easier to open and ensuring dynamic content, such as videos and forms, is preserved correctly.


thumb
thumb


These WACZ files can be viewed in various ways. There are several online tools available for consulting WARC/WACZ files. ReplayWeb.page proved to be the best choice, as the associated tool was used to archive the website. The tool also allows archived websites to be opened and explored locally. This is a simple process: you load the WARC/WACZ files into Archive WebPage, click on the links you want to view, and the website appears with all functional buttons intact. You can find more information about this via the Archive WebPage guide.


thumb
thumb


thumb
thumb


Social media

In addition to the website, Studio ORKA’s Facebook and Instagram accounts were archived. META, the parent company of both, offers built-in options that allow users to archive their accounts and export all data in a user-friendly way.

On Facebook/Instagram, data was requested and downloaded via the privacy settings of Studio ORKA’s account. The downloaded data includes all posts/messages that Studio ORKA has ever posted, liked or shared, supplemented with other of the account’s activities that META itself records. In the case of Studio ORKA, a full archiving was chosen. There is also an option to choose what you want to archive and what you don't.


thumb
thumb


Bij het downloaden van de data is er de mogelijkheid om het gewenste outputformaat kiezen: JSON-formaat of HTML-formaat. De overzichtelijkste optie was HTML, wat een quasi-representatie van de websiteversie van Facebook/Instagram van Studio ORKA opleverde. Deze representatie is geen exacte kopie qua design, maar de content is wel 1:1.


thumb
thumb


De data werd ook gedownload in het JSON-formaat, wat de betere optie is als je data wil analyseren of gegevens wil importeren in andere systemen. Hier staat tegenover dat de weergave minder overzichtelijk is.


thumb
thumb


Auteur: Ghaith Al-Ani (Letterenhuis)