Collaborative Archiving Digital Art

Team Members

  • Annet Dekker (University of Amsterdam / aaaan.net)

  • Claudia Roeck (University of Amsterdam)

  • Dusan Barok (University of Amsterdam / monoskop.org)

  • Julie Boschat Thorez (researcher / artist)

  • David Gauthier

  • Judith Hartstein

  • Megan Phipps

  • Larissa Tijsterman

  • Jim Wraith

Contents

Summary of Key Findings

Briefly describe your most significant findings.

1. Introduction

The authors present their ongoing research on distributed and decentralized platforms to supplement standard collections management databases for art documentation, in particular for networked and processual artworks. The documentation and process of these types of artworks does not fit well in standard database applications, due to the inflexibility of the standard applications. While the standard can be adjusted to specific needs, since most applications are developed by commercial companies this kind of flexibility comes at a price. Trying to get away from these proprietary systems and more importantly looking for a solution that can easily be shared and (re)used by others, we focused on open source alternatives. As an interdisciplinary team of a conservator, students, researchers, artists and programmers, we spent a week to explore and compare the functionalities of two version control systems: MediaWiki and GitLab. As source material we used research and documentation material from the artwork Chinese Gold by UBERMORGEN. In this report we reflect on the technical details of the different systems, their pro’s and con’s for documenting the material while looking at the potential of collaborative workflows.

Networked / Processual art

New media art, digital art, software art, networked art, Internet art, net.art, networked art, post-Internet, new aesthetics… Over the past decades, many terms have been used to signify contemporary art that works with networked media (Dekker 2018). Rather than giving a definition for the artworks that we are interested in, we will describe the main characteristics that are important for our current research: being networked and processual. These features are not a priori technical; they connect with the concepts and practices of the artwork and are thus part of artwork’s specificity. This means that these artworks are not objects or even a collection of components that are defined as a final product/artwork. Rather they consist of various building blocks that can be combined, composed, and compiled in different ways, at different times and locations (online and offline) and by different people. In other words, the artworks are not necessarily the consequence of a straightforward procedure that leads to specific results. Since the process of creation and (re)creation is heterogeneous and involves a certain level of improvisation that continually re-negotiates its structure the documentation archive should ideally reflect this approach, while also enabling a revisioning of past iterations and information. Whereas most content management systems are built around a relational database that can easily link different types of information to achieve our goals we focused on version control systems that would allow us to revisit the history of changes. Moreover, as we were interested in looking at the potential of collaborative workflows we decided to focus on version control systems that also made it possible to work at the same time from different locations.

Version control system

Another incentive for choosing this direction relates to several experiments that were done in archival and conservation practices to test the usefulness of wiki-based platforms and version control systems for documenting artworks (both in the case of traditional artworks, such as NYU, and software-based artworks, for example SFMOMA). This research could thus be useful to further the discussion and expand the working methods and possibilities.

Finding a coherent and structured way to organise and control revisions has always been a core of archival practices. In the era of computing, these fundamentals became even more urgent, and complex, and engendered the production of version control systems (VCS). Version control systems check the differences between versions of code or text. By archiving through an agency of timestamp and the name of the author, and by making ongoing versions of a project avaliable, VCS allows multiple people to work on elements of a project without overwriting someone else’s text. Changes that are made can easily be compared, restored, or, in some cases, merged. There are two types of version control systems: 1) Centralized, 2) Decentralized.

local.png

Local version system.

Centralized version control systems (CVCSs) are the standard for version control ... collaborate with developers on other systems, single server that contains all the versioned files which can be accessed by multiple people.

centralized.png

Decentralized version control systems...

distributed.png

Wikipedia is perhaps the most well-known example of using version control in its ‘page history’. Using ‘QuickDiff’, which is based on character-by-character analysis, it allows users to check the differences between new and previous versions. The thinking about version control began in the late 1960s (Mansoux 2017: 343), it was in particular used to understand when something goes wrong in a program as a way to trace the bug that caused the problem (Rochkind 1975). Due to lack of performance, especially in relation to speed when applying patches and updates of metadata, the requirement of a more simple design, support for non-linear development (or parallel branching), and the need to have a fully distributed system (Chacon and Straub 2014), induced the development of Git in 2005 . Git is a source code management system, or a file storage system, which makes it possible to write code in a decentralised and distributed way by encouraging branching or working on multiple versions at the same time and across different people. As well, and of particular interest to our research, it facilitates tracking and auditing of changes. In 2007 GitHub started hosting Git repositories (or repos). Interestingly, Git used on GitHub evolved the environment into a site of ‘social coding’ (Fuller et al. 2017). Rather quickly the collaborative coding repositories became used for widely diverse needs: from software development to writing license agreements, sharing Gregorian chants, and announcing a wedding – anything that needs a quick way of sharing and improving information. As mentioned by GitHub founder Tom Preston-Werner: ‘The open, collaborative workflow we have created for software development is so appealing that it's gaining traction for non-software projects that require significant collaboration’ (McMillan, 2013). Whereas Wiki’s VCS is a simple way of version control that is useful for collaborative writing and documentation, Git provides a more comprehensive approach to collaborative working.

2. Initial Data Sets

The artists behind UBERMORGEN, Lizvlx and Hans Bernhard, provided us with our inital data set for this session: a personal folder archive consisting of files used for the Chinese Gold project. In order to scale down from the archive's initial size (10GB), the specific data set for this research considered only the oldest folder of the Chinese Gold archive named "CHINESE_GOLD_2006." The selection criteria was based on the folder's diversity of files, thus rendering a high level of representation for the artwork's overall complexity. "CHINESE_GOLD_2006" contains multiple versions of the overall work. It also contains duplicate files which have been re-selected by the artists for varying purposes, explicit in the naming of the folders: e.x. 2014_CONTEMPORARY_ISTANBUL_HIGH_RES, IMAGES_FOR_CULTURAS_2008_CATALOGUE, NiMK _exhibition. Due to the varying colours, sizes and selection of images, the process of creating this particualr series involved necessarily finding the folder with best assortment of thestriking images, which would then also fit with the project’s narrative aim.
Aside from the work files, video and images, and the many copies of the works’ files in various sizes and resolution, the folders also contain the research made by the artists for the work. IMAGES contains one folder called SCREENSHOTS which, unlike the 8 other ones, contains image research. The first folder contains photographs of a relevant currency trading Website. The next folders’ show the relationship between the work's name and the visual correspondence that can be established through search engine queries: Chinese gold vs encrypted gold bars, golden coins, old golden objects, and wedding rings. Although these items have not been exploited by the artists for the work's production, this is exemplary of their work process and how they attempted to envision Chinese Culture’s take on “golden digital currency”.
The folder "CHINESE_GOLD_2006" consists of 337 files (41 of them hidden files) in 66 directories. The maximum depth of the file tree is 6 (counting from zero starting at the root folder), for example in the case of "CHINESE_GOLD_2006/IMAGES/IMAGES_WOW_BELGRAD_SERIES/For_Belgien_Collector/source_files/Contrast_pushed_not_good/P1051336_c.jpg". Most of the files are pictures (like .jpg, . png, .tif).
count type of extension
------------------------------
117 jpg
82 png
67 none (folder)
57 tif
41 DS_Store (hidden file)
12 zip
6 html
5 doc
4 mov
3 gif
2 rtf
1 dv
1 iMovieProj
1 iMovieProject
1 mp4
1 odt
1 pdf
1 xls
[Insert picture here: DigitalMethods2018 /DMI_WS_2018_Presentation/bar_chart_extensions.png]
[Permalink: bar_chart_extensions.png ]
Caption: bar chart showing the distribution of types of file extensions by category (Source: own representation, created with RAWgraphs.io)

3. Research Questions

The main questions that we want to answer are:
a. Whether, and in what ways, GitLab and MediaWiki are useful platforms for archival purposes?
b. How to effectively integrate VCS in the practice of archiving? To be understood from the perspective of what is being archived, and for what purpose? (preservational, art historical etc.)
Alongside the practical experimentation, and to gain a better understanding of the underlying structures that support these environments we will focus on answering the following questions:
- What is the value of concepts such as provenance, appraisal and selection in GitLab and MediaWiki?
- What is the function of metadata in these systems?
- How stable and secure is the data in a version controlled archive?
- Would these systems enable the reorganisation of the archive's structure through time? (if the documents status changed, ex: documentation vs artwork)

4. Methodology

To aid the research we used a case study by the artist group UBERMORGEN, particularly the research and documentation about their project Chinese Gold (2005-present). UBERMORGEN.COM is a well-known artist duo, founded in 1999 by Lizvlx and Hans Bernhard. They developed a series of landmark projects in digital art, including Vote-Auction (2000), a media performance involving a false site where Americans could supposedly put their vote up for auction, and Google Will Eat Itself (GWEI, 2005, in collaboration with Alessandro Ludovico and Paolo Cirio), a project that proposed using Google’s own advertising revenue to buy up every single share in the company. Using dark humour and activism Ubermorgen creates alternative narratives to critically reflect on networked culture, revealing the inside and downside of a post-truth society. With the project Chinese Gold they investigated the phenomenon of Gold mining within World of Warcraft. The project revolves around a partly fictive research into the socio-economic position of virtual currencies. Chinese Gold spans over a decade and deals with a mix of research, documentation, appropriation, storytelling and remixing. It is constantly evolving, growing and in flux.

Git(Lab)

Git is a ‘source code management’ (SCM) storage system that encourages developers to duplicate, track, integrate, and merge code repositories throughout multiple versions of the same project. Git also provides its users with the ability to trace and track the work in progess. (Fuller, 87) Within the Git, there exists a variety of hosting systems/sites such as GitHub, GitLab, BitBucket, and more. Prior to the start of this project, the GitHub was the chosen platform to compare to MediaWiki, specifically comparing each platform's given affordances with concerns to the collaborative effort of archiving Chinese Gold (2006 and ongoing) by UBERMORGEN. After re-evaluating the features unique to GitHub and comparing them to those of GitLab, we concluded the latter platform was more suitable to the needs of this specific case study.
GitLab, a web-based repository manager, open source alternative to GitHub, simplifies the process(es) of Git itself and provides additional functions to support collaborative development (e.g. wiki). Some of these additional functions include unlimited, free shared runners for public projects, built-in Continuous Integration/Contiuous Delivery (CI/CD), promotion of innersourcing of internal repositories, the designing, tracking, and managing of project 'milestones,' exporting one's project to other systems, and the creation of new branches from an issue. Moreover, the GitHub features of repository and/or branch permissions, as well as the performance of editing from upstream maintainers in branch is less than ideal for collaborative digital archiving projects: the need for repository administrators to approve each push to the repository and/or branch fosters a slower evolution of collaborative archival projects.
  • -> the need to co-ordinate across often increasingly large-scale projects or, the need to develop project requirements as the system develops. (89)
  • -> version control as a necessary archival allowance for the arts due to the frequency of "memory-intesive files and variations on those files' within the evolution of artistic artworks (89)
  • -->distributed vs. centralized

MediaWiki

MediaWiki is a free and open-source software that runs a website, which allows users to create and collaboratively modify pages or entries via a web browser. This platform is developed continually by the community around Wikimedia Foundation, the operator of Wikipedia. Its core functionality can be extended by hundreds of extensions for file, data and user management, metadata, layout, and so on.
Wiki also exists within Gitlab. It uses markdown language, in contrast to MediaWiki, which uses its own syntax. MediaWiki is much more sophisticated than the Gitlab wiki. In particular, MediaWiki allows to categorize and semantic tag pages.
Non-default extensions employed: Upload Wizard, EmbedVideo, Semantic MediaWiki.
Figure 1. File archive on MediaWiki. CADA.mediawiki.filelist.png
Figure 2. Non-default settings in MediaWiki 's LocalSettings.php configuration file. CADA.mediawiki.nondefault_settings.png

5. Findings

Assement criteria of findings:
  • Collaboration (workflow/rules/how to/complexity/userfriendlyness, user hierarchies)
  • Storage management (central, distributed, LFS, upload and retrieval of files)
  • Presentation of archive (what the user sees)
  • Overview/visualisation of changes, description possibilites and interpretation of changes
  • Handling metadata / file paths etc.
  • What the tool can do extremely well
  • Biggest deficiency/weakness
What properties would we like to track (and keep) in an archive?
  • provenance of a file (creation date) and other file creation metadata (for instance exif file)
  • who uploaded the file to the archive and when and who changed the file
  • file structure as uploaded
  • Being able to differentiate between description and object of description. Being able to link description and object of description
  • Being able to change a text file and image file and compare the changes with a previous version
  • Being able to describe (give reasons for) changes

Findings: Git vs. MediaWiki
git: (task oriented) ; --> more accessible for the curator
-more technical
-less (forced) organization or structure - just one page, sections made by making/organizing by multiple new pages (via 'New Page')
-tagging workflow more difficult
-branching via own folders
-keep pages/folders as the artists original but can link categories and/or semantics (expanded categories, 'any word: x' replaces 'category: x') [suseptible to change, allows freedom in changing the definition of ongoing undefinable artworks]
-avaliable offline
-possibility of needing explanation of organization from previous or current 'top admin'
-the influence of one individual on the direction of the sectioning/formulating the artwork
-need space on personal hard drive (disk space) vs. MediaWiki not needing own personal space
-can link to original pages rather than having to screenshot them
-can see the size of the archive on personal computer via cloning repository
MediaWiki: (content-oriented) ; --> more accessible to the conservator (more suitable to (re)presentation?)
-can upload bulk files >> easy uploading, but still time consuming in regards to organizing the files via renaming (via 'Move' files)
-allows for artist to comment on organization without having to be explained the 'team's' choice of organization (or does it?) --> could screenshot git personal folder but would possibly be more work
-never have the master image; always a translation of it
-must be online unless maintained by the admin (relying on others/past admins)
-can link to GitLab
MediaWiki as an archiving tool:
  • Collaboration: Administrator assigns editing rights to the users. There are many different levels of editing right (of what the user is allowed to edit, for instance edit pages, delete pages, edit sidebar https://www.mediawiki.org/wiki/Manual:User_rights ). MediaWiki also has discussion tabs for each page (content centred). The creation of categories and semantic tags has to be organised, otherwise it becomes chaotic.
  • Storage Management: all data is stored on one central server. In the context of preservation this is less desirable than the distributed system used by git.
  • Files can be bulk uploaded, but in a flat list structure (folder hierarchies are not kept). How can they be downloaded in a bulk? (through API)
  • Also possible w/ webcrawler?
  • Presentation of archive: The user sees the most current version. There are no forks or splitting up of projects.
  • Does track changes, but comparability is somewhat opaque - textual chages are easily compared (similar to diff, less accessible in the case of binaries.)
MediaWiki has a flat structure (pages linked amongst each other, no hierarchy) --> find a way to keep file paths of the stored objects --> and also maintain some of the functionality of the folder organization (or indeed improve on it) through use of e.g. Semantic MediaWiki extension, that allows for the application of a range of properties to a given item
there is a limited range of file formats that can be rendered in MediaWiki. This range can get extended with MediaWiki extensions. CHINESE GOLD's file formats are quite ideal in that sense, that they are compressed and common. MediaWiki allows to bulkupload files. It automatically lists all the files including a preview (EmbedVideo extension is required for non-W3C native video formats such as MP4) and image metadata, but it does not keep the hierarchical folder structure. MediaWiki allows to add context data.
What we did with wiki:
for the wiki we set up two different pages each with their own view on how to use wiki for archival purposes:
  • for the Belgrade Series page we looked for the basis to the file structure of IMAGES\IMAGES_WOW_BELGRAD_SERIES folder in the folder there are six subfolders where the same set of photos are in different formats and sizes for different purposes. So on the Belgrade Series page there is following contents:
  • 1 Photo series
  • 1.1 Used on the web -> these images came after some digging from hans's own personal website, there was some text on it as well so we copied the text and added along with a screenshot of the webpage followed by an ordered gallery of the pictures that were in the Webversion folder. Within the folder there was another subfolder called not perfect with 2 images in there that were in the same format as the the Webversion picture so we included under this sub header
  • 1.2 REX Exhibition -> there was an folder called tif_f_REX_exhibition__45x60_300dpi which had larger TIFF files that looked similair to the webversion files these files were probably used for the REX Exhibition in Belgrad (2007 REX Gallery, Belgrade „Chinese Gold, Amazon Noir & GWEI Slideshow“) We haven't looked into the details of this exhibition so more information on how what why we cannot add at the moment. There was no logical order of the TIFF files so we created a gallery within the wiki with those files however while saving the page we noticed that wiki cannot display .TIFF files so now they are all hyperlinks linking to each individual file page
  • 1.3 Originals -> There was one folder called originals with pictures in there with no particular sturcture logical these were the pictures the original in the series. We added these in the gallery in the same order as the the webversion pictures so it can be easier to compare them especially as some picture within the gallery look very similiar
  • 1.4 For collector -> There is one folder named For_Belgien_Collector which we thought that here files have been specialy preped for a belgian collector however we know that the artists don't even remember who this was. We did not yet had time to put these files up
  • For the NiMK exhibition page we to took a differnt approach here we wanted to gather as much information that was connected to the this particular Exhibition because of this, this page has been perfect for the use of semantic linking.
  • SemanticWiki extension allows annotation to be made within the MediaWiki. The format looks simliar to the category tag however it is much more powerful as it can be used to make direct queries to the database where all files and pages are stored and it can also be exported in RDF and CSV.
  • using the semantics we can link information on the page to certain properties that can be found on other pages as well for example we have raw facts of the exhibition such as the institution where the works have been exhibited would be formated as: Held_in_venue::NiMK but also the city where the works have been: Held_in_city::Amsterdam. Such annotations can help query search when the wiki is scaled up, questions that could be answered relativly easy with the semantics are for examples in which institutions has the Chinese Gold series been exhibited or even a particular work. Or when given the City; find all artwork that has been in Amsterdam or all exhibitions. Semantics can be really powerfull to create specific overviews however the implementation of it needs to be done carefully, consistantly and thoughtfully as the annotations you are creating are for potential future uses by unkown users. Annotating every word is not usefull it would create too many fieldnames. A possible solution to this is to create beforehand on the basis of a controlled vocabulary a set list of fieldnames to use and spread that under all the users. Using SemanticWiki requiers you to carefully deliberate how you are going to organise your wiki, we would recommand to make an architecture of what you would like to be searched before you start annotating.
  • The structure of the page is also very important as we wanted for strive to a consistant format it stopped us and really made us think what would we want to describe and show on this page. Again Wiki forces you to stop and think before you act. We decided for this page to grab everything that is connected to this particular exhibition and display that on this wiki. We included the artworks that were used, information about the exhibition, documentation about the installation, pictures and videos of the installation, visitors comments, critical reception and other media publications that we could find.

6. Discussion

Discuss and interpret the implications of your findings and make recommendations for future research and application, be it societal, academic or technical (or some combination).

7. Conclusions

Present a summary of what you have found, and its significance.

8. References

  • Chacon, Scott and Ben Straub. 2014. Pro Git. Everything You Need to Know About Git. New York, NJ: Apress. https://git-scm.com
  • Dekker, Annet. 2018. Collecting and Conserving Net Art. London/New York: Routledge.
  • Ernst, Wolfgang. 2009. "Underway to the Dual System: Classical Archives and/or Digital Memory" in Daniels and Reisinger, eds., Netpioneers 1.0, pp. 81-99.
  • Fuller, M., Goffey, A., Mackenzie, A., Mills, R. and Sharples, S. 2017. “Big Diff, Granularity, Incoherence, and Production,” In Blom, I. Lundemo, T. and Røssaak, E. (eds.) Memory in Motion: Archives, Technology, and the Social. Amsterdam: Amsterdam University Press, 87-102.
  • Mansoux, A. 2017. "Fork the System", chapter 8.2 in Mansoux, Sandbox Culture. London: Goldsmiths University. PhD dissertation. Available online: https://www.bleu255.com/~aymeric/dump/aymeric_mansoux-sandbox_culture_phd_thesis-2017.pdf
  • Rochkind, Marc J. 1975. “The Source Code Control System,” IEEE Transactions on Software Engineering. Vol. 1, no. 4, pp. 364-70.

Presentation slides

Coming soon...
Topic revision: r8 - 18 Jan 2018, MeganPhipps
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback