CollectiveAccess II - blandford.tech

Jumping in with both feet

Now that CA is installed it’s time to get it working. I am going to export an ArtworkArchive account full of over 400 pieces, export it all and import everything into CA.

Sounds easy, no? Let’s find out….

Process Overview

Both the data (metadata) and the media files need to be imported in two coordinated steps. In short, you’d first prepare your metadata and mapping spreadsheet to import your records (objects, entities, etc.) via the “Import → Data” interface, and then you’d upload the actual media files (images, PDFs, etc.) into the designated import folder (set in your app.conf, typically something like <ca_base_dir>/import) and use the “Import → Media” interface to batch attach them to the records.

Step One > Exporting from AA

For the purposes of this task using the export tool from the AA ‘pieces’ page it the ticket!
It can save out all data-points for every piece. This even includes URLs for all images that belong to the various pieces.

Step Two > Corralling the media

More details… I wrote a bash script that downloads the images naming them by combining the creation date and item name.

NOTE: the script and initial downloads are stored in code-server on unraid

#!/bin/bash

# Define the CSV file name. Adjust this if your CSV file has a different name or path.
CSV_FILE="data.csv"

# Create a directory for downloads if it doesn't already exist.
DOWNLOAD_DIR="downloads"
mkdir -p "$DOWNLOAD_DIR"

# Function to sanitize filenames:
# - Replaces spaces with underscores.
# - Removes any characters that are not alphanumeric, dots, underscores, or hyphens.
sanitize_filename() {
    local filename="$1"
    # Replace spaces with underscores.
    filename="${filename// /_}"
    # Remove any character that is not allowed in a filename.
    filename="$(echo "$filename" | sed 's/[^A-Za-z0-9._-]//g')"
    echo "$filename"
}

# Function to extract a file extension from a URL.
# If no extension is detected, it defaults to 'jpg'.
get_extension() {
    local url="$1"
    # Remove any query string and get the base name.
    local base=$(basename "${url%%\?*}")
    # Extract the extension (the substring after the last dot).
    local ext="${base##*.}"
    # If the extension is identical to the base, then no extension was found.
    if [[ "$ext" == "$base" ]]; then
        echo "jpg"
    else
        echo "$ext"
    fi
}

# Skip the header row and process the CSV file line by line.
# "tail -n +2" skips the first header line.
# IFS=, sets the delimiter to a comma.
tail -n +2 "$CSV_FILE" | while IFS=, read -r piece_id date_added name primary_url add1_url add2_url add3_url add4_url; do
    # Sanitize the date and name to create a safe base filename.
    sanitized_date=$(sanitize_filename "$date_added")
    sanitized_name=$(sanitize_filename "$name")
    base_filename="${sanitized_date}_${sanitized_name}"
    
    # Function to download an image from a given URL.
    # Parameters:
    #   $1 - URL to download.
    #   $2 - Optional suffix for additional images (e.g., "1" for Additional Image 1).
    download_image() {
        local url="$1"
        local suffix="$2"
        # If the URL is empty, skip this image.
        if [ -z "$url" ]; then
            return 0
        fi
        # Determine the file extension from the URL.
        ext=$(get_extension "$url")
        # Construct the filename: primary image uses no suffix; additional images get a numeric suffix.
        if [ -z "$suffix" ]; then
            filename="${base_filename}.${ext}"
        else
            filename="${base_filename}_${suffix}.${ext}"
        fi
        
        # Download the image using curl and save it in the downloads directory.
        echo "Downloading $url to ${DOWNLOAD_DIR}/${filename}"
        curl -s -o "${DOWNLOAD_DIR}/${filename}" "$url"
        
        # Check if the download succeeded.
        if [ $? -eq 0 ]; then
            echo "Downloaded: ${filename}"
        else
            echo "Error downloading $url"
        fi
    }
    
    # Download the primary image (no suffix) and additional images with numeric suffixes.
    download_image "$primary_url" ""
    download_image "$add1_url" "1"
    download_image "$add2_url" "2"
    download_image "$add3_url" "3"
    download_image "$add4_url" "4"
    
done

#!/bin/bash

# Define the CSV file name. Adjust this if your CSV file has a different name or path.
CSV_FILE="data.csv"

# Create a directory for downloads if it doesn't already exist.
DOWNLOAD_DIR="downloads"
mkdir -p "$DOWNLOAD_DIR"

# Function to sanitize filenames:
# - Replaces spaces with underscores.
# - Removes any characters that are not alphanumeric, dots, underscores, or hyphens.
sanitize_filename() {
    local filename="$1"
    # Replace spaces with underscores.
    filename="${filename// /_}"
    # Remove any character that is not allowed in a filename.
    filename="$(echo "$filename" | sed 's/[^A-Za-z0-9._-]//g')"
    echo "$filename"
}

# Function to extract a file extension from a URL.
# If no extension is detected, it defaults to 'jpg'.
get_extension() {
    local url="$1"
    # Remove any query string and get the base name.
    local base=$(basename "${url%%\?*}")
    # Extract the extension (the substring after the last dot).
    local ext="${base##*.}"
    # If the extension is identical to the base, then no extension was found.
    if [[ "$ext" == "$base" ]]; then
        echo "jpg"
    else
        echo "$ext"
    fi
}

# Skip the header row and process the CSV file line by line.
# "tail -n +2" skips the first header line.
# IFS=, sets the delimiter to a comma.
tail -n +2 "$CSV_FILE" | while IFS=, read -r piece_id date_added name primary_url add1_url add2_url add3_url add4_url; do
    # Sanitize the date and name to create a safe base filename.
    sanitized_date=$(sanitize_filename "$date_added")
    sanitized_name=$(sanitize_filename "$name")
    base_filename="${sanitized_date}_${sanitized_name}"
    
    # Function to download an image from a given URL.
    # Parameters:
    #   $1 - URL to download.
    #   $2 - Optional suffix for additional images (e.g., "1" for Additional Image 1).
    download_image() {
        local url="$1"
        local suffix="$2"
        # If the URL is empty, skip this image.
        if [ -z "$url" ]; then
            return 0
        fi
        # Determine the file extension from the URL.
        ext=$(get_extension "$url")
        # Construct the filename: primary image uses no suffix; additional images get a numeric suffix.
        if [ -z "$suffix" ]; then
            filename="${base_filename}.${ext}"
        else
            filename="${base_filename}_${suffix}.${ext}"
        fi
        
        # Download the image using curl and save it in the downloads directory.
        echo "Downloading $url to ${DOWNLOAD_DIR}/${filename}"
        curl -s -o "${DOWNLOAD_DIR}/${filename}" "$url"
        
        # Check if the download succeeded.
        if [ $? -eq 0 ]; then
            echo "Downloaded: ${filename}"
        else
            echo "Error downloading $url"
        fi
    }
    
    # Download the primary image (no suffix) and additional images with numeric suffixes.
    download_image "$primary_url" ""
    download_image "$add1_url" "1"
    download_image "$add2_url" "2"
    download_image "$add3_url" "3"
    download_image "$add4_url" "4"
    
done

Step Three > Create the Import Mapping

This has proven to be quite tricky…. And has led me to explore an alternative to CollectiveAccess! After doing some research looking for a FOSS alternative to Artwork Archive I found that these two were the final contenders. Here are some thoughts….

Omeka Classic vs CollectiveAccess

Omeka Classic and CollectiveAccess are both open-source content management systems designed for managing digital collections, but they have different strengths and ideal use cases.

Omeka Classic

Best for: Small to medium-sized digital collections, libraries, museums, and academic institutions.

Ease of Use: User-friendly with a simple installation process and an intuitive interface.
Customization: Uses themes and plugins for extensibility.
Metadata Support: Primarily uses Dublin Core, but can be extended with plugins.
Scalability: More suited for smaller projects; Omeka S is better for larger or linked data collections.
Hosting: Can be self-hosted or hosted via Omeka.net.
User Management: Simple role-based permissions.
Strengths: Great for quickly setting up digital exhibits with minimal technical knowledge.

CollectiveAccess

Best for: Museums, archives, and research institutions that need deep cataloging features.

Flexibility: Highly configurable with support for complex metadata and relationships.
Metadata Support: Supports multiple metadata standards like Dublin Core, MODS, PBCore, and more.
Scalability: Designed for large collections with complex relationships.
Hosting: Self-hosted with more complex installation and configuration.
User Management: More granular control over permissions and workflows.
Strengths: Ideal for institutions needing highly detailed, structured cataloging and data management.

Key Differences

Feature	Omeka Classic	CollectiveAccess
Ease of Use	Easier, plugin-based	Steeper learning curve, highly customizable
Metadata	Mostly Dublin Core	Supports multiple metadata schemas
Scalability	Best for small to medium collections	Designed for large and complex collections
Hosting	Self-hosted or Omeka.net	Self-hosted only
Customization	Plugin & theme-based	Highly configurable data model
Ideal Use Case	Digital exhibits, academic & community projects	Museums, archives, & research databases

Which One to Choose?

Choose Omeka Classic if you need a quick and easy way to display digital collections online with a focus on storytelling and user-friendly management.
Choose CollectiveAccess if you need a robust cataloging system for managing complex metadata and large collections with deep relationships.

Would you like recommendations based on a specific use case?

Decision Time!

…I will set up both, slap in some samples, compare and then make a final decision. See you in the next post!