ALICE-SHARK User Meeting 2023

Announcement

We are excited to announce that the ALICE-SHARK User Meeting 2023 will take place on Tuesday, 6 June 2023 from 09:00 - 13:00.

If you are a user of the ALICE HPC cluster at Leiden University or the SHARK HPC cluster at the Leiden University Medical Center (or both), this meeting is for you. Our goal is to bring the user communities of both clusters together to connect with each other and the support teams of both clusters. The meeting is also of interest to users who are not yet actively using one of the two clusters.

The meeting will include an overview and update for both clusters, a selection of talks from users and an interactive Q&A session with the support teams.

Attendance is possible in person and remotely. For in-person attendees, the meeting will be held in room 1B01 of the Pieter de la Court building. For remote attendees, all sessions will be streamed live with the possibility to ask questions remotely.

Registration

Registration for the meeting is mandatory. The number of people that can attend in-person is limited by the space of the room and will be filled on a first-come-first-serve basis.

There is no registration fee for the meeting.

If you want to register for the meeting, please send an e-mail to the ALICE helpdesk with the information below:

  • ALICE or SHARK username:

  • Have you already been using ALICE or SHARK actively? (Yes/No):

  • Do you plan to attend in person? (Yes/No):

Important dates

  • Abstract submission deadline: Sunday, 14 May 2023 at 23:59 CEST

  • Speaker selection and schedule: Tuesday, 16 May 2023

  • Deadline for registration: Sunday, 4 June 2023 at 23:59 CEST

  • User Meeting: Tuesday, 6 June 2023 from 09:00 till 13:00 CEST

When & Where

  • When: Tuesday, 6 June 2023 from 09:30 till 13:00 CEST

  • Where:

    • in-person participants: Peter De La Court building, room 1B01 and Microsoft Teams

    • remote participants: Microsoft Teams

Microsoft Teams space (for in-person and remote participants)

All participants (in-person and remote) will be added to a Microsoft Teams space created specifically for the meeting. This has worked well last year which is why we will use it again this year.

The advantage for us is that Teams provide both, live streaming (with chat for questions) and a separate chat functionality for all participants independent of the live stream in multiple channels in a single application. The tool is also officially supported by both organizations (LEI and LUMC). 

The Teams space can be actively used by all participants. We strongly encourage you to make use of it before and during the meeting independent of how you attend the meeting. The General channel contains two tabs next to Files with information about the Teams space and the schedule.

If you have registered for the meeting and you did not receive an invitation to join the Teams space, please contact the ALICE Helpdesk as soon as possible.

Schedule

The following table shows the schedule of the meeting:

Start Time

End Time

Session

Speaker

Title

Abstract

9:15

9:30

Arrival Participants

 

 

 

9:30

9:40

Welcome

TBA

 

 

9:40

10:00

Overview/Update ALICE

ALICE Team

 

 

10:00

10:20

Overview/Update SHARK

SHARK Team

 

 

10:20

10:40

Break

 

 

 

10:40

11:00

Session 1

John Boy (LEI, FSW)

Using nix for computational social science workflows

The nix package manager is a powerful way to install exactly the software needed for a specific workflow. It also allows making those workflows reproducible—and humanly comprehensible—by specifying the exact versions of each program used. On top of that, because of the vast amount of software that is packaged for it, it makes experimenting with new packages extremely easy. I will share how I set up and use nix on ALICE so that I can bring my computational environment with me when running jobs on the cluster.

11:00

11:20

Session 1

Christopher Handy (LEI, HUM)

Cross-linguistic Text Alignment: An Evolutionary Approach

Text alignment is the process of finding similar passages across two or more documents. Text alignment is a process that can be useful in examining multiple versions of a document, whether in one or several languages, or in searching for text re-use within a collection of disparate documents. Traditionally, the process of text alignment is done by a human being, and the determination of the boundaries of aligned segments is largely intuited from the education and experience of the researcher. However, if we attempt to automate this process, we quickly find that defining formally what similarity means can be a non-trivial task. This talk focuses on one particular solution to this problem, developed initially for a project in Buddhist Studies but then generalised to cover a wide variety of text alignment problems across any languages and genres. The basic idea is that the data presented to us are always in a less than ideal state, and that alignment of any two passages can never have a single correct solution. Instead of attempting to achieve perfect alignments, my method is to approach a hypothetical ideal alignment through an iterative process that begins with a series of educated guesses about aligned passages and then refines those guesses using a customisable scoring system. I use a custom genetic algorithm that I designed in Python to create a population of "agents", each possessing a sequence of data called a "gene" that dictates the alignment guesses each agent makes about a set of texts. Agents assign scores to themselves based on dictionary matches and other information, and a master controller combines the genes of the top scoring agents to create the next generation of agents. Over multiple generations, these agents evolve toward desired alignments, in a way that is similar to dog breeding or other processes of artificial selection among biological organisms. The system I have designed is free, open source and easy to use, allowing a researcher to select population sizes, mutation rates, scoring mechanisms and other variables to suit any particular alignment project. While the software is designed to run on nearly any device, the ALICE cluster is an ideal solution due to the high processing requirements for texts of any significant length. In this talk I will also discuss the practical aspects of using my alignment system, including how to get the input files to ALICE, set up and run SLURM files for specific text pairs, and then offload the output to a personal computer for later incorporation into a Django web interface included as part of my alignment system toolkit. This talk may be of interest more generally to users interested in learning to use the multiprocessing library in Python, in order to run computations efficiently across multiple CPUs. I am also happy to walk through the actual code of the project if there is interest among attendees.

11:20

11:40

Session 1

Ben Companjen (LEI, Library)

Batch-processing videos with ffmpeg and YOLO

As part of PhD research that looks for symbolism in a Turkish television series, in early 2022 we trained a YOLOv5 model to recognise a few specific symbols in the video files. Our goal was to point out scenes or shots that likely contained any of these symbols, so that the symbols could be viewed in the context of the scene, "manually", without having to closely watch 54 episodes of 2,5 hours each.
After training (on ALICE) and finding how long a job would take, we ran two interdependent array jobs to collect the final results. The first job downloaded each video file from SURFdrive, ran ffmpeg to detect shot changes and extract every tenth frame and uploading those results back to SURFdrive. The jobs in the second array job started after their counterpart in the first finished and used our YOLO model to do the recognition on the frames, uploading those results to SURFdrive as well. Further processing of the results was done on a laptop. We also discuss a few lessons learned on how to do a few of these steps more efficiently.

11:40

12:00

Break

 

 

 

12:00

12:20

Results from User Survey

ALICE & SHARK Teams

 

 

12:20

12:50

Q&A with ALICE - SHARK Team

ALICE & SHARK Teams

 

 

12:50

12:55

Closing