Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Announcement

We are excited to announce that the ALICE-SHARK User Meeting 2023 will take place on Tuesday, 6 June 2023 from 09:00 - 13:00.

If you are a user of the ALICE HPC cluster at Leiden University or the SHARK HPC cluster at the Leiden University Medical Center (or both), this meeting is for you. Our goal is to bring the user communities of both clusters together to connect with each other and the support teams of both clusters. The meeting is also of interest to users who are not yet actively using one of the two clusters.

The meeting will include an overview and update for both clusters, a selection of talks from users and an interactive Q&A session with the support teams.

We invite you to submit an abstract for a talk. All talks will be 15min in total and we recommend that the time is split 10min for slides and 5min for questions. Following the feedback from last year’s meeting, talks should have a more practical or technical focus and limited scientific content featuring for example your experience with the clusters, your workflows (anything from setting up jobs to moving data), plans for future projects, tools and software, tips and tricks.

Attendance is possible in person and remotely. For in-person attendees, the meeting will be held in room 1B01 of the Pieter de la Court building. For remote attendees, all sessions will be streamed live with the possibility to ask questions remotely.

Table of Contents
minLevel1
maxLevel2

Registration

and abstract submission

Registration for the meeting is mandatory. The number of people that can attend in-person is limited by the space of the room and will be filled on a first-come-first-serve basis.

There is no registration fee for the meeting.

If you want to register for the meeting, please send an e-mail to the ALICE helpdesk with the information below:

  • ALICE or SHARK username:

  • Have you already been using ALICE or SHARK actively? (Yes/No):

  • Do you plan to attend in person? (Yes/No):Would you like to give a talk? (Yes/No):If you want to give a talk, please provide us with the following until the abstract submission deadline (Tuesday, 9 May 2023 at 23:59 CEST):

    Title

    :

  • Short abstract:

  • Do you consent to the streaming of your talk? (Yes/No):

  • Do you consent that the speaker name and title of the talk will be publicly listed? (Yes/No):

Important dates

  • Abstract submission deadline: 09 Sunday, 14 May 2023 at 23:59 CEST

  • Speaker selection and schedule: Tuesday, 12 16 May 2023

  • Deadline for registration: Sunday, 4 June 2023 at 23:59 CEST

  • User Meeting: Tuesday, 6 June 2023 from 09:00 till 13:00 CEST

When & Where

  • When: Tuesday, 6 June 2023 from 09:00 30 till 13:00 CEST

  • Where:

    • in-person participants: Peter De La Court building, room 1B01 and Microsoft Teams

    • remote participants: Microsoft Teams

Microsoft Teams space (for in-person and remote participants)

All participants (in-person and remote) will be added to a Microsoft Teams space created specifically for the meeting. This has worked well last year which is why we will use it again this year.

The advantage for us is that Teams provide both, live streaming (with chat for questions) and a separate chat functionality for all participants independent of the live stream in multiple channels in a single application. The tool is also officially supported by both organizations (LEI and LUMC). 

Registered participants will receive an invitation one to two weeks in advance of the meeting to join the Teams space. Once the space is open, It can be actively used by all participants.

Tentative

The Teams space can be actively used by all participants. We strongly encourage you to make use of it before and during the meeting independent of how you attend the meeting. The General channel contains two tabs next to Files with information about the Teams space and the schedule.

If you have registered for the meeting and you did not receive an invitation to join the Teams space, please contact the ALICE Helpdesk as soon as possible.

Schedule

The following table shows a tentative the schedule of the meeting:

Start Time

End Time

Session

Speaker

Title

Abstract

9:

00

15

9:30

Arrival Participants

9:30

9:

05

40

Welcome

TBA

9:

05

40

9

10:

20

00

Overview/Update ALICE

ALICE

team

Team

9

10:

20

00

9

10:

35

20

Overview/Update SHARK

SHARK

team

9:35

10:35

Team

10:20

10:40

Break

10:40

11:00

Session 1

(4 talks @ 15min)

10:35

10:55

Break

10:55

11:55

Session 2 (4 talks @ 15min)

11:55

12:15

Break

12:15

12:25

Results from Use Survey

ALICE and SHARK team

12:25

12:55

John Boy (LEI, FSW)

Using nix for computational social science workflows

The nix package manager is a powerful way to install exactly the software needed for a specific workflow. It also allows making those workflows reproducible—and humanly comprehensible—by specifying the exact versions of each program used. On top of that, because of the vast amount of software that is packaged for it, it makes experimenting with new packages extremely easy. I will share how I set up and use nix on ALICE so that I can bring my computational environment with me when running jobs on the cluster.

11:00

11:20

Session 1

Christopher Handy (LEI, HUM)

Cross-linguistic Text Alignment: An Evolutionary Approach

Text alignment is the process of finding similar passages across two or more documents. Text alignment is a process that can be useful in examining multiple versions of a document, whether in one or several languages, or in searching for text re-use within a collection of disparate documents. Traditionally, the process of text alignment is done by a human being, and the determination of the boundaries of aligned segments is largely intuited from the education and experience of the researcher. However, if we attempt to automate this process, we quickly find that defining formally what similarity means can be a non-trivial task. This talk focuses on one particular solution to this problem, developed initially for a project in Buddhist Studies but then generalised to cover a wide variety of text alignment problems across any languages and genres. The basic idea is that the data presented to us are always in a less than ideal state, and that alignment of any two passages can never have a single correct solution. Instead of attempting to achieve perfect alignments, my method is to approach a hypothetical ideal alignment through an iterative process that begins with a series of educated guesses about aligned passages and then refines those guesses using a customisable scoring system. I use a custom genetic algorithm that I designed in Python to create a population of "agents", each possessing a sequence of data called a "gene" that dictates the alignment guesses each agent makes about a set of texts. Agents assign scores to themselves based on dictionary matches and other information, and a master controller combines the genes of the top scoring agents to create the next generation of agents. Over multiple generations, these agents evolve toward desired alignments, in a way that is similar to dog breeding or other processes of artificial selection among biological organisms. The system I have designed is free, open source and easy to use, allowing a researcher to select population sizes, mutation rates, scoring mechanisms and other variables to suit any particular alignment project. While the software is designed to run on nearly any device, the ALICE cluster is an ideal solution due to the high processing requirements for texts of any significant length. In this talk I will also discuss the practical aspects of using my alignment system, including how to get the input files to ALICE, set up and run SLURM files for specific text pairs, and then offload the output to a personal computer for later incorporation into a Django web interface included as part of my alignment system toolkit. This talk may be of interest more generally to users interested in learning to use the multiprocessing library in Python, in order to run computations efficiently across multiple CPUs. I am also happy to walk through the actual code of the project if there is interest among attendees.

11:20

11:40

Session 1

Ben Companjen (LEI, Library)

Batch-processing videos with ffmpeg and YOLO

As part of PhD research that looks for symbolism in a Turkish television series, in early 2022 we trained a YOLOv5 model to recognise a few specific symbols in the video files. Our goal was to point out scenes or shots that likely contained any of these symbols, so that the symbols could be viewed in the context of the scene, "manually", without having to closely watch 54 episodes of 2,5 hours each.
After training (on ALICE) and finding how long a job would take, we ran two interdependent array jobs to collect the final results. The first job downloaded each video file from SURFdrive, ran ffmpeg to detect shot changes and extract every tenth frame and uploading those results back to SURFdrive. The jobs in the second array job started after their counterpart in the first finished and used our YOLO model to do the recognition on the frames, uploading those results to SURFdrive as well. Further processing of the results was done on a laptop. We also discuss a few lessons learned on how to do a few of these steps more efficiently.

11:40

12:00

Break

12:00

12:20

Results from User Survey

ALICE & SHARK Teams

12:20

12:50

Q&A with ALICE - SHARK Team

ALICE & SHARK Teams

12:

55

50

13

12:

00

55

Closing