SEMLA 2024, June 12 – 13, 2024
The Software Engineering for Machine Learning Applications (SEMLA) international symposium aims at bringing together leading researchers and practitioners in software engineering and machine learning to reflect on and discuss the challenges and implications of engineering complex data-intensive software systems.

From the early attempts in the late 80s (such as the MAIA project) to the most recent breakthroughs in applications of deep learning, the human kind dreams of building machines capable of learning new tasks, adapting to the environment, and evolving. Yet this exploration poses important computational, practical and ethical challenges. Failure to properly address these challenges in such software intensive systems can lead to catastrophic consequences. Consider, for example, the recent human toll incidence caused by the $47-million Michigan Integrated Data Automated System (MiDAS) (see Broken: The human toll of Michiganâs unemployment fraud saga), or the recent finding that simple tweaks can fool neural networks in identifying street signs (see Robust Physical-World Attacks on Deep Learning Visual Classification).
The increasing concern of machine learning impacting peopleâs lives found a strong advocate in Prof. David Parnas, who expressed his concern in an ACM communication article. These challenges are also reflected in new IEEE standardization initiatives. With data science and deep learning becoming increasingly pervasive in the contemporary world, it is now imperative to engage software engineers and machine learning experts in in-depth conversations about the necessary perspectives, approaches, and roadmaps to address these challenges and concerns.
We are interested in (but not limited to) discussing the following topics concerning software-intensive machine learning applications âin the wildâ:
- Architecture and software design
- Model/data verification and validation
- Change management
- User experience evaluation and adjustment
- Privacy, safety, security issues and ethical concerns
The theme of SEMLA 2024 is âVerification, Validation, and Operations of AI Systemsâ. This year, we are pleased to have world-renowed speakers from academia and industry, from the software engineering and machine learning communities. Our program includes academic and industry talks, panels on research, practice, and education, as well as two tutorials on AI software testing (from academic and industry). We encourage and welcome experts from all sub-fields of software engineering and machine learning to participate in such a discussion.
Venue
Polytechnique Montréal (in person)
Address: 2500 Chem. de Polytechnique, Montréal, QC H3T 1J4
Presentation sessions: Building: Pavillon Principal — Room: Amphithéùtre BernardâLamarre
Poster session and reception: Building: Pavillon Lassonde — Roon=m: Atrium Lassonde (M-3500)
Streaming channels
Youtube Streaming Zoom Meeting
Registration
Online registration is available here.
Fees:
General (Entire event): $250 (plus taxes)
General (1st day – Industry Track): $150 (plus taxes)
General (2nd day – Research Track): $150 (plus taxes)
General Student Admission: $75 (plus taxes)
Polytechnique Students and Faculty: $5.75
These low fees are possible thanks to the contributions of the Département de génie informatique et génie logiciel of Polytechnique Montreal and the Institute for Data Valorization (IVADO).


















Abstract: Anomaly detection plays an important role in management of modern large-scale distributed systems. Logs, which record system runtime information, are widely used for anomaly detection. However, Unsupervised anomaly detection algorithms face challenges in addressing complex systems, which generate vast amounts of multivariate time series data. Timely anomaly detection is crucial for managing these systems effectively and minimizing downtime. This proactive approach minimizes system downtime and plays a vital role in incident management for large-scale systems. To address these challenges, a method called Multi-Scale Convolutional Recurrent Encoder-Decoder (MSCRED) has been developed for detecting anomalies in CN PTC system logs. MSCRED leverages the power of multivariate time series data to perform anomaly detection and diagnosis. It creates multi-scale signature matrices that capture different levels of system statuses across various time steps. The method utilizes a convolutional encoder to capture inter-sensor correlations and a Convolutional Long-Short Term Memory (ConvLSTM) network with attention mechanisms to capture temporal patterns.
Abstract
Abstract: Language models such as RoBERTa, CodeBERT, and GraphCodeBERT have gotten much attention in the past three years for various Software Engineering tasks. Though these models are proven to have state-of-the-art performance for many SE tasks, such as code summarization, they often require to be fully fine-tuned for the downstream task. Is there a better way for fine-tuning these models that require training fewer parameters? Can we impose new information on the current models without pre-training them again? How do these models perform for different programming languages, especially low-resource ones with less training data available? How can we use the knowledge learned from other programming languages to improve the performance of low-resource languages? This talk will review a series of experiments and our contributions to answering these questions.











