SEMLA 2022 Location
Address: 2500 Chem. de Polytechnique, Montréal, QC H3T 1J4
Building: Pavillon Principal
Room: Amphithéâtre Bernard‑Lamarre
Address: 2500 Chem. de Polytechnique, Montréal, QC H3T 1J4
Building: Pavillon Principal
Room: Amphithéâtre Bernard‑Lamarre
Abstract: A growing demand is witnessed in both industry and academia for employing Deep Learning (DL) in various domains to solve real-world problems. Deep Reinforcement Learning (DRL) is the application of DL in the domain of Reinforcement Learning. Like any other software system, DRL applications can fail because of faults in their programs. However, Testing DL systems is a complex task as they do not behave like traditional systems would, notably because of their stochastic nature.
Amin Nikanjam is a research associate in the SWAT research team at Polytechnique Montréal. He is studying 1) how Software Engineering practices (like testing and fault localization) can be leveraged to Machine Learning Software Systems, and 2) how Machine Learning techniques can be applied for safety-critical systems in terms of reliability, robustness, and explainability. He received his Master’s and Ph.D. in Artificial Intelligence from Iran University of Science and Technology, Iran, and his Bachelor’s in Software Engineering from University of Isfahan. Before joining Polytechnique Montréal, he was an invited researcher at University of Montréal, and before that, he was an assistant professor at K. N. Toosi University of Technology, Iran. His research interests include Systems Engineering for Machine Learning, (Deep) Reinforcement Learning, and Multi-Agent Systems.
Abstract: A class of machine learning algorithms known as deep learning that has received much attention in academia and industry. Deep learning has a large number of important societal applications, from self-driving cars to question-answering systems such as Siri and Alexa. A deep learning algorithm uses multiple layers of transformation functions to convert inputs to outputs, each layer learning higher-level of abstractions in the data successively. The availability of large datasets has made it feasible to train deep learning models. Since the layers are organized in the form of a network, such models are also referred to as deep neural networks (DNN). While the jury is still out on the impact of deep learning on the overall understanding of software’s behavior, a significant uptick in its usage and applications in wide-ranging areas and safety-critical systems, e.g., autonomous driving, aviation system, medical analysis, etc., combine to warrant research on software engineering practices in the presence of deep learning. One challenge is to enable the reuse and replacement of the parts of a DNN that has the potential to make DNN development more reliable. This talk will describe a comprehensive approach to systematically investigate the decomposition of deep neural networks into modules to enable reuse, replacement, and independent evolution of those modules. A module is an independent part of a software system that can be tested, validated, or utilized without a major change to the rest of the system. Allowing the reuse of DNN modules is expected to reduce energy- and data-intensive training efforts to construct DNN models. Allowing replacement is expected to help replace faulty functionality in DNN models without needing costly retraining steps. Our preliminary work has shown that it is possible to decompose fully connected neural networks and CNN models into modules and conceptualize the notion of modules. A serious problem facing the current software development workforce is that deep learning is widely utilized in our software systems, but scientists and practitioners do not yet have a clear handle on critical problems such as explainability of DNN models, DNN reuse, replacement, independent testing, and independent development. There was no apparent need to investigate the notions of modularity as neural network models trained before the deep learning era were mostly small, trained on small datasets, and were mostly used as experimental features. The notion of DNN modules developed by our work is helping make significant advances on a number of open challenges in this area. DNN modules enable the reuse of already trained DNN modules in another context. Viewing a DNN as a composition of DNN modules instead of a black box enhances the explainability of a DNN’s behavior. More modular deep learning will thus have a large positive impact on the productivity of these programmers, the understandability and maintainability of the DNN models that they deploy, and the scalability and correctness of software systems that they produce.
Hridesh Rajan is the Kingland Professor and Chair in the Department of Computer Science at Iowa State University (ISU), where he has been since August 2005. He served as the Professor-In-Charge of the ISU Data Science program from 2017-Oct 2019. He has held visiting positions at the University of Bristol, Harvard University, and the University of Texas, Austin. Prof. Rajan earned his Ph.D. in Computer Science from the University of Virginia. He is a AAAS Fellow, ACM Distinguished Scientist and a Fulbright Scholar. He has also been recognized by the US National Science Foundation (NSF) with a CAREER award in 2009, by the Iowa State University College of LAS with an Early Achievement in Research Award in 2010, a Big-12 Fellowship in 2012, a ACM Senior Membership in 2014, an exemplary mentor for Junior Faculty award in 2017, a Kingland Endowed Professorship in 2017, and early achievement in departmental leadership award in 2022. Prof. Rajan specializes in data science, programming languages and software engineering. He is credited with giving the definitive treatment for how to modularly reason about crosscutting concerns, and for the design and implementation of the Boa infrastructure for large-scale analysis of open source software and its evolution. Prof. Rajan served as an associate editor for the IEEE Transactions on Software Engineering and as an associate editor for the ACM SIGSOFT Software Engineering Notes. He served as the general chair of SPLASH 2020 and SPLASH 2021, the ACM SIGPLAN conference on Systems, Programming, Languages, and Applications: Software for Humanity.
Abstract: With AI being adopted in a rapidly growing number of real-world applications, the trustworthiness of the AI-based systems has gain attention by not only the researchers and practitioners, but also the regulatory bodies around the world. Two important topics in engineering trustworthy AI-based systems are the reproducibility of the models (i.e., given the same code and training data, can the training process be repeated to reproduce models with the same behavior), and the consistency of the interpretations of models (i.e., models that are produced to solve the same task agree with one another on feature importance), as they are closely tied to various tasks like training, testing, debugging, auditing, and decision making. However, machine learning models are challenging to be reproduced due to issues like randomness in the software (e.g., optimizing algorithms) and non-determinism in the hardware (e.g., GPU). In addition, many studies violate established practices in the machine learning community when deriving interpretations, such as interpreting models with suboptimal performance, though the impact of such violations on the interpretation consistency has not been studied. In this talk, we will introduce the trustworthy AI engineering research at Huawei, and dive into the specific research of model reproducibility and interpretation consistency that has been carried out at Huawei to tackle the challenges.
Dayi Lin is a Senior Researcher at Centre for Software Excellence, Huawei Canada, where he leads the research on software engineering for AI systems. He and team develop engineering technologies and guidelines to ensure the compliance, quality, and productivity in the lifecycle of AI systems. His research interests include SE4AI, AI4SE, mining software repositories, and game engineering. His work has been published at several top-tier software engineering venues, such as TSE, ICSE, TOSEM, and EMSE, and has attracted wide media coverage. He has served as program committee member in several conferences such as ICSE-SEIP 2023, ICSE 2022 Poster Track, and RAISE 2021. He is also the co-chair of GAS 2022. He received a Ph.D. in Computer Science from Queen’s University, Canada.
Abstract: Although artificial intelligence (AI) is solving real-world challenges and transforming industries, there are serious concerns about its ability to behave and make decisions in a responsible way. To address the responsible AI challenges, a number of AI ethics principles frameworks have been published recently, which AI systems are supposed to conform to. However, without further best practice guidance, practitioners are left with nothing much beyond truisms. In addition, significant efforts have been put on algorithm-level solutions which mainly focus on a subset of mathematics-amenable ethical principles (such as privacy and fairness). However, ethical issues can occur at any step of the development lifecycle crosscutting many AI, non-AI and data components of systems beyond AI algorithms and models. In this talk, we will discuss the challenges in operationalising responsible AI at scale and end-to-end system-level solutions to tackle those challenges.
Qinghua Lu leads the Responsible AI science team at CSIRO’s Data61, Australia. She is a principal research scientist at CSIRO’s Data61. She received her PhD from University of New South Wales in 2013. Her current research interest includes responsible AI, software engineering for AI, software architecture, and blockchain. She has published 100+ academic papers in international journals and conferences. Her recent paper “Towards a Roadmap on Software Engineering for Responsible AI“ won the ACM Distinguished Paper Award.
Sushmitha Bala is an AI Architect for the AI Factory team at National Bank of Canada. The team is responsible for the deployment and industrialization of AI models in production. Following a master’s degree focused on game theoretic constructs in applied economics and statistics, Sushmitha has spent the last decade in various analytics and data-centric roles in the financial services industry, delivering crucial initiatives for companies such as JP Morgan and National Bank. In her current role, she is responsible for designing AI model architecture that balances delivering value while remaining pragmatic, scalable and secure.
Emmanuel Thepie Fapi is currently a Senior Data Scientist with Ericsson, GAIA AI-Hub, Canada. He obtained a bachelor’s degree in applied mathematics from Douala University, Cameroon. He holds a master’s degree in engineering mathematics and computer tools from Orleans University and a PhD in signal processing and telecommunications from IMT Atlantique in France (former ENSTB de Bretagne), with Nokia Siemens Network as host laboratory in Munich Germany. From 2010 to 2016 he worked with GENBAND US LLC, QNX software System Limited as audio software developer, MDA system as analyst and EasyG as senior DSP engineer in Vancouver, Canada. In 2017 he joined Amazon Lab 126 in Boston, USA as Audio Software Developer for Echo dot 3rd generation. His main areas of interest are 5G network and beyond, Anomaly and Intrusion detection-based AI/ML, Network Observability, Predictive maintenance, Real-time embedded OS, Distributed AI/ML, IoT, Voice and Audio Quality Enhancement. He received in 2022 the Ericsson Impact Award and is the SPOC of Edge Computer Cluster project of MITACS Program at Ericsson, with six projects, in collaboration with four Canadian Universities.
Abstract: The data-driven AI systems(e.g., machine/deep learning) continue to achieve substantial strides in enabling cutting-edge intelligent applications. However, the development of current data-driven AI systems still lacks systematic quality assurance and engineering support in regard to the adoption of quality, security and reliability assurance standards, as well as the available mature toolchain support in an interpretable way. In this talk, I would provide a high-level overview of our team’s continuous efforts to establish the early foundation of Trustworthy Data-Driven AI System Engineering in the past few years across Canada, Japan and Singapore, I would give a high-level introduction to the challenges and opportunities toward laying down the foundations for engineering safe, secure and reliable systems in the data-driven era.
Lei Ma is currently an Associate Professor with shared appointments between (1) University of Alberta, Canada and (2) Kyushu University, Japan. He is also honorably selected as a Canada CIFAR AI Chair and Fellow at Alberta Machine Intelligence Institute (Amii). Previously, he received the B.E. degree from Shanghai Jiao Tong University, Shanghai, China, and the M.E. and Ph.D. degrees from The University of Tokyo, Tokyo, Japan. His recent research centers around the interdisciplinary fields of software engineering (SE) and trustworthy artificial intelligence (AI) with a special focus on the quality, reliability, safety and security assurance of machine learning and AI Systems. For more detailed information, please visit the website, https://www.malei.xyz.
Abstract: Deep Neural Networks (DNNs) are often used in safety-critical systems, such as autonomous driving. Hazards of these systems are usually linked to specific error patterns of the DNN, such as specific misclassifications. In the context of the “Engineerable AI Techniques for Practical Applications of High-Quality Machine Learning-based Systems” (eAI project), we are investigating techniques to repair a DNN to fix some given misclassifications that are considered particularly critical by stakeholders. The first step of these repair approaches consists in applying fault localization (FL) to identify the DNN components (neuron or weights) responsible for the misclassifications. However, the components responsible for one type of misclassification could be different from those responsible for another type; depending on the granularity of the analyzed dataset, FL may not reveal these differences: failure types more frequent in the dataset may mask less frequent ones. The talk will present a way to perform FL for DNNs that avoids this masking effect by selecting test data in a granular way. We conducted an empirical study, using a spectrum-based FL approach for DNNs, to assess how FL results change by changing the granularity of the analyzed test data. Namely, we performed FL by using test data with two different granularities: following a state-of-the-art approach that considers all misclassifications for a given class together, and the proposed fine-grained approach. Results show that FL should be done for each misclassification, such that practitioners have a more detailed analysis of the DNN faults and can make a more informed decision on what to repair in the DNN.
Paolo Arcaini is a project associate professor at the National Institute of Informatics (NII), Japan. He received a PhD in Computer Science from the University of Milan, Italy, in 2013. Before joining NII, he held an assistant professor position at Charles University, Czech Republic. His main research interests are related to search-based testing, fault-based testing, model-based testing, software product lines, and automated repair. In the context of the “Metamathematics for Systems Design” (MMSD) project, he has worked on search-based testing of autonomous driving systems. Currently, he is involved in the “Engineerable AI Techniques for Practical Applications of High-Quality Machine Learning-based Systems” project (eAI), where he works on fault localisation and automated repair for deep neural networks.
Jinqiu Yang is an Assistant Professor in the Department of Computer Science and Software Engineering at Concordia University, Montreal, Canada. Her research interests include automated program repair, software testing, quality assurance of machine learning software, and mining software repositories. Her work has been published in flagship conferences and journals such as ICSE, FSE, EMSE. She serves regularly as a program committee member of international conferences in Software Engineering, such as ASE, ICSE, ICSME and SANER. She is a regular reviewer for Software Engineering journals such as EMSE, TSE, TOSEM and JSS. Dr. Yang obtained her BEng from Nanjing University, and MSc and PhD from University of Waterloo. More information at: https://jinqiuyang.github.io/.
Foutse Khomh is a Full Professor, a Canada CIFAR AI Chair, and FRQ-IVADO Research Chair at Polytechnique Montréal, where he heads the SWAT Lab (http://swat.polymtl.ca/). He received a Ph.D. in Software Engineering from the University of Montreal in 2011. His research interests include software maintenance and evolution, cloud engineering, machine learning systems engineering, empirical software engineering, software analytics, and dependable and trustworthy AI/ML. His work has received four ten-year Most Influential Paper (MIP) Awards, and six Best/Distinguished Paper Awards. He has served on the program committees of several international conferences including ICSE, FSE, ASE, ICSM(E), SANER, MSR, ICPC, SCAM, ESEM and has reviewed for top international journals such as SQJ, JSS, EMSE, TSE, TPAMI, and TOSEM. He is program chair for Satellite Events at SANER 2015, program co-chair of SCAM 2015, ICSME 2018, PROMISE 2019, and ICPC 2019, and general chair of ICPC 2018, SCAM 2020, and general co-chair of SANER 2020. He initiated and co-organizes the Software Engineering for Machine Learning Applications (SEMLA) symposium. He is one of the organizers of the RELENG workshop series (http://releng.polymtl.ca) and Associate Editor for IEEE Software, EMSE, and JSEP.
Abstract: Post-hoc explanation is the problem of explaining how a machine learning model — whose internal logic is hidden to the end-user and generally complex — produces its outcomes. Current approaches for solving this problem include model explanations and outcome explanations. While these techniques can be beneficial by providing interpretability, there are two fundamental threats to their deployment in real-world applications: the risk of explanation manipulation that targets the trustworthiness of post-hoc explanation techniques and the risk of model extraction that jeopardizes their privacy guarantees. In this talk, we will discuss common explanation manipulation and privacy vulnerabilities in state-of-the-art post-hoc explanation techniques as well as existing lines of research that try to make these techniques more reliable.
Ulrich Aïvodji is an Assistant Professor of Computer Science at ETS Montreal in the Software and Information Technology Engineering Department. He is also a regular member of the International Observatory on the Societal Impacts of AI and Digital Technologies. Before his current position, he was a postdoctoral researcher at UQAM, working on machine learning ethics and privacy. He earned his Ph.D. in Computer Science at Université Toulouse III. His research areas of interest are computer security, data privacy, optimization, and machine learning. His current research focuses on several aspects of trustworthy machine learning, such as fairness, privacy-preserving machine learning, and explainability. .
Emad Shihab is Associate Dean of Research and Innovation and Full Professor in the Gina Cody School of Engineering and Computer Science at Concordia University. He holds a Concordia University Research Chair in Software Analytics. His research interests are in Software Engineering, Mining Software Repositories, Software Analytics, and Software Bots. Dr. Shihab received the 2019 MSR Early Career Achievement Award and the 2019 CS-CAN/INFO-CAN Outstanding Young Computer Science Researcher Prize. His work has been published in some of the most prestigious SE venues, including ICSE, ESEC/FSE, MSR, ICSME, EMSE, TOSEM, and TSE. He is recognized as a leader in the field, serving on numerous steering and organization committees of core software engineering conferences. Dr. Shihab has secured more than $2.7 Million, as PI, to support his research, including a highly competitive NSERC Discovery Accelerator Supplement. His work has been done in collaboration with world-renowned researchers from Australia, Brazil, China, Europe, Japan, the United Kingdom, Singapore and the USA and adopted by some of the biggest software companies, such as Microsoft, Avaya, BlackBerry, and Ericsson. He is a senior member of the IEEE. His homepage is: http://das.encs.concordia.ca/.
Reihaneh Rabbany is an Assistant Professor at the School of Computer Science, McGill University. She is a core faculty member of Mila – Quebec’s artificial intelligence institute, and a Canada CIFAR AI Chair. She is also a faculty member at the Center for the Study of Democratic Citizenship. Before joining McGill, she was a Postdoctoral fellow at the School of Computer Science, Carnegie Mellon University. She completed her Ph.D. in the Computing Science Department at the University of Alberta. Her research is at the intersection of network science, data mining and machine learning, with a focus on analyzing real-world interconnected data, and social good applications.
Qinghua Lu leads the Responsible AI science team at CSIRO’s Data61, Australia. She is a principal research scientist at CSIRO’s Data61. She received her PhD from University of New South Wales in 2013. Her current research interest includes responsible AI, software engineering for AI, software architecture, and blockchain. She has published 100+ academic papers in international journals and conferences. Her recent paper “Towards a Roadmap on Software Engineering for Responsible AI“ won the ACM Distinguished Paper Award.
Ipek Ozkaya is the technical director of Engineering Intelligent Software Systems group at Carnegie Mellon University Software Engineering Institute (SEI). Her main areas of expertise and interest include software architecture, software design automation, and managing technical debt in software-reliant and AI-enabled systems. At the SEI she has worked with several government and industry organizations in domains including avionics, power and automation, IoT, healthcare, and IT. Ozkaya is the co-author of a practitioner book titled Managing Technical Debt and is the Editor-in-Chief of IEEE Software Magazine. She holds a PhD in Computational Design from Carnegie Mellon University.
Abstract: Unit tests are a valuable (and increasingly essential) tool when building software systems. Indeed, test-driven development is a mainstay of most modern software development processes. In research, however, unit tests are typically eschewed for the sake of expediency and uncertainty about the long-term usage of the research code. This is unfortunate, as the reliability and reproducibility of the code used for research is essential for the advancement of science. Although efforts such as reproducibility checklists and challenges help mitigate some of these concerns, they come only at the end of the software development process. In this talk I will argue for the use of unit tests when writing code for machine learning research, as a means of ensuring correctness and reliability of the code we use for scientific progress.
Pablo Samuel was born and raised in Quito, Ecuador, and moved to Montreal after high school to study at McGill. He obtained his PhD from McGill, focusing on Reinforcement Learning under the supervision of Doina Precup and Prakash Panangaden. He has been working at Google for over 10 years, and is currently a staff research Software Developer in Google Brain in Montreal, focusing on fundamental Reinforcement Learning research, Machine Learning and Creativity, and being a regular advocate for increasing the LatinX representation in the research community. Aside from his interest in coding/AI/math, Pablo Samuel is an active musician..
Abstract: Software testing is about finding failures, assuming that failures are due to faults in the system under test (SUT). Failures, however, may not always indicate SUT faults. For example, when testing is applied at the system level to complex cyber-physical systems, e.g., self-driving cars, a failure may indicate insufficiencies such as performance limitations, physical constraints, or misuse by human operators. In these situations, there is a need for techniques that not only generate individual tests leading to failures but also either explain the circumstances around failures or identify constraints that can steer the system clear of failures. In this talk, I discuss how Interpretable ML can broaden the focus of verification and testing so as to include the learning of insufficiencies caused by the SUT environment. I will present how, using Interpretable ML, one can generate environment conditions that characterize system correctness or, alternatively, explain system failures. To illustrate applications, I will use case studies from the domains of cyber-physical systems and network systems.
Shiva Nejati is an Associate Professor at the School of Electrical Engineering and Computer Science at the University of Ottawa (uOttawa) and Co-director of uOttawa’s recently established IoT Lab (Sedna). Prior to joining the University of Ottawa, she was a Senior Scientist at the SnT Centre, University of Luxembourg and a Scientist at Simula Research Laboratory, Norway. Nejati received her Ph.D. from the University of Toronto, Canada. Her research interests are in software engineering, focussing on software testing, analysis of IoT and cyber-physical systems, search-based software engineering, applied machine learning, and formal and empirical software engineering methods. Nejati has published more than 70 scientific papers and received eight best or ACM distinguished paper awards as well as a 10-Year Most Influential Paper Award from CASCON. She serves as an Associate Editor for IEEE Transactions on Software Engineering and was PC co-chair for SSBSE 2019 and ACM/IEEE MODELS 2021. She has more than 15 years of experience conducting research in collaboration with the IoT, telecom, automotive, aerospace, maritime and energy sectors.
Abstract: There is a great appeal in using machine learning to assist in software development, as it promises to enable the experience of one software engineer to be recorded and then generalized to provide guidance to another. I’ll give an overview of some of our recent work on using deep learning for modeling software development and assisting software engineers, and I’ll reflect on open questions for SEMLA related to developing applications in this space for use “in the wild.”
Danny Tarlow is a Research Scientist at Google Research, Brain Team in Montreal. He is primarily interested in machine learning methods for understanding and generating programs. However, he have fairly broad interests across Machine Learning. On the academic side, he is also an Adjunct Professor in the School of Computer Science at McGill University and an associate member at MILA. He co-supervises a couple PhD students at MILA.He holds a Ph.D. from the Machine Learning group at University of Toronto (2013). Before coming to Montreal, he spent four years as a postdoc and then Researcher at Microsoft Research, Cambridge (UK).
Abstract: In this talk, I will talk about designing methods for analyzing complex data from online societies. Complex data is often interconnected, evolving, and hard to label. With my group, we work on designing methods for analyzing such data, building on techniques for graph mining, graph representation learning, unsupervised and self-supervised learning, anomaly detection, learning with weak and/or uncertain labels, etc. I will highlight one of our projects on measuring polarization in social media which works with real-world data from online societies, where we design methods closely with domain experts within an interdisciplinary team.
Reihaneh Rabbany is an Assistant Professor at the School of Computer Science, McGill University. She is a core faculty member of Mila – Quebec’s artificial intelligence institute, and a Canada CIFAR AI Chair. She is also a faculty member at the Center for the Study of Democratic Citizenship. Before joining McGill, she was a Postdoctoral fellow at the School of Computer Science, Carnegie Mellon University. She completed her Ph.D. in the Computing Science Department at the University of Alberta. Her research is at the intersection of network science, data mining and machine learning, with a focus on analyzing real-world interconnected data, and social good applications.
Abstract: AI systems are software-reliant systems which include data and components that implement algorithms mimicking learning and problem solving. The increasing availability of computing resources and off-the shelf ML solutions give the impression that engineering, deploying, and maintaining an AI system is trivial once the appropriate data is available. The challenges of developing and deploying ML-enabled systems have been extensively reported in the literature and practitioner blogs and articles, with increasing emphasis on responsible AI implementations. Some of these challenges stem from characteristics inherent to ML components, such as data-dependent behavior, detecting and responding to drift over time, and timely capture of ground truth to inform retraining. The sneaky part about engineering AI systems is they are “just like” conventional software systems we can design and reason about until they are not. Regardless, many principles and practices of building long-lived software systems that are sustainable still apply to engineering AI systems. This presentation will take a software architecture lens and introduce foundational software engineering practices and research gaps in software engineering of ML systems.
Ipek Ozkaya is the technical director of Engineering Intelligent Software Systems group at Carnegie Mellon University Software Engineering Institute (SEI). Her main areas of expertise and interest include software architecture, software design automation, and managing technical debt in software-reliant and AI-enabled systems. At the SEI she has worked with several government and industry organizations in domains including avionics, power and automation, IoT, healthcare, and IT. Ozkaya is the co-author of a practitioner book titled Managing Technical Debt and is the Editor-in-Chief of IEEE Software Magazine. She holds a PhD in Computational Design from Carnegie Mellon University.
Philippe Molaret is VP research & technology at Thales Digital Solutions.
TDS is a center to support Thales Group digital transformation on the backbone of Montreal digital intelligence ecosystem. Cofounder of the cortAIx AI research lab in TDS. Currently sponsoring actively the creation and buildup of the Confiance.ai program at CRIM and the ENGINE NFPO for the adoption of 5G technologies. He is a member of Ivado technology transfer committee and Prompt board of directors. Occasionally he is teaching technology and innovation management at Polytechnique Montreal master and PhD training. Between 2010 and 2012 he was ETS Research and Innovation ambassador. In 2002 he is one of the founding members of CRIAQ and sat on its board of directors until 2015. Industry member of MEI strategic council for research and innovation in 2009, leading to the development of the 2010-2013 Quebec Strategy for Research and Innovation.
Before joining Thales, Mr. Molaret worked at CAE for 18 years.
Mr. Molaret graduated in electrical engineering from ETS in 1990 and obtained a master degree in Technology and Innovation management from Polytechnique Montreal in 2017.
Gabriela Nicolescu is a full professor and the director of the Department of Computer and Software Engineering at Polytechnique Montreal. She obtained her B. Sc. A and her MSc degree from Politechnica Bucharest. She obtained her Ph.D. degree, in 2002, from INPG (Institut National Polytechnique de Grenoble) in France, with the award for Best Thesis in Microelectronics. She has been working at Ecole Polytechnique de Montréal (Canada) since august 2003, where she is a professor in the Computer and Software Engineering Department. Dr. Nicolescu’s research interests are in the field of design methodologies, programming and security for systems with advanced technologies, such as 3D multi-processor systems-on-chip integrating liquid cooling and optical networks. She published five books, and she is the author of more than a hundred articles in journals, international conferences and book chapters.
Eric Laufer is the lead data scientist at Peritus.ai, a startup building tools to help monitor and grow online communities. After a master’s degree focusing on recommender systems at the MILA, Eric has worked for the last decade as an applied scientist / ML engineer for various startups and large companies. This work includes NLP (NER/Q&A/Search) for Dow Jones, Element AI and Peritus.ai, along with recommendation and supply chain forecasting for JDA. His main focus as a ML practitioner is to build efficient, scalable and useful models in the context of application development.
Mélanie Bosc Mélanie Bosc holds a diploma in training engineering from University of Paris 1 Panthéon-Sorbonne and a DESS in training management from the University of Sherbrooke. She developed her expertise in the field of training by starting her career at National Institute of Agricultural Research in France and then working for organizations in the banking and university fields in Quebec. In this regard, she held the position of Director of Continuing Education at the Faculty of Continuing Education of the University of Montréal. Passionate about the challenges raised by workforce and human resources issues, as well as by learning and training in all its forms, as Executive Director of the sectoral committee of the ICT workforce, Mélanie has worked to promote the ICT sector and its workforce, as well as the digital transformation of the Quebec economy in general.
Patrick St-Amant is the CTO and cofounder of Zetane Systems with advanced education in mathematics. He is the inventor of Zetane’s technology and leads the development of Zetane Protector (ML models robustness testing and evaluation) and Zetane Insight Engine (models introspection 3D engine).
He has successfully led several end-to-end ML projects with industrial clients and partners in the fields of Security, Defense, Aerospace, Construction, Aviation, Simulation and Manufacturing. This included project scoping, ML solution design, planning, data engineering, implementation, robustness testing and client’s interactions. He has spent years as a researcher in number theory, set theory and fundamentals of mathematics. He did PhD studies in mathematics, category theories and foundation of computing at the University of Ottawa (2007). He was invited to the Institute for Advanced Study in Princeton (2006 & 2007) where he presented his work on a universal mathematical language. He has a M.Sc. degree in computer science and fundamental mathematics from UQAM and holds the patent “Scalable Transform Processing Unit for Heterogeneous Data”.
Over the last five years, he met with over 200 leaders and data scientists in the field of AI and ML. Some examples include IBM, Nvidia, Thales, Microsoft, US Department of Defense, Amazon, MILA, Polytechnique, Université de Montreal, Unity, Quantum Black, CAE, MDA, Creative Destruction Lab, CNRC and others. He presented at the World Summit AI, AI for Defense, Big Data Toronto, Deep Learning Montreal and the yearly ONNX conference.
Houari A. Sahraoui is full professor at the department of computer science and operations research (GEODES, software engineering group) of University of Montreal. Before joining the university, he held the position of lead researcher of the software engineering group at CRIM (Research center on computer science, Montreal). He holds an Engineering Diploma from the National Institute of computer science (1990), Algiers, and a Ph.D. in Computer Science, Pierre & Marie Curie University LIP6, Paris, 1995. His research interests include automated software engineering (SE), Search-base SE, Model-Driven Engineering software visualization, program comprehension, and re-engineering. He has published around 200 papers in conferences, workshops, books, and journals, edited three books, and gives regularly invited talks. He has served as program committee member in several IEEE and ACM conferences, as member of the editorial boards of three journals, and as organization member of many conferences and workshops. He was the general chair of IEEE Automated Software Engineering Conference in 2003, PC co-cahir of VISSOFT 2011, and general chair of VISSOFT 2013.
Emad Shihab is Associate Dean of Research and Innovation and Full Professor in the Gina Cody School of Engineering and Computer Science at Concordia University. He holds a Concordia University Research Chair in Software Analytics. His research interests are in Software Engineering, Mining Software Repositories, Software Analytics, and Software Bots. Dr. Shihab received the 2019 MSR Early Career Achievement Award and the 2019 CS-CAN/INFO-CAN Outstanding Young Computer Science Researcher Prize. His work has been published in some of the most prestigious SE venues, including ICSE, ESEC/FSE, MSR, ICSME, EMSE, TOSEM, and TSE. He is recognized as a leader in the field, serving on numerous steering and organization committees of core software engineering conferences. Dr. Shihab has secured more than $2.7 Million, as PI, to support his research, including a highly competitive NSERC Discovery Accelerator Supplement. His work has been done in collaboration with world-renowned researchers from Australia, Brazil, China, Europe, Japan, the United Kingdom, Singapore and the USA and adopted by some of the biggest software companies, such as Microsoft, Avaya, BlackBerry, and Ericsson. He is a senior member of the IEEE. His homepage is: http://das.encs.concordia.ca/.
Mike Rabbat is a Research Scientist and Manager in FAIR, the fundamental AI research group of Meta Platforms. He earned the BSc degree from the University of Illinois Urbana-Champagne, the MSc degree from Rice University, and the PhD from the University of Wisconsin-Madison, all in electrical engineering. Before joining FAIR he was a professor at McGill University and he has held visiting positions at IMT-Atlantique (Brest, France), the Inria Bretagne-Atlantique Research Centre (Rennes, France), and KTH Royal Institute of Technology (Stockholm, Sweden). His research interests include optimization for machine learning, large-scale and distributed optimization, and federated learning.
Software engineering and machine learning are two different worlds. There is a lot of research towards applying machine learning to software engineering but the reciprocal is not true. In this poster, we present an example where principles of software engineering were applied successfully to a machine learning prototype algorithm. The machine learning developer was able to improve his workflow by applying simple heuristics borrowed from software engineering. Then, we highlight other common problems that can be explored with software engineering to increase the velocity of machine learning projects and raise questions about various ways to apply software engineering to this domain.
The migration of legacy software systems to Service Oriented Architectures (SOA) has become a mainstream trend to modernize enterprise software systems. A key step in SOA migration is the identification of services in the target application, but it is a challenging one to the extent that the potential services (1) embody reusable functionalities, (2) can be developed in a cost-effective manner, and (3) should be easy to maintain. In this poster, we report on state of the practice of SOA migration in industry. We surveyed 45 practitioners of legacy-to-SOA migration to understand how migration, in general, and service identification, in particular are done. Key findings include: (1) reducing maintenance costs is a key driver in SOA migration, (2) domain knowledge and source code of legacy applications are most often used respectively in a hybrid top-down and bottom-up approach for service identification, (3) service identification focuses on domain services–as opposed to technical services, (4) the process of service identification remains essentially manual, and (5) RESTful services and microservices are the most frequent target architectures. We conclude with a set of recommendations and best practices.
Systems logs are widely used and plays a critical role in systems forensic. However, the task of logs analysis faces several challenges. Logs are massive in volume and contain complex kinds of messages, logs are unstructured data and lack homogeneity and log data does not contain explicit information for anomaly detection. Therefore, it is impossible to perform log analysis manually in large-scale router systems. However, Developers face the challenging task of choosing the most appropriate automated log analysis method. Also, there is a Lack of literature review on state-of-the-art machine learning methods for log analysis. Our aim is to help developers choose the most appropriate automated log analysis method for their task. and to answer the following research questions: What are current challenges and proposals in software log analysis? What are the state-of-art ML methods for anomaly detection? (supervised / un-supervised). What are the uses of ML in log analysis? and when ML should or shouldn’t be chosen over other practices?
Q&A website (e.g., Stack Overflow) designers have derived several incentive systems to encourage users to answer questions. However, the current incentive systems primarily focus on the quantity and quality of the answers instead of encouraging the rapid answering of questions. In this paper, we use a logistic regression model to analyze 46 factors along four dimensions in order to understand the relationship between the studied factors and the needed time to get an accepted answer. We find that i) factors in the answerer dimension have the strongest effect on the needed time to get an accepted answer. ii) the non-frequent answerers are the bottleneck for fast answers. iii) the current incentive system motivates frequent answerers well, but such frequent answerers tend to answer short questions. Our findings suggest that Q&A website designers should improve their incentive systems to motivate non-frequent answerers to be more active and to answer questions fast.
A common way to customize a framework is by passing a framework related object as an argument to an API call. The formal parameter of the method is referred to as the extension point. Such an object can be created by subclassing an existing framework class or an interface, or by directly customizing an existing framework object. However, this requires extensive knowledge of the framework’s extension points and their interactions. We develop a technique that mines a large number of code examples to discover all extension points and patterns for each framework class. Given a framework class that is being used, our approach first recommends all extension points that are available in the class. Once the developer chooses an extension point, our approach discovers all of its usage patterns and recommends the best code examples for each pattern. We evaluate the performance of our two-step recommendation using five different frameworks.
Continuous Integration (CI) allows developers to generate software builds more quickly and periodically, which helps in identifying errors at early stages. When builds are generated frequently, a long build duration may hold developers from performing other development tasks. Our initial investigation shows that many projects experience long build durations (e.g., in the scale of hours). In this research, we model long CI build durations of 63 GitHub projects to study the factors that may lead to longer CI builddurations. Our preliminary results indicate that common wisdom factors (e.g., lines of code and build configuration) do not fully explain long build durations. Therefore, we study the relationship of long build durations with CI, code, density, commit, and file factors. Our results show that test density and build jobs have a strong influence on build duration. Our research provides recommendations to developers on how to optimize the duration of their builds.
An important challenge in many real-world machine learning applications is imbalance between classes. Learning from imbalanced data is challenging due to bias of performance towards the majority class rather than the minority class of interest. This bias may exist because: (1) classification systems are often optimized and compared using performance measurements that are unsuitable for imbalance problems; (2) most learning algorithms are designed and tested on a fixed imbalance level, which may differ from operational scenarios; (3) the preference of classes is different from one application to another. In this poster, a summary of two papers from my PhD thesis is presented that includes: (1) a new ensemble learning algorithm called Progressive Boosting (PBoost). (2) a new global evaluation space for the F-measure that represent a classifier over all of its decision thresholds and a range of possible imbalance levels for the desired preference of TPR to precision.
Defect prediction is an important task for preserving software quality. Most prior work on defect prediction uses software features, such as the number of lines of code, to predict whether a file or commit will be defective in the future. Feature selection and reduction techniques can help to reduce the number of features in a model. Using a small number of features avoids the problem of multicollinearity and makes the prediction models simpler. However, there do not exist studies in which the impact of feature reduction techniques on defect prediction is investigated, while several recent studies have investigated the impact of feature selection techniques on defect prediction. In our research, we study the impact of eight feature reduction techniques on the performance and the variance in performance of five supervised learning and five unsupervised defect prediction models.
Several large-scale systems have faced system failures in the past due to their inability to handle a very large number of concurrent requests. Therefore, load tests are designed to verify the scalability, robustness, and reliability of the system (apart from the functionality) to meet the demands of millions of users. In our work, we survey the state of load testing research and practice. We compare techniques, data sources and results that are used in the three phases of a load test: Design, Execution, and Analysis. We focus on the work that was published after 2013. Our work complements existing surveys on load testing.
The popularity of mobile apps continues to grow over the past few years. Mobile app stores, such as the Google Play Store and Apple’s App Store provide a unique user feedback mechanism to app developers through app reviews. In the Google Play Store (and most recently in the Apple App Store), developers are able to respond to such user feedback. In our work, we analyze the dynamic nature of the review-response mechanism by studying 4.5 million reviews with 126,686 responses of 2,328 top free-to-download apps in the Google Play Store. One of the major findings of our study is that the assumption that reviews are static is incorrect. Our findings show that it can be worthwhile for app owners to respond to reviews, as responding may lead to an increase in the given rating. In addition, we identify four patterns of developers (e.g., developers who primarily respond to negative reviews).
Developer behavior is a common research topic in software engineering to spark the future maintenance and evolution of software systems. Studying developers behavior for the purpose of recommending a most common behavior is an area that captures great interest. Given this interest, our work aims to apply consensus algorithms on developers behaviors to generate a consensual behavior. We conduct a number of experiments to analyze how developers behave while performing programming task. We collect developers interaction traces (ITs) through Eclipse Mylyn and VLC video captures. To obtain best results, we perform an in-depth comparison between the results of applying each consensus algorithm. Preliminary results show that Kwiksort algorithm outperforms all other algorithms in producing most common developer behavior. This study demonstrates how using consensus algorithms can help recommend to developers a consensual behavior when performing a particular programming task.
Logs are widely used to monitor, understand and improve software performance. However, developers often face the challenge of making logging decisions. Prior works on automated logging guidance techniques are rather general, without considering a particular goal, such as monitoring software performance. We present Log4Perf, an automated approach that provides suggestions of where to insert logging statement with the goal of monitoring web-based systems’ software performance. In particular, our approach builds and manipulates a statistical performance model to identify the locations in the source code that statistically significantly influences software performance. Our evaluation results show that Log4Perf can build well-fit statistical performance models, which can be leveraged to investigate the influence of locations in the source code on performance. Also, our approach is an ideal complement to traditional approaches that are based on software metrics or performance hotspots. Log4Perf is integrated into the release engineering process of a commercial software to provide logging suggestions on a regular basis.
Developers rely on software logs for varieties of tasks. Recent research on logs often only consider the appropriateness of a log as an individual item, while logs are typically analyzed in tandem. Thus we focus on studying duplicate logging code, which are log lines that have the same static text message. Such duplication in logs are potential indications of logging code smells, which may affect developers’ understanding of the system. We uncover five patterns of duplicate logging code smells by manually studying a statistical sample of duplicate logs from four large-scale open source systems. We further manually study all the code smell instances and identify the problematic and justifiable cases of the uncovered patterns. Then, we contact developers in order to verify our result. We integrated our manual study result and developers’ feedback into our static analysis tool, DLFinder, which helps developers identify and refactor duplicate logging code smells.
An enormous amount of knowledge in software engineering is accumulated on Stack Overflow. However, as time passes, knowledge embedded in answers may become obsolete. Such obsolete answers, if not identified or documented clearly, may mislead answer seekers and cause unexpected problems (e.g., using an outdated security protocol). In this paper, we study the characteristics of obsolete answers. We find that: 1) 58.4% of the obsolete answers were already obsolete when they were first posted. 2) Only 23.5% of such answers are ever updated. 3) Answers in web and mobile development tags are more likely to become obsolete. 4) 79.5% of obsolete observations are supported by evidence (e.g., version information and obsolete time). We suggest that 1) Stack Overflow should encourage the whole community to maintain obsolete answers. 2) Answerers are suggested to include the information of valid versions/time when posting answers. 3) Answer seekers are suggested to go through comments in case of answer obsolescence.
Because of the voluntary nature of open source, sometimes it is hard to find a developer to work on a particular issue. However, these issues may be of high priority to others. To motivate developers to address these particular issues, people can offer monetary rewards (i.e., bounties) for addressing an issue report. To better understand how bounties can be leveraged to evolve an open source project, we investigated 3,509 Github projects’ issues for which bounties ($406,425 in total) were offered on Bountysource. We collect 31 factors and build a logistic regression model to understand the relationship between the bounty and the issue-addressed likelihood. We find that (1) providing a bounty for an issue earlier on and adding a bounty label are related to an increased issue-addressing likelihood. (2) The bounty value of an issue does not have a strong relationship with the likelihood of an issue being addressed.
Code comments play a fundamental role in Software Maintenance and Evolution. As such, they need to be kept up-to-date. A decade ago, Malik et al. introduced a classification model to flag whether the comments of a function need to be updated when such a function is changed. The authors claimed that their model had an overall accuracy of 80%. We discovered and addressed eight drawbacks in the design and evaluation of their model. In particular, we noticed that the out-of-bag performance evaluation yielded unrealistic results in all cases considered. In addition, we observed that the feature ranking tends to be biased towards the features that are important for the most-frequently occurring type of comment change (i.e., either inner or outer comments). Finally, we introduce and evaluate a simpler model and conclude that its performance is statistically similar to that of the full model and that it is more easily interpretable.
Performance issues may compromise user experiences, increase the resources cost, and cause field failures. One of the most prevalent performance issues is performance regression. Prior research proposes various automated approaches that detect performance regressions. However, the performance regression detection is conducted after the system is built and deployed. Hence, large amounts of resources are still required to locate and fix performance regressions. In our paper, we propose an approach that automatically predicts whether a test would manifest performance regression in a code commit. We conduct case studies on three open-source systems. Our results show that our approach can predict performance-regression-prone tests with high AUC values. In addition, we find that traditional size metrics are still the most important factors. On the other hand, performance-related metrics that are associated with Loop and Adding Expensive Variable are also risky for introducing performance regressions. Our approach and the study results can be leveraged by practitioners to effectively cope with performance regressions in a timely and proactive manner.
Logging is a common practice in software development and contains rich information. However, little is known about mobile apps’ logging practices. Therefore, we conduct a case study on 1,444 open source Android apps in the F-Droid repository. We find that although mobile app logging is less pervasive than large software systems, logging is leveraged in almost all studied apps. We compare the log level of each logging statement and developers’ rationale of using the logs. All too often(over 30%), developers choose an inappropriate log level. Such inappropriate log level may prevent the useful run-time information to be recorded or may generate unnecessary logs causing performance overhead and security issues. Finally, we conduct a performance evaluation with disabling logging messages in four open-source Android apps. We observe a significant performance overhead on response time, CPU and I/O. Our results imply the need of systematic guidance to assistant in mobile logging practices.
In collaborative software development platforms (such as Github and Gitlab), the role of reviewers is key to maintain the effective review process of the pull requests. However, the number of decisions that reviewers can make is far superseded by the increasing number of pull requests submissions. To help reviewers to perform more decisions, we propose a learning-to-rank (LtR) approach to recommend pull requests that can be quickly reviewed by reviewers. Our ranking approach complements the existing list of pull requests based on their likelihood of being quickly merged or rejected. We conduct empirical studies on 74 Java projects. We observe that: (1) The random forest LtR algorithm performs better than both the FIFO and the small first baselines obtained from existing pull requests prioritizing criteria, which means our LtR approach can help reviewers perform more decisions and improve their productivity. (2) The contributor’s social connections are the most influential metrics to rank pull requests that can be quickly merged.
Software developers insert logging statements in their source code to record important runtime information. However, providing proper logging statements remains a challenging task. In this work, we firstly studied why developers make log changes in their source code. We then proposed an automated approach to provide developers with log change suggestions as soon as they commit a code change. Our automated approach can effectively suggest whether a log change is needed for a code change with an AUC of 0.84 to 0.91. We also studied how developers assign log levels to their logging statements and proposed an automated approach to help developers determine the most appropriate log level when they add a new logging statement. Our automated approach can accurately suggest the levels of logging statements with an AUC of 0.75 to 0.81.
In most software ecosystems, developers use versioning statements to inform which versions of a provider package are acceptable for fulfilling a dependency. There is an ongoing debate about the benefits and challenges of using versioning statements. On the one hand, flexible versioning statements automatically upgrade a provider’s version, helping in keeping providers up-to-date. On the other hand, flexible versioning statements can introduce unexpected breaking changes. We study three different strategies used by developers to define versioning statements, ranging from accepting a large/flexible range of provider versions to a conservative strategy. Using a flexible strategy, one can expect to have more provider upgrades than other strategies while having to modify less versioning statements. Flexible packages with more than 100 providers should be aware of the possibility of larger inter-release times. Finally, the majority of the strategy shifts are from flexible to mixed and vice-versa.
It is common practice to discretize continuous defect counts into defective and non-defective classes and use them as a target variable when building defect classifiers (discretized classifiers). However, this discretization of continuous defect counts leads to information loss that might affect the performance and interpretation of defect classifiers. Another possible approach to build defect classifiers is through the use of regression models then discretizing the predicted defect counts into defective and non-defective classes (regression-based classifiers). In this paper, we compare the performance and interpretation of defect classifiers that are built using both approaches (i.e., discretized classifiers and regression-based classifiers) across six commonly used machine learning classifiers and 17 datasets. We find that: i) Random forest based classifiers outperform other classifiers (best AUC) for both classifier building approaches; ii) In contrast to common practice, building a defect classifier using discretized defect counts does not always lead to better performance.
“Early access” is a model that allows players to purchase an unfinished version of the game. In turn, players can provide developers with early feedback. Recently, the benefits of the early access model have been questioned by the community. We conducted an empirical study on 1,182 early access games on the Steam platform to understand the characteristics, advantages and limitations of the early access model. We observe that developers update their games more frequently in the early access stage. On the other hand, the reviewing activity during the early access stage is lower than that after the early access stage. However, the percentage of positive reviews is much higher during the early access stage, suggesting that players are more tolerant of imperfections in the early access stage. Hence, we suggest developers to use the early access model for eliciting early feedback and more positive reviews to attract future customers.
When APIs evolve, consumers are left with the difficult task of migration. Studies on API migration often assume that software documentation lacks explicit information for migration guidance and is impractical for API consumers. Past research has shown that it is possible to present migration suggestions based on historical code-change information. Yet, the assumptions made by prior approaches have not been evaluated on large-scale practical systems. We report our recent practical experience migrating the use of Android APIs in FDroid apps when leveraging approaches based on documentation and historical code changes. Our experiences suggest that migration through historical code changes presents various challenges and that API documentation is undervalued. More importantly, during our practice, we experienced that the challenges of API migration lie beyond migration suggestions, in aspects such as coping with parameter type changes in new API. Future research should aim to design automated approaches to address these challenges.