2021-2022

Project List

Supervisor Project Title
Amber Simpson AI-based survival analysis of intrahepatic cholangiocarcinoma
Predicting endovascular thrombectomy with CT imaging
Anna Panchenko A platform for the analysis of DNA-transcription factor binding sites
Bram Adams Bazel build process tracking and visualisation tool
Benchmark & dashboard for language server implementations
Burton Ma Implementing an extensible version of Sokoban
A grammar, parser, and compiler for simple unit tests
Catherine Stinson Testing a benchmark task for enormous language models
Mapping police interactions app
Christian Muise Modelling Human Behaviour
Powerful Puzzling
Any Other Thoughts?
CISC 352 Assignment Design
CISC 352 Auto-composition of Digital Assets
PDDL Editor for Education
Automatic Map Ingestion for ABM
Interactive Agent-based Simulation
Sports Analytics Dashboard
Natural Language → Planning
Sports Analytics
David Skillicorn Detecting scam web sites using deep learning
Predicting outcome in childhood leukemia (2 projects)
Finding the significant participants in a hacker forum
Predicting surge in ER visits
Detecting themes/depression in incel forum posts
Farhana Zulkernine 3D Object Detection for Autonomous Driving
Deeper learning of Deep Learning by experimenting on activations, learning rates and regularizations
Anomaly Detection and Correlation Analysis of Sensor Data for Autonomous Metro Rail Operation
Deep Layerwise Canonical Correlation Analysis
Using sense-embedding for improving neural language models
Hybrid EMR Data Analytics on Diabetes: Diagnosis, Prevalence and Primary Care Quality
Remote Photoplethysmography for Glycated Hemoglobin (HbA1c) Level Estimation
Using advanced analytics to study the prevalence of osteoarthritis pain in Canadian primary care
Human Activity Recognition
An information retrieval-based question answering system by using document ranking
A Comparative Study on Speech Recognition Tools
Healthcare question answering knowledge graph for patients with chronic conditions
Francois Rivest (and Farhana Zulkernine) Using Unity to Develop Activity Recognition Training Set for Deep Neural Networks.
Furkan Alaca Implementing a Security Tool Proposed in Security Research
Security Extension/Plugin for an Open-Source Project
Furkan Alaka and Juergen Dingel Formal Analysis of Security Protocols
Jana Dunfield “lint” for TeX proofs
From Haskell types to Prolog predicates
Juergen Dingel Testing 2.0
Domain-specific Language Support in VS Code
Kai Salomaa Computational complexity of decision problems for regular languages
Template-guided recombination: finite automaton construction
Kathrin Tyryshkin Developing a broadly enabling computational pipeline for accurate microRNA curation
Genomic data analysis and visualization GUI
Nicholas Graham Toward Live Theatre Hosted in Virtual Reality
Parvin Mousavi Margin assessment for cancer surgeries using computer-assisted diagnosis and interventions
Sara Nabil Digital Fabrication using E-textiles
Interactive Smart Spaces for Living with COVID-19
Selim Akl Homomorphic Encryption Implementation
Sidney Givigi A simulation environment for robot arms
Steven Ding [OD1NF1ST] Intrusion Detection for MIL-1553-based F-15/F-35 Avionic Platform
[N-Lights] AI-powered Malware Detection through Constant-Memory Implicit Recurrent Neural Network
[JARV1S] AI-powered Clone Search Engine for Cyber Thread Intelligence
[CH4OS] A Malware Generation and Detection Game
[GCB – Team Blue] A virtualized cybersecurity OpenAI gym for training autonomous attack/defence agents
[GCB – Team Red] A virtualized cybersecurity OpenAI gym for training autonomous attack/defence agents
[VulANalyzeR] Practical Vulnerability Discovery with Artificial Intelligence
Ting Hu Evolutionary music
Disease subtype detection using layer-wise relevance propagation
Networked genetic algorithm
Yuan Tian An Intelligent Course Comparison Tool for MOOC
Analyzing and Tracking Technical Debts Generated in Code Review
A Powerful Data Science-related Code Example Search Tool
Yuanzhu Chen Research collaboration network analysis
Flight viz
Air Travel Studies

Details

  • Margin assessment for cancer surgeries using computer-assisted diagnosis and interventions
    Supervisor: Parvin Mousavi
    Description:
    The clinical management of cancer mostly involves surgical resection of the tumor, and the goal is to remove all affected cells to avoid recurrence of cancer. To determine whether this goal has been achieved, the resected specimens are examined by histopathologist post-operatively. The presence of cancer cells at the margins indicates incomplete tumor resection. Novel mass spectrometry-based technologies can provide enriched feedbacks to surgeons about the molecular signature of the resected specimen to avoid positive margin. The goal of this project is to develop deep learning models capable of characterizing cancer signatures in recorded mass spectra. For related publications visit Medical Informatics Laboratory.
  • Homomorphic Encryption Implementation
    Supervisor: Selim Akl
    Description:
    CISC-499 Project: Homomorphic Encryption Implementation
    S. Akl
    The emergence and widespread use of cloud services in recent years represented one of the most important evolutions in information technology. By offering abundant, conveniently accessible, and relatively inexpensive storage for data, the cloud is certainly a very attractive option for businesses and citizens alike. However, this ease of access to, often personal and sometimes sensitive, information, is clearly coupled with serious concerns over the privacy and security of the data. The most effective approach to mitigate threats to data safety by unscrupulous individuals is cryptography. Unfortunately, encrypting the data offers an inevitable trade-off: convenience is diminished. Users wishing to process their data, must download the encrypted information from the cloud, decrypt it, process the plaintext version, re-encrypt the result, and re-upload the ciphertext to the cloud. A special kind of cryptography however exists, called homomorphic cryptography, that allows users to operate remotely on their encrypted data, directly on the untrusted database.

    This project aims to implement the use of homomorphic encryption for data represented as graphs. A successful project will provide a working homomorphic encryption system that: (i) receives inputs, (ii) encrypts the inputs, (iii) applies operations on the encrypted inputs, and (iv) decrypts the results.
  • Natural Language → Planning
    Supervisor: Christian Muise
    Description:
    The idea for this project is to extract planning models (descriptions of actions, their preconditions+effects, etc) from natural language instructions. Examples include recipes and WikiHow instructions, and other language-based datasets.

    NL2Plan

    More details on MuLab projects.
  • Sports Analytics Dashboard
    Supervisor: Christian Muise
    Description:
    The aim of this project is to create a framework and pipeline to enable research to sports analytics at Queen’s University. Co-advised by Prof. Catherine Pfaff and Prof. Muise, students will work with other researchers focusing on sports analytics, as well as contacts within sports organizations (both Queen’s varsity and more broadly). (image source)

    Sports Analytics Dashboard

    More details on MuLab projects.
  • Interactive Agent-based Simulation
    Supervisor: Christian Muise
    Description:
    With large-scale agent-based models (ABMs) to simulate the spread of COVID, having a visual means to interact with the simulation can play a crucial role in understanding the unfolding of events. This project aims to provide such an interface (text-based) for a pre-existing ABM system. (image source)

    Interactive ABM

    More details on MuLab projects.
  • Automatic Map Ingestion for ABM
    Supervisor: Christian Muise
    Description:
    Agent-based modelling typically requires many settings to be configured and tweaked in order to obtain more accurate results. This project aims to streamline the process of creating a simulation for the spread of COVID in a particular jurisdiction by ingesting geospatial map data for a town or city and converting that to the proper format for a pre-existing simulator.

    ABM Map Ingestion

    More details on MuLab projects.
  • PDDL Editor for Education
    Supervisor: Christian Muise
    Description:
    The aim of this project is to improve the online editor for planning specifications for the purposes of education. This may include anything from new visualization techniques to novel plugins for interacting with or debugging planning problems. (image source)

    PDDL Editor

    More details on MuLab projects.
  • CISC 352 Auto-composition of Digital Assets
    Supervisor: Christian Muise
    Description:
    This project aims to build on the ongoing efforts to build a framework of automated pedagogy for AI educational resources. In particular, the aim is to convert a set of given specifications into a visual representation for further modification. Core components of the visual assets will be produced in the lead-up to this project, and the results will be used in future incarnations of CISC 352.

    Prof Muise

    More details on MuLab projects.
  • CISC 352 Assignment Design
    Supervisor: Christian Muise
    Description:
    As part of the re-design of CISC 352, this project aims to re-imagine what assignments are used within the course. This not only involves putting together a compelling assignment to assess the material, but further involves the application of AI techniques to (1) generate the unique problems for each student; (2) automatically mark the student submissions; and (3) automatically generate meaningful feedback when submissions are incorrect.

    Prof Muise

    More details on MuLab projects.
  • Modelling Human Behaviour
    Supervisor: Christian Muise
    Description:
    This project aims to capture interpretable insight from human users of a system using modern AI techniques. The learned representation will capture the core elements of observed human behaviour in a form detailing how and when the user transitions from one mental state to another. The source of behavioural information will be data retrieved from biomedical devices such as heart rate or skin conductance sensors. The elements of the learned representation, and the mechanics it captures, will all be learned entirely in a data-driven fashion. The research will be conducted on a driving simulation testbed that will allow for mixed human-machine control of the (virtual) vehicle.

    This is part of a larger research project, and the scope of the project will be limited to one aspect of the larger system.

    Driving Simulator

    More details on MuLab projects.
  • Sports Analytics
    Supervisor: Christian Muise
    Description:
    This open-ended project is for focusing on advanced sports analytics. This includes vision-based applications to locate and identify players, geometric analysis of games to determine team influence of player behaviour, etc. Interested students are encouraged to propose ideas they would like to consider. (image source)

    Voronoi Diagram

    More details on MuLab projects.
  • Powerful Puzzling
    Supervisor: Christian Muise
    Description:
    The aim of this project is to be able to highlight on an image of puzzle pieces, a pair of piece edges that likely go together. If done in real-time, this would allow an interactive experience of puzzle-solving where the system suggests the next move, and the human continually responds by attempting the suggestion. (image source)

    Puzzle Completion

    More details on MuLab projects.
  • Any Other Thoughts?
    Supervisor: Christian Muise
    Description:
    At the intersection between the two areas of Machine Learning and Interaction Design, this project seeks to develop a tangible device that supports a healthy in-person workplace environment. A common obstacle to having productive group meetings is having some members speak too much or too little. This project aims to reflect this potential imbalance in contribution levels in real-time, through the integration of live LED feedback and machine learning-driven analysis of speaker localization and conversation duration. The team will put together an Arduino-controlled physical device that sits on a boardroom table, passively listens to the meeting (without recording the audio), and provides real-time feedback visually. This project is joint between the MuLab and the iStudio Lab. (Image: Group of individuals engaging in a team meeting by Jacob Lund Photography from NounProject.com)

    Group Meeting

    More details on MuLab projects.
  • Toward Live Theatre Hosted in Virtual Reality
    Supervisor: Nicholas Graham
    Description:
    This project will be co-supervised by Prof. Michael Wheeler of the Dan School of Drama and Music here at Queen’s.

    Our vision is that plays can be performed through virtual reality. The actors’ positions and movements are tracked in real-time, and used to manipulate avatars that appear in a virtual world. Audience members use a virtual reality headset (like the Oculus Quest) to experience the play.

    While the basics of this approach are currently being developed in Film and Media, there is much experimentation to be done to find out how to make VR theatre a compelling and enjoyable experience. A traditional approach might involve audience members positioned around a stage where the play takes place. A more ambitious approach could place the theatre-goer right in the action. It is unclear to what extent the audience should be allowed to freely roam the set, versus being positioned in real-time by a director.

    In this project, students will work with a professional programmer to develop interesting forms of audience interaction in a VR play, including exploration of the audience’s viewpoint into the play, their ability to move around, and the role of a director to manage the audience in real-time.

    Ideally, students will have taken CISC 226 and be confident with Unity development.

    This project can be performed by a team of up to three students.
  • A platform for the analysis of DNA-transcription factor binding sites
    Supervisor: Anna Panchenko
    Description:
    This project is to design a web server for analyzing the binding sites between double-stranded DNA and transcription factors. A pipeline based on the command line (Linux) has been proposed by Computational Biology and Biophysics Lab at Queen’s University. The goal is to optimize the pipeline and run it as a graphical user interface. The web interface should be able to allow users to search and analyze the DNA-transcription factor binding sites from a user-provided PDB ID (the structure identifier used in the Protein Data Bank) or a PDB-formatted file. It is required that the results can be directly visualized or downloaded. This project will help researchers to understand the molecular details of how the transcription factors recognize DNA and provide important clues for therapeutic targeting of oncogenic transcription factors.
  • AI-based survival analysis of intrahepatic cholangiocarcinoma
    Supervisor: Amber Simpson
    Description:
    Intrahepatic cholangiocarcinoma is a cancer of the bile ducts with poor prognosis and minimal treatment options that is difficult to diagnose. Prognostic information developed from survival analysis could provide important insight for treatment development and general understanding of this disease. This project aims to compare the predictive performance of a deep learning model for recurrence free survival when it is trained with different levels of segmentation of a CT image: organ-level and tumor-level. The question students would be exploring is whether there is predictive information lost from the surrounding organ tissue when segmenting only the tumor, and if that information is useful for predicting recurrence free survival.
  • Predicting endovascular thrombectomy with CT imaging
    Supervisor: Amber Simpson
    Description:
    Can machine learning algorithms accurately predict poor or incomplete recanalization after endovascular thrombectomy using imaging features on CT Angiography and CT Perfusion in patients with late-window acute ischemic stroke? Endovascular thrombectomy (EVT) is a medical procedure that has significantly improved patients’ outcomes following a stroke. However, nearly 30% of patients experience negative outcomes following EVT, ranging from dependence on nursing care to death ). The use of CT Angiography (CTA) and CT Perfusion (CTP) has been shown to greatly improve the selection of patients who would benefit the most from EVT. In some instances, despite the ideal imaging criteria provided by CTA and CTP, the results of the EVT are poor, failing to achieve a satisfactory outcome (i.e., blood cannot re-circulate in the area affected by the stroke). The use of Machine learning (ML) may provide additional value regarding the imaging features on CTA and CTP that are predictors of the technical success of EVT. Through identification of radiomic features from CTA and CTP imaging, in addition to the prognostic and clinical variables available for stroke patients, ML can provide significant value in poor outcome prediction.
    The aim of this study is to identify, using ML algorithms, possible imaging features as predictors of poor or incomplete recanalization after EVT in patients who otherwise fulfill the imaging selection criteria up to 24 hours after symptoms onset.
  • Security Extension/Plugin for an Open-Source Project
    Supervisor: Furkan Alaca
    Description:
    Propose, as a group of two, an implementation-based project for a security-related extension or plugin to an open source project (I also plan to post a more research-oriented project description – if you are interested more in security research than in implementation, please feel free to contact me and share your interests). The following are two possible ideas as starting points:

    1. John the Ripper (JtR) is a popular password-cracking utility that has plugins that supports a variety of file formats. For example, see the following two files which contain the source for (i) the zip2john utility that processes an input ZIP file to extract the necessary information to aid the password cracking process, and (ii) the format plugin that reads the output from zip2john and initiates the password cracking process:
    https://github.com/openwall/john/blob/bleeding-jumbo/src/zip2john.c
    https://github.com/openwall/john/blob/bleeding-jumbo/src/zip_fmt_plug.c

    Your project may propose a file format that is not yet supported by JtR, and write a plugin for it. The project will involve learning how JtR works, how to write a plugin, learning the fundamentals of how password cracking works, including for example what information from the file is required to verify that the correct password has been guessed.

    It is expected that the quality of the project should be high enough to submit a pull request for inclusion in the official JtR project.

    2. OpenWRT is a popular open-source Linux distribution for routers. I encourage you to try OpenWRT and take a look at the kind of packages that are included in the official repository, and propose a security-related package of your own. One basic idea that you may build off of might be to build a plugin that is configurable through the web interface that allow the creation of firewall rules that are dynamically created/enforced based on certain conditions. For example, blocking outbound network traffic from IP cameras (to protect your privacy) when your smartphone is present on the WiFi network (which would presumably indicate that you are at home).

    It is also expected that the quality of the project should be production-ready and not just prototype quality (i.e., you should submit a package that anybody can install on their router and use with little trouble).
  • Digital Fabrication using E-textiles
    Supervisor: Sara Nabil
    Description:
    Imagine future computers that are woven into your shirt sleeve, stitched to your pillow, knitted into your scarf or embroidered on your favourite jacket. Apart from software, coding and digital data, this project uniquely focuses on physical fabrication and making of novel interactive tangible objects. Hardware prototyping is not always using electronic sensors, motors and LEDs. Instead, we can innovate new materials that have computational properties, such as textiles, wood, paper, stone, etc. In this project, we will explore an alternative approach to creating interactive everyday things, which is to incorporate smart materials directly into the making stages of the everyday materials e.g. sewn fabrics. Smart materials that have morphological (shape and colour-changing) capabilities such as thermochromic inks and shape-memory wires can be literally stitched, knitted and weaved into different materials. This research project aims to explore the different emergent materialities that can be digitally designed, fabricated and crafted from such smart materials. To achieve this, we will adopt a ‘Research through Design’ (RtD) approach, which we refer to as ‘co-designing with our materials’. We draw on the insights of RtD to frame the production of annotated portfolios as a rigorous theory and a developing form to underpin our presentation of a series of hands-on laboratory experiments. We will use specialized fabrication equipment and digital computational design in our design explorations in a creative practice to offer novel insight into the interactive potentials of the techniques we will exploit. For example, laser-cutting, 3D-printing and digital embroidery are some of the fabrication methods we can utilize.
    In this project, we bring material science innovation of actuating wires to a new context and appropriated practices, as threads. This bridging between technology and crafting enables ‘smart’ materials to have new encounters with other materials (such as fabrics and textiles), other tools (such as needles and bobbins) and other machines (such as sewing machines or embroidery machines). This approach broadens the accessibility of technology prototyping and has the potential to enable new previously unrealizable possibilities. For example, we can innovate shape-changing wearables and colour-changing garments as means of assistive technology to support marginalized groups including people with disabilities. See examples of similar design projects at: https://istudio.cs.queensu.ca/publications/
  • Interactive Smart Spaces for Living with COVID-19
    Supervisor: Sara Nabil
    Description:
    Make your room change its size and physical appearance –literally not virtually- or have your desk deform its shape when others poke theirs during the next lockdown. Just touch wood to transform it back!

    Smart Spaces, including Smart Homes, are not only those that collect massive data and can control lights, curtains, temperature and humidity. Smart spaces can have interactive capabilities that can respond to people as actuating physical interfaces characterized by being aesthetically pleasing, intuitively manipulated and ubiquitously embedded in our daily life. In this project, we will be designing and building interactive walls, floors, ceilings, furniture or entire buildings that have the potential to –finally– transform the vision of smart homes and ubiquitous computing environments into reality. We can propose interactive spaces for both exterior and interior design, arguing that interaction design should be at the core of a new interdisciplinary field driving research and practice in architecture. The design concept will focus on supporting people during the pandemic to be able to tolerate the wellbeing challenges of self-isolation and lockdown. Applications include (but not limited to) designing for working from home, studying from home, sleeping at home, or eating alone. Based on this agenda, we will be innovating future technology of smart spaces and utilizing interactive smart materials (e.g. conductive paints and fabrics, shape-changing and colour-changing materials). We will also be addressing the challenges and opportunities of this novel design space. This agenda offers us new means through which to deliver a future of smart cities.
  • [GCB – Team Red] A virtualized cybersecurity OpenAI gym for training autonomous attack/defence agents
    Supervisor: Steven Ding
    Description:
    We are looking for two teams to work on building a virtualized network into an open AI gym interface for cyber attack/defense scenarios. Both teams will share responsibility for building the virtual network. The network will be fully defined in software to enable quickly resetting the environment at the end of an episode and should also feature user simulation agents to provide background noise to hide adversarial effects.

    Once the infrastructure is built responsibilities will be divided into red and blue teams. The red team would be responsible for building an interface between an OpenAI gym and an attacker C2 platform such as metasploit or covenant. This interface will cover actions such as lateral movement into another machine, exfiltrating data, or running ransomware payloads. The blue team would be responsible for connecting a defensive agent to the environment. This agent should have the ability to get current host/network alerts, take investigative actions with an agent such as velociraptor or osquery, and take remediation actions such as killing processes or re imaging machines.

    Alt Text
    ([Image source https://m.imdb.com/title/tt0113243/mediaviewer/rm234199552/])


    All the 499 projects at L1NNA lab will have the option to be jointly supervised by the Canadian Centre for Cyber Security (Cyber Centre) through a selection process. At the end of the project, the team will present their projects in front of a panel of experts from the Cyber Centre. Winning projects can be presented to GeekWeek VIII, a hands-on cyber security workshop organized by the Cyber Centre and gathering 200+ security professionals across Government departments and industry sectors including but not limited to Bell Canada, National Bank, THALES, National Defence, TD, and Telus.

    More Details
  • [GCB – Team Blue] A virtualized cybersecurity OpenAI gym for training autonomous attack/defence agents
    Supervisor: Steven Ding
    Description:
    We are looking for two teams to work on building a virtualized network into an open AI gym interface for cyber attack/defense scenarios. Both teams will share responsibility for building the virtual network. The network will be fully defined in software to enable quickly resetting the environment at the end of an episode and should also feature user simulation agents to provide background noise to hide adversarial effects.

    Once the infrastructure is built responsibilities will be divided into red and blue teams. The red team would be responsible for building an interface between an OpenAI gym and an attacker C2 platform such as metasploit or covenant. This interface will cover actions such as lateral movement into another machine, exfiltrating data, or running ransomware payloads. The blue team would be responsible for connecting a defensive agent to the environment. This agent should have the ability to get current host/network alerts, take investigative actions with an agent such as velociraptor or osquery, and take remediation actions such as killing processes or re imaging machines.

    Alt Text
    Image Source

    All the 499 projects at L1NNA lab will have the option to be jointly supervised by the Canadian Centre for Cyber Security (Cyber Centre) through a selection process. At the end of the project, the team will present their projects in front of a panel of experts from the Cyber Centre. Winning projects can be presented to GeekWeek VIII, a hands-on cyber security workshop organized by the Cyber Centre and gathering 200+ security professionals across Government departments and industry sectors including but not limited to Bell Canada, National Bank, THALES, National Defence, TD, and Telus.

    More Details
  • [CH4OS] A Malware Generation and Detection Game
    Supervisor: Steven Ding
    Description:
    Currently, there exists a two-player game environment for adversarial malware generation and detection. In our game, there is a modification agent which performs functional agnostic modifications on malware samples, and a detection agent which attempts to detect obfuscated malware. Intuitively this can be thought of as a very cool game of hide and seek. The hiding player is trying to modify malware to stay hidden, and the seeking player is trying to detect the malware.

    Right now the system only performs minor modifications on malware samples. Your task will be to design a system for choosing the optimal benign content based on the results of the previous game rounds. The system will be written in Python, and you do not need to have a background in cyber security!

    Alt Text

    All the 499 projects at L1NNA lab will have the option to be jointly supervised by the Canadian Centre for Cyber Security (Cyber Centre) through a selection process. At the end of the project, the team will present their projects in front of a panel of experts from the Cyber Centre. Winning projects can be presented to GeekWeek VIII, a hands-on cyber security workshop organized by the Cyber Centre and gathering 200+ security professionals across Government departments and industry sectors including but not limited to Bell Canada, National Bank, THALES, National Defence, TD, and Telus.

    More Details
  • Flight viz
    Supervisor: Yuanzhu Chen
    Description:
    Visualize airline flight data in North America and the world using statistics from US Bureau of Transportation Statistics. Study empirically how travel pattern evolve over time.
  • Research collaboration network analysis
    Supervisor: Yuanzhu Chen
    Description:
    Choose from PubMed, arXiv, and DBLP. Build networks of collaborators and publication citation. Study them from the structural and topical perspectives. Explore communities and paradigm shifts.
  • [VulANalyzeR] Practical Vulnerability Discovery with Artificial Intelligence
    Supervisor: Steven Ding
    Description:
    A vulnerability is a weakness or flaw of software system that allows a hacker to gain unauthorized access or manipulate the system’s behaviours. Discovering existing and new vulnerabilities from software systems has been a major challenges in cybersecurity.

    This project involves analyzing binary files for vulnerability detection. In machine learning, the training and testing of data can be quite different, thus it is required to have thorough experiments to evaluate the robustness of a model. Since synthetic datasets are easy to obtain, we can overlook the importance of the practicality of ML models in the real world. In this project, students need to first find an appropriate codebase (like a Github repos or existing vulnerability database) that is associated with vulnerability, and then compile it into object files. Finally, the labels also need to be available for each individual file in order to evaluate the performance of a model.

    Alt Text
    Image Source
    L1NNA lab

    All the 499 projects at L1NNA lab will have the option to be jointly supervised by the Canadian Centre for Cyber Security (Cyber Centre) through a selection process. At the end of the project, the team will present their projects in front of a panel of experts from the Cyber Centre. Winning projects can be presented to GeekWeek VIII, a hands-on cyber security workshop organized by the Cyber Centre and gathering 200+ security professionals across Government departments and industry sectors including but not limited to Bell Canada, National Bank, THALES, National Defence, TD, and Telus.

    More Details
  • [JARV1S] AI-powered Clone Search Engine for Cyber Thread Intelligence
    Supervisor: Steven Ding
    Description:
    re-Google was large-scale search engine for assembly code clones created by Google. It was extremely popular among the hackers and security researchers to discover re-used code that may contain certain system flaws and vulnerabilities possible for exploitation and intrusion. However, it was later taken down by Google from public access due to the copyright and security considerations, since the hackers can also use this for malicious purposes, uncovering hidden information in existing systems.

    The goal of this project, is to re-create re-Google, with a better performance in code search performance, enhanced by other searchable information such as strings or constants contained in the software. We have already built a platform for clone search, named JARV1S, going online this coming summer. Your goal is to help us add in additional features, improve testing pipeline, improve performances, add in additional machine learning components before summer 2022. You will learn and use popular frontend and backend solutions, including REACT, RESTful, socketIO, elastic search, tensorflow-sim, docker container, and docker compose, etc. Join us for this exciting opportunity!

    Alt Text
    Image Source: Star Wars: The Clone Wars
    L1NNA Lab



    All the 499 projects at L1NNA lab will have the option to be jointly supervised by the Canadian Centre for Cyber Security (Cyber Centre) through a selection process. At the end of the project, the team will present their projects in front of a panel of experts from the Cyber Centre. Winning projects can be presented to GeekWeek VIII, a hands-on cyber security workshop organized by the Cyber Centre and gathering 200+ security professionals across Government departments and industry sectors including but not limited to Bell Canada, National Bank, THALES, National Defence, TD, and Telus.

    More Details
  • Template-guided recombination: finite automaton construction
    Supervisor: Kai Salomaa
    Description:
    Template-guided recombination (TGR) is a formal model for the
    gene descrambling process occurring in certain unicellular organisms
    called stichotrichous ciliates (M. Daley, I. McQuillan, 2005).
    The mechanism by which these genes are
    descrambled is of interest both as a biological process and as
    a model of natural computation.

    This project studies template guided recombination as an
    operation on strings and languages with a goal to better
    understand its computational capabilities.
    A goal is to implement the TGR operation for finite automata
    and use this for experiments to study
    the complexity of the operation. The TGR operation has particular
    theoretical interest because the iterated version is known
    to preserve regularity but the result is nonconstructive
    (i.e., there is no known algorithm to produce the automaton),
    M. Daley, I. McQuillan, Template guided DNA recombination,
    Theoret. Comput. Sci. 330 (2005) 237-250.

    In this project you will implement a simulator for the
    non-iterated TGR operation. The operation gets as input
    two nondeterministic finite automata, or NFAs (and some numerical parameters),
    and outputs an NFA for the resulting language.
    The software to implement the TGR operation should use input/output
    format similar to Fado (or Vaucanson).

    The software is intended to be used in conjunction with software libraries such
    as
    Fado
    Or alternatively,
    Vaucanson
    The software libraries provide a collection of operations
    that allow us to determinize and minimize the resulting NFAs
    to study the state complexity of the operation. From theoretical
    point of view a question of particular interest is to find examples
    where iterating the operation significantly increases the
    size of the minimized DFA.

    This is a larger project, including both programming and theoretical
    work, and is suitable for a group of up to three students.
    The implementation of the TGR operation for regular languages
    (NFAs) requires good theoretical background on formal languages
    concepts.
  • Computational complexity of decision problems for regular languages
    Supervisor: Kai Salomaa
    Description:
    This is a theoretical topic requiring capability
    to read about algorithm complexity, computational complexity
    and a good familiarity with
    finite state machines (and grammars).

    The topic is suitable for students with a strong record from
    CISC-223 and CISC-365.

    It is known that all natural problems, like
    membership, emptiness, equivalence etc.
    are decidable for finite state automata and regular expressions.
    However, what is the algorithmic complexity
    of these problems?

    The goal of this project is to investigate
    what are the known complexity results for the basic
    decision problems for deterministic and nondeterministic finite
    automata and for regular expressions. Often the questions are
    complete for PSPACE for NFAs and regular expressions, or log-space complete
    for DFAs. Besides a survey of the known results, central goals of the project include
    to identify

    1. examples of natural problems for finite
    automata/regular expressions where the precise
    complexity is unknown,

    2. examples of finite automaton problems that are not
    known to be solvable, or that are known to be
    unsolvable.

    An optional part of the project can study the computational complexity of decision problems for context-free languages. In this case, many questions are known to be unsolvable.

    The goal of the project is to present the findings
    in the report in a systematic way (the terminology in different
    articles appearing in literature may not always be consistent).
    The project involves a fairly large amount of literature search
    since the complexity results are not included in typical
    textbooks. I can provide some survey articles to be used as
    a starting point.

    The project is for 2 (or 3) students. If three students
    are working on the project, we would include
    also decision problems for context-free grammars
    (the optional part).


  • [N-Lights] AI-powered Malware Detection through Constant-Memory Implicit Recurrent Neural Network
    Supervisor: Steven Ding
    Description:
    A major problem in cyber security is detecting malicious software (malware), especially as society becomes increasingly dependent on computer systems. Many ransom attacks through malware succeeded in the past few years, causing financial loss of millions across the globe.

    The goal of this project is to perform malware detection from raw byte sequences of the software. In this case, we don’t need to execute the software, which will guarantee the safety of the whole analysis. However, binary files are of significant length if we treat them as binary sequences (>250M time stamps), and it becomes difficult to treat them as the input to a neural network due to the memory constraint and gradient vanishing/explosion issue. In this project, we will leverage an implicit neural network with recurrent back-propagation for a constant-memory solution that is efficient and effective against malware samples of arbitrary size.

    Alt Text
    Image Source
    L1NNA Lab

    All the 499 projects at L1NNA lab will have the option to be jointly supervised by the Canadian Centre for Cyber Security (Cyber Centre) through a selection process. At the end of the project, the team will present their projects in front of a panel of experts from the Cyber Centre. Winning projects can be presented to GeekWeek VIII, a hands-on cyber security workshop organized by the Cyber Centre and gathering 200+ security professionals across Government departments and industry sectors including but not limited to Bell Canada, National Bank, THALES, National Defence, TD, and Telus.

    More Details
  • [OD1NF1ST] Intrusion Detection for MIL-1553-based F-15/F-35 Avionic Platform
    Supervisor: Steven Ding
    Description:
    MIL-STD-1553 is a communication bus that has been used by many military avionics platforms such as the F-15 and F-35 fighter jet for almost 50 years. The original specification focuses heavily on fault tolerance and reliability due to the intended military and aerospace applications, with little security considerations by design. Since first being installed in the F-16 Fighting Falcon, MIL-STD-1553 has spread to billions of dollars worth of Internet-connected military hardware across the globe. However, the system is not robust towards modern cyber-attacks such as denial of service (DoS) or man-in-the-middle (MITM). Redesigning MIL-STD-1553 from scratch is impractical due to the bus’ wide-reaching installations. An alternative solution is to augment the existing protocol with protection.

    In this project, you will help improve a MIL-1553-based flight simulation system that is built upon Microsoft Flight Simulator. We have implemented the platform with ten categories of cyber attacks targeting the jet system. Your task will be to set up an actual physical bus system to collect real-life data, calibrate the simulation system with the physical bus system, and improve the streamlined process for simulation play. Tools/platforms involved: docker, docker-compose, Microsoft Flight Simulator 2021, SimConnect API, RESTful API, Streaming API, audio/visual recording, ReactJS, and plotly.

    Alt Text
    L1NNA Lab

    All the 499 projects at L1NNA lab will have the option to be jointly supervised by the Canadian Centre for Cyber Security (Cyber Centre) through a selection process. At the end of the project, the team will present their projects in front of a panel of experts from the Cyber Centre. Winning projects can be presented to GeekWeek VIII, a hands-on cyber security workshop organized by the Cyber Centre and gathering 200+ security professionals across Government departments and industry sectors including but not limited to Bell Canada, National Bank, THALES, National Defence, TD, and Telus.

    More Details
  • Evolutionary music
    Supervisor: Ting Hu
    Description:
    Evolutionary computing is a creative approach to AI. This project explores the possibility of using an evolutionary algorithm to create music pieces to a user’s personal preference.
  • Networked genetic algorithm
    Supervisor: Ting Hu
    Description:
    Genetic algorithms maintain a population of diverse individual solutions. Some algorithms incorporate a spatial context to the solutions and only allow solutions in close proximity to recombine and mutate, in order to reproduce offspring solutions. This project explores the idea of organizing solutions in a network where nodes are individuals and edges are relationships (e.g., parent-offspring).
  • Disease subtype detection using layer-wise relevance propagation
    Supervisor: Ting Hu
    Description:
    Layer-wise relevance propagation (LRP) is an explanation technique for complex deep learning models. This project explores the utility of LRP for detecting subtypes in complex human diseases.
  • Formal Analysis of Security Protocols
    Supervisor: Furkan Alaka and Juergen Dingel
    Description:
    Security is an important quality attribute in many applications. Unfortunately, it can also be very challenging to establish. Formal methods offer techniques and tools that can allow more rigorous analysis of software artifacts than traditional techniques such as inspection and testing. The use of formal methods to verify security-sensitive applications is of great interest for researchers and practitioners [1]. ProVerif, Tamarin, and FDR4 are some of the leading tools for formally verifying properties of cryptographic protocols [2].

    The goal of this project is to explore and compare the use of (some of) these tools. As a first step, this can be done in the context of small, existing examples such as [3]. However, the application of these tools to detect unknown vulnerabilities in recently published protocols such as [4] is also an intriguing possibility.

    The project is most suited for students with interests and background in security and formal methods, ideally in the form of CISC 447 and 422.

    [1] S. Chong, et al. Report on the NSF workshop on formal methods for security. August 1, 2016. Available here
    [2] D. Basin. Formal Methods for Security Knowledge Area (Version 1.0.0). Available here
    [3] B. Kiesl. Tamarin Toy Protocol. GitHub Repository. Available here
    [4] K. Krawiecka, A. Kurnikov, A. Paverd, M. Mannan, N. Asokan. SafeKeeper: Protecting Web Passwords using Trusted Execution Environments. Proceedings of the 2018 World Wide Web Conference. 2018. Available here
  • Genomic data analysis and visualization GUI
    Supervisor: Kathrin Tyryshkin
    Description:
    Analysis of a genomic data often involves pre-processing, quality control, normalization, feature selection and classification and differential expression analysis. Many methods exist, however, the best technique depends on the dataset. Therefore, it is often required to try different techniques to select the one that works best for a given dataset.
    This project involves further development and improvement of a user-interface for a feature selection algorithm and feature analysis. The objective is to implement new components for feature selection and visualization of data. The interface would be published online for other researches to use.

  • Developing a broadly enabling computational pipeline for accurate microRNA curation
    Supervisor: Kathrin Tyryshkin
    Description:
    The goal of this project is to develop a computational pipeline that enables accurate microRNA curation in any species of interest. The project will involve developing algorithms for identifying correct miRNA sequences, which must present a bimodal distribution, in non-human species. This project is ideal for students interested in developing innovative and sophisticated algorithms and working with genomic data.
  • Testing 2.0
    Supervisor: Juergen Dingel
    Description:
    Cucumber is a testing tool that allows the expression of test cases in plain English, making it possible for end-users of the software to write and execute tests. In other words, it allows testing in terms of requirements-level scenarios and without knowledge of the implementation. Cucumber has been quite successful with significant industrial uptake. Cucumber is freely available for a range of languages including Python, Java, JavaScript, C++, Kotlin, and Go.
    Property-based testing (PBT) generalizes traditional, ‘example-based’ testing through the specification of properties that the output has to satisfy under, perhaps, the assumption that the inputs satisfy some other properties. PBT tools allow for the automatic generation of inputs that satisfy the input properties and then check that the output produced indeed has the desired output properties. Moreover, once an input that results in incorrect output has been found, the tool will automatically try to ‘shrink’ the input, i.e., look for the smallest input that makes the software fail. PBT thus not only allows for the specification of more general and expressive tests that avoid unnecessary or even misleading specialization, but also features a smart test case generation. More detailed descriptions of PBT can easily be found online (e.g., here). PBT can be used in any language, and PBT tools exist for, e.g, Python, Haskell, Java, JavaScript, C++, Kotlin, and Go.

    The goal of the project is to first explore Cucumber and PBT in isolation on a range of sample programs (using, e.g., solutions to assignments in previous courses). Then, the extent to which both approaches could be used together is to be investigated. Students are free to choose the programming language they want to work in (as long as it has Cucumber and PBT support, of course). The project is most suitable for students with interests in testing and software development.
  • Domain-specific Language Support in VS Code
    Supervisor: Juergen Dingel
    Description:
    Love VS Code? Interested in learning more about how it uses language servers to provide support for different, even user-defined languages? The goal of this project is to help us improve RTPoet, a VS Code extension implementing a textual version of UML-RT, a language which allows the description of reactive systems using a collection of communicating state machines. The extension supports code generation (for C++ and JavaScript) and execution. The execution of the generated JavaScript code can be animated using XState. The extension powers a web interface for UML-RT.

    While the extension already performs some validation on the specified state machines, this validation could be much improved by, e.g., checking for declared, but unused entities, or unreachable states. To achieve this, the extension must be extended appropriately using Java or Kotlin, a JVM-based language much more compact and elegant than Java. This project is most suitable for students with interests in (domain-specific) programming languages and IDE development using VS Code.
  • Implementing a Security Tool Proposed in Security Research
    Supervisor: Furkan Alaca
    Description:
    As a group of two, build your own implementation of a security-related tool proposed in literature. Ideally your implementation would be a variation or a small extension of the existing proposed idea.

    Examples include:

    “Sound-Proof: Usable Two-Factor Authentication Based on Ambient Sound”: https://www.usenix.org/conference/usenixsecurity15/technical-sessions/presentation/karapanos
    This is a two-factor authentication scheme where the second authentication factor is the proximity of the user’s smartphone to the device that is being logged in on. Proximity is determined by recording the ambient audio from both the user’s smartphone and the device that the user is logging in on (laptop, desktop, etc.) to conclude whether both devices are within the same physical vicinity.

    “HomeSnitch: behavior transparency and control for smart home IoT devices”: https://www.enck.org/pubs/oconnor-wisec19a.pdf
    This is a system for monitoring IoT device traffic and inferring the device’s activity (e.g., if it is uploading audio/video, or performing a firmware update, or sending a “heartbeat” message to keep its connection active with a cloud server) classifying it based on traffic characteristics and known device behaviours.

    “FirmAE: Towards Large-Scale Emulation of IoT Firmware for Dynamic Analysis”: https://github.com/pr0v3rbs/FirmAE
    This is an framework published at a recent conference (ACSAC 2020) for emulating IoT firmware images, primarily for the purpose of vulnerability scanning (but with other potential applications as well). You may be interested in extending this project, e.g., to allow emulation of a wider set of firmware images not tested in the 2020 paper.

    You may check this Google Doc to see if I have added any additional papers after this description has been posted: https://docs.google.com/document/d/1Fm7Du04v6s162l0jVNiQWVV1URfc_-Yl8z1yURl3cNE/edit?usp=sharing

    Some of these projects may require access to special hardware (e.g., for the IoT project, a wireless router or Raspberry Pi upon which you can install open-source firmware to develop on). If you are interested in a project along these lines, I would advise finding a suitable paper that interests you (whether from the above list, or a paper of interest that you find on your own), determining the scope of what you would like to do, and then reaching out to me to discuss.

    Google Scholar (https://scholar.google.ca/) is a good place to search using key words that interest you, and it would be helpful to narrow your selection by putting emphasis on security-focused conferences such as the one on this list: https://people.engr.tamu.edu/guofei/sec_conf_stat.htm

    Please feel free to reach out to me if you have any questions or have any ideas that you would like to discuss further with me. Ideally, given that the project will be security-related I would expect for you to be registered in CISC 447 (Introduction to Cybersecurity) this term. Exceptions can be made if you can demonstrate that you have the knowledge/skills required for the project.
  • Remote Photoplethysmography for Glycated Hemoglobin (HbA1c) Level Estimation
    Supervisor: Farhana Zulkernine
    Description:
    Diabetes is a serious disease that affects the insulin cycle in the human body. Serious long-term complications include cardiovascular disease, strokes, chronic kidney disease, foot ulcers, damage to the nerves, damage to the eyes and cognitive impairment. Thus, monitoring blood glucose levels in high-risk populations, pregnant women (gestational diabetes) and diabetic patients is very important.

    Noninvasive diabetes-diagnosis procedure is very new and require thorough studies to be error-resistant and user-friendly. In our previous works we built a smartphone application called Veyetals with Remote Photoplethysmography (rPPG) technique that can successfully estimate users’ vital signs including heart rate, heart rate variability, respiration rate, oxygen saturation and stress level with smartphone front cameras. In this project, we engage in monitoring the arterial condition by estimating the states of diabetic (HbA1c level greater than 10 percent).
  • A Comparative Study on Speech Recognition Tools
    Supervisor: Farhana Zulkernine
    Description:
    In recent years, human-like conversational agents have been created and deployed in a wide range of domains. This project is to explore the voice part of a conversation agent, and give a comparative study on different speech recognition APIs and state-of-the-art pre-trained models.
    Student needs to evaluate those APIs and models on different domain-specific benchmarks and record the performance such as accuracy and inferencing time.
  • An information retrieval-based question answering system by using document ranking
    Supervisor: Farhana Zulkernine
    Description:
    In recent years, human-like conversational agents have been created and deployed in a wide range of domains, such as e-commerce, medical, and education. During recent pandemic, online QA services became extremely important and popular. A powerful and intelligent question answering (QA) system can understand users’ intent and answer different types of questions by searching a huge knowledge-base quickly. However, existing state-of-the-art information retrieval-based QA systems such as bert-squad [1], and albert-squad [2] can only retrieve the answer from a provided document. It is extremely inefficient to search the answer on each document sequentially in a whole knowledge-base by deep learning model. Therefore, a powerful document ranking algorithm is needed to rank the documents in the knowledge-base based on the relevance between input questions and documents.

    In this project, students need to implement a powerful and efficient deep learning system for document ranking. This system is able to list several documents most relevant to a question with ranking scores.

    Reference
    [1] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1(Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.

    [2] Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. Albert: A lite bert for self-supervised learning of language representations. arXiv, abs/1909.11942, 2020.
  • Deeper learning of Deep Learning by experimenting on activations, learning rates and regularizations
    Supervisor: Farhana Zulkernine
    Description:
    In this project, we take a curious approach for learning Deep Learning. We test unusual parameter settings to find out how neural networks work.
    The aim of this project is to find simple ways to answer the following questions:
    How do deep neural networks extract high-level features?
    Why are activation functions necessary in neural networks?
    Why does Softmax have extraordinary power in focusing on the right target? Can we manipulate it to double its power?
    Why cross-entropy loss? Why not least squares? Can we visualize the gradient behavior?
    How large can our max-pooling size be without damaging the data?

    In this project, we are about to learn about the above questions by testing, comparing and visualizing in a task of interest (e.g. emotion recognition)
    This insightful project requires you to:
    – implement layer-specific learning rates, compare with global learning rate setting, and tabulate and plot the comparison results.
    – compare commonly used activation functions (ReLU, PReLU, Leaky ReLU,…) and tabulate and plot the results in the task of interest.
    – Assess the impact of regularizations (like sparsity and within data distances) in the resulted accuracy.
    – Compare different settings of residual connections in neural network.
    – Visualize the gradient value that cross-entropy loss backpropagates during training, and compare it to least squares.
    – Compare different pooling sizes with each other.
  • Deep Layerwise Canonical Correlation Analysis
    Supervisor: Farhana Zulkernine
    Description:
    Deep Learning based data fusion has attracted a lot of attention in variety of fields like visuo-linguistics and emotion recognition. Among the commonly used methods in fusion, one can mention CNN-RNN, transformers, feature level fusion and decision fusion. However, each of the mentioned approaches has its own limitations.

    In this project, to handle existing problems, you are required to inspire from the idea of Canonical Correlation Analysis to design a new ANN-based data fusion algorithm, capable of correlating two data modes simultaneously (e.g. audio and language) in different states or nonlinear levels.

    Depending on the selected multimodal dataset (audio-video emotion recognition, audio-textual emotion recognition, image captioning dataset, etc.), you need to combine canonical correlation analysis with models like CNNs, LSTMs, FFNNs, etc., to learn cross-modal dependencies.
  • Using sense-embedding for improving neural language models
    Supervisor: Farhana Zulkernine
    Description:
    Embedding words into vectors was so blessing, as we could have a sense of what each word conveys in terms of a series of numbers. Compared to other representations like bag-of-words and one-hot-codes, embedded vectors could correct themselves during training. However, there is still one problem remained with word embedding. They convey the same value for different senses of the word. This might cause ambiguities in great deal of words. For instance, regard the word “brother”. Either it means “sibling”, or “boss”. If we use the embedding vector that is trained over groups of sentences about family, sibling and friendships, the resulted word embedding gets distanced from the vector that is trained on group of collocated words about guidance, management, boss and etc.


    In this project, you are about to extract the sense of the word using a large vocabulary and language database. Then, after extracting the senses, you will embed each sense with different embedding vector. The main task is to associate each word to its corresponding sense index, and then use the sense-embeddding vectors in a neural language model (LSTM, biLSTM, Transformers) and compare the language modeling performance with a typical neural language model (lacking sense-embedding).
  • Hybrid EMR Data Analytics on Diabetes: Diagnosis, Prevalence and Primary Care Quality
    Supervisor: Farhana Zulkernine
    Description:
    With the widespread use of technology, huge volumes of clinical data is generated every day and stored digitally in the form of Electronic Medical Records (EMR). Data contained in EMR is not only used for the patients’ primary care but also for various secondary purposes such as clinical research, automated disease diagnosis, quality enhancement and better inform clinical decision making. With a view to improving disease identification using EMR, this project will develop an automatic diagnostic model for diabetes in the Canadian population. EMR data from primary care clinics in seven provinces across Canada will be used to develop the predictive models. The comprehensive nature of this EMR data including structured numeric, categorical, hybrid, and unstructured text data, makes the selection of the right data type and preprocessing of the data for clinical prediction a challenging task. By applying natural language processing and different classification algorithms, this project will examine various data types of EMR data including structured data, short unstructured diagnostic text as well as longer free text narrative patient-physician encounter notes. This allowed us to investigate the capacity of each data type in terms of disease diagnosis and clinical prediction power.
  • Using advanced analytics to study the prevalence of osteoarthritis pain in Canadian primary care
    Supervisor: Farhana Zulkernine
    Description:
    Osteoarthritis (OA) is a progressive chronic joint disease resulting in a breakdown of articular cartilage and bone when damaged joint tissues are not able to normally repair themselves. Symptomatic OA of the hip and knee affects over 300 million adults worldwide.
    The primary objective of this project is to apply Information Extraction (IE), Natural Language Processing (NLP) and Machine Learning (ML) techniques to extract data from the patients’ EMR in addition to the current structured data to determine the following.
    1) Which major index joints are affected and identify patients suffering from hip and/or knee OA.
    2) Classify the severity of the pain.
    3) Identify patients who have tried ≥3 analgesics (or equivalent).
  • Implementing an extensible version of Sokoban
    Supervisor: Burton Ma
    Description:
    Sokoban is a classic puzzle video game. The goal of this project is to create a Java implementation of Sokoban that is suitable for study by CISC124/CMPE212 students. The implementation must be carefully designed with extensibility in mind, be fully documented, and be fully tested.

    Requirements: Strong Java programming skills, JUnit 4. Knowledge of design patterns would be useful.
  • Healthcare question answering knowledge graph for patients with chronic conditions
    Supervisor: Farhana Zulkernine
    Description:
    The digitization of the Clinical records requires information models describing assets and information source to enable the semantic integration and interoperable exchange of data. Improving healthcare for people with chronic conditions requires clinical information systems that support integrated care and information exchange, Multiple systems have been implemented to support the full life cycle of the Virtual knowledge graph concept. The project aims to use answering SPARQL queries over virtual knowledge graph by query reformulation, mapping tools for assisting mapping design, syndication for evaluating queries over multiple data sources, and query formulation tools to interact with virtualized knowledge graph for healthcare knowledge management to diagnose people with chronic conditions.
  • Using Unity to Develop Activity Recognition Training Set for Deep Neural Networks.
    Supervisor: Francois Rivest (and Farhana Zulkernine)
    Description:
    Activity recognition from videos is an important application of machine learning and artificial intelligence, from self-driving cars and automated video surveillance to team sports analysis. A key challenge in this field is the absence of a large labelled dataset required by machine learning algorithms to train neural networks on activity recognition. In this project, you will have to develop an open-source activity simulator in Unity, which could combine existing skeleton and activity animation, to generate labelled activity videos. The goal is to automatically produce thousands of short activity clips labelled by activity for deep neural network training. Given enough students on the project, it could include training a simple convolutional neural network to show the ability of the network to generalize to real video.
    Image from within Unity
  • Human Activity Recognition
    Supervisor: Farhana Zulkernine
    Description:
    Human Activity Recognition (HAR) is a process of automatically identifying human activities based on stream data obtained from various sensors (such as cameras, position sensors, inertial sensors, physiological sensors, etc.). HAR has proven to be beneficial in various research fields, especially in healthcare, aged-care, personal care, rehabilitation engineering, ambient living, social science, and many other fields. Due to the recent progress of computing power, the algorithms based on deep learning have become the most effective and efficient algorithm choices to solve HAR problems.

    In this project, a video camera will be installed in the lab and the system should be able to recognize the lab students and report the students’ activities. In details, the following two tasks are required to be completed separately by two students.
    (1) Recognize the person’s identity and anonymize the data by creating a skeletal video with depth from the 2D video.
    (2) Detect the person’s activity with one person in the scene or attending to a specific person.


    Dataset:

    Collected by a video camera


    References:
    [1] Nida Saddaf Khan & Muhammad Sayeed Ghani (2021). A Survey of Deep Learning Based Models for Human Activity Recognition. Wireless Personal Communications volume 120.
    [2] Zhang, H.-B., et al. (2019). A comprehensive survey of vision-based human action recognition methods. Sensors, 19(5), 1005.
    [3] Zhang, S., Wei, Z., Nie, J., Huang, L., Wang, S., Li, Z. (2017). A review on human activity recognition using vision-based method. Journal of Healthcare Engineering.

  • A grammar, parser, and compiler for simple unit tests
    Supervisor: Burton Ma
    Description:
    Unit testing is commonly introduced in introductory programming courses. Students sometimes have difficulty writing unit tests because the tests are written in the same programming language that the student is currently learning. JUnit 5 in particular uses features of the Java language that beyond the scope of most introductory programming courses.

    Unit testing is also commonly used to assess student work. Manually writing unit tests that have detailed feedback is tedious (at best) because the feedback text needs to be embedded in the test.

    Both of these issues can potentially be addressed by creating a small language that can describe the common tests that might be written in an introductory programming course. Such a language would be specified by a grammar, that could be parsed and compiled to the target language and unit test framework. The goal of this project is to create such a grammar, use a tool such as ANTLR to create a parser for the grammar, and then create a compiler that can generate the unit tests described by the grammar.

    Requirements: CISC223. CISC458 may be useful. Strong Python and Java programming skills.
  • 3D Object Detection for Autonomous Driving
    Supervisor: Farhana Zulkernine
    Description:
    Autonomous driving technology has achieved rapid development in recent decade. The perception system of Autonomous Vehicles (AVs) needs to be accurate and robust enough to ensure safe driving. Object detection is a fundamental and critical module of the system which can localize the surrounding objects-of-interests and help AVs understand environment. 2D object detection can identify the objects and localize the objects with a 2D bounding box. However, AVs require more information to understand the surrounding environment. Comparing to 2D object detection, 3D object detection can provide more spatial information such as location, direction and object size which is more helpful to AVs. 3D object detection also has a wide range of applications not only in autonomous driving, but also in housekeeping robots and augmented/virtual reality. In this research we are planning to implement a deep learning-based 3D object detection to detect traffic participants such as vehicles, pedestrians, and cyclists.

    The 3D object detectors can be categorized into image-based, point cloud-based and multimodal fusion-based methods. We mainly research on the point cloud-based methods and multimodal fusion-based methods. Here we listed some related papers. Bird’s Eye View (BEV)-based methods [1], Range View (RV)-based method [2] and point-based method [3]

    [1] Simony, M., Milzy, S., Amendey, K. and Gross, H.M., 2018. Complex-yolo: An euler-region-proposal for real-time 3d object detection on point clouds. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops (pp. 0-0).
    [2] Fan, L., Xiong, X., Wang, F., Wang, N. and Zhang, Z., 2021. Rangedet: In defense of range view for lidar-based 3d object detection. arXiv preprint arXiv:2103.10039.
    [3] Qi, C.R., Su, H., Mo, K. and Guibas, L.J., 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 652-660).
  • A Powerful Data Science-related Code Example Search Tool
    Supervisor: Yuan Tian
    Description:
    Writing high-quality code effectively is challenging to many data science practitioners who focuses more on data collection, algorithm design, and statistical analysis. For instance, one has to search for existing APIs and frameworks that support tradition data analysis techniques and machine learning algorithms, and learn how to use those API/frameworks to develop the code for their own task. This process is usually manual and time-consuming.

    In this project, we aim to design and develop a new data science code example search tool, that can support free-text search and more complex search functions, e.g., template-based search, search for similar code, search for code for a specific statistical test, search for code with quality requirement and API usage requirement. This tool can be a web search application or a bot on the Github platform.

    More than two students are required to form a group.
  • An Intelligent Course Comparison Tool for MOOC
    Supervisor: Yuan Tian
    Description:
    Massive Open Online Courses (MOOCs) are free online courses available for anyone to enroll. These online courses offer critical opportunities for students and job seekers to gain knowledge in a more flexible way. However, even for the same topic, e.g., data analysis, there are plenty of courses that can be found and it takes time for people to figure out the differences between those courses and the prerequisite concepts required. Moreover, how these courses can support job hunting is unknown.

    This project aims to fill the above gap by proposing a new approach to summarize the learning concepts/skills achieved and required for each candidate MOOC and link them with available job posts that require those skills.

    The goal might be adjusted based on the number of students who take the project. Group is preferred.
  • Analyzing and Tracking Technical Debts Generated in Code Review
    Supervisor: Yuan Tian
    Description:
    When developers cut corners and make haste to rush out code, that code often contains technical debt (TD), i.e. decisions that must be repaid, later on, with further work. Technical debts can be generated in every phase of the software development and evolution process.

    In this project, we aim to analyze technical debts introduced in the code review process. Once a code change to the project was formed, usually, another developer needs to take a look at the code change and see if there are any problems involved. However, not all questions and requests raised by code reviewers are resolved in the final submitted code change, some of them may become technical debts. So far, there is no tool available for automatically detecting and tracking the technical debts and thus analyzing them and their impact on the project remains unknown. To fill the gap, this project aims to first empirically analyze the appearance of technical debts in the code review process and then propose a machine-learning-based approach to identify technical debts in code review comments. Depending on the number of students who take this project, we might also analyze the impact of debts in the code review on the quality of the software project.

    Note: Individual or group.
  • “lint” for TeX proofs
    Supervisor: Jana Dunfield
    Description:
    Proof assistants such as Agda and Coq are increasingly used to automate metatheory (proofs about programming languages). However, these tools require great effort to use; I used a proof assistant for just one of my research papers. And how do you do the proofs about the proof assistants? At some point, you have to rely on natural language (“paper”) proofs.

    The goal of this project is to perform certain checks on proofs written in TeX—not to show that the proof is definitely correct, but to identify possible or probable bugs in the proof. To avoid attempting to understand general language, the tool will be restricted to proofs written in TeX using certain macros. The kinds of possible bugs being detected might include:

    – certain wrong uses of an induction hypothesis;

    – circular references;

    – mismatched lemmas (the proof “uses” Lemma 2 but the result derived has the wrong shape);

    – “obvious” missing cases.

    Basic knowledge of TeX is strongly recommended. Intermediate to advanced knowledge of a functional programming language (for example, Haskell) is strongly recommended. Knowledge of material from CISC 465 and/or CISC 458 would be ideal, though since most students will not have already taken those courses, concurrent enrolment in 465 and/or 458 in the Winter term is helpful. Knowledge of natural language processing may be useful.
  • From Haskell types to Prolog predicates
    Supervisor: Jana Dunfield
    Description:
    Prolog lacks any equivalent to Haskell’s ‘data’ declarations, which is both irritating and causes confusion in CISC 360 when we switch from Haskell to Prolog halfway through the course.

    For this project, you would develop a tool to turn Haskell ‘data’ declarations into Prolog predicates, ideally handling some of the more advanced Haskell features (barely covered in CISC 360) such as type classes and type guards.

    Knowledge of both Haskell and Prolog is required. Knowledge of grammars and parsing would be ideal.
  • Predicting surge in ER visits
    Supervisor: David Skillicorn
    Description:
    Emergency rooms would like to be able to estimate the number of cases they will see tomorrow, to help with staffing. We have time series data of previous visits to hospitals in Kingston and the GTA for several respiratory syndromes. The project is to develop surge prediction models for such diseases using data analytic techniques.
  • Detecting themes/depression in incel forum posts
    Supervisor: David Skillicorn
    Description:
    We have a dataset of incel forum posts. The participants in such forums are more heterogeneous than most other forums and the membership and motivations are not well understood. These projects will explore the themes present in the posts, and build predictive models for properties of interest such as depression or an intent to carry out an attack.
  • Predicting outcome in childhood leukemia (2 projects)
    Supervisor: David Skillicorn
    Description:
    Given a dataset of phenotype and genotype information about patients diagnosed with childhood acute lymphoblastic leukemia, build a predictive model for outcome. There are hints that prediction from SNPs is actually more accurate than from phenotype; if this is true, then it’s a an important finding. One project will use deep learning techniques, the other will use graph-based similarity methods.
  • Bazel build process tracking and visualisation tool
    Supervisor: Bram Adams
    Description:
    The build process is responsible for taking the (textual) source code files and other resources of a software project, then calling the minimal set of compilers, preprocessors and other transformation tools in the right order to generate the executables and packages needed to install and run the project. Since the 1970s, several build technologies have been developed, from the original Make, over Ant/Maven and CMake to modern technologies like Bazel. In particular, Bazel (open-sourced by Google) is part of a new breed of distributed build technologies, able to decompose a complex build process into smaller processes that can be sandboxed and run in a distributed manner.

    The goal of this project is to design and implement a web-based tool to track and visualize Bazel build processes. If a developer, tester or build engineer would like to debug an issue (either incorrect behaviour or performance bottleneck), or would like to understand what the build currently is doing, they should be able to fire up their browser and interact with the proposed tool. This tool would build on ideas prototyped in https://mcis.cs.queensu.ca/makao.html (for GNU Make), and would allow (amongst others) to:
    – visualize the dependency graph of a Bazel build process in your browser
    – navigate, query, filter, search, etc. the graph in a scalable manner (dependency graphs are known to be huge)
    – link the nodes, edges and graph metadata to the Bazel specification files
    – track (in real-time) the progress of the current build
    – the tool should scale to actual Bazel build processes of open-source projects

    This tool requires a healthy combination of web and shell development skills, an interest in learning about (modern) build processes, a fascination for open-source development, and a motivation to decompose complex problems into smaller tasks.
  • Benchmark & dashboard for language server implementations
    Supervisor: Bram Adams
    Description:
    Five years ago, the language server protocol specification (LSP; https://microsoft.github.io/language-server-protocol/) was open-sourced by Microsoft. This spec has caused a revolution in the world of IDEs and editors, both for developers and tool builders. Developers rejoice since they no longer need to switch IDEs for every separate programming language they would need to handle, but can stick to their favourite tool (e.g., VS Code, Emacs), as long as it is able to talk to language-specific server processes via the LSP protocol. Similarly, tool builders are ecstatic, since they no longer need to rewrite editing tools from scratch for every language their organization needs to support, they just need to be able to talk via LSP to language-specific servers. Put briefly: editing and programming language analysis are effectively decoupled through LSP.

    Now, five years later, there is a wealth of LSP implementations, for any programming language imaginable (see https://microsoft.github.io/language-server-protocol/implementors/servers/). There are 2 LSP implementations for C, 4 for C++, 2 for Go, 3 for Groovy, 4 for PHP, 4 for Python, and even 5 for Ruby! Each LSP implementation can choose which capabilities to support, from simple code completion and syntax checks, to code cross-referencing or even (simple) refactorings like renaming. Furthermore, each implementation could show differences in terms of performance (CPU time, memory usage, network delay).

    To help developers choose the right LSP implementation for their needs and programming language, this project would aim at building a benchmark and dashboard that, given two LSP implementations, is able to clearly summarize:
    – differences in supported capabilities
    – differences in accuracy for the supported capabilities based on a benchmark test set
    – differences in performance when using the capabilities (latency, perhaps memory usage)

    A major part of this project involves coming up with a clever way to generate test data for the LSP implementations. This could range from automatically identifying and extracting test cases of either LSP implementation and leveraging it on both implementations, to generating new test cases by mining GitHub projects containing source code of that programming language. This might not be trivial.

    To work on this project, one would need a healthy interest in exploring the LSP protocol and its implementations, in software testing (particularly benchmarking), and web development (dashboard).
  • Detecting scam web sites using deep learning
    Supervisor: David Skillicorn
    Description:
    A project to detect financial scam web sites using sophisticated linguistic analysis was quite successful. Can this expensive process be replicated by using deep learning techniques (word embeddings, biLSTMs, attention) t much lower cost?
  • Finding the significant participants in a hacker forum
    Supervisor: David Skillicorn
    Description:
    Given the posts by a large group of participants in a hacker forum, can the key players be identified? What about finding the most important topics being discussed?
  • Mapping police interactions app
    Supervisor: Catherine Stinson
    Description:
    Based on specifications by a community group that experiences police harassment, build a mobile app that is capable of organizing and mapping incidents between community and police, while keeping the data secure and confidential to a degree chosen by the user.
    The idea the community group described is similar to the Green Book app in the new season of Dear White People.
    Students taking on this project should be aware of local racial justice issues, or willing to do the reading in preparation.
  • Testing a benchmark task for enormous language models
    Supervisor: Catherine Stinson
    Description:
    “Conceptual Combinations” is a task recently included in Google’s Beyond the Imitation Game Benchmark (BIG Bench) for large language models. https://github.com/google/BIG-bench. The point of the task is to challenge language models to be able to answer questions that require semantic understanding, not just statistical correlations between sets of words.
    This project will involve running an online behavioural experiment to test the performance of English speakers on the benchmark task, running the task on several large language models, like BERT and GPT-2 for comparison, and designing models that would be expected to succeed on the task.
    This is suitable for COGS students.
  • Anomaly Detection and Correlation Analysis of Sensor Data for Autonomous Metro Rail Operation
    Supervisor: Farhana Zulkernine
    Description:
    Anomaly Detection (AD) is a key research problem in the fast-evolving area of Intelligent Transportation Systems (ITS). Industries adopting leverage Machine Learning (ML) methods to enable higher levels of autonomous operation require robust fault detection. To be proactive in ensuring reliable operation before faults occur, AD methods are applied to detect deviation from regular behavior. Therefore, AD methods first model the regular behaviour of a system, which is then used to detect deviation from the regular behavior i.e., irregular or anomalous behviour to verify system performance and highlight areas of concern for system administrators. AD in Big Data is predominantly an unsupervised learning task, meaning that the data does not contain ground truth labels. As a result, researchers either turn to simulated datasets or use benchmarked data to develop and validate unsupervised learning methods in clustering the data. For this project, Thales Canada has provided data from their Next-Generation Positioning System (NGPS) on-board their Train Autonomy Platform (TAP). This data contains speed and positioning readings from sensors onboard multiple trains collected over 2 years. The data has no labeled anomalies and characterization of anomalous behaviour is relatively limited. In addition, exploration of the impacts from environmental factors such as weather conditions, neighbouring trains, and track curvature on NGPS readings is important to define anomalous behaviour, which can be done through correlation analysis of the driving pattern with the environmental variables. Due to the nature of this new technology, Thales requires performance verification of NGPS in order to validate its use. Providing an AD tool to model the regular behaviour of the system to identify anomalous behaviour and a correlation analysis tool to show the relationship between the system’s behaviour and external factors would contribute to the real-world application of this autonomous technology.

    Objective 1: Build an IBM SPSS statistical data analytics pipeline to perform clustering and correlation analysis of external variables with train speed profile.
    Deliverable: Software and the results of correlation between weather data and presence of other trains in the parallel track.
    Objective 2: Develop a predictive model using ANN to output train speed given sequential data and compute the deviation from given data.
    Deliverable: Software in Python and label anomalous train behaviour.

    Data: Provided by industry partner Thales Group. Numeric sensor data from train’s NGPS system.

  • Air Travel Studies
    Supervisor: Yuanzhu Chen
    Description:
    Analyze airline flight data in North America and the world using statistics from US Bureau of Transportation Statistics. Study empirically how travel pattern evolve over time. Requirements: programming in Python (numpy and pandas), graph theory, graph algorithms.
  • A simulation environment for robot arms
    Supervisor: Sidney Givigi
    Description:
    Recently, there has been a growing interest in using robot manipulators in industry and healthcare. However, training environments are not very easy to be used as they do not integrate well with existing hardware. The development of appropriate simulations is an important step during the development of robotic applications, especially when humans are involved.

    The goal of this project is to extend an existing simulator for robot manipulator training in industrial and healthcare environments. The application should be developed using MuJoCo made recently available by DeepMind. The robot arms should be modeled after collaborative robotic arms available at Dr. Givigi’s lab.

    The project is most suitable for 1-3 students interested in robotics and simulation.