I am a graduate student at UCLA pursuing a Master of Science in Computer Science with a concentration in Artificial Intelligence. I anticipate graduating in December 2023. Previously, I completed my undergraduate degree in Computer Science at UC San Diego.
My interests lie in computer vision and multidisciplinary deep learning applications. I am currently seeking opportunities for machine learning roles starting in 2024.
📍 San Jose, CA✉️allencheung @ g.ucla.edu
In this project, we are leveraging Faster R-CNN and multimodal fusion mechanisms (MVX-Net) to develop a deep learning model that fuses 2D images and 3D point clouds to improve autonomous driving action explainability & navigation. Prior work by Xu et. al has demonstrated that multi-task learning improves 2D object detection and provides intuitive explanations for actions performed by an AV. We are aiming to extend upon this work to incorporate the 3D modality in hopes of improving performance in scenarios such as bad weather or night where the 2D image modality may fail. To gather data, I designed an Amazon MTurk data annotation interface to collect labels on the Waymo Open dataset for driver actions and explanations.
Try the annotation interface!Fully supervised methods [1, 2, 4] relied on unreasonably large amounts of synthetic data, while current unsupervised methods [5, 6, 7] are clearly lacking in performance. We wanted to explore how we can incorporate a small amount of 3D data in this latter group to boost this suboptimal performance. This would potentially be especially useful for classes with complex topologies or different modes, such as pianos, cups, or chairs. We selected the paper Shape and Viewpoints without Keypoints (UCMR) [5] as the base of our work. Our first step was to verify our hypothesis that the mean template shape greatly affected performance, and thus incorporating a more suitable base shape to the specific class mode would lead to improvements.
SMART 5.0 is a deep learning model that uses a Transformer architecture for natural product identification in the drug discovery workflow. Previous iterations of SMART adopted a CNN-based architecture, taking in images of HSQC spectra plots as input to the model. Given the sequential nature of HSQC spectra coordinates, Morgan Fingerprints, and SMILES strings, a sequence-to-sequence Transformer model may be better suited for the problem.
In the domain of computer vision, point cloud data allows for objects to be represented in the three dimensional space through a large set of (𝑥, 𝑦, 𝑧) geometric coordinates. This project uses a dataset of 49 images of a bird plushie on a platform from various perspectives to reconstruct a 3D point-cloud representation of the object. I approached this problem through the means of multi-view stereo (MVS) by drawing point correspondences between neighboring stereo images, leveraging triangulation to reconstruct a set of 3D points, and ultimately validating the resulting point cloud for outliers to remove noise. Prior to deriving point correspondences, I utilized the Scale Invariant Feature Transform (SIFT) detection algorithm in the OpenCV library to identify points of interests across the 49 images.
Shadow mapping is a common technique for the creation of hard shadows in a scene rendering of objects with the help of texture buffers. For this project, I experimented with shadow mapping by employing two rendering passes to create the effect: one is a camera at the light source responsible for casting the light and shadows, and another is at the actual camera that is considered for the viewer’s perspective.
In this project, we aim to employ domain specific pre-training to improve results on the downstream task of hate speech classification. We fine-tune a BERT language model to perform binary and multi-class classification on the Twitter [1], Reddit [3] , and Gab [3] datasets as a baseline for our experiment. Then, we pretrain a BERT model on a large corpus of 636K hate speech entries using the Parler [5] dataset. Our pre-trained model Parler5 achieved a highest MCC score of 0.592 on the Twitter dataset, marginally outperforming our baseline model by 1.2%.