聴・忍者(chou.ninja)

Creation date
30 May 2019

Description

Before I relocated to Tokyo, I had already studied Japanese for a couple of years, both in college and a little after graduation. But I quickly realized upon arrival that my listening comprehension was lacking, especially when it came to numbers. That’s when a waitress at a restaurant tells you how much the bill is (e.g., ¥2294), or when friends share the dates of events, like July 4, or 1964.

Looking for solutions, I came across a site called Gou Ninja which is a sort of trainer for listening comprehension, but it had some quirks which I didn’t like. For example, it could throw at you inapplicable numbers like 55時 or 55 o’clock; it didn’t support the days of the week, which were of particular interest to me; the UX was also such that you couldn’t use the app as you were walking down the street.

To that end, I hacked together a webapp that fit my needs, and named it 聴・忍者 or chou ninja, a shameless derivative of its inspiration.

Links:

Photographic mosaicker

Creation date
28 March 2016

Description

This was an experiment in exposing image processing algorithms to the web.

Links: [post 1] [post 2]

The case for visual-inertial SLAM over visual SLAM

Submission date
20 May 2016

Author(s)
Jia-Shen Boon

Project type
CS799 Master’s Project

Abstract

The ability to map one’s environment and locate oneself within that map has always been important in robotics. This capability has applications in augmented reality as well. It is possible to realize this capability just with one or more cameras, but the resulting system has one serious flaw during a failure mode known as “track failure”. In this report, we argue that in the bid to achieve the aforementioned capability it is overwhelmingly beneficial to the system engineer to have a sensor suite consisting at least of a camera and an inertial measurement unit, instead of solely having a camera.

Links:

Lower PAC bound on Upper Confidence Bound-based Q-learning with examples

Submission date
13 May 2016

Author(s)
Jia-Shen Boon, Xiaomin Zhang

Project type
CS761 Advanced Machine Learning course project

Abstract

Recently, there has been significant progress in understanding reinforcement learning in Markov decision processes (MDP). We focus on improving Q-learning and analyze its sample complexity. We investigate the performance of tabular Q-learning, Approximate Q-learning and UCB-based Q-learning. We also derive a lower PAC bound \( \Omega(\frac{\vert\mathcal{S}\vert^2\vert\mathcal{A}\vert}{\epsilon^2}\ln \frac{\vert\mathcal{A}\vert}{\delta}) \) of UCB-based Q-learning. Two tasks, CartPole and Pac-Man, are each solved using these three methods. Some results and discussion are presented at last. UCB-based learning does better in exploration but lose its advantage in exploitation, compared to its alternatives.

Links:

CUDA-accelerated feature matching for image stitching

Submission date
21 Dec 2015

Author(s)
Jia-Shen Boon

Project type
CS759 High Performance Computing course project

Abstract

Image stitching is an algorithmic pipeline that takes as input an unordered set of images and combines a subset of the input into one continuous canvas. It has applications in a wide variety of fields, including mobile phones, agriculture, military surveillance and medical imaging. For large problem sizes, the compute time of the pipeline is dominated by a step known as “feature matching”. How to bring this time down is still an area of active research. In this paper, we speed up feature matching by implementing it on a CUDA-enabled GPU. We present performance results and discuss future plans.

Links:

Robust image retrieval using topic modeling on captioned image data

Submission date
08 Jun 2015

Author(s)
Jia-Shen Boon, Akshay Sood, Meenakshi Syamkumar

Project type
CS766 Computer Vision course project

Abstract

Image retrieval based only on image features emphasizes only visual similarity and does not capture semantic similarity between images. In order to capture the semantic similarity textual data associated with images can be very useful. We demonstrate that some semantics of an image, while poorly captured by the image alone, can be captured by text that accompanies the image. These semantics include artistic feel and sociocultural events. We capture these semantics by modeling the topics generated by the accompanying text, referred to as captions, whie visual features are extracted with a deep convolutional network. A joint model of text and images is applied to the Flickr8K dataset. We also collect a custom dataset of over 32K images and 110K captions, crawled from Imgur. This model has applications in general to image retrieval, as well as generating links to similar images within popular photo-sharing websites such as Imgur. In the latter application, such links would allow users to ‘account-hop’, which would increase visitor stay duration.

Links:

Distributed Representation of Sentences for Speculative Language Recognition in Biomedical Articles

Submission date
18 Dec 2014

Author(s)
Jia-Shen Boon, Rasiga Gowrisankar, Akshay Sood

Project type
CS760 Machine Learning course project

Abstract

We explore the automatic identification of speculative language in biomedical articles using distributed representation of sentences and deep learning methods. Such an identification has potential applications in information retrieval, multi-document summarization and knowledge discovery. We explore two methods that learn distributed representations of sentences, Paragraph Vector model and Recursive Neural Tensor Network, and compare them against three baseline algorithms, Support Vector Machines, Naive Bayes and pattern matching. We show that the RNTN (\(F1 = 0.885\)) marginally outperforms the best baseline algorithm, linear bigram SVM (\(F1 = 0.881\)), while the paragraph vector approach performs poorly (\(F1 = 0.368\)) even after training with a large unlabeled dataset. We discuss reasons for these differences in performance and give suggestions for future work.

Links: