Liwei Wu...

always curious about the world around me

Liwei's Homepage~

Or Liwei's home, Liwei's page, whatever you call it ;)

My name is Liwei Wu. I am studying Statistics and Computer Science at University of California-Davis. As a fourth-year PhD candidate, my research focus is on designing and implementing novel machine learning algorithms for recommender systems that can handle massive datasets. I am jointly advised by Professor Cho-Jui Hsieh and Professor James Sharpnack. I obtained MS in Statistics in 2016 and MS in Computer Science in 2018 along the way in doctoral program. Before moving to Davis, I received a BSc (First Class Honors) in Computing Mathematics from City University of Hong Kong in 2014. I did my undergraduate thesis on Extreme Value Thoery under supversion of Professor Xiang Zhou.

Currently, I am actively looking for full-time job opportunies starting 2019.


Research/Teaching/Working et al...

To know more about my research and teaching experience, one can refer to the sections research and teaching in the homepage I made from scratch.

As to my working experience, I worked at AT&T Labs as an intern on cloud services (AT&T Integrated Clouds) team during 2017 summer. During 10 weeks internship (well, actually 9 weeks, had to 1 travel to Canada and present my work at KDD'17 for 1 week), I was able to file a patent application for a scalable, multi-threaded and distributed approach I invented for Operational Management for sequence data on Cloud Platforms incorporating the tools and novel algorithms described later.

Now, when I look back, I can safely conclude that my intern at AT&T Labs is Applied Research Intern. I have spent most of my time reading papers, working on hard problems, designing algorithms, writing codes, and testing on real data. I love the idea that my research is driven by the real world problems faced by the company. I feel more motivated to solve complicated problems, because I know by solving them I can make great impact and make other people's lives easier, which is the feeling sometimes difficult to have in pure research in academia. And this really motivates me to work hard and be creative.

My official title on the offer letter is actually called PhD technical intern, which is super uninformative and does sould like my job is mainly fixing technical issues for computers rather than developing novel algorithms...At first I was thinking I was data engineer/software engineeer/machine learning engineer, since during first two weeks, all I did is to build an internal tool in Python to simplify unstructured text mining task using a novel algorithm I proposed. The data I was dealing with is raw log data for the cloud and is very messy and unstructured. It turns out the tool I built is very curcial to what I achieved later. The parsing tool has nice User Interface (very flexible to use and can easily cater to different demands), and supports parallel computing on multicore SMP. What is more, it is 10x faster than previous implementation using regular expression and works for data in various forms: multi-terabyte flat files stored in disk or ElasticSearch or HDFS, and real-time data from RabbitMQ message queue on OpenStack. Later, I even re-implemented my algorithm in Python and Scala to fit in Spark framework for distributed cluster computing. On top of the parsing tool, I developed a real-time multi-threaded audit tool on cloud to alert operation team, which is used in production. Each thread is responsible for one of the following tasks: listening to RabbitMQ continuously querying Vertica Production Database periodically, and detecting discrepancy in real time.

During the data cleaning process, I realized that unknown failure root causes on the cloud are causing a lot of headaches for operational team. If machine can predict correctly what root causes are, then it can be fixed promptly. After I identified the problem, I formulatd the problem and propose to my boss that this may be solved using machine learning algorithms. I am grateful that I am allowed to pursue this new project I proposed for my rest of internship. I did a lot research on this topic and after trial and run, eventually for one production zone I was considering, I achieved over 96% prediction accuracy for OpenStack Failure Root Causes Prediction Project I proposed using ensemble of several algorithms, including one novel algorithm I designed and implemented in Tensorflow. I started the patent application just before I finished my internship. It usually takes 5 years to get it approved. Hopefully I can eventually get it.

Programming Skills

As to my programming skills, I can write production codes in Python and C++. I use Julia mainly for research in machine learning/optimization and am very proficient in Julia. I can make websites with knowledge of front end (html, CSS, javascript) and backend (php, relational(SQL)/NoSQL databases) knowledge. I also have a good knowledge of other languages such as R/matlab and various machine learning libraries, including Tensorflow and XGBoost. I can read Java codes and write simple Java as well as simple Shell Scripts. I feel I can easily pick up any language within a short time. (I wrote this sentence in June and during Internship I picked up Scala because I need to fit my novel parsing algorithm in the Spark Framework, so it kinda verified my statement). Anyway, all the languages mentioned are connected and similar to each other to some extent. For me, I think the knowledge base itself is not as important as the curiosity for new things and the willingness and ability to pick up something new quickly.

Recent News

I will intern at Facebook News Feed Core Machine Learning team in 2018 summer.

I travelled to Seattle, Washington during Oct 17-21, 2017 for Amazon's Annual Graduate Research Symposium.

I travelled to Halifax, Nova Scotia - Canada to give an oral presentation at KDD 2017 during Aug 13-17, 2017.

I just finished my internship at AT&T labs on cloud services (AT&T Integrated Clouds) team during 2017 summer, during which I filed a patent for a scalable, multi-threaded and distributed approach I invented for Operational Management for sequence data on Cloud Platforms incorporating the tools and novel algorithms I designed and implemented. You can refer to the Experience Section above for more details.

Promotional Video for my paper on Collaborative Ranking


Office: Room 1113, Mathematical Sciences Building


Linkedin: Liwei Wu

Github: wuliwei9278

For potential future career and research collaboration opportunities, you can request a copy of my latest resume by filling out the following Google form: link

My Greatest Achievement

The one thing I am proudest of in my life is that I lost over 80 pounds of weight and finished a couple of full marathons, as one can easily tell the difference from the photos below. The first one was me back in 2013. The second and third ones is me at Big Sur during 2017 Spring Break.

Template design by Andreas Viklund