LI, Mingyang

An alumnus in Data Science (DATS) at the School of Engineering and Applied Science (SEAS), University of Pennsylvania (UPenn). Resume.

About Me


Mingyang Li received two Bachelor's degrees studying nanomaterials. Just when he thought he'd create his own self-healing superhero armor, he found himself in love with data crunching more than with magnetron sputtering1. Following his heart, soon after joining the Scientific Computering (SCMP) program at UPenn, he submatriculated into DATS.


I am a Research Associate at Wharton Research Data Services (WRDS), where I build pipelines to parse SEC filings (Form 10-K, etc.). I also conduct social science research at the World Well-Being Project (WWBP), mainly focusing on comparing Chinese and American cultures using microblog data. I am joining Google as a software engineer in late September, 2019.

1 I have to admit that, being a huge fan of Iron Man, the resemblance of a magnetic sputter with Stark's arc reactor is a major reason that got me into PVDs in the first place.


Current Projects


  • Word embedding models trained on 5,000,000 Weibo posts from each year of 2012-2018. Posts are deduplicated, in Mandarin, and in Simplified Chinese. Models are 10-fold fastText per year -- thus, a total of 70 models. Request collaboration here.
  • Sectionalized Form 10-Ks in plain text. Each filing is split into separate items. We have the Item IDs, too. Request collaboration via email.

Course Notes

Shared here are my review notes and summaries for some courses I have taken. Hope you will find one helpful.

"Learn a course as if you are to teach it." -- Mingyang Li

CIS545 Big Data Analytics - Spring 2018

This was a fun course to attend.

STAT512 Mathematical Statistics - Spring 2018

Taught by Professor Ewens, this course was a charm.

CIS550 Database - Fall 2017

"A SQL query goes into a bar, walks up to two tables and asks, 'can I join you?'" -- Anonymous

CIS502 Algorithm Analysis - Fall 2017

I got a C for this course. Don't believe anything I wrote in my notes.

CIS519 Intro To Machine Learning - Fall 2017

Notes here are in Markdown format. I recommend Typora as your Markdown editor/viewer.

MNS321 Electrical and Optical Properties of Materials - Spring 2016

Taught by Professor Gunter Scholz at the University of Waterloo -- yes, I was a nanomaterial student before diving into data science.

Miscellaneous Notes

Listed here are notes not related to any course and also not suitable as a blog post.