Feature Engineering with IBM Watson Studio by Tsun Chow
Machine Learning and Artificial Intelligence are perhaps the two most commonly talked about topics in technology nowadays. Despite of all the advances recently made in ML and AI, some of the most challenging part of pursing ML are the data insight and the ways to best represent the data for machine learning. The age old problem that the pioneering AI computer scientists have battled for several decades are the same problem that the modern day ML data scientists are stumbling upon.
I went to the presentation by Dr. Tsun Chow on the topic of "Feature Engineering”. If the “Data Structure/Algorithm” is considered the final/end boss in the Computer Science, “Feature Engineering” is perhaps the final/end boss for the Data Science. Dr. Chow talked about some of the best practices for the feature engineering and demo using Titanic survival data set an IBM Watson Studio. During the lecture, Dr. Chow walked us through the definition of FE, importance of data insight (Domain Knowledge), techniques of handling missing data, and adding random data to artificially add "noise”.
I go to many lectures and presentations on various technology topics, but it is not common to meet someone who really have a deep understanding about the topic inside out. Personally, his emphasis on data insight/data domain knowledge and the utilization of random data was an eye-opening for me. I always felt like the technologist are blindly applying different ML algorithms to the data set hoping to run into a better prediction by chance… and not being able to explain how and why the machine made that prediction. Leaving the lecture room, I felt the renewed responsibility of the technologist to “really be the master of what we do” rather than simply chasing the quick result and/or profit. Thank you Dr. Chow.
Copy of the presentation HERE
Below is a short bio of Dr. Tsun Chow.
Tsun Chow has a PhD in Computer Science and Electrical Engineering from U. C. Berkeley. Formerly an IT professional at AT&T Bell Labs, Dr. Chow has been teaching IT and Business Analytics at a number of local universities in the Chicago area.
https://www.meetup.com/DuPage-Business-Analytics-Meetup/events/265543397/
Free IBM Watson Studio account
https://dataplatform.cloud.ibm.com/
Titanic dataset used the the demo - one of the most analyzed dataset (data insight)
https://github.com/Meaad96s/datapreparation_titanic/blob/master/titanic.csv