Saturday, July 27, 2024
HomeData ScienceI talked to a data scientist who keeps winning at Kaggle!

I talked to a data scientist who keeps winning at Kaggle!

Talk to Kaggle Grandmasters In this series of interviews, we share stories with Kaggle Grandmasters and data scientists working at H2O.ai to share their journeys, inspirations, and achievements . there is These interviews are meant to motivate and encourage those who want to understand what it takes to become a Kaggle Grandmaster.

In this interview, I share my interaction with Philipp Singer, known as Psi in the Kaggle world. He is a Kaggle Double Grandmaster and Senior Data Scientist at H2O.ai. He holds a PhD in Computer Science with honors from Graz University of Technology , where he also completed a master’s degree in software development and business administration.

(*Translation Note 1) Graz University of Technology is one of the leading technical universities in Austria , founded in 1811 . Among its graduates is Nikola Tesla , who invented the alternating current electricity system .

Philipp has several achievements, including scientific accolades, including multiple Kaggle wins and top prizes, and Best Paper Award at the prestigious World Wide Web Conference . He is currently ranked 3rd in the world in Kaggle competitions, an achievement that is both very impressive and inspiring for Kaggle participants.

One of Philipp’s most notable achievements was winning the NFL’s second annual Big Data Bowl competition, teaming with fellow H2O.ai data scientist Dmitry Gordeev . More than 2,000 data scientists from all over the world competed on Kaggle to predict runplay results. Philipp and Dmitry Gordeev won a prize of $50,000 for their unique approach to the challenge.

(*Translation Note 2) The NFL Big Data Bowl is a Kaggle competition in which the NFL, the American professional football league, questions. It was held from October 9, 2019 to January 6, 2020, and the prize money was $ 75,000 (about 7.8 million yen) . The content is to develop a model that predicts the acquisition distance of run play (play that receives the ball from the quarterback and runs toward the goal), which is one of the offensive tactics of American football . The American football game data provided included the position on the field of the player in possession of the ball, body orientation, movement speed, and so on.

(*Translation Note 3) Mr. Philipp and Mr. Dmitry divided the features included in the provided game data into three categories as follows.

  1. Data such as body orientation for the player in possession of the ball .
  2. Positional and other data regarding the opposing team players defending to block the advance of the player in possession of the ball .
  3. Positional and other data about offensive players (who are on the same team as the ball holder) that block players defending the ball holder .

For these data groups, we used CNNs to extract features and build predictive models (see model schematic below).

In this interview, we learn more about his education, his passion for Kaggle, and his work as a data scientist. Below is an excerpt from a conversation with Philipp.

You have a PhD in computer science. Why did you choose data science as a career instead of sticking to academic research?

Philipp: I have a PhD in Computer Science from Graz University of Technology, Austria, and worked as a postdoc in Germany. During his academic career, he has been exposed to various data sciences and has published many papers and articles in prestigious conferences and journals. As a next step in such a career, I thought I should aim for a professorship. In fact, the professorship seemed very attractive. However, although I love teaching, I also wanted to do something more applied. Because I wanted my work to have more impact than what academic research can do. So I decided to do a job in data science. So after thoroughly enjoying my PhD and learning a lot during this time, I am now at the forefront of data science and machine learning and am excited to serve as a true value maker at H2O.ai. I’m happy.

・・・

How did your relationship with Kaggle begin? And what kept you motivated on your journey to Grandmaster?

Philipp: I signed up for Kaggle about eight years ago, when I was close to starting my PhD. I signed up because I heard about this platform and wanted to check it out. I didn’t do much more than a sample post, but I haven’t touched Kaggle in 6 years. About two years ago, Dmitry (then known as dott1718 on Kaggle , now a colleague at work) and I decided to try out a Kaggle competition together as a side project at work. Initially, I had very little expectations for the competition, but I won the competition, and this win totally blew my mind, and thus began my Kaggle journey. My approach at Kaggle has always been to tackle new types of problems to keep me motivated, and there are still new and exciting problems on Kaggle to properly solve. I also enjoy meeting and working with talented people on Kaggle and seeing how the community works.

・・・

It’s been doing great on Kaggle’s leaderboards lately, coming in second in the recent competition The Beginnings and Future of the NFL – Detecting Impact . What approach do you take to successfully solve the problem?

Philip:: I’m often asked how to win a Kaggle contest, but I don’t believe there is a universal secret sauce that will give you a win. Much of our success at Kaggle is based on experience and a willingness to learn about the seemingly lesser known. Over time, I’ve assembled my own generic toolbox of pieces from each competition I’ve worked on. For example, we understand how toproperly set up cross-validation , which libraries to use for your modelThat’s why recent competitions have more time to focus on new and important parts. I’m also always trying to improve my workflow after the competition to become more efficient and competitive.

Much of our success at Kaggle is based on experience and a willingness to learn about the seemingly lesser known.

(*Translation Note 4) The Beginning and Future of the NFL – Detecting Impact is a Kaggle competition held by the NFL on the subject of American football. The content is to predict the impact that will occur when players collide from images and videos of NFL games . As training data, a set of videos taken from the end zone and from the side of the field was provided as one unit.

・・・

How do you decide which competitions to enter?

Philipp’s list of top achievements on Kaggle

Philipp: I mainly try to work on new types of problems and competitions that I think are interesting about data and problems to solve. Sometimes we also run more standard contests to see the state of the technology, which changes every week.

・・・

How do you typically approach Kaggle problems? Let us know if you have any favorite machine learning materials (online courses, blogs, etc.) that you would like to share with the community.

Philipp: I try to study the specific problem at hand by drawing on a repertoire of methods, tools and experience that I have already accumulated. That means researching previous solutions to similar problems on Kaggle and reading related papers. The best way to learn about a problem is to do it and learn in the process.

・・・

As a data scientist at H2O.ai, what is your role and area of ​​expertise?

Philipp joins H2O.ai as a Kaggle Grandmaster Fellow

In the list of H2O.ai member photos above, the photo in the lower left corner is Paul Pandey, the author of this article .

Philipp: At H2O.ai , my role is very multifaceted. I always work on customer facing projects where my goal is to use my data science expertise to support the project. In addition, as a Kaggle Grandmaster, we are always on the cutting edge of experience and knowledge to continuously improve our products and develop new, cutting-edge prototypes and solutions. For example,we are proposing new features for Driverless AI and developing AI applications with Wave to demonstrate new technologies and full-pipeline data science solutions.

(*Translation Note 6) Driverless AI is a machine learning platform developed and provided by H2O.ai. Feature engineering, tuning, etc. can be performed efficiently, and machine learning can be developed in minutes to hours .
Wave is a Python application development framework developed and provided by the company. Rapidly develop interactive AI applications .

・・・

What’s the best thing you’ve learned from Kaggle and applied it to your area of ​​expertise at H2o.ai?

Philipp: One of the key things we learn from Kaggle is how to generate robust models that are easy to generalize and immune to strong overfitting. Practicing this know-how is very important at Kaggle, as it requires good handling of private, invisible data. That said, it’s important to learn a lot about robust cross-validation and to care about different cross-sections of the data, such as shifts in feature distributions and certain essential aspects. I have successfully applied this knowledge to my work at H2O.ai. We want to use these learnings to help our customers run robust machine learning, supported by our expertise and domain knowledge.

・・・

The data science field is evolving rapidly. How do you keep up with the latest developments?

Philipp: I mostly use Kaggle to keep up with the latest developments. Kaggle is also a great filter for determining whether a new technology is practical and applicable to a problem, or whether it is useless. Robust methods usually survive, and marginal methods that work only occasionally get filtered out. At the same time, I keep up to date with the latest information by following well-known researchers and practitioners on twitter and other platforms.

Are there any areas or problems you would like to apply your machine learning expertise to?

Philipp spoke at the Vienna Data Science Group meetup on January 9, 2020

Philipp: I’m not particularly picky (to use my expertise). I always want people to be surprised by the interesting problems they encounter at work and at Kaggle. It’s very important to delve into the seemingly uninteresting issues. You will also be able to take an unbiased view of the problem (by working on a variety of problems) and apply your experience from other problems to the data.

・・・

Do you have any advice for people who are new to data science, Kaggle, or wanting to start their data science journey?

Philipp: Don’t get your hands dirty, don’t be afraid to fail, and always be willing to learn new things.

・・・

Philipp’s Kaggle journey is quite remarkable. His journey, dedication, and accomplishments are sure to be a source of inspiration for those already working in this field or looking to build a career.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Recent Posts

Most Popular

Recent Comments