Description as a Tweet:

We devise a method to find the dependence of genes on the risks of cancer. We find the risk and severity of cancer from a small pool of 15 features so that people with higher risk have a better chance for treatment.

Inspiration:

Cancer is one of the most common causes of death today, there are more than 100 types of cancers today. An interesting fact about cancer is that if caught early there is a 70% better chance of survival as compared to if caught in later stages. This hack tends to find people with high-risks of cancer early on so that they are able to get the proper treatment necessary and hopefully improve the survivability rate.

What it does:

Our project aims to find the key features from a collection of over 300 different features. Using these features as the basis, we are predicting the severity of the case for any human being given the value associated with these genes in the human body.

How we built it:

We developed a Machine Learning model in python to train and predict the likelihood. We first preprocessed the data to make it more useful using different techniques like visual exploration, imputations, etc. We built the project using Python's sklearn library for determining the features (genes) that pose a relevant significance to the severity of cancer. We then trained a neural network from with keras library using 4 layers to engineer the feature and map the features to a probability value signifying the chances of a severe case of cancer.

Technologies we used:

  • HTML/CSS
  • Python
  • Flask
  • AI/Machine Learning

Challenges we ran into:

A major challenge was the data. While many datasets are available to be used for training and testing, a major flaw in all of these datasets is the absence of some data. We had a dataset with over 320 features, there were 6300 empty values. Therefore, filtering out the fields of our interest and also filling in the gaps in the data was a major problem for which we used a variety of methods like removing empty values and using k-nn imputations before finally settling on the method of imputing using the mean value.

Accomplishments we're proud of:

We are really proud of the neural network model that we have created. Although a small neural network, it performs remarkably with an average accuracy of about 97%. Additionally, the model showed a 100% precision rate, thus identifying the population with a high risk of cancer correctly, which is a vital part of any such algorithm.

What we've learned:

This project was a great learning experience, we created a custom neural network to work with the dataset. We also gave a significant amount of time understanding the Google cloud environment and deploying to google cloud, however, we faced a major bolder there and did not get any significant help in that. Even though we did not end up using Google cloud, we self-learned quite a lot about cloud computing and its advantages.

Our major takeaways were preprocessing the data as it is an essential step in Machine Learning in order to make the data more useful and useable and building our own neural network with great accuracy.

What's next:

As next steps, we are optimistic that given training data for more such diseases, we will be able to expand our model to predict the severity in even more diseases enabling the people to get proper treatment at the correct time.
We want to expand this project to other diseases and make a portal that can be used in medical organizations/agencies to predict the likelihood of some common and life-threatening diseases globally. We also want to further work on usability by improving the frontend and incorporating a database to store the information entered once.

Built with:

We mainly used python for this project. The tools used by us include python libraries such as sklearn, flask, and tensorflow keras. We also used some HTML and CSS for the frontend.

Prizes we're going for:

  • Best Documentation
  • Best Web Hack
  • Best Machine Learning Hack
  • Best Healthcare Hack
  • Best Beginner Web Hack

Team Members

Akash Munjial
Anushree Jana

Table Number

Table TBD