IBM recently launched a new program called Project CodeNet that is an opensource dataset that will be used to train AI to better understand code. The idea is to automate more of the engineering process by applying Artificial Intelligence to the problem. This is not the first project to do this and it won’t be the last. For some reason AI has become the cure all for all ‘ills’ in any part of life. It doesn’t matter if it is required or not but if there is a problem someone out there is trying to apply AI and Machine Learning to the problem.
This is not to say that Artificial Intelligence is not something that needs to be explored and developed. It has its uses but it doesn’t need to be applied everywhere. In one of my previous companies we interacted with a lot of companies who would pitch their products to us. In our last outing to a conference over 90% of the idea’s pitched had AI and/or Machine Learning involved. It got to the point where we started telling the companies that we knew what AI/ML was and ask them to just explain how they were using it in their product.
Coming back to Project CodeNet, it consists of over 14M code samples and over 500M lines of code in 55 different programming languages. The data set is high quality and curated. It contains samples from Open programming competitions with not just the code, it also contains the problem statements, sample input and output files along with details like code size, memory footprint and CPU run time. Having this curated dataset will allow developers to benchmark their software against a standard dataset and improve it over a period of time.
Potential use cases to come from the project include code search and cloud detection, automatic code correction, regression studies and prediction.
Press release: Kickstarting AI for Code: Introducing IBM’s Project CodeNet
– Suramya