This repository provides a demonstration of the deep learning package for classifying the code parsed by the fast utility. See also the Visual Studio Code Extension.
You can run fast in your own machine as the docker container of course, but here you don't even need that: all the binary and python dependencies have been provided, including also the trained models and the pre-trained embeddings.
To reproduce the results, all you need is to enable the GitPod app to access your GitHub account so that the commands can run on a remote server belonging to yourself.
Examples of algorithms in Java and C++ are provided to test the algorithm classification deep learning tool. Once your gitpod machine is running, it will launch the following command:
run.sh datasets/github_java_10/4/1.java
Looks like Tensorflow 1.15 is no longer supported by default. You need to set up an older python environment that is compatible with this older version.
You will see the predicted probabilistic distribution of the class labels: the correctly classified label will be shown in blue, and the misclassified label will be shown in red.
To understand why, click at the HTML file "datasets/github_java_10/4/1.html" and use the Preview button on the up-right corner of the tab to see visualisation results in a split pane. The colours on the tokens indicate which parts of the code that have got the most attention by the classification algorithm.
To run another example, type:
run.sh datasets/github_java_10/4/3.java
run.sh datasets/github_cs_10/4/1.cs
run.sh datasets/github_cpp_10/4/1.cpp
In these examples, it shows that even though the model was trained using Java programs, when applying it to other programming languages such as C# or C++, it normally works well too. We call this feature "Cross-Language Algorithm Classification" [Bui et al SANER'19].
cd datasets
# print the command line options and arguments
fast
# convert a C++ code into protobuffer representation
fast tensorflow-1.0.1/tensorflow/cc/saved_model/loader_test.cc tensorflow-1.0.1/tensorflow/cc/saved_model/loader_test.cc.pb
# convert a Java code into flatbuffers representation
fast RxJava-1.2.9/src/test/java/rx/ErrorHandlingTests.java.java RxJava-1.2.9/src/test/java/rx/ErrorHandlingTests.java.fbs
# convert a flatbuffers representation back to C#
fast corefx-1.0.4/src/System.IO.IsolatedStorage/ref/System.IO.IsolatedStorage.cs.fbs corefx-1.0.4/src/System.IO.IsolatedStorage/ref/System.IO.IsolatedStorage.cs
# slice a program
fast -S -G RxJava-1.2.9/src/test/java/rx/ErrorHandlingTests.java RxJava-1.2.9/src/test/java/rx/ErrorHandlingTests-ggnn.fbs
# diff two programs
fast -D github_java_10/4/1.java github_java_10/4/3.java
cd usr/bin
java -cp /workspace/demo/usr/config:/workspace/demo/usr/config/lic:/workspace/demo/usr/lib/ConCodeSe-1.0.0.jar com.concodese.ConCodeSeJettyServerStarter SERVER_PORT=8081
alias fast=”docker run -v $PWD:/e yijun/fast”
Yijun Yu. "fAST: Flattening Abstract Syntax Trees for Efficiency". In: 41st ACM/IEEE International Conference on Software Engineering, 25-31 May 2019, Montreal, Canada, ACM and IEEE. demo, paper, poster
Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang. "Learning Cross-Language API Mappings with Little Knowledge", In the 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), Tallinn, Estonia, 26-30 August 2019.
Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang. "Bilateral Dependency Neural Networks for Cross-Language Algorithm Classification", In the 26th edition of the IEEE International Conference on Software Analysis, Evolution and Reengineering, Research Track, Hangzhou, China, February 24-27, 2019. GGNN, DTBCNN
Nghi D. Q. Bui, Lingxiao Jiang, and Yijun Yu. "Cross-Language Learning for Program Classification Using Bilateral Tree-Based Convolutional Neural Networks", In the proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI) Workshop on NLP for Software Engineering, New Orleans, Louisiana, USA, 2018. Bi-TBCNN
Miltiadis Allamanis, Marc Brockschmidt, Mahmoud Khademi. "Learning to Represent Programs with Graphs", In: 6th International Conference on Language Representations (ICLR), 2018. GGNN
Y. Li, D. Tarlow, M. Brockschmidt, R. Zemel. "Gated graph sequence neural networks", In: 4th International Conference on Language Representations (ICLR), 2016.
Lili Mou, Ge Li, Lu Zhang, Tao Wang, Zhi Jin: "Convolutional Neural Networks over Tree Structures for Programming Language Processing". In: AAAI 2016: 1287-1293. TBCNN, datasets/pku_cpp_104/
M. L. Collard and J. I. Maletic, "srcML 1.0: Explore, Analyze, and Manipulate Source Code," 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME), Raleigh, NC, 2016, pp. 649-649. srcML
Parr, T. J. and Quong, R. W. 1995. "ANTLR: a predicated-LL(k) parser generator". Softw. Pract. Exper. 25, 7 (Jul. 1995), 789-810. ANTLR
Hakam W. Alomari, Michael L. Collard, Jonathan I. Maletic, Nouh Alhindawi and Omar Meqdadi. “srcSlice: very efficient and scalable forward static slicing”. Software: Evolution and Process, 26(11):931-961, November 2014.
Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. "Fine-grained and accurate source code differencing". In Proceedings of the 29th ACM/IEEE international conference on Automated software engineering (ASE '14). ACM, New York, NY, USA, 313-324. GumTreeDiff
Yijun Yu, Thein Thun Tun, and Bashar Nuseibeh, "Specifying and detecting meaningful changes in programs," In: Proc. of the 26th IEEE/ACM Conference on Automated Software Engineering, pp. 273-282, 2011. MCT
Tezcan Dilshener, Michel Wermelinger, Yijun Yu: “Locating bugs without looking back”. Automated Software Engineering 25(3): 383-434 (2018) ConCodeSe