Improving Code Quality based on AI
No Thumbnail Available
Files
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This thesis addresses the development of a LSTM model for correcting erroneous Python
code snippets. The methodology comprises several key steps. First, a carefully curated dataset
containing 450 incorrect Python codes with corresponding corrections was constructed,
covering a diverse range of code patterns and functionalities. The dataset, structured as a CSV
file, encapsulates various aspects of Python programming, including basic function
definitions, mathematical calculations, and data processing tasks such as string manipulation
(Dataset Construction). Next, the dataset underwent meticulous data processing, involving
tokenization, padding, and splitting into training and testing sets to prepare it for model
training (Data Processing). The model architecture, following an encoder-decoder framework
with Long Short-Term Memory (LSTM) layers and a Dense layer for output, was designed to
facilitate effective sequence generation (Model Architecture). Subsequently, the model
underwent training using an iterative optimization algorithm such as stochastic gradient
descent (SGD), with parameters adjusted based on the gradients of the loss function computed
using backpropagation (Training). Evaluation of the trained model's performance on a
separate test dataset revealed robust performance metrics, including high accuracy scores and
low test loss values, indicating the model's effectiveness in distinguishing between correct and
incorrect code snippets (Evaluation). The results underscore the potential of the developed
classification and correction models to revolutionize various aspects of software development,
including code validation processes and educational platforms, with implications for
automated code analysis and correction in real-world applications (Conclusion).