Prediction and detection of personal information from written text using AI techniques
No Thumbnail Available
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Machine learning represents a key branch of artificial intelligence, aiming to enable computer systems
to learn from data without explicit programming for each task. In various fields, the prediction
and detection of personal information from texts have become crucial for information security and
privacy protection. Traditional machine learning methods often face the challenge of representing
textual data in a numerical form, a problem addressed by text vectorization techniques. These techniques,
such as bag-of-words representation and word embeddings, capture the semantics and context
of words in a text, thereby improving the accuracy of prediction and detection models. This synergy
between machine learning and text vectorization offers promising prospects for privacy protection and
compliance with data protection regulations.
In this thesis, we will focus on predicting and detecting personal information, such as the age and
gender of the author, from written texts and data collected from online blogs. To achieve these objectives,
we will adopt machine learning methods, particularly implementing multilayer neural networks
for classification, as well as the TF-IDF text vectorization technique for keyword extraction.