Classification of textual data that implements LSTM and BERT on web of science dataset using Python
| .github | ||
| logs | ||
| .gitignore | ||
| attention_plot_correct_0.png | ||
| attention_plot_correct_1.png | ||
| attention_plot_incorrect_2.png | ||
| attention_plot_incorrect_3.png | ||
| attention_plot_incorrect_4.png | ||
| BERT.png | ||
| BERT.py | ||
| BERT_comparaison.png | ||
| dataset-acquisition.png | ||
| dataset-acquisition.py | ||
| Experiment-1-LSTM-Standardized-Split.py | ||
| Experiment-2-BERT-Standardized-Split.py | ||
| Experiment-3-LSTM-Single.py | ||
| Experiment-4-BERT-Single.py | ||
| Experiment-5-BERT-Attention-Matrix.py | ||
| LICENSE.md | ||
| LSTM.png | ||
| LSTM.py | ||
| LSTM_accuracies.png | ||
| LSTM_for_comparaison.png | ||
| README.md | ||
| X.txt | ||
| Y.txt | ||
| YL1.txt | ||
| YL2.txt | ||
Classification of Textual Data
In this repository, we explore two advanced models, LSTM and BERT on the Web of Science dataset. The project involves building a custom LSTM model from scratch and fine-tuning a pre-trained BERT model for text classification. We compare the performance of these models in classifying scientific paper abstracts into their corresponding fields and sub-fields. The main objectives were to pre-process raw text data, implement LSTM and BERT from the ground up, run experiments and analyze the results in terms of accuracy and model performance. The final report includes a detailed comparison between both models, insights on the impact of pretraining and performance discussion based on our findings.
Authors
- Batuhan Berk Başoğlu, 260768350 - batuhan-basoglu
- Jared Tritt, 260763506 - Jaredtritt
- Alys Pisani-Houze, 261093153