A beginner guide for business students
Data analysis means examining large amounts of information in order to discover patterns, trends, and useful insights.
In business and finance, organizations collect huge amounts of data every day. Data analysis helps decision makers understand this information and make better strategic choices.
In this project we analyse more than 100,000 financial news articles. Because it would be impossible for a human to read them all manually, we use computer programs to automatically analyse the text and identify the main topics discussed in financial media.
Natural Language Processing (NLP) is a field of Artificial Intelligence that allows computers to understand human language.
This allows financial analysts to analyse large volumes of news automatically.
Python is a programming language commonly used for:
print("Hello World")
In this assignment Python is used to analyse large datasets of financial news articles.
Anaconda is a software platform used to run Python for data science.
It provides many tools already installed such as:
Kaggle is an online platform where people share datasets and run data science projects.
Students and researchers use Kaggle to:
The assignment uses a dataset of more than 100,000 financial news articles.
These articles are stored in JSON files.
{
"title":"Stock market rises",
"date":"2022-05-10",
"text":"Investors reacted positively..."
}
Before using machine learning the text must be cleaned.
This is done using spaCy.
Steps include:LDA (Latent Dirichlet Allocation) is a machine learning algorithm that finds hidden topics inside documents.
Each article can contain multiple topics with different probabilities.
Different topic models must be compared.
Two common evaluation metrics are:Click to check your understanding