SEMINARS AND WORKSHOPS

Data Viz : NLP Sentiment Analytics

On June 7, 2024, the Alumni Association and the Department of Computer Science and Engineering (Data Science) at New Horizon College of Engineering organized a workshop titled “Data Viz: NLP Sentiment Analytics.” This workshop was held from 09:30 AM to 11:00 AM at the Data Science Computer Lab-1. The session was conducted by Mr. Krishnav Dave, Founder & CEO of PreProd Corp. This report summarizes the activities and learnings from the workshop, focusing on the processes of procuring data, storing it in a local environment, performing web scraping using BeautifulSoup (bs4), and understanding the concepts of monolithic and microservices architectures.

The workshop aimed to provide participants with practical knowledge on how to perform sentiment analysis using natural language processing (NLP) techniques and visualize the results effectively. The specific focus was on working with data related to U.S. presidents.

Key Topics Covered

1. Procuring Data
Data Sources: Identification of reliable sources for gathering data on U.S. presidents, such as official websites, news articles, and historical databases. Data Collection: Methods for collecting textual data, speeches, and public opinions.

2. Storing Data in Local Environment
Environment Setup: Steps to set up the local environment for data storage and analysis. This includes installing necessary Python libraries and tools. Data Storage: Techniques for storing the collected data in a structured format, such as CSV or JSON files, to facilitate easy retrieval and processing.

3. Web Scraping Using BeautifulSoup (bs4)

Introduction to BeautifulSoup: Overview of BeautifulSoup, a Python library used for web scraping purposes.

Web Scraping Process: Detailed steps on how to scrape data from websites. 

This involves:

Sending HTTP Requests: Using libraries such as requests to fetch the content of web pages.
Parsing HTML Content: Utilizing BeautifulSoup to parse the HTML content of the web pages.
Extracting Data: Techniques for extracting specific pieces of information (e.g., text, links, and images) from the parsed HTML
content.
Storing Scraped Data: Methods for storing the scraped data in a structured format for further analysis.

4. Monolithic and Microservices Architectures

Monolithic Architecture: Understanding the traditional approach where a single, unified application is responsible for multiple tasks. Monolithic systems are simpler to develop initially but can become difficult to maintain and scale as they grow.