Buffer-6.0

Women-Harassment-Detection

Protect. Detect. Empower.

Overview

Tech Features That Safeguard Women, One Message at a Time. A Streamlit-based web app that detects online harassment from user-submitted text using an ML model. If harassment is detected, it encodes a report with message, sender details, and location metadata to help authorities take action.

Data Structures Used

1.pandas.DataFrame

📍 Used in:

Reading and displaying CSV data
Passing user message input to the model

🔎 Why:

Structured format suitable for ML
Compatible with scikit-learn pipelines

2.list

📍 Used in:

Collecting values (labels, sizes) for plotting
Creating the heap in Huffman encoding

🔎 Why:

Dynamic and easy to manipulate
Supports priority queues via heapq

3. dict

📍 Used in:

Storing frequency counts
Huffman encoding map
Reverse lookup during decoding
IP and location API responses

🔎 Why:

O(1) average time complexity for lookups
Flexible for structured data mapping

4. defaultdict

📍 Used in:

Counting character frequency in create_frequency_dict

🔎 Why:

Prevents key errors while incrementing
Cleaner and safer syntax than regular dict

5. heapq` (Min Heap using list)

📍 Used in:

build_huffman_tree() for character priority

🔎 Why:

Huffman coding needs the two least frequent elements repeatedly
Efficient for implementing priority queue

6. tuple

📍 Used in:

Storing (char, encoding) pairs
Returning multiple values like latitude, longitude

🔎 Why:

Lightweight, immutable group of values
Ideal for returning compact data from functions
7. str (String)

📍 Used in:

User inputs
Huffman text compression
File naming
Report formatting

🔎 Why:

Native support for encoding/decoding operations
Fast and memory-efficient for text
8. Custom Linked List

📍 Used in:

Organizing and displaying phone number and IP address metadata for reports

🔎 Why:

Good for sequential insertion without resizing
Educational purpose — reinforces understanding of pointers and nodes
9.Huffman Tree (Nested List/Tree Structure)

📍 Used in:

build_huffman_tree() The tree is built like: python [total_weight, [char1, code1], [char2, code2], ...]

🔎 Why:

The binary tree represents prefix codes for each character
Left child adds '0', right child adds '1'
Helps generate minimum average bit-length encoding 🔺 Example: If you encode "aabbbc", the tree might look like: python [6, ['c', '00'], ['a', '01'], ['b', '1'] ] Each [char, code] pair shows the binary code assigned to that character.

Summary Table:

Flow Daigram

1.Information Available

2.Non Harassing Speech Detected

3.Harassing Speech Detected

4.Encoded Text in Report Using Huffman algorithm

5.Approx location of sender's sim origin as per telecom registration

Model Accuracy

Accuracy: 0.9729675206778293 Classification Report: precision recall f1-score support

harassing       0.99      0.98      0.98      3731

not_harassing 0.94 0.96 0.95 1226

 accuracy                           0.97      4957
macro avg       0.96      0.97      0.96      4957

weighted avg 0.97 0.97 0.97 4957

Key Features and their role in Women Safety.

🔍 1. Real-Time Harassment Detection (ML-powered) Feature: Uses machine learning to classify messages as harassing or non-harassing. Advantage: Immediate alert for any suspicious or abusive text content. Why Use It: Reduces response time and prevents escalation. Women’s Safety Link: Helps identify verbal abuse and threats at an early stage, encouraging timely intervention.

2. AI-Powered Sentiment Analysis Feature: Analyzes emotional tone using logistic regression with TF-IDF vectorization. Advantage: Understands the intensity and intent behind messages. Why Use It: More than keyword-based — understands nuance. Women’s Safety Link: Detects manipulation, gaslighting, or persistent threats often missed in simple filters.

🔐 3. Huffman Encoding for Privacy Feature: Compresses and encrypts reports before storage. Advantage: Protects sensitive data and reduces file size. Why Use It: Ensures user safety, privacy, and report integrity. Women’s Safety Link: Empowers victims to report safely without fear of exposure.

🌍 4. Geo-location APIs (Phone/IP-based) Feature: Uses OpenCage (phone number) and IPinfo (IP address) to trace approximate location. Advantage: Tracks suspect’s location without revealing victim’s location. Why Use It: Builds credible reports that authorities can use. Women’s Safety Link: Offers location-aware evidence, useful in legal escalation or emergency action.

🧾 5. Auto-Generated Reports (LinkedList Structured) Feature: Final reports are created automatically in a clean, structured format. Advantage: Saves time, ensures consistency, and makes filing easy. Why Use It: Reduces emotional burden on the victim to explain everything repeatedly. Women’s Safety Link: Supports quicker filing of complaints or proof submission to trusted authorities.

📊 6. Minimal UI with Streamlit Feature: Simple, intuitive interface. Advantage: Anyone can use it — no tech skills required. Why Use It: Lower barrier for entry, even for first-time users. Women’s Safety Link: Encourages self-empowerment through easy-to-access protection tools.

🚺 Why It Matters for Women’s Safety

📱 Accessible: Even through mobile. 🧩 Smart Detection: ML adapts with more data. 🛡 Private Reporting: Secure, encoded, and discreet. 🧭 Location Insight: Helps trace origin of threats. 💬 Psychological Signal Detection: Picks up hidden or veiled abuse.

🔮 Future Work & Improvements

1. Advanced ML Models Upgrade to deep learning models like LSTM, BERT, or DistilBERT for better language understanding. Fine-tune on larger datasets to improve accuracy and detect subtler forms of harassment.

🌍 2. Multilingual Support Extend detection to regional and international languages to support diverse users. Use libraries like spaCy, transformers, or Google Translate API.

🔐 3. End-to-End Encryption & Cloud Integration Implement secure cloud storage (e.g., AWS, Firebase) for report management. Add end-to-end encryption to protect user data during transfer.

🤖 4. Real-Time Monitoring Integrate with platforms (email, social media, chat apps) for live scanning of abusive content. Optional alert systems for parents, authorities, or platform moderators.

📱 5. Mobile App Version Build an Android/iOS app for easier, on-the-go reporting. Could integrate voice-to-text input for accessibility.

🧾 6. Automated Report Filing Connect directly with local cyber cells or women’s helplines. Send auto-generated reports with minimal manual effort.

🧪 7. Community Feedback & Learning Allow users to flag false positives/negatives to improve model accuracy over time. Introduce feedback loops into model training pipeline.

Sample Simulation

https://drive.google.com/drive/folders/1by1_dJPkAFGDlz5LYm7MY3uaH3umJ4yT

Refrences

https://www.geeksforgeeks.org/huffman-coding-greedy-algo-3/

https://thesai.org/Downloads/Volume15No3/Paper_103-Detection_of_Harassment_Toward_Women_in_Twitter.pdf

https://www.sciencedirect.com/science/article/abs/pii/S0140366420319101?via%3Dihub

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Haffmun_Encoding.py		Haffmun_Encoding.py
Logo.png		Logo.png
Messages.csv		Messages.csv
Mission_Protect.py		Mission_Protect.py
PB_Banner 1.png		PB_Banner 1.png
README.md		README.md
hatespeech.png		hatespeech.png
normal.png		normal.png
requirements.txt		requirements.txt
train model.py		train model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Buffer-6.0

Women-Harassment-Detection

Protect. Detect. Empower.

Overview

Data Structures Used

Summary Table:

Flow Daigram

1.Information Available

2.Non Harassing Speech Detected

3.Harassing Speech Detected

4.Encoded Text in Report Using Huffman algorithm

5.Approx location of sender's sim origin as per telecom registration

Model Accuracy

Key Features and their role in Women Safety.

🚺 Why It Matters for Women’s Safety

🔮 Future Work & Improvements

Sample Simulation

Refrences

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Buffer-6.0

Women-Harassment-Detection

Protect. Detect. Empower.

Overview

Data Structures Used

Summary Table:

Flow Daigram

1.Information Available

2.Non Harassing Speech Detected

3.Harassing Speech Detected

4.Encoded Text in Report Using Huffman algorithm

5.Approx location of sender's sim origin as per telecom registration

Model Accuracy

Key Features and their role in Women Safety.

🚺 Why It Matters for Women’s Safety

🔮 Future Work & Improvements

Sample Simulation

Refrences

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages