Skip to content

Pritee3011/-170

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Buffer-6.0

Women-Harassment-Detection

Protect. Detect. Empower.

Overview

Tech Features That Safeguard Women, One Message at a Time. A Streamlit-based web app that detects online harassment from user-submitted text using an ML model. If harassment is detected, it encodes a report with message, sender details, and location metadata to help authorities take action.

Data Structures Used

1.pandas.DataFrame

๐Ÿ“ Used in:

  • Reading and displaying CSV data
  • Passing user message input to the model

๐Ÿ”Ž Why:

  • Structured format suitable for ML
  • Compatible with scikit-learn pipelines

2.list

๐Ÿ“ Used in:

  • Collecting values (labels, sizes) for plotting
  • Creating the heap in Huffman encoding

๐Ÿ”Ž Why:

  • Dynamic and easy to manipulate

  • Supports priority queues via heapq

    3. dict

๐Ÿ“ Used in:

  • Storing frequency counts
  • Huffman encoding map
  • Reverse lookup during decoding
  • IP and location API responses

๐Ÿ”Ž Why:

  • O(1) average time complexity for lookups
  • Flexible for structured data mapping

4. defaultdict

๐Ÿ“ Used in:

  • Counting character frequency in create_frequency_dict

๐Ÿ”Ž Why:

  • Prevents key errors while incrementing
  • Cleaner and safer syntax than regular dict

5. heapq` (Min Heap using list)

๐Ÿ“ Used in:

  • build_huffman_tree() for character priority

๐Ÿ”Ž Why:

  • Huffman coding needs the two least frequent elements repeatedly
  • Efficient for implementing priority queue

6. tuple

๐Ÿ“ Used in:

  • Storing (char, encoding) pairs
  • Returning multiple values like latitude, longitude

๐Ÿ”Ž Why:

  • Lightweight, immutable group of values
  • Ideal for returning compact data from functions
  • 7. str (String)

๐Ÿ“ Used in:

  • User inputs
  • Huffman text compression
  • File naming
  • Report formatting

๐Ÿ”Ž Why:

  • Native support for encoding/decoding operations
  • Fast and memory-efficient for text
  • 8. Custom Linked List

๐Ÿ“ Used in:

  • Organizing and displaying phone number and IP address metadata for reports

๐Ÿ”Ž Why:

  • Good for sequential insertion without resizing
  • Educational purpose โ€” reinforces understanding of pointers and nodes
  • 9.Huffman Tree (Nested List/Tree Structure)

๐Ÿ“ Used in:

  • build_huffman_tree() The tree is built like: python [total_weight, [char1, code1], [char2, code2], ...]

๐Ÿ”Ž Why:

  • The binary tree represents prefix codes for each character
  • Left child adds '0', right child adds '1'
  • Helps generate minimum average bit-length encoding ๐Ÿ”บ Example: If you encode "aabbbc", the tree might look like: python [6, ['c', '00'], ['a', '01'], ['b', '1'] ] Each [char, code] pair shows the binary code assigned to that character.

Summary Table:

image

Flow Daigram

WhatsApp Image 2025-04-17 at 21 59 42_3e6437b4

WhatsApp Image 2025-04-17 at 22 45 00_cda38c13

1.Information Available

image

2.Non Harassing Speech Detected

WhatsApp Image 2025-04-17 at 23 21 27_f5bc0ba4

3.Harassing Speech Detected

WhatsApp Image 2025-04-17 at 23 21 28_9b8df684

4.Encoded Text in Report Using Huffman algorithm

WhatsApp Image 2025-04-17 at 23 21 27_868c8a86

WhatsApp Image 2025-04-17 at 23 21 29_44b8360d

5.Approx location of sender's sim origin as per telecom registration

WhatsApp Image 2025-04-17 at 23 21 29_16444168

Model Accuracy

image

Accuracy: 0.9729675206778293 Classification Report: precision recall f1-score support

harassing       0.99      0.98      0.98      3731

not_harassing 0.94 0.96 0.95 1226

 accuracy                           0.97      4957
macro avg       0.96      0.97      0.96      4957

weighted avg 0.97 0.97 0.97 4957

Key Features and their role in Women Safety.

๐Ÿ” 1. Real-Time Harassment Detection (ML-powered) Feature: Uses machine learning to classify messages as harassing or non-harassing. Advantage: Immediate alert for any suspicious or abusive text content. Why Use It: Reduces response time and prevents escalation. Womenโ€™s Safety Link: Helps identify verbal abuse and threats at an early stage, encouraging timely intervention.

2. AI-Powered Sentiment Analysis Feature: Analyzes emotional tone using logistic regression with TF-IDF vectorization. Advantage: Understands the intensity and intent behind messages. Why Use It: More than keyword-based โ€” understands nuance. Womenโ€™s Safety Link: Detects manipulation, gaslighting, or persistent threats often missed in simple filters.

๐Ÿ” 3. Huffman Encoding for Privacy Feature: Compresses and encrypts reports before storage. Advantage: Protects sensitive data and reduces file size. Why Use It: Ensures user safety, privacy, and report integrity. Womenโ€™s Safety Link: Empowers victims to report safely without fear of exposure.

๐ŸŒ 4. Geo-location APIs (Phone/IP-based) Feature: Uses OpenCage (phone number) and IPinfo (IP address) to trace approximate location. Advantage: Tracks suspectโ€™s location without revealing victimโ€™s location. Why Use It: Builds credible reports that authorities can use. Womenโ€™s Safety Link: Offers location-aware evidence, useful in legal escalation or emergency action.

๐Ÿงพ 5. Auto-Generated Reports (LinkedList Structured) Feature: Final reports are created automatically in a clean, structured format. Advantage: Saves time, ensures consistency, and makes filing easy. Why Use It: Reduces emotional burden on the victim to explain everything repeatedly. Womenโ€™s Safety Link: Supports quicker filing of complaints or proof submission to trusted authorities.

๐Ÿ“Š 6. Minimal UI with Streamlit Feature: Simple, intuitive interface. Advantage: Anyone can use it โ€” no tech skills required. Why Use It: Lower barrier for entry, even for first-time users. Womenโ€™s Safety Link: Encourages self-empowerment through easy-to-access protection tools.

๐Ÿšบ Why It Matters for Womenโ€™s Safety

๐Ÿ“ฑ Accessible: Even through mobile. ๐Ÿงฉ Smart Detection: ML adapts with more data. ๐Ÿ›ก Private Reporting: Secure, encoded, and discreet. ๐Ÿงญ Location Insight: Helps trace origin of threats. ๐Ÿ’ฌ Psychological Signal Detection: Picks up hidden or veiled abuse.

๐Ÿ”ฎ Future Work & Improvements

1. Advanced ML Models Upgrade to deep learning models like LSTM, BERT, or DistilBERT for better language understanding. Fine-tune on larger datasets to improve accuracy and detect subtler forms of harassment.

๐ŸŒ 2. Multilingual Support Extend detection to regional and international languages to support diverse users. Use libraries like spaCy, transformers, or Google Translate API.

๐Ÿ” 3. End-to-End Encryption & Cloud Integration Implement secure cloud storage (e.g., AWS, Firebase) for report management. Add end-to-end encryption to protect user data during transfer.

๐Ÿค– 4. Real-Time Monitoring Integrate with platforms (email, social media, chat apps) for live scanning of abusive content. Optional alert systems for parents, authorities, or platform moderators.

๐Ÿ“ฑ 5. Mobile App Version Build an Android/iOS app for easier, on-the-go reporting. Could integrate voice-to-text input for accessibility.

๐Ÿงพ 6. Automated Report Filing Connect directly with local cyber cells or womenโ€™s helplines. Send auto-generated reports with minimal manual effort.

๐Ÿงช 7. Community Feedback & Learning Allow users to flag false positives/negatives to improve model accuracy over time. Introduce feedback loops into model training pipeline.

Sample Simulation

https://drive.google.com/drive/folders/1by1_dJPkAFGDlz5LYm7MY3uaH3umJ4yT

Refrences

https://www.geeksforgeeks.org/huffman-coding-greedy-algo-3/

https://thesai.org/Downloads/Volume15No3/Paper_103-Detection_of_Harassment_Toward_Women_in_Twitter.pdf

https://www.sciencedirect.com/science/article/abs/pii/S0140366420319101?via%3Dihub

About

She Secure

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%