Tech Features That Safeguard Women, One Message at a Time. A Streamlit-based web app that detects online harassment from user-submitted text using an ML model. If harassment is detected, it encodes a report with message, sender details, and location metadata to help authorities take action.
1.pandas.DataFrame
๐ Used in:
- Reading and displaying CSV data
- Passing user message input to the model
๐ Why:
- Structured format suitable for ML
- Compatible with
scikit-learnpipelines
2.list
๐ Used in:
- Collecting values (labels, sizes) for plotting
- Creating the heap in Huffman encoding
๐ Why:
-
Dynamic and easy to manipulate
-
Supports priority queues via
heapq3. dict
๐ Used in:
- Storing frequency counts
- Huffman encoding map
- Reverse lookup during decoding
- IP and location API responses
๐ Why:
- O(1) average time complexity for lookups
- Flexible for structured data mapping
4. defaultdict
๐ Used in:
- Counting character frequency in
create_frequency_dict
๐ Why:
- Prevents key errors while incrementing
- Cleaner and safer syntax than regular
dict
5. heapq` (Min Heap using list)
๐ Used in:
build_huffman_tree()for character priority
๐ Why:
- Huffman coding needs the two least frequent elements repeatedly
- Efficient for implementing priority queue
6. tuple
๐ Used in:
- Storing
(char, encoding)pairs - Returning multiple values like latitude, longitude
๐ Why:
- Lightweight, immutable group of values
- Ideal for returning compact data from functions
- 7.
str(String)
๐ Used in:
- User inputs
- Huffman text compression
- File naming
- Report formatting
๐ Why:
- Native support for encoding/decoding operations
- Fast and memory-efficient for text
- 8. Custom
Linked List
๐ Used in:
- Organizing and displaying phone number and IP address metadata for reports
๐ Why:
- Good for sequential insertion without resizing
- Educational purpose โ reinforces understanding of pointers and nodes
- 9.
Huffman Tree(Nested List/Tree Structure)
๐ Used in:
- build_huffman_tree() The tree is built like: python [total_weight, [char1, code1], [char2, code2], ...]
๐ Why:
- The binary tree represents prefix codes for each character
- Left child adds
'0', right child adds'1' - Helps generate minimum average bit-length encoding
๐บ Example:
If you encode
"aabbbc", the tree might look like: python [6, ['c', '00'], ['a', '01'], ['b', '1'] ] Each[char, code]pair shows the binary code assigned to that character.
Accuracy: 0.9729675206778293 Classification Report: precision recall f1-score support
harassing 0.99 0.98 0.98 3731
not_harassing 0.94 0.96 0.95 1226
accuracy 0.97 4957
macro avg 0.96 0.97 0.96 4957
weighted avg 0.97 0.97 0.97 4957
๐ 1. Real-Time Harassment Detection (ML-powered) Feature: Uses machine learning to classify messages as harassing or non-harassing. Advantage: Immediate alert for any suspicious or abusive text content. Why Use It: Reduces response time and prevents escalation. Womenโs Safety Link: Helps identify verbal abuse and threats at an early stage, encouraging timely intervention.
2. AI-Powered Sentiment Analysis Feature: Analyzes emotional tone using logistic regression with TF-IDF vectorization. Advantage: Understands the intensity and intent behind messages. Why Use It: More than keyword-based โ understands nuance. Womenโs Safety Link: Detects manipulation, gaslighting, or persistent threats often missed in simple filters.
๐ 3. Huffman Encoding for Privacy Feature: Compresses and encrypts reports before storage. Advantage: Protects sensitive data and reduces file size. Why Use It: Ensures user safety, privacy, and report integrity. Womenโs Safety Link: Empowers victims to report safely without fear of exposure.
๐ 4. Geo-location APIs (Phone/IP-based) Feature: Uses OpenCage (phone number) and IPinfo (IP address) to trace approximate location. Advantage: Tracks suspectโs location without revealing victimโs location. Why Use It: Builds credible reports that authorities can use. Womenโs Safety Link: Offers location-aware evidence, useful in legal escalation or emergency action.
๐งพ 5. Auto-Generated Reports (LinkedList Structured) Feature: Final reports are created automatically in a clean, structured format. Advantage: Saves time, ensures consistency, and makes filing easy. Why Use It: Reduces emotional burden on the victim to explain everything repeatedly. Womenโs Safety Link: Supports quicker filing of complaints or proof submission to trusted authorities.
๐ 6. Minimal UI with Streamlit Feature: Simple, intuitive interface. Advantage: Anyone can use it โ no tech skills required. Why Use It: Lower barrier for entry, even for first-time users. Womenโs Safety Link: Encourages self-empowerment through easy-to-access protection tools.
๐ฑ Accessible: Even through mobile. ๐งฉ Smart Detection: ML adapts with more data. ๐ก Private Reporting: Secure, encoded, and discreet. ๐งญ Location Insight: Helps trace origin of threats. ๐ฌ Psychological Signal Detection: Picks up hidden or veiled abuse.
1. Advanced ML Models Upgrade to deep learning models like LSTM, BERT, or DistilBERT for better language understanding. Fine-tune on larger datasets to improve accuracy and detect subtler forms of harassment.
๐ 2. Multilingual Support Extend detection to regional and international languages to support diverse users. Use libraries like spaCy, transformers, or Google Translate API.
๐ 3. End-to-End Encryption & Cloud Integration Implement secure cloud storage (e.g., AWS, Firebase) for report management. Add end-to-end encryption to protect user data during transfer.
๐ค 4. Real-Time Monitoring Integrate with platforms (email, social media, chat apps) for live scanning of abusive content. Optional alert systems for parents, authorities, or platform moderators.
๐ฑ 5. Mobile App Version Build an Android/iOS app for easier, on-the-go reporting. Could integrate voice-to-text input for accessibility.
๐งพ 6. Automated Report Filing Connect directly with local cyber cells or womenโs helplines. Send auto-generated reports with minimal manual effort.
๐งช 7. Community Feedback & Learning Allow users to flag false positives/negatives to improve model accuracy over time. Introduce feedback loops into model training pipeline.
https://drive.google.com/drive/folders/1by1_dJPkAFGDlz5LYm7MY3uaH3umJ4yT
https://www.geeksforgeeks.org/huffman-coding-greedy-algo-3/
https://www.sciencedirect.com/science/article/abs/pii/S0140366420319101?via%3Dihub









