SmartNotes is a full-stack, end-to-end production-ready web application designed to help students and professionals quickly digest lectures, meetings, and notes. It processes text (pasted or uploaded as TXT, PDF, DOCX) completely in-memory, ensuring no permanent file storage is used and 100% data privacy.
The application leverages Natural Language Processing (NLP) and Machine Learning (ML) models via Python, spaCy, NLTK, and HuggingFace Transformers to extract the most pertinent information.
- In-Memory Processing only: Extremely secure; files and data exist only during the active session.
- Multiple Input Formats: Paste raw text or upload
.txt,.pdf,.docx. - Extractive Summarization: Preserves original syntax using a custom TextRank graph algorithm.
- Abstractive Summarization: Uses HuggingFace Transformers (
BART) to generate human-like concise summaries. - Length Control: Choose between Short (20%), Medium (40%), and Detailed (60%) summaries.
- Sentence Classification: Classifies sentences into ⭐ Very Important, ✅ Key Concept, and ℹ️ Supporting Information.
- Keyword & NER Extraction: Identifies the most relevant terms via YAKE and Named Entity Recognition (spaCy).
- Automated Flashcards: Auto-generates Q&A flip-cards based on masked entities from the text.
- Word Cloud: Dynamic base64 rendered word cloud identifying major topics.
- Exporters: Download artifacts as PDF, CSV, DOCX, TXT, or Markdown—all streamed via
io.BytesIO. - Premium UI: TailwindCSS, glassmorphism headers, dark mode toggling, and fully responsive layout.
+-----------------------------------------------------------------------------------------+
| USER INTERFACE |
| [ index.html ] --> (Upload/Paste) --> [ results.html ] <-- (Flip-cards, WordCloud) |
| | | |
| (Tailwind CSS, style.css) (script.js logic) |
+---------|-------------------------------------|-----------------------------------------+
| POST /process | GET /download/<type>/<format>
v v
+-----------------------------------------------------------------------------------------+
| FLASK BACKEND |
| (app.py) Routes & Orchestration [ MEMORY_STORE {session_id: results_dict} ] |
+-------------------------------------------|---------------------------------------------+
|
+-------------------------------------------|---------------------------------------------+
| NLP & ML PIPELINE |
| |
| 1. text_cleaner.py: NLTK Tokenization, Stopwords, DocX/PDF parsing stream |
| 2. extractive.py: TextRank via NetworkX and Cosine Similarity |
| 3. abstractive.py: HuggingFace Pipeline (BART) chunking |
| 4. keyword_extractor.py: YAKE algorithm & spaCy NER |
| 5. importance.py: Heuristic TextRank scoring percentiles -> 3 importance tiers |
| 6. flashcards.py: spaCy entity masking generator |
| 7. wordcloud_generator.py: matplotlib -> io.BytesIO -> base64 string |
| 8. exporters.py: reportlab, python-docx, CSV exporters streamed straight to bytes |
+-----------------------------------------------------------------------------------------+
Requirements: Python 3.11 (recommended). No database. No login. In-memory processing only.
Windows (PowerShell or CMD):
setup.batMac / Linux:
chmod +x setup.sh
./setup.shThis creates a virtual environment (venv), installs all dependencies from requirements.txt (stable versions, Transformers 4.x), and downloads the spaCy model en_core_web_sm.
- Clone or download the repository, then navigate to the project directory:
cd smart_notes - Create and activate a virtual environment:
python -m venv venv # Windows: venv\Scripts\activate # macOS/Linux: source venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Download spaCy model (if not done by the app on first run):
python -m spacy download en_core_web_sm
Note: First run may download HuggingFace model distilbart-cnn-12-6 and NLTK data; this can take a few minutes.
- Activate the virtual environment (if not already):
# Windows: venv\Scripts\activate # macOS/Linux: source venv/bin/activate
- Start the application:
python app.py
- Open your browser at: http://127.0.0.1:5000
To confirm that summarization and NLP modules work without starting the web app:
python test_model.pyAll six pipeline steps (text cleaning, extractive & abstractive summarization, keywords, importance, word cloud) are tested.
- Models and libraries — List of every model and library (Flask, spaCy, NLTK, HuggingFace, YAKE, NetworkX, WordCloud, ReportLab, python-docx, etc.) and why each is used.
- Landing Page: A beautiful hero section featuring a pastel-indigo gradient header. Options to toggle between "Paste Text" and "Upload File" in a clean glass-effect card. Length control slider positioned clearly. Top right features a Dark Mode toggle (Moon/Sun icon).
- Analysis View (Results):
- Top banner metrics show
Original Format (words),Summarized (words), andReduction (%). - Left side: Scrollable cards displaying the original source text, and below it, an Analysis Section categorizing extracted sentences by "⭐ Very Important", "✅ Key Concept", etc., with beautifully colored badges.
- Right side: The Summaries panel combining Extractive and Abstractive approaches (controlled by aesthetic toggle buttons). Below it, Flashcards rendered as interactive 3D CSS flip-cards, alongside the generated Word Cloud image and pill-styled Keywords.
- Top banner metrics show
Implement WebSocket streaming for the abstractive summarizer so the user sees text generating in real-time.✅ Done: On the results page, switch to the "Abstractive" tab and click "Watch live generation" to stream the summary chunk-by-chunk over WebSocket.- Introduce advanced LLM QA generation for more robust questions on the flashcards rather than simply masking Named Entities.
- Add multi-language summarization support utilizing
XLM-RoBERTa.