Skip to content

vvvvvivekkk/smart_notes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SmartNotes: AI-Powered Lecture Notes Summarization & Flashcard Generation

📖 Project Overview

SmartNotes is a full-stack, end-to-end production-ready web application designed to help students and professionals quickly digest lectures, meetings, and notes. It processes text (pasted or uploaded as TXT, PDF, DOCX) completely in-memory, ensuring no permanent file storage is used and 100% data privacy.

The application leverages Natural Language Processing (NLP) and Machine Learning (ML) models via Python, spaCy, NLTK, and HuggingFace Transformers to extract the most pertinent information.

✨ Features

  • In-Memory Processing only: Extremely secure; files and data exist only during the active session.
  • Multiple Input Formats: Paste raw text or upload .txt, .pdf, .docx.
  • Extractive Summarization: Preserves original syntax using a custom TextRank graph algorithm.
  • Abstractive Summarization: Uses HuggingFace Transformers (BART) to generate human-like concise summaries.
  • Length Control: Choose between Short (20%), Medium (40%), and Detailed (60%) summaries.
  • Sentence Classification: Classifies sentences into ⭐ Very Important, ✅ Key Concept, and ℹ️ Supporting Information.
  • Keyword & NER Extraction: Identifies the most relevant terms via YAKE and Named Entity Recognition (spaCy).
  • Automated Flashcards: Auto-generates Q&A flip-cards based on masked entities from the text.
  • Word Cloud: Dynamic base64 rendered word cloud identifying major topics.
  • Exporters: Download artifacts as PDF, CSV, DOCX, TXT, or Markdown—all streamed via io.BytesIO.
  • Premium UI: TailwindCSS, glassmorphism headers, dark mode toggling, and fully responsive layout.

🏗️ Architecture Diagram

+-----------------------------------------------------------------------------------------+
|                                    USER INTERFACE                                       |
|  [ index.html ] --> (Upload/Paste) --> [ results.html ] <-- (Flip-cards, WordCloud)     |
|         |                                     |                                         |
|    (Tailwind CSS, style.css)           (script.js logic)                                |
+---------|-------------------------------------|-----------------------------------------+
          | POST /process                       | GET /download/<type>/<format>
          v                                     v
+-----------------------------------------------------------------------------------------+
|                                     FLASK BACKEND                                       |
|  (app.py) Routes & Orchestration    [ MEMORY_STORE {session_id: results_dict} ]         |
+-------------------------------------------|---------------------------------------------+
                                            |
+-------------------------------------------|---------------------------------------------+
|                                    NLP & ML PIPELINE                                    |
|                                                                                         |
| 1. text_cleaner.py: NLTK Tokenization, Stopwords, DocX/PDF parsing stream               |
| 2. extractive.py: TextRank via NetworkX and Cosine Similarity                           |
| 3. abstractive.py: HuggingFace Pipeline (BART) chunking                                 |
| 4. keyword_extractor.py: YAKE algorithm & spaCy NER                                     |
| 5. importance.py: Heuristic TextRank scoring percentiles -> 3 importance tiers          |
| 6. flashcards.py: spaCy entity masking generator                                        |
| 7. wordcloud_generator.py: matplotlib -> io.BytesIO -> base64 string                    |
| 8. exporters.py: reportlab, python-docx, CSV exporters streamed straight to bytes       |
+-----------------------------------------------------------------------------------------+

🚀 Installation & Setup

Requirements: Python 3.11 (recommended). No database. No login. In-memory processing only.

Option A — One-command setup (recommended)

Windows (PowerShell or CMD):

setup.bat

Mac / Linux:

chmod +x setup.sh
./setup.sh

This creates a virtual environment (venv), installs all dependencies from requirements.txt (stable versions, Transformers 4.x), and downloads the spaCy model en_core_web_sm.

Option B — Manual setup

  1. Clone or download the repository, then navigate to the project directory:
    cd smart_notes
  2. Create and activate a virtual environment:
    python -m venv venv
    # Windows:
    venv\Scripts\activate
    # macOS/Linux:
    source venv/bin/activate
  3. Install dependencies:
    pip install -r requirements.txt
  4. Download spaCy model (if not done by the app on first run):
    python -m spacy download en_core_web_sm

Note: First run may download HuggingFace model distilbart-cnn-12-6 and NLTK data; this can take a few minutes.

💻 How to Run

  1. Activate the virtual environment (if not already):
    # Windows:
    venv\Scripts\activate
    # macOS/Linux:
    source venv/bin/activate
  2. Start the application:
    python app.py
  3. Open your browser at: http://127.0.0.1:5000

Verify the pipeline (no server)

To confirm that summarization and NLP modules work without starting the web app:

python test_model.py

All six pipeline steps (text cleaning, extractive & abstractive summarization, keywords, importance, word cloud) are tested.

📚 Documentation

  • Models and libraries — List of every model and library (Flask, spaCy, NLTK, HuggingFace, YAKE, NetworkX, WordCloud, ReportLab, python-docx, etc.) and why each is used.

📸 Sample Screenshots Description

  1. Landing Page: A beautiful hero section featuring a pastel-indigo gradient header. Options to toggle between "Paste Text" and "Upload File" in a clean glass-effect card. Length control slider positioned clearly. Top right features a Dark Mode toggle (Moon/Sun icon).
  2. Analysis View (Results):
    • Top banner metrics show Original Format (words), Summarized (words), and Reduction (%).
    • Left side: Scrollable cards displaying the original source text, and below it, an Analysis Section categorizing extracted sentences by "⭐ Very Important", "✅ Key Concept", etc., with beautifully colored badges.
    • Right side: The Summaries panel combining Extractive and Abstractive approaches (controlled by aesthetic toggle buttons). Below it, Flashcards rendered as interactive 3D CSS flip-cards, alongside the generated Word Cloud image and pill-styled Keywords.

🔮 Future Improvements

  • Implement WebSocket streaming for the abstractive summarizer so the user sees text generating in real-time.Done: On the results page, switch to the "Abstractive" tab and click "Watch live generation" to stream the summary chunk-by-chunk over WebSocket.
  • Introduce advanced LLM QA generation for more robust questions on the flashcards rather than simply masking Named Entities.
  • Add multi-language summarization support utilizing XLM-RoBERTa.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors