GitHub RadarBlue team tool

lindsey98/PhishIntention

PhishIntention: Phishing detection through webpage intention Primary language: Python. 258 stars.

Python258 stars23 forkspushed Jun 5, 2026CC0-1.0

Project links:Open GitHub project Back to radar

README Preview

Fetched from GitHub

PhishIntention

Image: Protected Brands

<a href="https://www.usenix.org/conference/usenixsecurity22/presentation/liu-ruofan">Paper</a> • <a href="https://sites.google.com/view/phishintention">Website</a> • <a href="https://www.youtube.com/watch?v=yU7FrlSJ818">Video</a> • <a href="#citation">Citation</a>

</div>

Official implementation of "Inferring Phishing Intention via Webpage Appearance and Dynamics: A Deep Vision-Based Approach" (USENIX Security 2022) — paper, project website.

Motivation

Existing reference-based phishing detectors only capture brand intention, which makes them prone to false positives. PhishIntention is, to our knowledge, the first system to analyze both brand intention and credential-taking intention in a systematic way, and it powers a phishing-monitoring pipeline that reports phishing webpages daily with state-of-the-art precision.

Framework

Input: a screenshot (and optionally the HTML source). Output: Phish/Benign and the predicted phishing target.

Abstract Layout Detector (AWL) — detect page elements (logos, inputs, buttons, …).
OCR-aided Siamese Logo Comparison — if no brand is matched, return Benign; otherwise continue with the matched target.
CRP Classifier — decide whether the page requests credentials (CRP). If yes, go to step 5; if no and the CRP locator has not run yet, go to step 4; otherwise return Benign.
CRP Locator (dynamic analysis) — click login/signup links to reach a credential page. On success, restart from step 1 with the updated URL and screenshot; on failure, return Benign.
Decision — a credential page with a matched brand inconsistent with the domain ⇒ Phish + target; otherwise Benign.

Project Structure

PhishIntention/
├── src/phishintention/           # Importable package (installed via pyproject.toml)
│   ├── pipeline.py               #   Entry point / pipeline orchestrator (python -m phishintention)
│   ├── config.py                 #   Loads configs and builds all models
│   ├── modules/                  #   Pipeline components
│   │   ├── awl_detector.py       #     Abstract layout detector (Faster R-CNN)
│   │   ├── crp_classifier.py     #     Credential-requiring-page classifier + HTML heuristic
│   │   ├── crp_locator.py        #     Dynamic analysis to locate credential pages
│   │   └── logo_matching.py      #     OCR-aided Siamese logo matcher
│   ├── networks/                 #   Neural-network architectures
│   │   ├── bit_backbone.py       #     Shared BiT ResNet-v2 building blocks
│   │   ├── crp_models.py         #     CRP classifier networks
│   │   ├── siamese_models.py     #     Siamese logo-matching network
│   │   └── ocr/                  #     Vendored OCR encoder (ASTER text recognizer)
│   └── utils/                    #   Shared helpers (image/brand/web)
├── configs/                      # Config: configs.yaml + detectron2/ detector configs
├── scripts/                      # Install/setup scripts (PyTorch, Detectron2, Chrome, weights)
├── tests/                        # Unit tests
├── datasets/                     # Example test sites
└── pyproject.toml                # Package metadata + build configuration

# Created at setup time (downloaded by scripts/setup.sh|setup.bat, not committed):
models/                          # Model weights (*.pth) + reference data
                                 #   (expand_targetlist = brand logos, domain_map.pkl = brand→domain)

Installation

The setup scripts auto-detect your system and GPU and install the appropriate PyTorch and Detectron2 builds (CUDA if an NVIDIA GPU is present, otherwise CPU). They also download all model weights and the reference list into models/.

Option A — Docker (recommended for Linux/Windows)

git clone https://github.com/lindsey98/PhishIntention.git
cd PhishIntention
docker build -t phishintention .

docker run --rm phishintention \
  pixi run python -m phishintention --folder datasets/test_sites --output_fn test.json

Option B — Native install

Prerequisite: Pixi.

<details> <summary><b>Linux</b></summary>

git clone https://github.com/lindsey98/PhishIntention.git
cd PhishIntention
export KMP_DUPLICATE_LIB_OK=TRUE

# Install pixi (restart your terminal afterwards)
curl -fsSL https://pixi.sh/install.sh | sh

# Install Chrome (Ubuntu/Debian)
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo dpkg -i google-chrome-stable_current_amd64.deb
sudo apt-get install -f

pixi install
chmod +x scripts/setup.sh && ./scripts/setup.sh

</details>

<details> <summary><b>macOS</b></summary>

git clone https://github.com/lindsey98/PhishIntention.git
cd PhishIntention
export KMP_DUPLICATE_LIB_OK=TRUE

# Install pixi (restart your terminal afterwards)
curl -fsSL https://pixi.sh/install.sh | sh

# Install Chrome and expose it on PATH (use ~/.zshrc for zsh)
brew install --cask google-chrome
echo 'export CHROME_BIN="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"' >> ~/.bash_profile
echo 'export PATH="/Applications/Google Chrome.app/Contents/MacOS:$PATH"' >> ~/.bash_profile
source ~/.bash_profile

pixi install
chmod +x scripts/setup.sh && ./scripts/setup.sh

</details>

<details> <summary><b>Windows</b></summary>

git clone https://github.com/lindsey98/PhishIntention.git
cd PhishIntention

# Install latest Chrome and ChromeDriver
.\scripts\chrome_setup.bat

# Install pixi (restart your terminal afterwards)
powershell -ExecutionPolicy ByPass -c "irm -useb https://pixi.sh/install.ps1 | iex"

pixi install
.\scripts\setup.bat

</details>

If the automatic PyTorch/Detectron2 install fails, run it interactively with pixi run python scripts/auto_install_detectron2.py, or install PyTorch and Detectron2 manually.

ChromeDriver

The setup scripts install a matching ChromeDriver automatically. To pin it manually, check your Chrome version (google-chrome --version or chrome://version), download the matching driver from this repository, and place the binary under ./chromedriver/.

Usage

pixi run python -m phishintention --folder datasets/test_sites --output_fn test.json

On the first run, the reference list is embedded and cached to LOGO_FEATS.npy, which can take a few minutes.

The input --folder must contain one sub-directory per site:

test_site_1/
├── info.txt    # the URL (required)
├── shot.png    # the screenshot (required)
└── html.txt    # the HTML source (optional)

Results are written to the --output_fn JSON file; a predict.png visualization is saved next to the screenshot when phishing is detected.

Baselines

The phishing detection and identification baselines from our paper are available here.

Citation

@inproceedings{liu2022inferring,
  title={Inferring Phishing Intention via Webpage Appearance and Dynamics: A Deep Vision Based Approach},
  author={Liu, Ruofan and Lin, Yun and Yang, Xianglin and Ng, Siang Hwee and Divakaran, Dinil Mon and Dong, Jin Song},
  booktitle={31st USENIX Security Symposium (USENIX Security 22)},
  year={2022}
}

Contact

For questions, open an issue or email liu.ruofan16@u.nus.edu, lin_yun@sjtu.edu.cn, or dcsdjs@nus.edu.sg.