Welcome to Incels.is - Involuntary Celibate Forum

Welcome! This is a forum for involuntary celibates: people who lack a significant other. Are you lonely and wish you had someone in your life? You're not alone! Join our forum and talk to people just like you.

what if we made a local run incel search engine?

fukurou

fukurou

the supreme coder
★★★★★
Joined
Dec 17, 2021
Posts
5,178
Online time
1d 2h
the knows kikestream search engines are absolute trash.
it shows nothing but paid results and like the main 100 sites that circle jerk each other.

what if we run search engines locally?
 
Would be cool.
 

Make another thread in suggestions and ask the jannies.​

 
Python:
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse

class LocalSearchEngine:
    def __init__(self, start_url, max_pages=50):
        self.start_url = start_url
        self.max_pages = max_pages
        self.index = {}  # url -> text

    def crawl(self):
        visited = set()
        to_visit = [self.start_url]

        print(f"[Crawler] Starting crawl at: {self.start_url}")

        while to_visit and len(visited) < self.max_pages:
            url = to_visit.pop(0)
            if url in visited:
                continue

            print(f"[Crawler] Fetching: {url}")

            try:
                r = requests.get(url, timeout=5)
                soup = BeautifulSoup(r.text, "html.parser")
                text = soup.get_text(" ", strip=True)
                self.index[url] = text
                visited.add(url)

                # discover new links
                for link in soup.find_all("a", href=True):
                    new = urljoin(url, link["href"])
                    if urlparse(new).netloc == urlparse(self.start_url).netloc:
                        if new not in visited:
                            to_visit.append(new)

            except Exception as e:
                print(f"[Crawler] Error fetching {url}: {e}")
                continue

        print(f"[Crawler] Done. Indexed {len(self.index)} pages.\n")

    def search(self, query):
        q = query.lower()
        results = []

        for url, text in self.index.items():
            if q in text.lower():
                results.append(url)

        return results


def main():
    start_url = "https://incels.is/"  # change this to your domain
    engine = LocalSearchEngine(start_url, max_pages=30)

    engine.crawl()

    while True:
        q = input("Search query: ").strip()
        if not q:
            continue

        hits = engine.search(q)

        print("\nResults:")
        if not hits:
            print("  No matches found.")
        else:
            for h in hits:
                print(" -", h)
        print()


if __name__ == "__main__":
    main()

[Crawler] Starting crawl at: https://incels.is/
[Crawler] Fetching: https://incels.is/
[Crawler] Fetching: https://incels.is/login/
[Crawler] Fetching: https://incels.is/login/register
[Crawler] Fetching: https://incels.is/whats-new/
[Crawler] Fetching: https://incels.is/whats-new/posts/
[Crawler] Fetching: https://incels.is/misc/contact
[Crawler] Fetching: https://incels.is/forums/-/index.rss
C:\Users\Lenovo\PycharmProjects\sandbox1\main graveyard.py:26: XMLParsedAsHTMLWarning: It looks like you're using an HTML parser to parse an XML document.

Assuming this really is an XML document, what you're doing might work, but you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the Python package 'lxml' installed, and pass the keyword argument `features="xml"` into the BeautifulSoup constructor.

If you want or need to use an HTML parser on this document, you can make this warning go away by filtering it. To do that, run this code before calling the BeautifulSoup constructor:

from bs4 import XMLParsedAsHTMLWarning
import warnings

warnings.filterwarnings("ignore", category=XMLParsedAsHTMLWarning)

soup = BeautifulSoup(r.text, "html.parser")
[Crawler] Fetching: https://incels.is/#site-information.26
[Crawler] Fetching: https://incels.is/link-forums/rules-faq.27/
[Crawler] Fetching: https://incels.is/#incels.1
[Crawler] Fetching: https://incels.is/forums/must-read-content.23/
[Crawler] Fetching: https://incels.is/forums/must-read-content.23/index.rss
[Crawler] Fetching: https://incels.is/threads/i-expected-stories-like-this-from-jewmerica-not-south-africa.692848/latest
[Crawler] Fetching: https://incels.is/threads/i-expecte...wmerica-not-south-africa.692848/post-24202018
[Crawler] Fetching: https://incels.is/forums/inceldom-discussion.2/
[Crawler] Fetching: https://incels.is/forums/inceldom-discussion.2/index.rss
[Crawler] Fetching: https://incels.is/threads/incel-trait-u-like-hentai.867212/latest
[Crawler] Fetching: https://incels.is/threads/incel-trait-u-like-hentai.867212/post-24206885
[Crawler] Fetching: https://incels.is/#copes.41
[Crawler] Fetching: https://incels.is/forums/gaming.35/
[Crawler] Fetching: https://incels.is/forums/gaming.35/index.rss
[Crawler] Fetching: https://incels.is/threads/whats-the-best-gacha-game.868314/latest
[Crawler] Fetching: https://incels.is/threads/whats-the-best-gacha-game.868314/post-24206337
[Crawler] Fetching: https://incels.is/forums/anime-manga.36/
[Crawler] Fetching: https://incels.is/forums/anime-manga.36/index.rss
[Crawler] Fetching: https://incels.is/threads/has-anyon...spired-movie-polytechnique-2009.868527/latest
[Crawler] Fetching: https://incels.is/threads/has-anyon...movie-polytechnique-2009.868527/post-24206787
[Crawler] Fetching: https://incels.is/#offtopic.24
[Crawler] Fetching: https://incels.is/forums/the-lounge.4/
[Crawler] Fetching: https://incels.is/forums/the-lounge.4/index.rss
[Crawler] Done. Indexed 30 pages.

Search query: foid

Results:
- https://incels.is/
- https://incels.is/whats-new/
- https://incels.is/whats-new/posts/
- https://incels.is/forums/-/index.rss
- https://incels.is/#site-information.26
- https://incels.is/link-forums/rules-faq.27/
- https://incels.is/#incels.1
- https://incels.is/forums/must-read-content.23/
- https://incels.is/forums/must-read-content.23/index.rss
- https://incels.is/threads/i-expected-stories-like-this-from-jewmerica-not-south-africa.692848/latest
- https://incels.is/threads/i-expecte...wmerica-not-south-africa.692848/post-24202018
- https://incels.is/forums/inceldom-discussion.2/
- https://incels.is/forums/inceldom-discussion.2/index.rss
- https://incels.is/threads/incel-trait-u-like-hentai.867212/latest
- https://incels.is/threads/incel-trait-u-like-hentai.867212/post-24206885
- https://incels.is/#copes.41
- https://incels.is/forums/gaming.35/
- https://incels.is/forums/gaming.35/index.rss
- https://incels.is/threads/whats-the-best-gacha-game.868314/latest
- https://incels.is/threads/whats-the-best-gacha-game.868314/post-24206337
- https://incels.is/forums/anime-manga.36/
- https://incels.is/forums/anime-manga.36/index.rss
- https://incels.is/#offtopic.24
- https://incels.is/forums/the-lounge.4/
- https://incels.is/forums/the-lounge.4/index.rss

Search query: fukurou

Results:
- https://incels.is/
- https://incels.is/whats-new/
- https://incels.is/whats-new/posts/
- https://incels.is/forums/-/index.rss
- https://incels.is/#site-information.26
- https://incels.is/#incels.1
- https://incels.is/#copes.41
- https://incels.is/forums/gaming.35/
- https://incels.is/forums/gaming.35/index.rss
- https://incels.is/forums/anime-manga.36/
- https://incels.is/forums/anime-manga.36/index.rss
- https://incels.is/#offtopic.24
- https://incels.is/forums/the-lounge.4/
- https://incels.is/forums/the-lounge.4/index.rss

Search query: creamed corn

Results:
No matches found.

Search query: slut

Results:
- https://incels.is/whats-new/posts/
- https://incels.is/forums/-/index.rss
- https://incels.is/forums/must-read-content.23/index.rss
- https://incels.is/forums/inceldom-discussion.2/
- https://incels.is/forums/inceldom-discussion.2/index.rss
- https://incels.is/forums/the-lounge.4/
- https://incels.is/forums/the-lounge.4/index.rss
 
Mogs me if you have the technical skills to program a full blown search engine that indexes the entire internet :shock:
 
Just began for Incels.searchcels
 
we would need seed sites as starting point for the crawler.
 

Similar threads

carolusasocialist
Replies
6
Views
765
Deco
Deco
lagaga
Replies
21
Views
3K
aemond
A
FarangInDaNang
Replies
26
Views
2K
friedrllyudidntknow
friedrllyudidntknow
D. B. Gooner
Replies
6
Views
914
superpsycho
superpsycho

Users who are viewing this thread

shape1
shape2
shape3
shape4
shape5
shape6
Back
Top