adbar/trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Language: Python
Total stars: 1562
Stars trend:
14 Aug 2023
15 Aug 2023
#python
#articleextractor, #corpus, #corpusbuilder, #corpustools, #crawler, #htmltomarkdown, #html2text, #news, #newsaggregator, #newscrawler, #nlp, #readability, #rssfeed, #scraping, #tei, #textcleaning, #textextraction, #textmining, #textpreprocessing, #webscraping
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Language: Python
Total stars: 1562
Stars trend:
14 Aug 2023
7pm ██▎ +18
8pm ███▎ +26
9pm ███▊ +30
10pm ██▍ +19
11pm ██ +16
15 Aug 2023
12am ██ +16
1am █▊ +14
2am ███ +24
3am █▋ +13
4am ██▌ +20
5am ██▌ +20
#python
#articleextractor, #corpus, #corpusbuilder, #corpustools, #crawler, #htmltomarkdown, #html2text, #news, #newsaggregator, #newscrawler, #nlp, #readability, #rssfeed, #scraping, #tei, #textcleaning, #textextraction, #textmining, #textpreprocessing, #webscraping
mendableai/firecrawl
🔥 Turn entire websites into LLM-ready markdown
Language:TypeScript
Total stars: 106
Stars trend:
#typescript
#ai, #crawler, #data, #htmltomarkdown, #llm, #markdown, #rag, #scraper, #scraping, #webcrawler
🔥 Turn entire websites into LLM-ready markdown
Language:TypeScript
Total stars: 106
Stars trend:
16 Apr 2024
5pm ▋ +5
6pm ██▍ +19
7pm ██▎ +18
8pm █▊ +14
9pm █▎ +10
10pm █ +8
11pm ▎ +2
17 Apr 2024
12am ██▎ +18
#typescript
#ai, #crawler, #data, #htmltomarkdown, #llm, #markdown, #rag, #scraper, #scraping, #webcrawler
JohannesKaufmann/html-to-markdown
⚙️ Convert HTML to Markdown. Even works with entire websites and can be extended through rules.
Language:Go
Total stars: 1097
Stars trend:
#go
#cli, #converter, #go, #golang, #html, #htmltomarkdown, #markdown
⚙️ Convert HTML to Markdown. Even works with entire websites and can be extended through rules.
Language:Go
Total stars: 1097
Stars trend:
9 Nov 2024
10am █▋ +13
11am ████████▏ +65
12pm ████▍ +35
1pm ███▊ +30
2pm ███▋ +29
#go
#cli, #converter, #go, #golang, #html, #htmltomarkdown, #markdown
mendableai/firecrawl
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Language:TypeScript
Total stars: 21906
Stars trend:
#typescript
#ai, #aiscraping, #crawler, #data, #htmltomarkdown, #llm, #markdown, #rag, #scraper, #scraping, #webcrawler, #webscraping
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Language:TypeScript
Total stars: 21906
Stars trend:
20 Jan 2025
5pm ▏ +1
6pm ▎ +2
7pm ▉ +7
8pm ▊ +6
9pm ▉ +7
10pm █ +8
11pm ▋ +5
21 Jan 2025
12am ▉ +7
1am ▌ +4
2am ▍ +3
3am ██ +16
4am █▍ +11
#typescript
#ai, #aiscraping, #crawler, #data, #htmltomarkdown, #llm, #markdown, #rag, #scraper, #scraping, #webcrawler, #webscraping
mendableai/firecrawl
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Language:TypeScript
Total stars: 43926
Stars trend:
#typescript
#ai, #aiscraping, #crawler, #data, #htmltomarkdown, #llm, #markdown, #rag, #scraper, #scraping, #webcrawler, #webscraping
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Language:TypeScript
Total stars: 43926
Stars trend:
30 Jul 2025
11pm ▏ +1
31 Jul 2025
12am ▉ +7
1am ▌ +4
2am █▎ +10
3am █▎ +10
4am ▌ +4
5am █▏ +9
6am █▍ +11
7am █▏ +9
8am ▋ +5
9am █▏ +9
10am █▍ +11
#typescript
#ai, #aiscraping, #crawler, #data, #htmltomarkdown, #llm, #markdown, #rag, #scraper, #scraping, #webcrawler, #webscraping
any4ai/AnyCrawl
AnyCrawl 🚀: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts structured SERP results from Google/Bing/Baidu/etc. Native multi-threading for bulk processing.
Language:TypeScript
Total stars: 1201
Stars trend:
#typescript
#aiscraping, #aitools, #crawl, #data, #htmltomarkdown, #rag, #scrape, #scraping, #serp, #webscraper
AnyCrawl 🚀: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts structured SERP results from Google/Bing/Baidu/etc. Native multi-threading for bulk processing.
Language:TypeScript
Total stars: 1201
Stars trend:
31 Jul 2025
4pm █ +8
5pm █▍ +11
6pm ▉ +7
7pm ▋ +5
8pm █▍ +11
9pm █▏ +9
10pm ▉ +7
11pm █ +8
1 Aug 2025
12am █▌ +12
1am ▊ +6
#typescript
#aiscraping, #aitools, #crawl, #data, #htmltomarkdown, #rag, #scrape, #scraping, #serp, #webscraper