adbar/trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Language: Python
Total stars: 1562
Stars trend:
14 Aug 2023
15 Aug 2023
#python
#articleextractor, #corpus, #corpusbuilder, #corpustools, #crawler, #htmltomarkdown, #html2text, #news, #newsaggregator, #newscrawler, #nlp, #readability, #rssfeed, #scraping, #tei, #textcleaning, #textextraction, #textmining, #textpreprocessing, #webscraping
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Language: Python
Total stars: 1562
Stars trend:
14 Aug 2023
7pm ██▎ +18
8pm ███▎ +26
9pm ███▊ +30
10pm ██▍ +19
11pm ██ +16
15 Aug 2023
12am ██ +16
1am █▊ +14
2am ███ +24
3am █▋ +13
4am ██▌ +20
5am ██▌ +20
#python
#articleextractor, #corpus, #corpusbuilder, #corpustools, #crawler, #htmltomarkdown, #html2text, #news, #newsaggregator, #newscrawler, #nlp, #readability, #rssfeed, #scraping, #tei, #textcleaning, #textextraction, #textmining, #textpreprocessing, #webscraping
mendableai/firecrawl
🔥 Turn entire websites into LLM-ready markdown
Language:TypeScript
Total stars: 106
Stars trend:
#typescript
#ai, #crawler, #data, #htmltomarkdown, #llm, #markdown, #rag, #scraper, #scraping, #webcrawler
🔥 Turn entire websites into LLM-ready markdown
Language:TypeScript
Total stars: 106
Stars trend:
16 Apr 2024
5pm ▋ +5
6pm ██▍ +19
7pm ██▎ +18
8pm █▊ +14
9pm █▎ +10
10pm █ +8
11pm ▎ +2
17 Apr 2024
12am ██▎ +18
#typescript
#ai, #crawler, #data, #htmltomarkdown, #llm, #markdown, #rag, #scraper, #scraping, #webcrawler
JohannesKaufmann/html-to-markdown
⚙️ Convert HTML to Markdown. Even works with entire websites and can be extended through rules.
Language:Go
Total stars: 1097
Stars trend:
#go
#cli, #converter, #go, #golang, #html, #htmltomarkdown, #markdown
⚙️ Convert HTML to Markdown. Even works with entire websites and can be extended through rules.
Language:Go
Total stars: 1097
Stars trend:
9 Nov 2024
10am █▋ +13
11am ████████▏ +65
12pm ████▍ +35
1pm ███▊ +30
2pm ███▋ +29
#go
#cli, #converter, #go, #golang, #html, #htmltomarkdown, #markdown
mendableai/firecrawl
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Language:TypeScript
Total stars: 21906
Stars trend:
#typescript
#ai, #aiscraping, #crawler, #data, #htmltomarkdown, #llm, #markdown, #rag, #scraper, #scraping, #webcrawler, #webscraping
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Language:TypeScript
Total stars: 21906
Stars trend:
20 Jan 2025
5pm ▏ +1
6pm ▎ +2
7pm ▉ +7
8pm ▊ +6
9pm ▉ +7
10pm █ +8
11pm ▋ +5
21 Jan 2025
12am ▉ +7
1am ▌ +4
2am ▍ +3
3am ██ +16
4am █▍ +11
#typescript
#ai, #aiscraping, #crawler, #data, #htmltomarkdown, #llm, #markdown, #rag, #scraper, #scraping, #webcrawler, #webscraping