Text Extractor For Web Pages and Text
Use a regular expression to find and extract anything in text, web pages, data files, ...
From CSV/Excel
CSV To Delimited
CSV To Flat File
CSV To GeoJSON
CSV To HTML Table
CSV To JSON
CSV To KML
CSV To Markdown
CSV To Multi-line Data
CSV To PDF
CSV To SQL
CSV To Word
CSV To XML
CSV To YAML
Excel To Jira
Pivot CSV
Transpose CSV
Query CSV with SQL
To CSV/Excel
Flat File to CSV
GeoJSON To CSV
HTML Links To CSV
HTML Table To CSV
JSON To CSV
KML To CSV
SQL To CSV
XML To CSV
YAML To CSV
Data Tools
CSV Template Engine
Sqlite Online
EDA Tool
CSV Editor
Generate Test Data
Email Extractor
Phone Extractor
Split Text or CSV Files
URL Extractor
Extract via RegEx
CSV Escape Tool
CSV Home
What can this tool do?
Use this tool to extract anything from web pages and data files into a CSV file.
The output is 1 or more columns of the result. You can see the output below or as an Excel file
You can scan a list of URLs and extract text
What are my options?
You can choose the number of resulting per line (default 1)
You may choose the output delimiter if multi-column output or use comma (the default).
You can remove duplicate results.
You can sort the results.
You can add a heading
You can force all results to lower case
See also
Phone Extractor
Step 1: Select your input
Enter Data
Choose File
Enter URL
Scan list of web pages in input
Enter the Regular Expression:
* This is required
Test using Regex101.com
Whole Numbers
Words
YYYY/MM/DD
HTML Tags
Alphanumeric
Choose File
Encoding
-Default-
ISO-8859-1 (Latin No. 1)
ISO-8859-2 (Latin No. 2)
ISO-8859-3 (Latin No. 3)
ISO-8859-4 (Latin No. 4)
ISO-8859-5 (Latin/Cyrillic)
ISO-8859-6 (Latin/Arabic)
ISO-8859-7 (Latin/Greek)
ISO-8859-8 (Latin/Hebrew)
ISO-8859-9 (Latin No. 5)
ISO-8859-13 (Latin No. 7)
ISO-8859-15 (Latin No. 9)
Mac OS Roman
UTF-8
UTF-16
UTF-16 (Big-Endian)
UTF-16 (Little-Endian)
UTF-32
UTF-32 (Big-Endian)
UTF-32 (Little-Endian)
windows-1250 (Win East European)
windows-1251 (WinCyrillic)
windows-1252 (WinLatin-1)
windows-1253 (WinGreek)
windows-1254 (Win Turkish)
windows-1255 (Win Hebrew)
windows-1256 (Win Arabic)
windows-1257 (Win Baltic)
windows-1257 (Win Vietnamese)
Enter URL as data source
Step 2: Choose output options
(optional)
Output Options
Output Field Separator:
,
;
:
Bar-|
Tab
Other-Choose
Include header in first row
# of Columns Per Line:
Minimum Number of characters:
Maximum Number of characters:
Sort results
Use lower case on results
Remove duplicate results
Result contains this string
Is regular expression
Force CSV style output
Append results
If scanning a list of web pages, output the From URL also
Step 3: Extract Text
Result Data:
Save your result:
.csv
Download Result
EOL:
CRLF
LF