How to Make a Simple Email Extractor in Python?

Ganga Siva Krishna Palla
2 min readFeb 24, 2019

Web Scrapping?

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. Web scraping software may access the World Wide Web directly using the Hypertext Transfer Protocol, or through a web browser

How it works?

First, it sends a “GET” query to a specific website. Then, it parses an HTML document based on the received result. After it’s done, the scraper searches for the data you need within the document, and, finally, converts it into the specified format.

Popular python modules for web scraping:

Mechanize

BeautifulSoup

Selenium

lxml

Scrapy

Building A Email Extractor in Python

Here are the primary steps involved in crawling -

  • Defining the source, i.e., website
  • Feasibility study via robots.txt file
  • Using the source URL to crawl the web page
  • Fetching content
  • Extracting outgoing links from the page
  • Crawling the new pages
  • Duplication to crawl only newly added links (URLs can be maintained in a database)

Building a Web Crawler in Python is incredibly easy:

Here, i am using request module to send request to a website and

BeautifulSoup for parsing the content.

Basic code for extracting all links from a page:

import requests
from bs4 import BeautifulSoup
url = 'https://kore.ai/'
response=requests.get(url)
soup=BeautifulSoup(response.text,'html.parser')
for link in soup.find_all('a'):
print(link.get('href'))

You will get all the links in that website

Most of the emails found on Contact, Career, About and Services page.So, i am applying filter on all the links to get the desired links.

To extract emails form text, we can take of regular expression. In the below example we take help of the regular expression package re to define the pattern of an email ID and then use the match() function to check whether the match is true or false.

You can check to python package on PyPi and code source on Github.

I believe the script is pretty self-explanatory

Thanks for reading.

--

--