You Can contact us by filling the form below:

Thank you, your message was sent successfully!

Please, wait our respond.

Sorry, the error has occured.


You Can contact us by filling the form below:

Thank you, your message was sent successfully!

Please, wait our respond.

Sorry, the error has occured.


Real estate

ABOUT


The application is a desktop scraping application with MS SQL database and several crawlers in it scraping data from 3 real estate websites https://www.redfin.com/ , https://www.glassdoor.com/ and http://www.zillow.com/. The data is scraped into the database of the application and output into CSV format.

 

The project was quite complex, as at https://www.glassdoor.com/ browser identification is used, and in case of a big number of queries from the same user, the server replies with a Captcha. The problem was solved with the help of IP rotating and client identification in case of having to deal with Captcha. A special service was created to realize this. The service scans Free Proxy available on the web with the frequency predefined, checks the possibility of them to be used at each target website and saves these Proxy addresses into the database for further use.

 

The real estate website https://www.redfin.com/ is protected with Google captcha, very uneasy to solve. The crawler was integrated with third-party service http://www.deathbycaptcha.com which provides right answers for Google Captcha, as well as browser emulator was created used for clicking on the right Google Captcha Options.

 

The complexity with http://www.zillow.com/ was due to the limitation of 300 items to be scraped a day. As a solution, post codes were used to specify small areas for search with the aim to daily scrape all new items available at the website without limitation.

 

 

PROPERTIES


Solution type: Windows service, Desktop Application
Items in Database: ~ 5 mln
Built on: dynamic .dll library
Crawler business logic: described in crawler core

Features


Data search
Data crawling
Anti-captcha
Anti-Googlecaptcha
Data collection
Multithreading
xml parsing
html parsing
csv export

 

EFFORT


 200 hours

 

Services


Web Crawling
Data Collection
Data export
Jobs

Project Screens


Tools and Technologies


Contact us
Feel free to contact us.
Our work hours:
Monday-Friday 8am - 8pm

Thank you, your message was sent successfully!

Please, wait our respond.


Sorry, the error has occured.


Our location:

Megapolis Office Center, Office 607,
Moskovskiy av. 179-B Kharkiv, 61068, Ukraine