I have been pretty quiet on the internet for a while.

For those of you that know me well, you know this means typically I am working on my next big project.

So what might this project be? It turns out I am revisiting something I have done recently to unlock some of its potentials. If you follow this blog you saw me posting about SCRAPE quite a bit back in March.

Well, since I worked on SCRAPE v2 and scanned the internet for over a billion emails, it turns out a lot of people have reached out interested in this data. I decided to put a little more effort into the email gathering process of SCRAPE for a couple of reasons.

  1. I have been telling my wife for years that I think data is the most valuable thing about the majority of companies.
  2. Every project I have started the first question I needed to answer when finished is what is the best way to "reach the masses".

With that being said, I have recently worked on revamping the SCRAPE email gathering process to provide an industry-best solution to cold email list acquisition.

Firstly I should talk about the elephant in the room, cold emailing. Firstly I know there is a lot of gray-area around cold emailing, and it might do some companies more harm than good. Counter to that point, some businesses have launched and successfully gained funding due to their cold emailing efforts.

Regardless, the platform I am building is not to send or even encourage sending cold emails. It is quite simply a platform that gives you all the information needed to find out if a give email address might be someone you would benefit from reaching out to.

Your next question might be, what is wrong with the billion emails you already collected for SCRAPE? Well, the short answer is nothing. But collecting emails and only emails only gets people so far. You would not want to reach out to a list of 1000 emails without knowing anything about these emails.

Enter V2. Firstly when collecting data, we are now collecting as much metadata as we can about the page emails are found on to enable far better searching and filtering. If you need to find pet sitters in the state of Michigan, this metadata will allow us to return FAR better results.

Secondly, we validate every email coming up with a score. This score will give you some information on the % chance that the email is or is not active and if sending might bounce.

This is all very much still a work in process. We have currently revamped our RUST code to pull in metadata and store it to S3, as we did previously with just emails. With this code refactor pulling in emails + domains + metadata happens faster than our v1 version of just pulling in emails.

Next, we are working on lambdas that will upsert the data to a managed RDS Postgres instance that has all the proper indexing. Querying across a billion records quickly is not trivial, so we will likely have a lot of trial and error on this process.

This RDS instance and our crawling process are quite costly, but thankfully we got approved for the AWS startup grant giving us $1,300 to work with. We are silently chugging along and will post more when we make more progress. Our goal is to launch a new service by January 2021 that will be a very affordable credit based email list building platform.

I will try to be better about posting the progress here and thanks for taking the time to read.

Till next time. ✌️