Reducing spam on your site forms

No one likes spammers…

So you created your portfolio site and included a sweet contact form that allows visitors to get a hold of you for meetups, chats, or even to offer you work. Awesome! Imagine your excitement as the first email hits your inbox. You are positively giddy with joy as you fly into your mail app, only to find that your first contact from your form is a spammer explaining that your SEO stats could be way better - just hit this definitely not malicious link to find out more! Pretty deflating right? No worries, by the end of this article I’ll have given you several ways we can address this all too common problem.

A quick note on Netlify

You probably already know that Netlify is an amazing tool that you should definitely have in your dev toolbelt. Netlify makes deploying projects, managing teams, and basically anything else you can think of super enjoyable, and this includes forms! If you’ve deployed your project via Netlify, but are struggling with handling your own form submissions and any spam associated, you need to read up on how Netlify can simplify your life. Netlify forms are enabled by default and are parsed by the build bots whenever you deploy. With a few simple steps you can get Netlify to weed out a lot, if not all, of the spammers and scammers that are abusing your forms. This article is going to focus on a few methods that assume you might not have deployed on Netlify, so I won’t be getting into the Netlify forms steps here. If you are intrigued and want to read more, check out the Netlify forms docs here.

Method 1: Catching flies with honey

So let us assume you didn’t deploy via Netlify and you are looking for ways you can manage all this spammy spam. The first method we’ll look at is the Honeypot. Named as such, because it’s just too sweet to resist, the honeypot (in the context of forms) is an invisible-to-humans input field on your contact form that bots are particularly susceptible to. Imagine a bot for a moment. It’s just a bunch of code that is meant to parse the HTML of millions of sites a day, find any forms in that HTML, and fill them out with their robo-garbage. Let’s look at a super basic form example.

<form method=”POST” action=”/”>
  <label for=”name”>Name</label>
  <input type=”text” name=”name” required />
  <label for=”email”>Email</label>
  <input type=”email” name=”email” required />
  <label for=”message”>Your message</label>
  <input type=”text” name=”message” required />
  <input type=”submit” value=”submit” />
</form>

Notice there is a name, an email, and a message field. Pretty standard stuff right? So our spammy bot is going to see this HTML and fill out the form, putting text in the Name field, an email in the Email field, and some ridiculous text in the Message field. Pretty straightforward. And that’s really the problem. Bots love straightforward forms. But now, let’s look at a form with a honeypot in it.

<form method=”POST” action=”/”>
  <label for=”name”>Name</label>
  <input type=”text” name=”name” required />
  <label for=”email”>Email</label>
  <input type=”email” name=”email” required />
  <p hidden><label for="subject">Subject</label><input type="text" name="subject" /></p>
  <label for=”message”>Your message</label>
  <input type=”text” name=”message” required />
  <input type=”submit” value=”submit” />
</form>

In this form example I once again have the name, email and message fields. But I’ve notably added an Subject field. The Subject field is our honeypot. This field can be whatever you want, I've made mine a subject field, but yours could be anything that fits your form, address, brand, etc. Take note of the p tag with the html attribute of "hidden". The p tag is wrapping our honeypot form label and input. This makes sure that the honeypot is not visible to human eyes that are filling out your form. But if a bot parses your html and submits your form, they will capture that honeypot field, thereby giving away their botty intentions! Now, you just need to build some logic on your server that recognizes if a form submission has content in the honeypot field, if it does, it’s almost certainly a bot submission. You can either direct it to some area that requires further scrutiny, or simply ditch it. Your call.

Method 2: reCAPTCHA

There’s a lot of debate around reCAPTCHA, and rightly so. There are accessibility concerns, user friction, and just generally wondering if you’ve ticked off a site visitor because they’ve already had enough of trying to identify the stupid crosswalks in the last 5 sites they visited. By using a reCAPTCHA, you might actually drive a real human that was looking to contact you, away. All that being said, it doesn’t hurt to know how to implement a reCAPTCHA, and it can definitely help in cutting down on some spammy bots. So firstly, what is a reCAPTCHA? CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart. Wow, that’s a lot, but it’s really pretty simple. It is a test that can be pretty good (not perfect) at telling a human user from a non-human spammy bot. If you’ve used the interwebs at all in the last decade, you’ve undoubtedly had to confirm that you aren’t a robot at one time or another by checking a box or identifying fuzzy characters with lines through them. And this brings us back to the opening of this section, because let’s be honest, you hate these tests and they are crazy annoying. As such, have a good think on it before you just start chucking captchas on everything. You can really hurt UX. But we’re squashing bots here, and so we forge ahead!

There are various reCAPTCHAs available, and depending on your use case and developer environment, you may favor one over the other. In the following example I’ll be building out an invisible reCAPTCHA v2 in a React application and using a custom NodeJS server to validate my user captcha tokens.

Before we start firing up projects, let’s head over to the Google reCAPTCHA Admin Console where you’ll register a new domain that will be associated with reCAPTCHA. Choose which version of reCAPTCHA you’d like to use (I’ll be using reCAPTCHA v2 - invisible recaptcha in this section). It’s important to note that the package I use later on in the React portion, only supports v2 reCAPTCHA options. So if you want to stick with this guide, you’ll want to pick one of the v2 options. Give your domain a label and also be sure to add the domain address, as well as localhost, and 127.0.0.1 to the address section. This should ensure that you can test your reCAPTCHA locally. Save your details and google will give you access to your reCAPTCHA site key and secret. Both of these are important and can be accessed via this Google reCaptcha admin settings screen, so no worries if you don’t copy them down immediately.

With this done, let’s talk about the steps necessary to implement this reCAPTCHA.

Step One. I’ll be building out a quick backend server to validate the captcha tokens generated by our frontend. This will be a separate NodeJS project. Like me, you may be implementing reCAPTCHA after the fact, meaning your frontend is already deployed and you may or may not have control of the backend server. If you have access/control - feel free to build the token validation route into your already existing server code.

Step Two. I’ll build a form in React and will use the react-google-recaptcha package to create a simple reCAPTCHA component that will live inside my form.

Step Three. I’ll need to wire these two together to effectively generate a reCAPTCHA token on our frontend and validate it with our backend.

Let’s gooooo!

Quick side note - this is not a NodeJS tutorial and I won’t be going into each and every little thing I’m doing with my Node server. I’ll just be building out a basic server with one route to receive my frontend token and to respond back to my frontend with a success or failure.

To start with I’m just going to use “npm init” within a directory of my choosing to get rolling with an extremely basic node project. Then I’ll add in a few packages we’ll need with “npm i express nodemon axios dotenv cors”. With this done, I’ll create a .env file at the root of the project and I’ll create my reCAPTCHA secret environment variable and set it equal to whatever is in my reCAPTCHA admin panel for the domain I set up. Next I’ll create a .gitignore file at the base of my project and I’ll add “/node_modules” as well as “.env”. This is EXTREMELY important if you decide to push this code to an external repo. We don’t need to go exposing our secrets and we certainly don’t need to go pushing 7 billion node modules to github (way to go last time…PHIL). With that done let’s add your entry point file you selected during the npm init. Default is index.js, but I usually call my server.js. And before we get a look at the index.js (or server.js if you’re like me) code, let’s just quickly add a dev script to our package.json so you can test this sucker. In package.json under “scripts”, add the following:

“start”: “node index” // make sure this actually matches to your entry “dev”: “nodemon index” // again - make sure it’s your entry

Okay, great. I think that should do it for the setup. Let’s see some code!

import dotenv from 'dotenv';
dotenv.config();
import express from 'express';
import cors from 'cors';
import axios from 'axios';
// creating the server
const app = express();
const corsOptions = {
	origin: [
		'http://localhost:3000',  // make sure this is your frontend local
		'https://your-prod.site', // make sure this is your prod deployed
		// include any other origins you want to greenlight access to
	],
	optionsSuccessStatus: 200,
	methods: 'POST',
};
// middleware
app.use(express.urlencoded({ extended: true }));
app.use(express.json());
app.use(cors(corsOptions));
// needed vars
const captchaSecret = process.env.YOUR_RECAPTCHA_SECRET;
const captchaAPIBase = 'https://www.google.com/recaptcha/api/siteverify';
// define routes
app.post('/verify', async (req, res) => {
	const token = await req.body.token;
	try {
	const response = await axios.post(`${captchaAPIBase}?secret=${captchaSecret}&response=${token}`);
        const resObj = {
		success: response.data.success,
		hostname: response.data.hostname,
		};
	if (resObj.success) {
		return res.status(200).send({ success: resObj.success });
		}
		return res.status(400).send({ success: false });
	} catch (error) {
		return res
			.status(500)
			.send({ error, message: 'Server issues...', success: false });
	}
});
// create a port and listen for server
const PORT = process.env.PORT || 5000;
app.listen(PORT, () => console.log(`Server is up on port ${PORT}`));

As I mentioned, this is not a Node tutorial, but I’ll quickly explain this code. Dotenv is going to allow us to grab environment variables from our .env file, so we import it and then immediately configure it. We are using express to get a server going, I like to use axios for my outgoing API call to google to validate the token we’ll be sending to our server via a POST request, and I’m setting up some CORS (cross origin) options that should set us up for success in both testing our server from our frontend dev environment (localhost) as well as our production frontend. The post verify route expects a token in the request body and then calls the google recaptcha site verify API, which in turn, will give us back an object that looks like:

// success is true | false
// timestamp is of the challenge load in ISO format
// hostname is the domain that got the token
// error codes are an optional array
{
  "success": boolean,
  "challenge_ts": timestamp,
  "hostname": string, 
  "error-codes": [...]
}

Ideally, we should be verifying that the response object from the google API contains the hostname we expect, the timestamp is within two mins, and that the outcome was a success. In the above Node example, I’m being lazy and just checking the success of the reCAPTCHA challenge. But you get the idea!

With the backend all set up - we can move on to our frontend and start working with this reCAPTCHA in our form that’s getting blasted by bots!

Just another quick aside - I’m using a React project as an example, but the principals involved here should translate to other libraries and frameworks as well. You could even use vanilla JS if that’s your fancy. You do you!

I’m kind of assuming you have the bulk of your frontend project complete and probably deployed, so I won’t get into how to start rolling with a React project, but googling npx create-react-app is as good a place as any to start researching if you are wondering where to begin. Assuming you are beyond that, we’ll want to add the react-google-recaptcha npm package to your project by using “npm i react-google-recaptcha”. You can check out the package details here. If you don’t have it already, get axios as well, with “npm i axios”. Let’s see the code!

import { useRef, useState } from 'react';
import ReCAPTCHA from 'react-google-recaptcha';
import axios from 'axios';

export default function App() {
  const [formData, setFormData] = useState({
    name: "",
    email: "",
    message: "",
  })
  const {name, email, message} = formData;
  const reCaptchaInstance = useRef();
  const sitekey = process.env.YOUR_SITE_KEY;
  const handleInput = (e) => {
    setFormData({...formData, [e.target.name]: e.target.value});
  }
  const checkForm = (e) => {
    e.preventDefault();
    // handle your form input validation here...
    // if all passes proceed to submit...
    submitForm();
  }
  const submitForm = async () => {
    try {
      const token = await reCaptchaInstance.current.executeAsync();
      const body = { token };
      const response = await axios.post('your_node_server', body);
      if (response.data.success) {
        // grab your form data from state and submit your form 
        // to wherever it goes...
      }
      // if we make it here the our node server returned  
      // response.data.success as false
      // handle the failure however you choose...
      // but DO NOT GIVE THE BOTS TOO MUCH INFO!
      // some kind of modal or message that says 
      // couldn't submit form will be fine...
    } catch (error) {
      // if we've thrown an error we could have server issues
      // etc...handle this how you wish as well, but again
      // I wouldn't go too in depth...try again later would be fine
    }
  }
  return (
    <div className="App">
      <form>
        <label htmlFor="name">Name</label>
        <input 
          type="text"
          name="name" 
          value={name} 
          onChange={e => handleInput(e)}
        />
        <label htmlFor="email">Email</label>
        <input 
          type="email"
          name="email" 
          value={email} 
          onChange={e => handleInput(e)}
        />
        <label htmlFor="message">Message</label>
        <input 
          type="text"
          name="message" 
          value={message} 
          onChange={e => handleInput(e)}
        />
        <button onClick={(e) => checkForm(e)}>submit</button>
        <ReCAPTCHA
          ref={reCaptchaInstance}
          size="invisible"
          sitekey={siteKey}
          theme="dark"
          badge="inline"
        />
      </form>
    </div>
  );
}

Nothing too crazy happening here. Hopefully you are familiar with most of what’s going on. Notice that I have a reCaptchaInstance variable that is set to the useRef() hook. This will correspond to the RECAPTCHA element ref at the bottom of the form. Notice too that the RECAPTCHA element has a sitekey attribute that I’m setting to a process.env Environment Variable. It’s important to note, this is the SITE KEY, not the SECRET. We used the secret on the backend remember! I’ve then got a basic form here and my inputs are updating a state object with name, email, and message. When the submit button is clicked we trigger a function that will begin to check that our form fields contain what we’d expect. If we pass our validations then we move to handle the form submission with our async function. Here’s where things get interesting. So first we create a const I call “token” where we await what comes back from calling “executeAsync()” on the reCaptchaInstance.current. This will likely cause a reCAPTCHA challenge on your form. The user (or spammy bot) will attempt to complete the challenge. What gets stored in this token variable is the resulting challenge token that we need to verify via our Node backend.

Important misconception! Simply receiving this token on your frontend does not indicate passing or failure of the challenge. The token NEEDS to be verified through the google site verify API and this needs to be done through your backend to ensure that you are getting the ACTUAL token results and not some fake result that a particularly nasty bot could fake. Read more about it here.

So once we get this token (if you console log it, you’ll get a massive string) we then put it into an object that I’m calling body. I then create a variable called response where we await what comes back from an axios.post to our Node server path we specified in the first part. Make sure to tack on the body after the node endpoint as our endpoint is expecting a token in the request body. If our user passed the challenge, we’ll get a boolean true in the response.data.success. And while this doesn’t always mean you had a human submission, it definitely helps! Based on the response we can proceed with form submission or exit out and just give a generic message about not being able to submit. Don’t give those bots too much info to work with.

Method 3: Scorched Earth

I hate to tell you this. But even your best efforts to thwart bots can often, and will often, fail. Bots can be vicious. Sometimes we have to admit that the best solution to a problem, may not be the solution we coded ourselves. If you are still seeing WAY too much spam in your forms even after implementing the above solutions, you can move to outsourcing your form collection and protection to a third party. Formspark is a great tool if you find yourself in this boat. You can view the Formspark documentation here. The signup, installation, customizability, and use of Formspark is dead simple and works with just about whatever project platform you are probably using. Register a form after creating account, get a form id, and implementation and testing looks as simple as this:

// installation
<form action="https://submit-form.com/your-form-id">
  <input type="email" name="email" />
  <button type="submit">Subscribe</button>
</form>
// testing
<form action="https://submit-form.com/echo">
  <input type="text" name="message" />
  <button type="submit">Send</button>
</form>

The best part is that Formspark, at the time of writing, offers ridiculously simple pricing models. You can literally try the product for absolutely no cost. Trying Formspark for free gets you 250 total form submissions. Once this initial 250 submissions run out, you can simply buy an "upgrade" package that is a dead simple, single payment model, for $25USD. This "upgrade" is not a recurring payment. It's one time, and nets you 50,000 form submissions. No strings. Until the end of the internet. If you blow through those 50k submissions, just make another payment to get yourself more! This could be a great fit for your smaller portfolio/product sites that have a naturally smaller contact base.

Wrap up

None of these methods are going to completely stop the junk that floods your forms. And some of these methods, arguably the captcha, are quite involved and might end up leading to user friction which is bad. My sole goal with this article was to share some knowledge, and share some techniques, that at the very least, might help send you down a path towards less spam and fewer bots trying to tell you that the SEO on your site could really use a boost. Your SEO is totally fine. (I actually have no idea…you may benefit from checking your SEO..omg…did I just become a bot…)