A few days back I created Butler-AI that allows you to use the ChatGPT on any website by simply typing the command “butler: whatever you want to do;”. It is a Chrome extension and works like a charm on many different websites.
You can see how it works in the following image.
After receiving an overwhelming response to it, I decided to create a simple tutorial around this on how it works and how you can create a similar extension of your own.
So let’s get started.
Setup the chrome extension
Chrome currently suggests using the Manifest V3 to define any extension and we are going to use the same. manifest.json
is the config file that defines any Chrome extension.
Define a manifest.json
in your directory and inside that add the following things.
{ "name": "Butler AI - Powered by ChatGPT", "description": "Use the power of ChatGPT at your fingertips, The butler will serve its master.", "author": "Prashant Yadav", "version": "0.0.1", "manifest_version": 3, "permissions": ["storage", "activeTab"], "host_permissions": [""], "action": { "default_popup": "popup.html" }, "content_scripts": [ { "matches": [" "], "runAt": "document_end", "js": ["script.js"], "all_frames": true } ] }
Many things are self-explanatory, let’s do a walkthrough of the important properties.
- permissions: This defines all things this Chrome extension will have access to, we want access to the
activeTab
to observe what is being written in the command and respond to that andstorage
to access the localStorage and store some secrets like ChatGPT API key. - action: default_popup: The default HTML page that opens when you click on the ICON of the extension.
- content_scripts: This defines which javascript file to load when a new tab opens and when to run this. Basically, we will open the
script.js
on thedocument_end
(when page load is complete) and for all the URLsall_urls
in all the framesIframes
as well. Inside thisscript.js
our all logic will be present.
popup.html
Written a simple message just to make sure popup.html
is loading properly.
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8" /> <meta http-equiv="X-UA-Compatible" content="IE=edge" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <title>Butler AI</title> </head> <body> <h1>Butler AI Powered By ChatGPT</h1> </body> </html>
script.js
Printing a message to check if the script is being properly injected or not.
console.log("ButlerAI");
Load the Chrome extension
Now that our boilerplate is ready, let’s load the extension and see if it working fine or not.
Remember that we will have to load this on developer mode on the local machine.
- Open the Chrome browser.
- Go to settings > extensions.
- Enable the Developer mode on right handside top corner.
- Click on Load unpacked button on the left handside top corner.
- Navigate and load your directory.
Once you have loaded the extension, open a new tab and navigate to StackEdit and in the console, the message ButlerAI
should be printed.
Observe the text being written on any webpage
For this tutorial, I am going to run this extension on the StackEdit which is a popular markdown editor. Now that the extension is loaded, inside the script.js
we will have to observe what the user is typing and find if anything in the context butler: whatever is the command; is written or not.
To do this, we will listen to the keypress event on the whole window (activeTab) and when the user stops typing, we will parse the HTML and look for any text that starts with butler
and ends with ;
.
Observe what the user is typing on the active tab
This we will do on the debounced event because we want to search only when a user stops typing, thus debouncing is a good thing to do.
// helper function to debounce function calls function debounce(func, delay) { let inDebounce; return function () { const context = this; const args = arguments; clearTimeout(inDebounce); inDebounce = setTimeout(() => func.apply(context, args), delay); }; } // debounced function call const debouncedScrapText = debounce(scrapText, 1000); // observe what the user is typing window.addEventListener("keypress", debouncedScrapText);
Here the scrapText
function will be debounced after 1000 milliseconds that is if the user stopped typing for 1 second then only the function scrapText
will be invoked.
Finding text that starts with butler
We will have to first collect all the text from the page and then check which text starts with butler
and ends with ;
. If any such text is found then we will store the HTML node that contains this text so that we will populate it back with the response of the command and also extract the command.
To narrow down the search, rather than parsing the whole page, we will only parse the HTML element that accepts text. For example where the user can type or provide input.
On the StackEdit the area where the user can write has the attribute contenteditable="true"
thus we can get this element and get its text and parse them.
// regex to check the text is in the form "butler: command;" const getTextParsed = (text) => { const parsed = /butler:(.*?)\;/gi.exec(text); return parsed ? parsed[1] : ""; }; // helper function to get the nodes, extract their text const getTextContentFromDOMElements = (nodes, textarea = false) => { if (!nodes || nodes.length === 0) { return null; } for (let node of nodes) { const value = textarea ? node.value : node.textContent; if (node && value) { const text = getTextParsed(value); if (text) return [node, text]; else return null; } } }; // function to find the text on active tab const scrapText = () => { const ele = document.querySelectorAll('[contenteditable="true"]'); const parsedValue = getTextContentFromDOMElements(ele); if (parsedValue) { const [node, text] = parsedValue; makeChatGPTCall(text, node); }
Here we are getting all the HTML elements, extracting their text, and checking if they are matching the pattern we are expecting, once they match, we get that node (HTML element) and the command.
Different websites have different ways of accepting input, thus you will see that I am checking if the node is of type textarea
or not and getting its value accordingly.
Once we have them we are passing them forward to make the ChatGPT API call.
Get the command response with ChatGPT API
Create this function that will accept the command and the node and populate the node with the response from the ChatGPT API for this command.
We are going to use the completions API of ChatGPT with the text-davinci-003 model. You can use any of the APIs and Models as per your preference, but remember you have limited tokens in the free tier so make a note of it while testing to not exhaust the limit. Explore your choice through this ChatGPT playground.
You will have to pass your API key in the Authorization headers to make it work.
const makeChatGPTCall = async (text, node) => { try { const myHeaders = new Headers(); myHeaders.append("Content-Type", "application/json"); myHeaders.append("Authorization", `Bearer ${apikey}`); // set request payload const raw = JSON.stringify({ model: "text-davinci-003", prompt: text, max_tokens: 2048, temperature: 0, top_p: 1, n: 1, stream: false, logprobs: null, }); // set request options const requestOptions = { method: "POST", headers: myHeaders, body: raw, redirect: "follow", }; // make the api call let response = await fetch("https://api.openai.com/v1/completions", requestOptions); response = await response.json(); const { choices } = response; // remove the spaces from the reponse text const text = choices[0].text.replace(/^\s+|\s+$/g, ""); // populate the node with the response node.textContent = text; } catch (e) { console.error("Error while calling openai api", e); } };
That’s it, reload the extension and see the magic, it should work like a charm on StackEdit.
The most challenging part of this extension is reading the values from different websites as a user writes and then populating back them with the response. For security purposes, websites do many internal things and block the updation of value by just changing the content through javascript.
I made this work on many websites. You can get the source code of Butler-AI for $10, but I leave it up to you, to try and make it work on as many websites as possible.