I was approached by a fresher on Linkedin, he was looking for a solution to a question that was asked to him in a recent Interview.
The question was to โWrite a script to fetch top 3 results from Google searchโ in JavaScript.
I didnโt had the answer to it right away but as a curious engineer I decided to solve this.
Being the most popular search engine, I knew that Google would provide different results for different search queries and it may also include Ads, Featured posts, Q&A, News, etc.
For example, when I queried โTitanicโ, I got two relevant links and then the remaining search result.
For another query โlast element of string javascriptโ, I got one featured post and then the relevant results.
And then there is a normal result for the query โlast element of string javaโ but it contained a suggestion section.
I am sure there must be many more types of search results, thus I decided not to consider all the edge cases as they would be redundant, I decided to go for the above three only and return the links of the only relevant results (ones with heading).
The first thing I did after this was inspected the DOM elements to find out how HTML elements are generated so that I can write a script around it to get the links.
What I found out was that Google search results were generated inside a div
with the id โsearchโ that contained another div
with id โrsoโ and inside that the results were differentiated with the classes based on their type. For example, the result with the heading (normal results) was placed inside the class โMjjYudโ.
And inside this, the links were placed inside a div with attribute โ[data-header-feature=โ0โฒ]โ
I got a hook and based on this I can write a script that will fetch me the links.
const getTop3Links = () => { const result = []; const multipleLinksResults = document.querySelectorAll("#rso [data-header-feature='0'] a"); for(let link of multipleLinksResults){ const href = link.getAttribute('href'); if(href) result.push(href); } return result.slice(0, 3); }
This div with the attribute was common for the Normal post as well as for multiple results like for Titanic.But for the third case Featured post, it was a little different, the featured post has a hidden h2
with the text โFeatured snippet from the webโ, based on this I had to write the script to fetch the nearest class of โMjjYudโ and get the link.
Thus I had to write a different script to cover this case.
const getFeaturedLinks = () => { const h2s = document.getElementsByTagName('h2'); for(let h2 of h2s){ if(h2.innerText === 'Featured snippet from the web'){ const parent = h2.closest('.MjjYud'); return parent.querySelector('.yuRUbf a').getAttribute('href'); } } return undefined; }
Combining both the scripts together, we can get the top 3 links from Google search, with the featured post if it exists otherwise normal heading links.
const getTop3Links = () => { const result = []; const featured = getFeaturedLinks(); if(featured) result.push(featured); const multipleLinksResults = document.querySelectorAll("#rso [data-header-feature='0'] a"); for(let link of multipleLinksResults){ const href = link.getAttribute('href'); if(href) result.push(href); } return result.slice(0, 3); } const getFeaturedLinks = () => { const h2s = document.getElementsByTagName('h2'); for(let h2 of h2s){ if(h2.innerText === 'Featured snippet from the web'){ const parent = h2.closest('.MjjYud'); return parent.querySelector('.yuRUbf a').getAttribute('href'); } } return undefined; }
Try it out yourself and cover the remaining edge cases.