BeautiFul Soup search()
Beautiful Soup searching problem
import requests from bs4 import BeautifulSoup URL= "https://realpython.github.io/fake-jobs/" page = requests.get(URL) soup = BeautifulSoup(page.content,"html.parser") results = soup.find(id="ResultsContainer") print(results)
Find Elements by Class Name and Text Content
Not all of the job listings are developer jobs.
Instead of printing out all the jobs listed on the website, you’ll first filter them using keywords.
You know that job titles in the page are kept within <h2> elements. To filter for only specific jobs
, you can use the string argument:
python_jobs = results.find_all("h2", string="Python")
This code finds all <h2> elements where the contained string matches “Python” exactly. N
ote that you’re directly calling the method on your first results variable.
If you go ahead and print() the output of the above code snippet to your console,
then you might be disappointed because it’ll be empty:
There was a Python job in the search results, so why is it not showing up?
When you use string= as you did above,
your program looks for that string exactly.
Any differences in the spelling, capitalization,
or whitespace will prevent the element from matching.
In the next section, you’ll find a way to make your search string more general.
Pass a Function to a Beautiful Soup Method
import requests from bs4 import BeautifulSoup URL= "https://realpython.github.io/fake-jobs/" page = requests.get(URL) soup = BeautifulSoup(page.content,"html.parser") results = soup.find(id="ResultsContainer") python_jobs = results.find_all( "h2", string=lambda text: "python" in text.lower() ) print(python_jobs)
Now you’re passing an anonymous function to the string= argument.
The lambda function looks at the text of each <h2> element,
converts it to lowercase, and checks whether the substring
“python” is found anywhere. You can check whether you managed to identify all the Python jobs with this approach: