https://berlinstartupjobs.com/engineering
Jobs in IT & Software Development | Berlin Startup Jobs
Tech jobs in Berlin for frontend and backend developers specializing in Python, PHP, Ruby on Rails, or Java, iOS & Android Developers, JavaScript Developers, MySQL and HTML/CSS specialists.
berlinstartupjobs.com
프로그래밍 스킬키워드를 크롤링해 수집한 단어들로 다시 크롤링하는 형태다
스킬 목록을 지정할 때와 그렇지 않은 때 선택지가 있다.
지정한 스킬만 크롤링 하거나 해당 페이지에서 제공하는 모든 스킬을 크롤링하여 추출한다
bs4에서 제공하는 select_one 메소드가 사용하기 더 편하게 느껴진다
import requests
from bs4 import BeautifulSoup
class Scraper:
def __init__(self):
self.skills = []
def set_skills(self, skills = None):
# specified position
if skills:
self.skills = skills
return
# all position
response = requests.get('https://berlinstartupjobs.com/engineering',
headers = {
'User-Agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36'
}
)
soup = BeautifulSoup(response.content, 'html.parser')
links_box = soup.find_all('ul', class_ = 'links')[3]
skills_links = links_box.find_all('a')
for link in skills_links:
self.skills.append(link.text.split(' ')[0].strip())
def get_resource(self, skill):
response = requests.get(f'https://berlinstartupjobs.com/skill-areas/{skill}/',
headers = {
'User-Agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36'
}
)
if response.status_code >= 400:
return False
return BeautifulSoup(response.content, 'html.parser')
def refine(self, data):
link = data.select_one('h4.bjs-jlid__h > a')
title = link.text
company = data.select_one('a.bjs-jlid__b').text
description = data.select_one('div.bjs-jlid__description').text
info = {
'link': link['href'],
'title': title,
'company': company,
'description': description
}
return info
def get_jobs(self):
result = {}
for skill in self.skills:
soup = self.get_resource(skill)
# Not Found Page
if not soup:
continue
jobs = soup.find_all('div', class_= 'bjs-jlid__wrapper')
jobs_data = []
for job in jobs:
jobs_data.append(self.refine(job))
result[skill] = jobs_data
return result
skills = [
"python",
"typescript",
"javascript",
"rust"
]
s = Scraper()
s.set_skills(skills = skills)
result = s.get_jobs()
for key, value in result.items():
print(key, ':', len(value))
for job in value:
print(job)
print()
bs4(BeautifulSoup)
크롤링에 필요한 모듈을 받아서 진행하자 pip install requests pip install bs4 https://remoteok.com/ Remote Jobs in Programming, Design, Sales and more #OpenSalaries Looking for a remote job? Remote OK® is the #1 Remote Job Board and has 597
cloakinghost.tistory.com
728x90
'Python' 카테고리의 다른 글
CentOS 7 에 Python 3.10 이상 설치 (0) | 2024.08.13 |
---|---|
인공지능 로드맵 (0) | 2024.06.28 |
playwright(동적 스크래핑) (0) | 2024.01.28 |
bs4(BeautifulSoup) (0) | 2024.01.27 |