src.kg package

Submodules

src.kg.knowledge_graph module

Knowledge graph module for this project.

KnowledgeBaseError

Exception intended for knowledge base errors.

KnowledgeGraphError

Exception intended for knowledge graph errors.

KnowledgeBase

Dataclass intended to encapsulate knowledge bases.

KnowledgeGraph

Dataclass intended to encapsulate knowledge graphs.

scrape_sbu_solar

Scrape Stony Brook University's course catalog for a specific major's course information.

parse_requirements

Parse major requirements from a string into a list of lists of course codes.

parse_prerequisites

Parse major requirements from a string into a list of lists of course codes.

clean_course_title

Clean course title by removing any additional information after '**'.

remove_non_numeric

Remove any non-digit characters from the course number.

get_course_components

Helper function to get course components.

get_sbu_cse_undergrad_course_offered_info

Scrape Stony Brook University's undergraduate CSE course offering webpage.

get_sbu_cse_grad_course_offered_info

Scrape Stony Brook University's CSE graduate course offering webpage.

get_sbu_cse_course_offered_info

Scrape Stony Brook University's CSE undergraduate and graduate course offering webpages.

class src.kg.knowledge_graph.KnowledgeBase(url='', pdf='', txt='', ergo='', lp='')[source]

Bases: object

Dataclass intended to encapsulate knowledge bases.

Note

  • Only one knowledge base file needs to be specified.

Example usage:
>>> kb = KnowledgeBase(url="https://www.stonybrook.edu")
>>> kb.url
'https://www.stonybrook.edu'
Raises:

KnowledgeBaseError – Arises if the knowledge base file or representation is not specified. Valid knowledge base files include PDF, TXT or ERGO files, and valid representations include a URL.

url

URL knowledge base website link.

pdf

PDF knowledge base file.

txt

TXT knowledge base file.

ergo

ERGO knowledge base file.

lp

Logic programming (Clingo) knowledge base file.

ergo: str = ''
lp: str = ''
pdf: str = ''
txt: str = ''
url: str = ''
exception src.kg.knowledge_graph.KnowledgeBaseError[source]

Bases: Exception

Exception intended for knowledge base errors.

class src.kg.knowledge_graph.KnowledgeGraph(json='', ergo='', rdf='', owl='', csv='', df=Empty DataFrame Columns: [] Index: [], lp='')[source]

Bases: object

Dataclass intended to encapsulate knowledge graphs.

Note

  • Only one knowledge graph file needs to be specified.

Example usage:
>>> kg = KnowledgeGraph(json="path/to/file.json")
>>> kg.json
'path/to/file.json'
Raises:

KnowledgeGraphError – Arises if the knowledge graph file is not specified. Valid knowledge graph files include JSON, ERGO, RDF, OWL, CSV files.

json

JSON knowledge graph file.

ergo

ERGO knowledge graph file.

rdf

RDF knowledge graph file.

owl

OWL knowledge graph file.

csv

CSV knowledge graph file.

lp

Logic programming (Clingo) knowledge graph file.

csv: str = ''
df: DataFrame = Empty DataFrame Columns: [] Index: []
ergo: str = ''
json: str = ''
lp: str = ''
owl: str = ''
rdf: str = ''
exception src.kg.knowledge_graph.KnowledgeGraphError[source]

Bases: Exception

Exception intended for knowledge graph errors.

src.kg.knowledge_graph.clean_course_title(course_title)[source]

Clean course title by removing any additional information after ‘**’.

Parameters:

course_title (str) – Course title string.

Return type:

str

Returns:

Cleaned course title string.

src.kg.knowledge_graph.get_course_components(driver)[source]

Helper function to get course components. Course components may include more than one word.

Parameters:

driver – (Selenium WebDriver) Input webdriver object.

Returns:

Tuple that consists of course components.

src.kg.knowledge_graph.get_sbu_cse_course_offered_info(undergrad_url, grad_url)[source]

Scrape Stony Brook University’s CSE undergraduate and graduate course offering webpages.

Warning

  • The URLs used in Usage example were (accessed and) current as of May 03 2024.

  • The tables located at each URL contain information: Spring 2023, Fall 2023, Spring 2024, and Fall 2024 – this will need to be updated in this function in the future.

Usage example:
>>> undergrad_url = "https://www.cs.stonybrook.edu/students/Undergraduate-Studies/csecourses"
>>> grad_url = "https://www.cs.stonybrook.edu/students/Graduate-Studies/courses"
>>> df = get_sbu_cse_course_offered_info(undergrad_url=undergrad_url, grad_url=grad_url)
Parameters:
  • undergrad_url (str) – URL of the Stony Brook University undergraduate course offering webpage.

  • grad_url (str) – URL of the Stony Brook University graduate course offering webpage.

Return type:

DataFrame

Returns:

Pandas DataFrame containing the undergraduate and graduate course offering information.

src.kg.knowledge_graph.get_sbu_cse_grad_course_offered_info(url)[source]

Scrape Stony Brook University’s CSE graduate course offering webpage.

Usage example:
>>> url = "https://www.cs.stonybrook.edu/students/Graduate-Studies/courses"
>>> df = get_sbu_cse_grad_course_offered_info(url=url)
Parameters:

url (str) – URL of the Stony Brook University graduate course offering webpage.

Return type:

DataFrame

Returns:

Pandas DataFrame containing the graduate course offering information.

src.kg.knowledge_graph.get_sbu_cse_undergrad_course_offered_info(url)[source]

Scrape Stony Brook University’s undergraduate CSE course offering webpage.

Usage example:
>>> url = "https://www.cs.stonybrook.edu/students/Undergraduate-Studies/csecourses"
>>> df = get_sbu_cse_undergrad_course_offered_info(url=url)
Parameters:

url (str) – URL of the Stony Brook University undergraduate course offering webpage.

Return type:

DataFrame

Returns:

Pandas DataFrame containing the undergraduate course offering information.

src.kg.knowledge_graph.parse_prerequisites(input_string)[source]

Parse major requirements from a string into a list of lists of course codes. This function is mainly used to separate disjunctions and conjunctions course prerequisites. Disjunctions are grouped together in the same sub-list, while conjunctions are separated into different sub-lists. For example, "Prerequisite: CSE 216 or CSE 260; AMS 310; CSE major" would be parsed as: [["CSE 216", "CSE 260"], ["AMS 310"], ["CSE major"]].

Warning

Usage example:
>>> input_string = "Prerequisite: CSE 216 or CSE 260; AMS 310; CSE major"
>>> parse_prerequisites(input_string)
[['CSE 216', 'CSE 260'], ['AMS 310'], ['CSE major']]
Parameters:

input_string (str) – Input string containing major course requirements.

Return type:

Union[str, List[List[str]]]

Returns:

List of lists of containing strings that corresponds to course prequisites.

src.kg.knowledge_graph.parse_requirements(input_string)[source]

Parse major requirements from a string into a list of lists of course codes. This function is mainly used to separate disjunctions and conjunctions of course: prerequisites, anti-requisites and corequisites. Disjunctions are grouped together in the same sub-list, while conjunctions are separated into different sub-lists. Returns lists for prerequisites, anti-requisites, and corequisites.

Note

  • Disjunctive statements will appear in the same sub-list, while conjunctive statements will appear in a separate sub-list.

  • Use this function in place of parse_prerequisites().

Usage example:
>>> input_string = "Prerequisite: CSE 216 or CSE 260; AMS 310; Anti-requisite: CSE 260. Corequisite: CSE 161."
>>> parse_requirements(input_string)
([['CSE216', 'CSE260'], ['AMS310']], [['CSE260']], [['CSE161']])
Parameters:

input_string (str) – Input string containing major course requirements.

Return type:

Tuple[List[List[str]], List[List[str]], List[List[str]]]

Returns:

Tuple of lists containing strings that corresponds to course prerequisites, anti-requisites, and corequisites.

src.kg.knowledge_graph.remove_non_numeric(course_number)[source]

Remove any non-digit characters from the course number.

Parameters:

course_number (str) – Course number string.

Return type:

str

Returns:

Cleaned course number string.

src.kg.knowledge_graph.scrape_sbu_solar(url, major_three_letter_code, wait_time=10, headless=True, verbose=False, output_filename=None)[source]

Scrape Stony Brook University’s course catalog for a specific major’s course information. This function scrapes Stony Brook University’s course catalog and stores the information in a KnowledgeGraph object. The course information includes course number, title, career, units, grading basis, enrollment requirements, anti-requisites, corequisites, course components, academic group, academic organization, and course description. Additionally, information to when courses are offered over a 4 semester span (specific only to CSE courses) is also included. This information is scraped from the CSE department’s website, and is hardcoded for CSE courses only see (get_sbu_cse_course_offered_info()).

Warning

  • This function uses a Selenium WebDriver and specific div IDs to scrape the course catalog.

Usage example:
>>> url = "https://prod.ps.stonybrook.edu/psc/csprodg/EMPLOYEE/CAMP/c/COMMUNITY_ACCESS.SSS_BROWSE_CATLG.GBL?"
>>> kg = scrape_sbu_solar(
...        url=url,
...        major_three_letter_code="cse",
...        wait_time=10,
...        headless=True,
...        verbose=True,)
Parameters:
  • url (Union[KnowledgeBase, str]) – Input Stony Brook URL (or KnowledgeBase object) to scrape.

  • major_three_letter_code (str) – Three letter code for the major (e.g. CSE for computer science).

  • wait_time (int) – Maximum wait time (in seconds) for each click operation. Defaults to 10.

  • headless (bool) – Do not open brower. Defaults to True.

  • verbose (bool) – Print output to screen. Defaults to False.

  • output_filename (Optional[str]) – Output filename for the JSON file. Defaults to None.

Raises:

ValueError – Arises if the course table is not displayed, is empty, or if the wait time is less than 0 seconds.

Return type:

KnowledgeGraph

Returns:

KnowledgeGraph object containing course information that corresponds to an output JSON file.

Module contents