src.kg package
Submodules
src.kg.knowledge_graph module
Knowledge graph module for this project.
Exception intended for knowledge base errors. |
|
Exception intended for knowledge graph errors. |
|
Dataclass intended to encapsulate knowledge bases. |
|
Dataclass intended to encapsulate knowledge graphs. |
|
Scrape Stony Brook University's course catalog for a specific major's course information. |
|
Parse major requirements from a string into a list of lists of course codes. |
|
Parse major requirements from a string into a list of lists of course codes. |
|
Clean course title by removing any additional information after '**'. |
|
Remove any non-digit characters from the course number. |
|
Helper function to get course components. |
|
Scrape Stony Brook University's undergraduate CSE course offering webpage. |
|
Scrape Stony Brook University's CSE graduate course offering webpage. |
|
Scrape Stony Brook University's CSE undergraduate and graduate course offering webpages. |
- class src.kg.knowledge_graph.KnowledgeBase(url='', pdf='', txt='', ergo='', lp='')[source]
Bases:
objectDataclass intended to encapsulate knowledge bases.
Note
Only one knowledge base file needs to be specified.
- Example usage:
>>> kb = KnowledgeBase(url="https://www.stonybrook.edu") >>> kb.url 'https://www.stonybrook.edu'
- Raises:
KnowledgeBaseError – Arises if the knowledge base file or representation is not specified. Valid knowledge base files include PDF, TXT or ERGO files, and valid representations include a URL.
- url
URL knowledge base website link.
- pdf
PDF knowledge base file.
- txt
TXT knowledge base file.
- ergo
ERGO knowledge base file.
- lp
Logic programming (
Clingo) knowledge base file.
- exception src.kg.knowledge_graph.KnowledgeBaseError[source]
Bases:
ExceptionException intended for knowledge base errors.
- class src.kg.knowledge_graph.KnowledgeGraph(json='', ergo='', rdf='', owl='', csv='', df=Empty DataFrame Columns: [] Index: [], lp='')[source]
Bases:
objectDataclass intended to encapsulate knowledge graphs.
Note
Only one knowledge graph file needs to be specified.
- Example usage:
>>> kg = KnowledgeGraph(json="path/to/file.json") >>> kg.json 'path/to/file.json'
- Raises:
KnowledgeGraphError – Arises if the knowledge graph file is not specified. Valid knowledge graph files include JSON, ERGO, RDF, OWL, CSV files.
- json
JSON knowledge graph file.
- ergo
ERGO knowledge graph file.
- rdf
RDF knowledge graph file.
- owl
OWL knowledge graph file.
- csv
CSV knowledge graph file.
- lp
Logic programming (
Clingo) knowledge graph file.
-
df:
DataFrame= Empty DataFrame Columns: [] Index: []
- exception src.kg.knowledge_graph.KnowledgeGraphError[source]
Bases:
ExceptionException intended for knowledge graph errors.
- src.kg.knowledge_graph.clean_course_title(course_title)[source]
Clean course title by removing any additional information after ‘**’.
- src.kg.knowledge_graph.get_course_components(driver)[source]
Helper function to get course components. Course components may include more than one word.
- Parameters:
driver – (
Selenium WebDriver) Input webdriver object.- Returns:
Tuple that consists of course components.
- src.kg.knowledge_graph.get_sbu_cse_course_offered_info(undergrad_url, grad_url)[source]
Scrape Stony Brook University’s CSE undergraduate and graduate course offering webpages.
Warning
The URLs used in
Usage examplewere (accessed and) current as of May 03 2024.The tables located at each URL contain information: Spring 2023, Fall 2023, Spring 2024, and Fall 2024 – this will need to be updated in this function in the future.
- Usage example:
>>> undergrad_url = "https://www.cs.stonybrook.edu/students/Undergraduate-Studies/csecourses" >>> grad_url = "https://www.cs.stonybrook.edu/students/Graduate-Studies/courses" >>> df = get_sbu_cse_course_offered_info(undergrad_url=undergrad_url, grad_url=grad_url)
- Parameters:
- Return type:
DataFrame- Returns:
Pandas DataFrame containing the undergraduate and graduate course offering information.
- src.kg.knowledge_graph.get_sbu_cse_grad_course_offered_info(url)[source]
Scrape Stony Brook University’s CSE graduate course offering webpage.
- Usage example:
>>> url = "https://www.cs.stonybrook.edu/students/Graduate-Studies/courses" >>> df = get_sbu_cse_grad_course_offered_info(url=url)
- Parameters:
url (
str) – URL of the Stony Brook University graduate course offering webpage.- Return type:
DataFrame- Returns:
Pandas DataFrame containing the graduate course offering information.
- src.kg.knowledge_graph.get_sbu_cse_undergrad_course_offered_info(url)[source]
Scrape Stony Brook University’s undergraduate CSE course offering webpage.
- Usage example:
>>> url = "https://www.cs.stonybrook.edu/students/Undergraduate-Studies/csecourses" >>> df = get_sbu_cse_undergrad_course_offered_info(url=url)
- Parameters:
url (
str) – URL of the Stony Brook University undergraduate course offering webpage.- Return type:
DataFrame- Returns:
Pandas DataFrame containing the undergraduate course offering information.
- src.kg.knowledge_graph.parse_prerequisites(input_string)[source]
Parse major requirements from a string into a list of lists of course codes. This function is mainly used to separate disjunctions and conjunctions course prerequisites. Disjunctions are grouped together in the same sub-list, while conjunctions are separated into different sub-lists. For example,
"Prerequisite: CSE 216 or CSE 260; AMS 310; CSE major"would be parsed as:[["CSE 216", "CSE 260"], ["AMS 310"], ["CSE major"]].Warning
This function is deprecated. Use
parse_requirements()instead.
- Usage example:
>>> input_string = "Prerequisite: CSE 216 or CSE 260; AMS 310; CSE major" >>> parse_prerequisites(input_string) [['CSE 216', 'CSE 260'], ['AMS 310'], ['CSE major']]
- src.kg.knowledge_graph.parse_requirements(input_string)[source]
Parse major requirements from a string into a list of lists of course codes. This function is mainly used to separate disjunctions and conjunctions of course: prerequisites, anti-requisites and corequisites. Disjunctions are grouped together in the same sub-list, while conjunctions are separated into different sub-lists. Returns lists for prerequisites, anti-requisites, and corequisites.
Note
Disjunctive statements will appear in the same sub-list, while conjunctive statements will appear in a separate sub-list.
Use this function in place of
parse_prerequisites().
- Usage example:
>>> input_string = "Prerequisite: CSE 216 or CSE 260; AMS 310; Anti-requisite: CSE 260. Corequisite: CSE 161." >>> parse_requirements(input_string) ([['CSE216', 'CSE260'], ['AMS310']], [['CSE260']], [['CSE161']])
- src.kg.knowledge_graph.remove_non_numeric(course_number)[source]
Remove any non-digit characters from the course number.
- src.kg.knowledge_graph.scrape_sbu_solar(url, major_three_letter_code, wait_time=10, headless=True, verbose=False, output_filename=None)[source]
Scrape Stony Brook University’s course catalog for a specific major’s course information. This function scrapes Stony Brook University’s course catalog and stores the information in a
KnowledgeGraphobject. The course information includes course number, title, career, units, grading basis, enrollment requirements, anti-requisites, corequisites, course components, academic group, academic organization, and course description. Additionally, information to when courses are offered over a 4 semester span (specific only to CSE courses) is also included. This information is scraped from the CSE department’s website, and is hardcoded for CSE courses only see (get_sbu_cse_course_offered_info()).Warning
This function uses a
Selenium WebDriverand specificdivIDs to scrape the course catalog.
- Usage example:
>>> url = "https://prod.ps.stonybrook.edu/psc/csprodg/EMPLOYEE/CAMP/c/COMMUNITY_ACCESS.SSS_BROWSE_CATLG.GBL?" >>> kg = scrape_sbu_solar( ... url=url, ... major_three_letter_code="cse", ... wait_time=10, ... headless=True, ... verbose=True,)
- Parameters:
url (
Union[KnowledgeBase,str]) – Input Stony Brook URL (orKnowledgeBaseobject) to scrape.major_three_letter_code (
str) – Three letter code for the major (e.g. CSE for computer science).wait_time (
int) – Maximum wait time (in seconds) for each click operation. Defaults to 10.headless (
bool) – Do not open brower. Defaults to True.verbose (
bool) – Print output to screen. Defaults to False.output_filename (
Optional[str]) – Output filename for the JSON file. Defaults to None.
- Raises:
ValueError – Arises if the course table is not displayed, is empty, or if the wait time is less than 0 seconds.
- Return type:
- Returns:
KnowledgeGraphobject containing course information that corresponds to an output JSON file.