JCU Logo

James Cook University Subject Handbook - 2024

For subject information from 2025 and onwards, please visit the new JCU Course and Subject Handbook website.

MA3831 - Natural Language Processing, Web Scraping and Large Data Processing

Credit points:03
Year:2024
Student Contribution Band:Band 1
Prerequisites:CP1404 AND MA3405
Administered by:College of Science and Engineering

Subject Description

    This subject will provide students with cutting-edge tools and techniques for data science. There are two parts to this subject. In the first half of the subject, student will explore natural language processing (NLP), web scraping and APIs to harvest data with Python and explore the data science workbench approach to managing production pipelines of work that can be re-used in different data science projects. In the second half of the subject, student will focus on computer models and software designed to handle Big Data sets in a distributed and/or parallel fashion. Particular focus will be given to distributed and parallel computing using Map-Reduce/Hadoop and similar models for processing Big Data sets.

Learning Outcomes

  • understand and apply new data science skills, knowledge and techniques to solve problems in data science using NLP
  • apply data science skills, knowledge and techniques to solve problems in data science NLP projects with a focus on web scraping
  • understand how to deploy data science projects into production pipelines
  • compare and evaluate different systems and approaches for high-performance and large-scale computing for analytics for standard data and big data
  • manage and prepare data using standard management frameworks for the purpose of transforming, cleaning to ensuring classical characteristic outcomes are achieved
  • examine and deploy data processing tasks in the Hadoop ecosystem for big data

Subject Assessment

  • Written > Case report 1 - (20%) - Individual
  • Written > Project report - (50%) - Individual
  • Written > Technical report - (30%) - Individual

Note that minor variations might occur due to the continuous subject quality improvement process, and in case of minor variation(s) in assessment details, the Subject Outline represents the latest official information.

Availabilities

Cairns Nguma-bada, Study Period 5, Intensive, (Face to Face dates exist for this availability)

Census date:Thursday, 02 May 2024
Study Period Dates:Monday, 15 Apr 2024 to Friday, 14 Jun 2024
Face to face teaching:Monday, 15 Apr 2024 to Friday, 24 May 2024
Coordinator(s):
DR Dianna Hardy
Workload expectations:The student workload for this 3 credit point subject is approximately 130 hours.
  • 26 Hours - Online activity
  • 26 Hours - Online Workshops

JCU Singapore, Study Period 51, Internal

Usually available in even-numbered years.

Census date:Thursday, 07 Mar 2024
Study Period Dates:Thursday, 15 Feb 2024 to Friday, 26 Apr 2024
Coordinator(s):
DR David Donald
Lecturer(s):
DR Eric Tham
Workload expectations:The student workload for this 3 credit point subject is approximately 130 hours.
  • 30 Hours - Lectures
  • 30 Hours - Tutorials

JCU Singapore, Trimester 3, Internal

Usually available in even-numbered years.

Census date:Thursday, 10 Oct 2024
Study Period Dates:Monday, 16 Sep 2024 to Saturday, 14 Dec 2024
Coordinator(s):
DR Dianna Hardy
Lecturer(s):
DR Stanley Loo
Workload expectations:The student workload for this 3 credit point subject is approximately 130 hours.
  • 30 Hours - Lectures
  • 30 Hours - Tutorials

Townsville Bebegu Yumba, Study Period 5, Intensive, (Face to Face dates exist for this availability)

Census date:Thursday, 02 May 2024
Study Period Dates:Monday, 15 Apr 2024 to Friday, 14 Jun 2024
Face to face teaching:Monday, 15 Apr 2024 to Friday, 24 May 2024
Coordinator(s):
DR Dianna Hardy
Workload expectations:The student workload for this 3 credit point subject is approximately 130 hours.
  • 26 Hours - Online activity
  • 26 Hours - Online Workshops