Case Western Reserve UniversityWebsiteAcademic Catalog
Computer and Data SciencesDepartment Website
BS Degree in Data Science and Analyticssource 1source 2ABET
CS Courses
- Programming in JavaCSDS 132 (3)introCSDS 132: Programming in Java
An in-depth survey of modern programming language features, computer programming and algorithmic problem solving with an emphasis on the Java language. Computers and code compilation; conditional statements, subprograms, loops, methods; object-oriented design, inheritance and polymorphism, abstract classes and interfaces; types, type systems, generic types, abstract data types, strings, arrays, linked lists; software development, modular code design, unit testing; strings, text and file I/O; GUI components, GUI event handling; threads; comparison of Java to C, C++, and C#. Offered as CSDS 132 and ECSE 132.
- Introduction to Data Science and Engineering for MajorsCSDS 133 (3)introCSDS 133: Introduction to Data Science and Engineering for Majors
This course is an introduction to data science and analytics. In the first half of the course, students will develop a basic understanding of how to manipulate, analyze and visualize large data in a distributed computing environment, with an appreciation of open source development, security and privacy issues. Case studies and team project assignments in the second half of the course will be used to implement the ideas. Topics covered will include: Overview of large scale parallel and distributed (cloud) computing; file systems and file i/o; open source coding and distributed versioning, data query and retrieval; basic data analysis; visualization; data security, privacy and provenance.
- Introduction to Data StructuresCSDS 233 (4)introCSDS 233: Introduction to Data Structures
Different representations of data: lists, stacks and queues, trees, graphs, and files. Manipulation of data: searching and sorting, hashing, recursion and higher order functions. Abstract data types, templating, and the separation of interface and implementation. Introduction to asymptotic analysis. The Java language is used to illustrate the concepts and as an implementation vehicle throughout the course. Offered as CSDS 233 and ECSE 233.
- Structured and Unstructured DataCSDS 234 (3)introCSDS 234: Structured and Unstructured Data
This course is an introduction to types of data and their representation, storage, processing and analysis. The course has three parts. In the first part of the course, students will develop a basic understanding and the ability to represent, store, process and analyze structured data. Structured data include catalogs, records, tables, logs, etc., with a fixed dimension and well-defined meaning for each data point. Suitable representation and storage mechanisms include lists and arrays. Relevant techniques include keys, hashes, stacks, queues and trees. In the second part of the course, students will develop a basic understanding and the ability to represent, store, process and analyze semi-structured data. Semi-structured data include texts, web pages and networks, without a dimension and structure, but with well-defined meaning for each data point. Suitable representation and storage mechanisms include trees, graphs and RDF triples. Relevant techniques include XML, YAML, JSON, parsing, annotation, language processing. In the third part of the course, students will develop a basic understanding and the ability to represent, store, process and analyze unstructured data. Unstructured data include images, video, and time series data, without neither a fixed dimension and structure, nor well-defined meaning for individual data points. Suitable representation and storage mechanisms include large matrices, EDF, DICOM. Relevant techniques include feature extraction, segmentation, clustering, rendering, indexing, and visualization.
- pick 3
Signals and SystemsECSE 246 (4)mathECSE 246: Signals and SystemsMathematical representation, characterization, and analysis of continuous-time signals and systems. Development of elementary mathematical models of continuous-time dynamic systems. Time domain and frequency domain analysis of linear time-invariant systems. Fourier series, Fourier transforms, and Laplace transforms. Sampling theorem. Filter design. Introduction to feedback control systems and feedback controller design.
Software CraftsmanshipCSDS 293 (4)softengCSDS 293: Software CraftsmanshipA course to improve programming skills, software quality, and the software development process. Software design; Version control; Control issues and routines; Pseudo-code programming process and developer testing; Defensive programming; Classes; Debugging; Self-documenting code; Refactoring.
Files, Indexes and Access Structures for Big DataCSDS 305 (3)sysCSDS 305: Files, Indexes and Access Structures for Big DataDatabase management become a central component of a modern computing environment, and, as a result, knowledge about database systems has become an essential part of education in computer science and data science. This course is an introduction to the nature and purpose of database systems, fundamental concepts for designing, implementing and querying a database and database architectures. Objectives: -An expert knowledge of basic data structures, basic searching, sorting, methods, algorithm techniques, (such as greedy and divide and conquer) -In-depth knowledge on Search and Index Structures for large, heterogeneous data including multidimensional data, high dimensional data and data in metric spaces (e.g., sequences, images), on different search methods (e.g. similarity searching, partial match, exact match), and on dimensionality reduction techniques.
Signal ProcessingECSE 313 (3)mathECSE 313: Signal ProcessingFourier series and transforms. Analog and digital filters. Fast-Fourier transforms, sampling, and modulation for discrete time signals and systems. Consideration of stochastic signals and linear processing of stochastic signals using correlation functions and spectral analysis. The course will incorporate the use of Grand Challenges in the areas of Energy Systems, Control Systems, and Data Analytics in order to provide a framework for problems to study in the development and application of the concepts and tools studied in the course. Various aspects of important engineering skills relating to leadership, teaming, emotional intelligence, and effective communication are integrated into the course.
Data Mining for Big DataCSDS 335 (3)aiCSDS 335: Data Mining for Big DataWith the unprecedented rate at which data is being collected today in almost all fields of human endeavor, there is an emerging economic and scientific need to extract useful information from it. Data mining is the process of automatic discovery of patterns, changes, associations and anomalies in massive databases, and is a highly interdisciplinary field representing the confluence of several disciplines, including database systems, data warehousing, machine learning, statistics, algorithms, data visualization, and high-performance computing. This course is an introduction to the commonly used data mining techniques.
Intro to Operating Systems and Concurrent ProgrammingCSDS 338 (4)sysCSDS 338: Intro to Operating Systems and Concurrent ProgrammingIntro to OS: OS Structures, processes, threads, CPU scheduling, deadlocks, memory management, file system implementations, virtual machines, cloud computing. Concurrent programming: fork, join, concurrent statement, critical section problem, safety and liveness properties of concurrent programs, process synchronization algorithms, semaphores, monitors. UNIX systems programming: system calls, UNIX System V IPCs, threads, RPCs, shell programming. Offered as CSDS 338, ECSE, 338, CSDS 338N and ECSE 338N.
Introduction to Machine LearningCSDS 340 (3)aiCSDS 340: Introduction to Machine LearningMachine learning is a sub-field of Artificial Intelligence that is concerned with the design and analysis of algorithms that 'learn' and improve with experience. This course is an introduction to algorithms for machine learning and their implementation in the context of big data.
Computer SecurityCSDS 344 (3)sysCSDS 344: Computer SecurityGeneral types of security attacks; approaches to prevention; secret key and public key cryptography; message authentication and hash functions; digital signatures and authentication protocols; information gathering; password cracking; spoofing; session hijacking; denial of service attacks; buffer overruns; viruses, worms, etc., principles of secure software design, threat modeling; access control; least privilege; storing secrets; socket security; firewalls; intrusions; auditing; mobile security.
Engineering OptimizationECSE 346 (3)mathECSE 346: Engineering OptimizationOptimization techniques including linear programming and extensions; transportation and assignment problems; network flow optimization; quadratic, integer, and separable programming; geometric programming; and dynamic programming. Nonlinear optimization topics: optimality criteria, gradient and other practical unconstrained and constrained methods. Computer applications using engineering and business case studies. The course will incorporate the use of Grand Challenges in the areas of Energy Systems, Control Systems, and Data Analytics in order to provide a framework for problems to study in the development and application of the concepts and tools studied in the course. Various aspects of important engineering skills relating to leadership, teaming, emotional intelligence, and effective communication are integrated into the course.
Data PrivacyCSDS 356 (3)impactCSDS 356: Data PrivacyIntroduction to privacy, economics and incentives, crypto-based solution for privacy, hiding data from the database user, hiding access patterns from the database owner, anonymous routing and TOR, privacy in online social networks, privacy in cellular and Wi-Fi networks, location privacy, privacy in e-cash systems, privacy in e-voting, genomic privacy.
Advanced Game Development ProjectCSDS 390 (3)capstoneCSDS 390: Advanced Game Development ProjectThis game development project course will bring together an inter-professional group of students in the fields of engineering, computer science, and art to focus on the design and development of a complete, fully functioning computer game as an interdisciplinary team.
Software EngineeringCSDS 393 (3)softengCSDS 393: Software EngineeringTopics: Introduction to software engineering; software lifecycle models; development team organization and project management; requirements analysis and specification techniques; software design techniques; programming practices; software validation techniques; software maintenance practices; software engineering ethics. Undergraduates work in teams to complete a significant software development project. Graduate students are required to complete a research project.
Convex Optimization for EngineeringECSE 416 (3)mathECSE 416: Convex Optimization for EngineeringThis course will focus on the development of a working knowledge and skills to recognize, formulate, and solve convex optimization problems that are so prevalent in engineering. Applications in control systems; parameter and state estimation; signal processing; communications and networks; circuit design; data modeling and analysis; data mining including clustering and classification; and combinatorial and global optimization will be highlighted. New reliable and efficient methods, particular those based on interior-point methods and other special methods to solve convex optimization problems will be emphasized. Implementation issues will also be underscored.
Data MiningCSDS 435 (3)aiCSDS 435: Data MiningData Mining is the process of discovering interesting knowledge from large amounts of data stored either in databases, data warehouses, or other information repositories. Topics to be covered includes: Data Warehouse and OLAP technology for data mining, Data Preprocessing, Data Mining Primitives, Languages, and System Architectures, Mining Association Rules from Large Databases, Classification and Prediction, Cluster Analysis, Mining Complex Types of Data, and Applications and Trends in Data Mining.
Causal Learning from DataCSDS 442 (3)aiCSDS 442: Causal Learning from DataThis course introduces key concepts and techniques for characterizing, from observational or experimental study data and from background information, the causal effect of a specific treatment, exposure, or intervention (e.g., a medical treatment) upon an outcome of interest (e.g., disease status). The fundamental problem of causal inference is the impossibility of observing the effects of different and incompatible treatments on the same individual or unit. This problem is overcome by estimating an average causal effect over a study population. Making valid causal inferences with observational data is especially challenging, because of the greater potential for biases (confounding bias, selection bias, and measurement bias) that can badly distort causal effect estimates. Consequently, this topic has been the focus of intense cross-disciplinary research in recent years. Causal inference techniques will be illustrated by applications in several fields such as computer science, engineering, medicine, public health, biology, genomics, neuroscience, economics, and social science. Course grading will be based on quizzes, homeworks, a class presentation, and a causal data analysis project. Specific topics: treatments, exposures, and interventions; causal effects and causal effect measures; confounding bias; potential outcomes and counterfactuals; randomized experiments; observational studies; causal directed acyclic graphs (DAGs); exchangeability and conditional exchangeability; effect modification; causal interactions; nonparametric structural equations; Pearl's Back-Door Criterion, Front-Door Criterion, and related results; covariate adjustment; matching on covariates; selection bias; measurement bias; instrumental variables; causal modeling; inverse probability weighting; marginal structural models; standardization; structural nested models; outcome regression; propensity scores; sensitivity analysis.
Advanced AlgorithmsCSDS 477 (3)algsCSDS 477: Advanced AlgorithmsDesign and analysis of efficient algorithms, with emphasis on network flow, combinatorial optimization, and randomized algorithms. Linear programming: duality, complementary slackness, total unimodularity. Minimum cost flow: optimality conditions, algorithms, applications. Game theory: two-person zero-sum games, minimax theorems. Probabilistic analysis and randomized algorithms: examples and lower bounds. Approximation algorithms for NP-hard problems: examples, randomized rounding of linear programs.
Artificial Intelligence: Probabilistic Graphical ModelsCSDS 491 (3)aiCSDS 491: Artificial Intelligence: Probabilistic Graphical ModelsThis course is a graduate-level introduction to Artificial Intelligence (AI), the discipline of designing intelligent systems, and focuses on probabilistic graphical models. These models can be applied to a wide variety of settings from data analysis to machine learning to robotics. The models allow intelligent systems to represent uncertainties in an environment or problem space in a compact way and reason intelligently in a way that makes optimal use of available information and time. The course covers directed and undirected probabilistic graphical models, latent variable models, associated exact and approximate inference algorithms, and learning in both discrete and continuous problem spaces. Practical applications are covered throughout the course.
Introduction to Linear Algebra for ApplicationsMATH 201 (3)mathMATH 201: Introduction to Linear Algebra for ApplicationsMatrix operations, systems of linear equations, vector spaces, subspaces, bases and linear independence, eigenvalues and eigenvectors, diagonalization of matrices, linear transformations, determinants. Less theoretical than MATH 307. Appropriate for majors in science, engineering, economics.
Statistical Theory with Application ISTAT 243 (3)mathSTAT 243: Statistical Theory with Application IIntroduction to fundamental concepts of statistics through examples including design of an observational study, industrial simulation. Theoretical development motivated by sample survey methodology. Randomness, distribution functions, conditional probabilities. Derivation of common discrete distributions. Expectation operator. Statistics as random variables, point and interval estimation. Maximum likelihood estimators. Properties of estimators.
Statistical Theory with Application IISTAT 244 (3)mathSTAT 244: Statistical Theory with Application IIExtension of inferences to continuous-valued random variables. Common continuous-valued distributions. Expectation operator. Maximum likelihood estimators for the continuous case. Simple linear, multiple and polynomial regression. Properties of regression estimators when errors are Gaussian. Regression diagnostics. Class or student projects gathering real data or generating simulated data, fitting models and analyzing residuals from fit.
Linear AlgebraMATH 307 (3)mathMATH 307: Linear AlgebraA course in linear algebra that studies the fundamentals of vector spaces, inner product spaces, and linear transformations on an axiomatic basis. Topics include: solutions of linear systems, matrix algebra over the real and complex numbers, linear independence, bases and dimension, eigenvalues and eigenvectors, singular value decomposition, and determinants. Other topics may include least squares, general inner product and normed spaces, orthogonal projections, finite dimensional spectral theorem. This course is required of all students majoring in mathematics and applied mathematics. More theoretical than MATH 201.
Convexity and OptimizationAny STAT 300 level or above courseMATH 327 (3)mathMATH 327: Convexity and OptimizationIntroduction to the theory of convex sets and functions and to the extremes in problems in areas of mathematics where convexity plays a role. Among the topics discussed are basic properties of convex sets (extreme points, facial structure of polytopes), separation theorems, duality and polars, properties of convex functions, minima and maxima of convex functions over convex set, various optimization problems. Offered as MATH 327, MATH 427, and OPRE 427.
- Discrete MathematicsCSDS 302 (3)mathCSDS 302: Discrete Mathematics
A general introduction to basic mathematical terminology and the techniques of abstract mathematics in the context of discrete mathematics. Topics introduced are mathematical reasoning, Boolean connectives, deduction, mathematical induction, sets, functions and relations, algorithms, graphs, combinatorial reasoning. Offered as CSDS 302, ECSE 302 and MATH 304.
- AlgorithmsCSDS 310 (3)algsCSDS 310: Algorithms
The course covers fundamentals in algorithm design and analysis and provides practice in professional algorithm writing and presentations. Loop invariants, asymptotic notation, recurrence relations, sorting algorithms, divide-and-conquer, dynamic programming, greedy algorithms, basic graph algorithms. Offered as CSDS 310 and CSDS 310N. Counts as a Disciplinary Communication course.
- Introduction to Data Science SystemsCSDS 312 (3)sysCSDS 312: Introduction to Data Science Systems
An introduction to the software and hardware architecture of data science systems, with an emphasis on Operating Systems and Computer Architecture that are relevant to Data Sciences systems. At the end of the course, the student should understand the principles and architecture of storage systems, file systems (especially, HDFS), memory hierarchy, and GPU. The student should have carried out projects in these areas, and should be able to critically compare various design decisions in terms of capability and performance.
- Introduction to Data AnalysisCSDS 313 (3)mathCSDS 313: Introduction to Data Analysis
This course provides a conceptual and hands-on introduction to reasoning with data. Introduction of basic statistical concepts; models vs. observations, common distributions, parameters vs. statistics, statistical inference, hypothesis testing, multiple hypotheses, confidence intervals. Use of computational approaches to address statistical problems; data representation, empirical assessment of statistical significance, assessment of the association between variables, dimensionality reduction, model building, evaluation, and validation. Data visualization and accessibility/interpretability of patterns in data and predictive models. Computational thinking and critical approaches in data science; common mistakes and issues in data analysis, causality vs. correlation, confounders, statistical artifacts, Simpson's paradox, base rate fallacy, stage migration, survivorship bias, censoring, misleading visualization. Offered as CSDS 313 and CSDS 413.
- Introduction to Database SystemsCSDS 341 (3)sysCSDS 341: Introduction to Database Systems
Relational model, ER model, relational algebra and calculus, SQL, OBE, security, views, files and physical database structures, query processing and query optimization, normalization theory, concurrency control, object relational systems, multimedia databases, Oracle SQL server, Microsoft SQL server.
- Computer SecurityorCSDS 344 (3)sysCSDS 344: Computer Security
General types of security attacks; approaches to prevention; secret key and public key cryptography; message authentication and hash functions; digital signatures and authentication protocols; information gathering; password cracking; spoofing; session hijacking; denial of service attacks; buffer overruns; viruses, worms, etc., principles of secure software design, threat modeling; access control; least privilege; storing secrets; socket security; firewalls; intrusions; auditing; mobile security.
Data PrivacyCSDS 356 (3)impactCSDS 356: Data PrivacyIntroduction to privacy, economics and incentives, crypto-based solution for privacy, hiding data from the database user, hiding access patterns from the database owner, anonymous routing and TOR, privacy in online social networks, privacy in cellular and Wi-Fi networks, location privacy, privacy in e-cash systems, privacy in e-voting, genomic privacy.
- Senior Project in Data ScienceCSDS 398 (4)capstoneCSDS 398: Senior Project in Data Science
Capstone course for data science seniors. Material from previous and concurrent courses used to apply tools of the data science lifecycle to practical applications. Professional engineering topics such as project management, engineering design, communications, and professional ethics. Requirements include periodic reporting of progress, plus a final oral presentation and written report. Scheduled formal project presentations during the last week of classes. Counts as a SAGES Senior Capstone course.
- pick 2
Introduction to BioinformaticsCSDS 458 (3)aiCSDS 458: Introduction to BioinformaticsFundamental algorithmic and statistical methods in computational molecular biology and bioinformatics will be discussed. Topics include introduction to molecular biology and genetics, DNA sequence analysis, polymorphisms and personal genomics, structural variation analysis, gene mapping and haplotyping algorithms, phylogenetic analysis, biological network analysis, and computational drug discovery. Much of the course will focus on the algorithmic techniques, including but not limited to, dynamic programming, hidden Markov models, string algorithms, graph theories and algorithms, and some representative data mining algorithms. Paper presentations and course projects are also required.
Bioinformatics for Systems BiologyCSDS 459 (3)aiCSDS 459: Bioinformatics for Systems BiologyDescription of omic data (biological sequences, gene expression, protein-protein interactions, protein-DNA interactions, protein expression, metabolomics, biological ontologies), regulatory network inference, topology of regulatorn networks, computational inference of protein-protein interactions, protein interaction databases, topology of protein interaction networks, module and protein complex discovery, network alignment and mining, computational models for network evolution, network-based functional inference, metabolic pathway databases, topology of metabolic pathways, flux models for analysis of metabolic networks, network integration, inference of domain-domain interactions, signaling pathway inference from protein interaction networks, network models and algorithms for disease gene identification, identification of dysregulated subnetworks network-based disease classification.
Survey of Bioinformatics: Technologies in BioinformaticsBIOL 311A (1)sciBIOL 311A: Survey of Bioinformatics: Technologies in BioinformaticsSYBB 311A/411A is a 5-week course that introduces students to the high-throughput technologies used to collect data for bioinformatics research in the fields of genomics, proteomics, and metabolomics. In particular, we will focus on mass spectrometer-based proteomics, DNA and RNA sequencing, genotyping, protein microarrays, and mass spectrometry-based metabolomics. This is a lecture-based course that relies heavily on out-of-class readings. Graduate students will be expected to write a report and give an oral presentation at the end of the course. SYBB 311A/411A is part of the SYBB survey series which is composed of the following course sequence: (1) Technologies in Bioinformatics, (2) Data Integration in Bioinformatics, (3) Translational Bioinformatics, and (4) Programming for Bioinformatics. Each standalone section of this course series introduces students to an aspect of a bioinformatics project - from data collection (SYBB 311A/411A), to data integration (SYBB 311B/411B), to research applications (SYBB 311C/411C), with a fourth module (SYBB 311D/411D) introducing basic programming skills. Graduate students have the option of enrolling in all four courses or choosing the individual modules most relevant to their background and goals with the exception of SYBB 411D, which must be taken with SYBB 411A.
Survey of Bioinformatics: Data Integration in BioinformaticsBIOL 311B (1)sciBIOL 311B: Survey of Bioinformatics: Data Integration in BioinformaticsSYBB 311B/411B is a five week course that surveys the conceptual models and tools used to analyze and interpret data collected by high-throughput technologies, providing an entry points for students new to the field of bioinformatics. The knowledge structures that we will cover include: biomedical ontologies, signaling pathways, and interaction networks. We will also cover tools for genome exploration and analysis. The SYBB survey series is composed of the following course sequence: (1) Technologies in Bioinformatics, (2) Data Integration in Bioinformatics, (3) Translational Bioinformatics, and (4) Programming for Bioinformatics. Each standalone section of this course series introduces students to an aspect of a bioinformatics project - from data collection (SYBB 311A/411A), to data integration (SYBB 311B/411B), to research applications (SYBB 311C/411C), with a fourth module (SYBB 311D/411D) introducing basic programming. Graduate students have the option of enrolling in all four courses or choosing the individual modules most relevant to their background and goals with the exception of SYBB 411D, which must be taken with SYBB 411A.
Survey of Bioinformatics: Translational BioinformaticsBIOL 311C (1)sciBIOL 311C: Survey of Bioinformatics: Translational BioinformaticsSYBB 311C/411C is a longitudinal course that introduces students to the latest applications of bioinformatics, with a focus on translational research. Topics include: `omic drug discovery, pharmacogenomics, microbiome analysis, and genomic medicine. The focus of this course is on illustrating how bioinformatic technologies can be paired with data integration tools for various applications in medicine. The course is organized as a weekly journal club, with instructors leading the discussion of recent literature in the field of bioinformatics. Students will be expected to complete readings beforehand; students will also work in teams to write weekly reports reviewing journal articles in the field. The SYBB survey series is composed of the following course sequence: (1) Technologies in Bioinformatics, (2) Data Integration in Bioinformatics, (3) Translational Bioinformatics, and (4) Programming for Bioinformatics. Each standalone section of this course series introduces students to an aspect of a bioinformatics project - from data collection (SYBB 311A/411A), to data integration (SYBB 311B/411B), to research applications (SYBB 311C/411C), with a fourth module (SYBB 311D/411D) introducing basic programming. Graduate students have the option of enrolling in all four courses or choosing the individual modules most relevant to their background and goals with the exception of SYBB 411D, which must be taken with SYBB 411A.
Applied Probability and Stochastic Processes for BiologyBIOL 319 (3)sciBIOL 319: Applied Probability and Stochastic Processes for BiologyApplications of probability and stochastic processes to biological systems. Mathematical topics will include: introduction to discrete and continuous probability spaces (including numerical generation of pseudo random samples from specified probability distributions), Markov processes in discrete and continuous time with discrete and continuous sample spaces, point processes including homogeneous and inhomogeneous Poisson processes and Markov chains on graphs, and diffusion processes including Brownian motion and the Ornstein-Uhlenbeck process. Biological topics will be determined by the interests of the students and the instructor. Likely topics include: stochastic ion channels, molecular motors and stochastic ratchets, actin and tubulin polymerization, random walk models for neural spike trains, bacterial chemotaxis, signaling and genetic regulatory networks, and stochastic predator-prey dynamics. The emphasis will be on practical simulation and analysis of stochastic phenomena in biological systems. Numerical methods will be developed using a combination of MATLAB, the R statistical package, MCell, and/or URDME, at the discretion of the instructor. Student projects will comprise a major part of the course.
Cognition and ComputationDSCI 330 (3)aiDSCI 330: Cognition and ComputationAn introduction to (1) theories of the relationship between cognition and computation; (2) computational models of human cognition (e.g. models of decision-making or concept creation); and (3) computational tools for the study of human cognition. All three dimensions involve data science: theories are tested against archives of brain imaging data; models are derived from and tested against datasets of e.g., financial decisions (markets), legal rulings and findings (juries, judges, courts), legislative actions, and healthcare decisions; computational tools aggregate data and operate upon it analytically, for search, recognition, tagging, machine learning, statistical description, and hypothesis testing.
Exploratory Data ScienceDSCI 351 (3)aiDSCI 351: Exploratory Data ScienceIn this course, we will learn data science and analysis approaches to identify statistically significance relationships and better model and predict the behavior of these systems. We will assemble and explore real-world datasets, perform clustering and pair plot analyses to investigate correlations, and logistic regression will be employed to develop associated predictive models. Results will be interpreted, visualized and discussed. We will introduce basic elements of statistical analysis using R Project open source software for exploratory data analysis and model development. R is an open-source software project with broad abilities to access machine-readable open-data resources, data cleaning and munging functions, and a rich selection of statistical packages, used for data analytics, model development and prediction. This will include an introduction to R data types, reading and writing data, looping, plotting and regular expressions, so that one can start performing variable transformations for linear fitting and developing structural equation models, while exploring for statistically significant relationships.
Math/Stat Courses
- Calculus for Science and Engineering IMATH 121 (4)mathMATH 121: Calculus for Science and Engineering I
Functions, analytic geometry of lines and polynomials, limits, derivatives of algebraic and trigonometric functions. Definite integral, antiderivatives, fundamental theorem of calculus, change of variables.
- Calculus for Science and Engineering IIorMATH 122 (4)mathMATH 122: Calculus for Science and Engineering II
Continuation of MATH 121. Exponentials and logarithms, growth and decay, inverse trigonometric functions, related rates, basic techniques of integration, area and volume, polar coordinates, parametric equations. Taylor polynomials and Taylor's theorem.
- Calculus for Science and Engineering IIIorMATH 223 (3)mathMATH 223: Calculus for Science and Engineering III
Introduction to vector algebra; lines and planes. Functions of several variables: partial derivatives, gradients, chain rule, directional derivative, maxima/minima. Multiple integrals, cylindrical and spherical coordinates. Derivatives of vector valued functions, velocity and acceleration. Vector fields, line integrals, Green's theorem.
- or
- Statistical Theory with Application IorSTAT 243 (3)mathSTAT 243: Statistical Theory with Application I
Introduction to fundamental concepts of statistics through examples including design of an observational study, industrial simulation. Theoretical development motivated by sample survey methodology. Randomness, distribution functions, conditional probabilities. Derivation of common discrete distributions. Expectation operator. Statistics as random variables, point and interval estimation. Maximum likelihood estimators. Properties of estimators.
Basic Statistics for Engineering and ScienceSTAT 312 (3)mathSTAT 312: Basic Statistics for Engineering and ScienceFor advanced undergraduate students in engineering, physical sciences, life sciences. Comprehensive introduction to probability models and statistical methods of analyzing data with the object of formulating statistical models and choosing appropriate methods for inference from experimental and observational data and for testing the model's validity. Balanced approach with equal emphasis on probability, fundamental concepts of statistics, point and interval estimation, hypothesis testing, analysis of variance, design of experiments, and regression modeling.
- Statistical Theory with Application IIorSTAT 244 (3)mathSTAT 244: Statistical Theory with Application II
Extension of inferences to continuous-valued random variables. Common continuous-valued distributions. Expectation operator. Maximum likelihood estimators for the continuous case. Simple linear, multiple and polynomial regression. Properties of regression estimators when errors are Gaussian. Regression diagnostics. Class or student projects gathering real data or generating simulated data, fitting models and analyzing residuals from fit.
Data Analysis and Linear ModelsSTAT 325 (3)mathSTAT 325: Data Analysis and Linear ModelsBasic exploratory data analysis for univariate response with single or multiple covariates. Graphical methods and data summarization, model-fitting using S-plus computing language. Linear and multiple regression. Emphasis on model selection criteria, on diagnostics to assess goodness of fit and interpretation. Techniques include transformation, smoothing, median polish, robust/resistant methods. Case studies and analysis of individual data sets. Notes of caution and some methods for handling bad data. Knowledge of regression is helpful.
- Introduction to ProbabilityMATH 380 (3)mathMATH 380: Introduction to Probability
Combinatorial analysis. Permutations and combinations. Axioms of probability. Sample space and events. Equally likely outcomes. Conditional probability. Bayes' formula. Independent events and trials. Discrete random variables, probability mass functions. Expected value, variance. Bernoulli, binomial, Poisson, geometric, negative binomial random variables. Continuous random variables, density functions. Expected value and variance. Uniform, normal, exponential, Gamma random variables. The De Moivre-Laplace limit theorem. Joint probability mass functions and densities. Independent random variables and the distribution of their sums. Covariance. Conditional expectations and distributions (discrete case). Moment generating functions. Law of large numbers. Central limit theorem. Additional topics (time permitting): the Poisson process, finite state space Markov chains, entropy.
Engineering Courses
- Impact of Engineering on SocietyENGR 399 (3)impactENGR 399: Impact of Engineering on Society
As engineers, we design and implement technical solutions with the goal of improving people's lives, locally and globally. However, the technical solutions can have disparate impacts, in that they are beneficial to some people but less beneficial, or even detrimental, to others. What are our ethical and professional responsibilities to understand, consider, and perhaps address the disparate impacts of our work on the affected local and/or global populations?
Science Courses
- Principles of Chemistry for EngineersCHEM 111 (4)sciCHEM 111: Principles of Chemistry for Engineers
A first course in university chemistry emphasizing chemistry of materials for engineering students. Atomic theory and quantitative relationships; gas laws and kinetic theory; solutions, acid-base properties and pH; thermodynamics and equilibrium; kinetics, catalysis, and mechanisms; molecular structure and bonding.
- General Physics I - MechanicsorPHYS 121 (4)sciPHYS 121: General Physics I - Mechanics
Particle dynamics, Newton's laws of motion, energy and momentum conservation, rotational motion, and angular momentum conservation. This course has a laboratory component. Recommended preparation: MATH 121 or MATH 123 or MATH 125 or one year of high school calculus. Students who do not have the appropriate background should not enroll in PHYS 121 without first consulting the instructor. Students may earn credit for only one of the following courses: PHYS 115, PHYS 121, PHYS 123.
Physics and Frontiers I - MechanicsPHYS 123 (4)sciPHYS 123: Physics and Frontiers I - MechanicsThe Newtonian dynamics of a particle and of rigid bodies. Energy, momentum, and angular momentum conservation with applications. A selection of special frontier topics as time permits, including fractals and chaos, special relativity, fluid mechanics, cosmology, quantum mechanics. This course has a laboratory component. Admission to this course is by invitation only. Students may earn credit for only one of the following courses: PHYS 115, PHYS 121, PHYS 123.
- General Physics II - Electricity and MagnetismorPHYS 122 (4)sciPHYS 122: General Physics II - Electricity and Magnetism
Electricity and magnetism, emphasizing the basic electromagnetic laws of Gauss, Ampere, and Faraday. Maxwell's equations and electromagnetic waves, interference, and diffraction. This course has a laboratory component.
Physics and Frontiers II - Electricity and MagnetismPHYS 124 (4)sciPHYS 124: Physics and Frontiers II - Electricity and MagnetismTime-independent and time-dependent electric and magnetic fields. The laws of Coulomb, Gauss, Ampere, and Faraday. Microscopic approach to dielectric and magnetic materials. Introduction to the usage of vector calculus; Maxwell's equations in integral and differential form. The role of special relativity in electromagnetism. Electromagnetic radiation. This course has a laboratory component.
Other Courses
- 4 × Free Elective
Program Educational Objectives
Graduates from the Data Science and Analytics Bachelor of Science program will be prepared to:
- Analyze real-world problems and create data-driven solutions based on the fundamentals of data science and computing.
- Work effectively, professionally, and ethically.
- Assume positions of leadership in industry, academia, public service, and entrepreneurship.
- Successfully progress in advanced degree programs in data science, computing, and related fields.
Learning Outcomes
- Students analyze a complex computing problem and to apply principles of computing and other relevant disciplines to identify solutions.
- Students design, implement, and evaluate a computing-based solution to meet a given set of computing requirements in the context of the program’s discipline.
- Students communicate effectively in a variety of professional contexts.
- Students recognize professional responsibilities and make informed judgments in computing practice based on legal and ethical principles.
- Students function effectively as a member or leader of a team engaged in activities appropriate to the program's discipline.
- Students apply theory, techniques, and tools throughout the data analysis life cycle and employ the resulting knowledge to satisfy stakeholders' needs.