Data Science Academy

The 3-week Wharton faculty-led program will bring state-of-the-art machine learning and data science tools to high school students. We aim to stimulate curiosity in the fast-moving field of machine learning through this rigorous yet approachable program. Building up statistical foundations together with empirical and critical thinking skills will be the main theme throughout. 

Overview

Wharton Data Science Academy (WDS), launched in 2020, is an intensive pre-college program designed for curious, high-achieving students from around the world. The curriculum mirrors the rigor of upper-level Wharton undergraduate courses, while being taught in a way that is accessible to motivated high school learners. We start from the foundations, including data wrangling and visualization, fundamental probability and statistics (distributions, confidence intervals, hypothesis testing), and core modeling (simple/multiple regression, classification, model assessment).

From there, we build steadily toward modern machine learning and artificial intelligence (AI). The curriculum has been evolving; we update modules with state-of-the-art topics such as neural networks and large language models (LLMs). Throughout, we emphasize not only technical skills, but also the responsible use of data, attention to bias and fairness, and the ability to communicate results clearly Coursework is hands-on, using R as the primary language, with selective use of Python (e.g., Colab) for deep learning and LLM. 

All participants who complete the program will earn a Wharton Global Youth Certificate of Completion.

What students can expect:

  • Wharton Professors who are data science experts will lead the lectures and will also be available to students outside of class. 
  • Wharton upper-level undergraduate rigor, delivered accessibly. High expectations paired with scaffolding: short lectures, case studies, and tailored support, so prepared high school students can thrive. 
  • Step-by-step hands-on learning: pre-work, live demos, guided labs, and daily hands-on project with real-world datasets. 
  • Capstone project + Data Science Live (DSL) showcase: identify a high-impact real-world challenge, build a data-driven solution, and present results to peers/mentors at the end of the program. 
  • Close mentorship from Penn undergraduate and graduate TAs through recitations, office hours, and project coaching. 
  • Enrichment sessions (guest speakers, college/admissions insights, student panels) connecting data skills to academic and career paths. 
  • An evolving curriculum: new cases, datasets, and techniques are added each year, keeping the content up-to-date with a strong foundation in the fundamentals of data science. 

 

 

Details

Academic classes are held Monday-Friday with extracurricular activities available in the evenings and on the weekends. Students move in on Sunday pre-program, and move out the final Saturday of the program. For more information on campus life, visit our residential experience page.

While each day varies by module, a typical rhythm includes: 

  • Short, focused lectures on core methods or applications 
  • Guided labs and case discussions using live notebooks 
  • Team project time with structured checkpoints 
  • TA-led recitations/office hours for deeper practice 
  • Guest talks or special workshops (e.g., responsible AI, careers in data) 

Session topics may include: 

  • Acquiring, preparing, exploring, understanding, and visualizing data (R/RStudio/RMarkdown; dplyr/ggplot/plotly) 
  • Foundations of probability and statistics (LLN/CLT, intervals, tests) 
  • Model-based learning (regression/classification, cross-validation, LASSO), trees & ensembles 
  • Text analytics (vectorization/embeddings, n-grams) and model evaluation/communication 
  • Modern AI: introductions to neural networks and LLMs (transformers, attention, embeddings), practical prompting/fine-tuning concepts, and responsible AI
    Depth and emphasis evolve over time as new methods mature and are gradually incorporated. 

In the evenings, students can join organized activities, meet with TAs, push their capstone forward, or relax with their cohort. Some days may deviate from the usual rhythm for site visits or special simulations. 

Eligibility

Eligibility

High school students currently enrolled in grades 10-11 with a strong background in math and coding, and interest in data analytics. Previous understanding of statistics is preferred. Students must be open to the challenge of a rigorous curriculum similar to that of an intermediate Wharton undergraduate course. International applicants are welcome.

Admission

Admission to the Data Science Academy is selective. Wharton will select approximately 75 students to participate in the Academy. Selections are based on a record of academic excellence and a demonstrated background in mathematics and/or statistics. Interested students are strongly encouraged to submit an application by the priority deadline.

Please note that participation in the Data Science Academy does not guarantee admission into Penn.

Instructional Team

Program Leader: Linda Zhao

Linda Zhao is a full professor of statistics in the Wharton School. As an expert in machine learning, she has been teaching a modern data mining course to undergraduate, MBA, Master, and Ph.D. students throughout the entire Penn campus. Students comment that her data mining course is one of the most fun and useful courses offered at Penn. In addition to teaching regular Wharton students, Linda served as a co-director of the Wharton–SAC (Securities Association of China) executive program, which she successfully ran and taught. By teaching various levels of students, Linda is able to design and deliver state-of-the art machine learning skills to students from all different backgrounds. A fellow of the IMS, a leading international statistics organization, Linda has been actively engaged in her academic career. Her specialty falls in modern machine learning methods, replicability crisis in science, high dimensional data, housing price prediction, and Bayesian methods. Her work has won the NSF support for over 20 years. She is also an avid ballroom dancer and she loves to travel around the world. 

Teaching Assistants

Teaching Assistants consist of both undergraduate and graduate students from the University of Pennsylvania. TAs facilitate small-group discussions, lead small-group lab work, ensure student understanding, assist with final project development, and hold office hours to answer student questions.

“My favorite part of the Data Science Academy was the final group project where my team and I were able to put our statistical learning skills to the test with a completely new set of data!” - Ramya S., California, USA