Statistics for linguistic research
Spring 2010
4th year, elective course for students from all departments

Instructor: doc. dr. Damir Cavar
Lectures: 2 hour/week
Seminar: 1 hour/week
ECTS points: 8
Possibility to lecture in the following languages: Croatian, English, German, Polish

Time and location: Office, Relja building, time WED 11-13, THU 10-11
Office hours: see online calendar

This course introduces statistical analysis for text and language processing, covering

  • univariate analysis (descriptive statistics)
  • bivariate analysis (e.g. correlations)
  • multivariate analysis (e.g. regression, cluster analysis, principle component analysis)
All analytical methods will be applied to processing of spoken and textual language data, using programming environments and applications like R, and Microsoft Excel or OpenOffice Spreadsheet. Students will acquire the understanding of the underlying statistical analysis procedures, and also the ability to chose the appropriate analysis methods for specific questions and applications, as well as the ability to use common software tools to perform own analyses of language data.

Text and materials (selection)

  • reading assignments
  • 2 quizzes a 5-10 minutes
  • participation in class
  • project
  • presentation of the project
  • term paper on the basis of the project

  • GET AN EMAIL ACCOUNT. Send me a mail with “Statistics for Linguists” in the subject field and your name in the body of the text, so that I can add you to the mailing list of the course. I will send you announcements, information, links, and the reading in an electronic version.
  • The best way to contact me is via email.
  • Seminar papers which are handed in later will not be given a credit (unless in documented justified cases).
  • You have the option to turn in the seminar paper electronically if you can create a readable pdf file.
  • The make-ups for the quizzes will not be given (it’s not an official exam), unless in documented justified cases. In such cases, you have to contact me beforehand.
  • Materials will be sent via email. On demand, they can also be also available for copying in the copy shop ‘kod referade’, and for reading in the reading room, first floor.
  • Please check your email regularly.

The following factors will be taken into account in the calculation of the final grade:
  • Term paper 35%
  • Exams 20%
  • Participation 25%
  • Presentation of the project 20%

Grading scale
5: 90-100%
4: 80-89%
3: 70-79%
2: 65-69%
1: 0-64%

Academic Misconduct
Academic integrity is the honesty and responsibility in the scholarship. Academic Misconduct is any activity that undermines the academic integrity of the institution, the examples of which are:
  • Using unauthorized materials and help during the exams and for the home assignments.
  • Taking another person as a substitute to take the exam for the student.
  • Submitting substantial portions of the same academic work for credit more than once without permission of the instructor or program to whom the work is being submitted.
  • Falsifying or inventing information or data in an academic exercise including, but not limited to, records or reports, laboratory results, and citation.
  • Plagiarism, i.e., presenting someone else’s work, including the work of other students, as one’s own. Any ideas or materials taken from another source for either written or oral use must be fully acknowledged, unless the information is common knowledge.
  • Students and professors are expected to adhere to the broadly recognized standards of academic conduct. Academic misconduct will be penalized in an appropriate manner. The penalty can range from a lower grade on the affected work to the failure of the entire course. In some cases, the professor may require extra work before the course can be completed.
  • Disclaimer: All information in this syllabus, including course requirements and daily lesson plans, is subject to change and should not be considered a substitute for attending class or for any information that is provided to you by your instructor.

TENTATIVE SCHEDULE (possibly subject to changes)
  1. Intro
  2. -
  3. -
  4. -
  5. -
  6. -
  7. -
  8. -
  9. -
  10. -
  11. -
  12. -
  13. -
  14. -