Datasets
-
Mandarin Chinese Cued Speech Dataset (MCCSD)
This MCCS dataset is the first large-scale Mandarin Chinese Cued Speech dataset. This dataset covers 23 major categories of scenarios (e.g, communication, transportation and shoping) and 72 subcategories of scenarios (e.g, meeting, dating and introduction). It is recorded by four skilled native Mandarn Chinese Cued Speech cuers with portable cameras on the mobile phones. The Cued Speech videos are recorded with 30fps and 1280x720 format. We provide the raw Cued Speech videos, text file (with 1000 sentences) and corresponding annotations which contains two kind of data annotation. -
Towards Automatic Evaluation for LLMs’ Clinical Capabilities: Metric, Data, and Algorithm
This benchmark contains an automatic evaluation paradigm tailored to assess the LLMs’ capabilities in delivering clinical services, e.g., disease diagnosis and treatment. The evaluation paradigm contains three basic elements: metric, data, and algorithm. (1) LLM-specific clinical pathway (LCP) defines the clinical capabilities that a doctor agent should possess. (2) Standardized Patients (SPs) includes 44 medical cases with the complete information. (3) A Retrieval-Augmented Evaluation (RAE) is to simulate the interactive environment between SPs and a doctor agent. -
RJUA-QA: A Comprehensive QA Dataset for Urology
Originating from authentic clinical contexts, RJUA-QA empowers LLMs to generate precise diagnoses and recommendations. It encompasses 2,132 meticulously assembled Question-Context-Answer sets. These instances cover 67 prevalent urological conditions, addressing over 97.6\% of the demands in urological healthcare services. Each record in RJUA-QA includes a patient-like inquiry about symptoms or health conditions, paired with a context rich in expert insights pertinent to medical evaluation and diagnosis.