Starrydata Datasets

Per-project CSV splits of the Starrydata2 dataset, refreshed daily.

DB snapshot: — Files generated: —

Dataset on GitHubstarrydata/starrydata_datasets

Star

Loading…

Dataset Repositories

Repository	Description	Update Schedule	Period
Google Drive	Latest dataset only	Twice daily at 00:00 and 12:00	from 2024/06/13
Figshare	Past datasets	Daily until 2024/06/06, then monthly	from 2022/12/22
Github	Past datasets	As needed	from 2019/7/11 until 2022/12/22

Changelog

Added per-project dataset downloads at starrydata.github.io/starrydata_datasets. Each project (ThermoelectricMaterials, BatteryMaterials, MagneticMaterials, etc.) can now be downloaded separately as papers / samples / curves files, alongside the full unsplit dataset.
Compressed all downloads as gzipped CSV (.csv.gz) to reduce file size. Load directly with pandas.read_csv(url, compression="gzip") or decompress before opening in Excel.

Excluded datasets with the data type "calculation" in the descriptor from the sample dataset and curve dataset. As of 2024/07/01 12:00:01 UTC+0900 (JST), there were 346 samples.

Changed dataset file name prefix from "all" to "starrydata". For example, all_curves.csv is now starrydata_curves.csv.
Changed the file extension of the paper dataset from JSON to CSV for availability.
Reduced the columns in the paper dataset to only those necessary for citation, reducing the file size from 400MB to about 50MB.
Added project_names and created_at to the paper dataset.

Fixed the character corruption issue when users open all_samples.csv in certain applications, such as Excel, by adding a BOM.
The upload schedule to Figshare has been changed from daily to monthly.

Fixed the incorrect timestamp format in the dataset. For example, corrected "2024-05-17 00:00:01 JST+0900" to "2024-05-17 00:00:01 GMT+0900 (JST)".

The values in the XY value list were originally strings enclosed in double quotations. These double quotations were removed for easier analysis.
e.g. ["299.8597", "324.8683"] → [299.8597, 324.8683]

Added updated_at, created_at, and composition_details to all_samples.csv.