开放数据实验室:利用开放数据集增强通用人工智能能力

Conghui He, Wei Li, Zhenjiang Jin, Chao Xu, Bin Wang, Dahua Lin
{"title":"开放数据实验室:利用开放数据集增强通用人工智能能力","authors":"Conghui He, Wei Li, Zhenjiang Jin, Chao Xu, Bin Wang, Dahua Lin","doi":"arxiv-2407.13773","DOIUrl":null,"url":null,"abstract":"The advancement of artificial intelligence (AI) hinges on the quality and\naccessibility of data, yet the current fragmentation and variability of data\nsources hinder efficient data utilization. The dispersion of data sources and\ndiversity of data formats often lead to inefficiencies in data retrieval and\nprocessing, significantly impeding the progress of AI research and\napplications. To address these challenges, this paper introduces OpenDataLab, a\nplatform designed to bridge the gap between diverse data sources and the need\nfor unified data processing. OpenDataLab integrates a wide range of open-source\nAI datasets and enhances data acquisition efficiency through intelligent\nquerying and high-speed downloading services. The platform employs a\nnext-generation AI Data Set Description Language (DSDL), which standardizes the\nrepresentation of multimodal and multi-format data, improving interoperability\nand reusability. Additionally, OpenDataLab optimizes data processing through\ntools that complement DSDL. By integrating data with unified data descriptions\nand smart data toolchains, OpenDataLab can improve data preparation efficiency\nby 30\\%. We anticipate that OpenDataLab will significantly boost artificial\ngeneral intelligence (AGI) research and facilitate advancements in related AI\nfields. For more detailed information, please visit the platform's official\nwebsite: https://opendatalab.com.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"37 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"OpenDataLab: Empowering General Artificial Intelligence with Open Datasets\",\"authors\":\"Conghui He, Wei Li, Zhenjiang Jin, Chao Xu, Bin Wang, Dahua Lin\",\"doi\":\"arxiv-2407.13773\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The advancement of artificial intelligence (AI) hinges on the quality and\\naccessibility of data, yet the current fragmentation and variability of data\\nsources hinder efficient data utilization. The dispersion of data sources and\\ndiversity of data formats often lead to inefficiencies in data retrieval and\\nprocessing, significantly impeding the progress of AI research and\\napplications. To address these challenges, this paper introduces OpenDataLab, a\\nplatform designed to bridge the gap between diverse data sources and the need\\nfor unified data processing. OpenDataLab integrates a wide range of open-source\\nAI datasets and enhances data acquisition efficiency through intelligent\\nquerying and high-speed downloading services. The platform employs a\\nnext-generation AI Data Set Description Language (DSDL), which standardizes the\\nrepresentation of multimodal and multi-format data, improving interoperability\\nand reusability. Additionally, OpenDataLab optimizes data processing through\\ntools that complement DSDL. By integrating data with unified data descriptions\\nand smart data toolchains, OpenDataLab can improve data preparation efficiency\\nby 30\\\\%. We anticipate that OpenDataLab will significantly boost artificial\\ngeneral intelligence (AGI) research and facilitate advancements in related AI\\nfields. For more detailed information, please visit the platform's official\\nwebsite: https://opendatalab.com.\",\"PeriodicalId\":501285,\"journal\":{\"name\":\"arXiv - CS - Digital Libraries\",\"volume\":\"37 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Digital Libraries\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.13773\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.13773","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

人工智能(AI)的发展取决于数据的质量和可访问性,然而目前数据源的分散性和可变性阻碍了数据的高效利用。数据源的分散性和数据格式的多样性往往导致数据检索和处理效率低下,严重阻碍了人工智能研究和应用的进展。为了应对这些挑战,本文介绍了 OpenDataLab,这是一个旨在弥合多样化数据源与统一数据处理需求之间差距的平台。OpenDataLab 整合了广泛的开源人工智能数据集,并通过智能查询和高速下载服务提高了数据采集效率。该平台采用下一代人工智能数据集描述语言(DSDL),实现了多模态和多格式数据的标准化呈现,提高了互操作性和可重用性。此外,OpenDataLab 还通过补充 DSDL 的工具来优化数据处理。通过用统一的数据描述和智能数据工具链整合数据,OpenDataLab 可以将数据准备效率提高 30%。我们预计,OpenDataLab 将极大地推动人工智能(AGI)研究,并促进相关人工智能领域的进步。欲了解更多详细信息,请访问该平台的官方网站:https://opendatalab.com。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
OpenDataLab: Empowering General Artificial Intelligence with Open Datasets
The advancement of artificial intelligence (AI) hinges on the quality and accessibility of data, yet the current fragmentation and variability of data sources hinder efficient data utilization. The dispersion of data sources and diversity of data formats often lead to inefficiencies in data retrieval and processing, significantly impeding the progress of AI research and applications. To address these challenges, this paper introduces OpenDataLab, a platform designed to bridge the gap between diverse data sources and the need for unified data processing. OpenDataLab integrates a wide range of open-source AI datasets and enhances data acquisition efficiency through intelligent querying and high-speed downloading services. The platform employs a next-generation AI Data Set Description Language (DSDL), which standardizes the representation of multimodal and multi-format data, improving interoperability and reusability. Additionally, OpenDataLab optimizes data processing through tools that complement DSDL. By integrating data with unified data descriptions and smart data toolchains, OpenDataLab can improve data preparation efficiency by 30\%. We anticipate that OpenDataLab will significantly boost artificial general intelligence (AGI) research and facilitate advancements in related AI fields. For more detailed information, please visit the platform's official website: https://opendatalab.com.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Publishing Instincts: An Exploration-Exploitation Framework for Studying Academic Publishing Behavior and "Home Venues" Research Citations Building Trust in Wikipedia Evaluating the Linguistic Coverage of OpenAlex: An Assessment of Metadata Accuracy and Completeness Towards understanding evolution of science through language model series Ensuring Adherence to Standards in Experiment-Related Metadata Entered Via Spreadsheets
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1