Akash Kothare, Shridhara Chaube, Yash Moharir, Gaurav Bajodia, S. Dongre
{"title":"SynGen:合成数据生成","authors":"Akash Kothare, Shridhara Chaube, Yash Moharir, Gaurav Bajodia, S. Dongre","doi":"10.1109/iccica52458.2021.9697232","DOIUrl":null,"url":null,"abstract":"Synthetic data is superficial data generated using various machine learning techniques. The respective synthetic data generated can be used to preserve privacy, test systems, or create training data for machine learning algorithms. Synthetic data generation is critical as the need for specific data is huge in today's world, for example, synthetic data can be used to practice various data science tasks and techniques, while maintaining the anonymity of the samples generated. We used an open-source engine named Faker (v5.6.1) and Gaussian copula to create a platform that can generate datasets, based on user requirements as well as available resources. The user can also perform a variety of machine learning algorithms and differentiate their performance either over the generated dataset or a predefined dataset.","PeriodicalId":327193,"journal":{"name":"2021 International Conference on Computational Intelligence and Computing Applications (ICCICA)","volume":"39 14","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"SynGen: Synthetic Data Generation\",\"authors\":\"Akash Kothare, Shridhara Chaube, Yash Moharir, Gaurav Bajodia, S. Dongre\",\"doi\":\"10.1109/iccica52458.2021.9697232\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Synthetic data is superficial data generated using various machine learning techniques. The respective synthetic data generated can be used to preserve privacy, test systems, or create training data for machine learning algorithms. Synthetic data generation is critical as the need for specific data is huge in today's world, for example, synthetic data can be used to practice various data science tasks and techniques, while maintaining the anonymity of the samples generated. We used an open-source engine named Faker (v5.6.1) and Gaussian copula to create a platform that can generate datasets, based on user requirements as well as available resources. The user can also perform a variety of machine learning algorithms and differentiate their performance either over the generated dataset or a predefined dataset.\",\"PeriodicalId\":327193,\"journal\":{\"name\":\"2021 International Conference on Computational Intelligence and Computing Applications (ICCICA)\",\"volume\":\"39 14\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Computational Intelligence and Computing Applications (ICCICA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/iccica52458.2021.9697232\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computational Intelligence and Computing Applications (ICCICA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iccica52458.2021.9697232","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Synthetic data is superficial data generated using various machine learning techniques. The respective synthetic data generated can be used to preserve privacy, test systems, or create training data for machine learning algorithms. Synthetic data generation is critical as the need for specific data is huge in today's world, for example, synthetic data can be used to practice various data science tasks and techniques, while maintaining the anonymity of the samples generated. We used an open-source engine named Faker (v5.6.1) and Gaussian copula to create a platform that can generate datasets, based on user requirements as well as available resources. The user can also perform a variety of machine learning algorithms and differentiate their performance either over the generated dataset or a predefined dataset.