Indoor air temperature is one of the key variables for indoor air quality, building energy consumption and moisture safety. Measurements are required to have accurate information on how well indoor air temperature during operation matches the target values set in the design phase. However, besides the information acquired during the measurements, we would also like to have a more comprehensive understanding on how the temperature conditions behave outside the measurement campaign, in different years and in future climatic conditions. The purpose of this paper is to compare machine learning (ML) methods for long-term prediction of hourly indoor air temperature, where the predictions are made based on outdoor climatic conditions only. According to results, the prediction accuracy (mean absolute error) was between 0.78 °C and 1.71 °C for the baseline method (arithmetic mean of training data) and between 0.5 °C and 0.8 °C for the best methods. Prediction methods should be evaluated using multiple datasets and with sufficiently long measurement periods. The most influential factor for prediction accuracy was the selection of the prediction method, whereas optimisation method, number of cross-validation splits and number of lagged values of the climatic variables were of secondary importance. The best combination of prediction accuracy, calculation time and robustness towards variation in measured data was found with decision-tree based methods, such as RandomForest, XGBoost, LightGBM and ExtraTreesRegressor. In the future common datasets and a benchmarking system should be defined for a better comparison of different ML methods for indoor air temperature prediction.